More details on the libusbdi problem

To the usb guru,

I had been having problems with my usb program. It would SIGSEGV in the usb
library code from time to time. I looked at the usb library source code and
saw that physical memory is mapped using the mmap command. After some
tests, I found out that the program SIGSEGV’ed because the memory that is
mmap’ed got corrupted.

In the usb library memory module, there is a control structure that points
to 13 memory buckets. Each bucket manages a list of memory header structs
that in turn is used to manage memory chunk entries. The problem occurs in
the header structs and the entries structs. Each memory chunk has a header
and a number of entries. The size of a memory chunk is 1 page (4K).
Through the function UseFreeEntry, memory is mapped for use. Normally, the
calling function is usbd_alloc. The memory that is mapped (when this
problem occurs) ranges from 18 to 64 bytes. After memory has been mapped,
it is used for various purposes by usb library code. When it comes time to
free the memory, UnuseAllocEntry is called. This function should free up
the entries for others to use. It is here that I saw that the entries
struct has been corrupted. When the entry mapped is the first entry in the
memory chunk (right after the header), I saw that the header was corrupted
as well. Here are the details:

typedef struct MemchunkHdr
{
struct MemchunkHdr *link;
MemchunkEntry *unused;
paddr_t paddr;
unsigned short used;
unsigned short ctrl;
} MemchunkHdr;

typedef union MemchunkEntry
{
union MemchunkEntry *link;
struct MemchunkHdr *owner;
} MemchunkEntry;

When the memory is mapped, the entry struct (4 bytes long) immediately
follows the header (16 bytes long). The mapped entry will hold more than 4
bytes (MemchunkEntry struct + some size). The owner element is supposed to
point to the address of the header. What I noticed was that the contents of
the header were copied 8 bytes into the header. For example, if the header
is supposed to contain 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10,
after the corruption, it contains 01 02 03 04 05 06 07 08 01 02 03 04 05 06
07 08 and the first entry contains 09 0A 0B 0C 0D 0E 0F 10. As you can see,
this messes up the owner pointer and causes the segmentation fault.

I have also seen a segmentation fault occur when the entry used is not the
first in the memory chunk. In that case, the header itself was not
corrupted. I believe the contents of the header were copied over the
corrupted entry which changed “owner” and caused the fault. The corrupted
values looked similar to some of header’s values.

Can someone take a look at this? I know that the mapped memory is passed to
devu-ohci to use so the problem may lie in the code in devu-ohci. The time
frame when this happens is quite small. I saw this when usbd_descriptor is
called:

  1. memory is mapped for use - usbd_alloc.
  2. more memory is mapped for use - usbd_alloc_urb.
  3. calls to devu-ohci.
  4. memory is freed - usbd_alloc_urb.
  5. memory is freed - usbd_alloc.
    On step 5, a segmentation fault occurs because entry->owner has been
    corrupted. I did a printf of entry->owner when it is mapped and verified
    that the value is good. Somewhere between the time it is mapped and the
    time it is freed, memory was corrupted.

Here are a few things I tried:

  1. When I stopped using the pre-mapped memory (13 buckets), the problem goes
    away. All memory that is allocated is mmap’ed on the fly and munmap’ed when
    no longer needed.
  2. When I added 12 bytes to the MemchunkHdr struct, the problem goes away.
    These 12 bytes could be added in any order to the struct. I used three
    uint32s.

One final note. This problem only occurs when my video driver is running.
The video driver uses mmap also. The evidence seem to indicate that the
video driver is just a catalyst that causes this problem to manifest itself.
The way the memory is corrupted supports this point.

Please help. I need to know for sure what is causing the problem. I am on
a tight schedule to get my project done by the end of May. If it is in the
usb code, I would appreciate a fix soon. If the problem lies elsewhere, I
would like to know ASAP too. Please give me a response either way.

Thank you.

Rex Lam