Found it! It was a rather snaeky recursion path that resulted in returning an npkt twice, which
is why the npkt looked like the buffer with the Ethernet header had been properly inserted in the
list and then removed - that’s exactly what had happened. Once again, it sure would have been
helpful to be able to observe the io-net interals, and see that the counter had gone negative
instead of bumping into the limit.
This one doesn’t ring a bell. It sounds like undetermined memory
coruption. The arp module has (or is supposed to have) exclusive
access to the npkt itself as down headed packets go to each module
in turn. Thus the lack of any locks when the actual TAILQ_INSERT_HEAD
is performed.
I’d look for something being freed too early / twice.
-seanb
John A. Murphy <> murf@perftech.com> > wrote:
While chasing this problem, I’ve isolated a case where the ARP convertor sends the Ethernet
driver an npkt that has tot_iov set to 2, but has just one buffer on the buffer list; i.e.,
an IP packet with no Ethernet header. There are two npkt_done_t’s registered, and the
second one points to a buffer with an Ethernet header. And that buffer, the one containing
the Ethernet header, has it’s tqe_prev pointing at the npkt, as it should ne, but it’s
tqe_next points off into outer space. In other words, it looks like the ARP convertor’s
add_header routine got interrupted in the middle of doing it’s TAILQ_INSERT_HEAD - or else
both the npkt and it’s buffers list got corrupted on the way from the ARP convertor to the
Ethernet driver- or else ???
Any ideas?
Murf
Sean Boudreau wrote:
If your driver isn’t seeing the packet to tx, the error is being raised by
io-net because it thinks the driver has reached it’s limit as far as how
many packets it’s allowed to queue up for tx. The default limit is 64K
and can be manipulated with the DCMD_IO_NET_MAX_QUEUE ioctl. Every time
you receive a packet to tx, io-net increments the count. Every time you
call ion->tx_done() on a packet, io-net decrements the count.
I’d suggest making the limit a large value and monitor memory usage
with ‘pidin me’. If io-net grows until ENOBUFS, you’re probably leaking
packets. If it doesn’t, you may be releasing a packet twice?
-seanb
John A. Murphy <> murf@perftech.com> > wrote:
At the moment I’m seeing it with one of my own drivers, which is why I’m fairly sure
that the driver has released all its packets (I have a devctl that tells if if there
are outstanding xmit packets). Is there any way for a driver to find the address of
_ion, so that I can watch the num_queued count myself? I may well have a bug in this
driver that’s causing all the problems, but it would sure help the debug effor to
know what io-net thinks is happening.
Murf
Sean Boudreau wrote:
I think the algorithm has been described in this thread. If you
always get ENOBUFS, the driver isn’t releasing packets. Or a
packet has been released twice and a counter has wrapped around…
Which versions of which drivers are you seeing this with?
-seanb
John A. Murphy <> murf@perftech.com> > wrote:
I’ve noticed this same problem, “once I receive a single
TX_DOWN_FAILED/ENOBUFS, I can no longer send any packets. Any attempt to send a
packet results in this error”, in several different drivers. Anybody got any
idea what’s going on?
Murf
Shaun Jackman wrote:
I noticed in pcnet_transmit_packets() (the rx_down function) the driver
never actually refuses a packet. So, io-net must be returning ENOBUFS to me
like you said. My problem is that once I receive a single
TX_DOWN_FAILED/ENOBUFS, I can no longer send any packets. Any attempt to
send a packet results in this error. So, what mechanism does io-net use to
enforce this limit? How does it calculate the number of outstanding packets?
I think somehow I’m not playing by the rules, and io-net is believing that
some number of packets that have actually been sent are unaccounted for.
Thanks,
Shaun
Sean Boudreau <> seanb@qnx.com> > wrote in message
news:aekqmk$6pa$> 1@nntp.qnx.com> …
The driver can either manage its queue length itself, or it
can use the DCMD_IO_NET_MAX_QUEUE devctl which tell io-net to
enforce the limit. And yes, errno is set.
-seanb