IO-Net failing under heavy network load

Hi I’m currently developing a Network Driver based upon the reference
PCNet driver from QNX.

This driver work well most of the time, but I have a strange problem
with heavy network traffic and would appreciate some suggestions.

The symptom of the problem is that IO-net fails to send packets from the
IP stack to the transmit function of the driver. This eventually causes
the IP stack to report that there is no buffer space available.

The receive side of the driver is still working fine and nicinfo reports
packets being accepted. Checking the driver shows that all appears well,
it’s not blocked or waiting for anything and the last transmission
appeared to work correctly.

Is there anyway to debug what io-net is doing in this situation? I have
tried placing a filter module inbetween the driver and IP_EN, this shows
packets going across the filter but not being consumed by the driver.

An examination of the RX side of the driver shows that RX packets are
being held by the upper stack, typically I see a packet level of 3 or 4
in the stack when operating normally, when the driver starts to fail
this rises to around 100! When the tx finally dies these packets get
released and the rx level reduces to around 3/4. I’ve tried examining
the packet just as it starts to fail but these appear OK.

Additionally I have seen references to stack size issues with TCPIP. Is
there any way to check the size of the stack whilst running, or any
guidelines as to what will cause the IPstack to fail?

I running the driver with QNX 6.1A on both a PPC and X86. The problem
appears on both systems.


Thanks in Advance

Dave

You are essentially being flow controled. When the driver is
sent a packet to tx, io-net increments a counter. When you
call ion->tx_done() upon tx completion, io-net decrements the
counter. If too many packets are outstanding in the driver,
io-net won’t send you anymore and starts returning ENOBUFS on
your behalf. The threshold is controled with the DCMD_IO_NET_MAX_QUEUE
devctl (I bet you’re setting it to 100).

The fact that your seeing 100 rx packets in the stack is probably a
red herring. What is probably happening is the stack is getting
the packet, swapping a few fields and sending the same packet back
to the driver to tx. ie they are probably Q’ing in the driver, not the
stack.

One common case of this is waiting for another packet to tx before
releasing those that have already been tx’d. You see the catch 22…
Or you may simply be leaking packets…

If you don’t like io-net doing the flow control for you, don’t
call the DCMD_IO_NET_MAX_QUEUE at all (the default is 65535). You’ll
continue getting packets but you should impose your own cap,
something like:

if(q_len >= max_pkts) {
release_any outstanding();
tickle_tx();
npkt->flags |= _NPKT_NOT_TXED;
ion->tx_done(reg_hdl, npkt);
errno = ENOBUFS;
return -1
}

-seanb

Dave Edwards <nobody@home.com> wrote:
: Hi I’m currently developing a Network Driver based upon the reference
: PCNet driver from QNX.

: This driver work well most of the time, but I have a strange problem
: with heavy network traffic and would appreciate some suggestions.

: The symptom of the problem is that IO-net fails to send packets from the
: IP stack to the transmit function of the driver. This eventually causes
: the IP stack to report that there is no buffer space available.

: The receive side of the driver is still working fine and nicinfo reports
: packets being accepted. Checking the driver shows that all appears well,
: it’s not blocked or waiting for anything and the last transmission
: appeared to work correctly.

: Is there anyway to debug what io-net is doing in this situation? I have
: tried placing a filter module inbetween the driver and IP_EN, this shows
: packets going across the filter but not being consumed by the driver.

: An examination of the RX side of the driver shows that RX packets are
: being held by the upper stack, typically I see a packet level of 3 or 4
: in the stack when operating normally, when the driver starts to fail
: this rises to around 100! When the tx finally dies these packets get
: released and the rx level reduces to around 3/4. I’ve tried examining
: the packet just as it starts to fail but these appear OK.

: Additionally I have seen references to stack size issues with TCPIP. Is
: there any way to check the size of the stack whilst running, or any
: guidelines as to what will cause the IPstack to fail?

: I running the driver with QNX 6.1A on both a PPC and X86. The problem
: appears on both systems.


: Thanks in Advance

: Dave

Hi Sean,

Thanks for the info, I believed that is has enabled me to fix the
symptom rather than the cause! What I’m seeing now is a the driver
running without any problem (about 3-6 packets queued).

After a period of time, with heavy loading the IO-Net subsystem crashes.
The core file io-net.core indicates that thread 6 has suffered with a
segmentation fault (SIGNALLED-SIGSEGV, fltno=11).

Is there anyway to identify what thread 6 actually is?

At this point there are no signs that there is anything wrong with the
driver. However, I do occasionally see that my receive complete routine
appears to have been called twice for the same npkt. Is this normal? In
this case what happens if I try to free the npkt again?

I should point out that IO-Net doesn’t crash when I see this event.

Cheers

Dave



Sean Boudreau wrote:

You are essentially being flow controled. When the driver is
sent a packet to tx, io-net increments a counter. When you
call ion->tx_done() upon tx completion, io-net decrements the
counter. If too many packets are outstanding in the driver,
io-net won’t send you anymore and starts returning ENOBUFS on
your behalf. The threshold is controled with the DCMD_IO_NET_MAX_QUEUE
devctl (I bet you’re setting it to 100).

The fact that your seeing 100 rx packets in the stack is probably a
red herring. What is probably happening is the stack is getting
the packet, swapping a few fields and sending the same packet back
to the driver to tx. ie they are probably Q’ing in the driver, not the
stack.

One common case of this is waiting for another packet to tx before
releasing those that have already been tx’d. You see the catch 22…
Or you may simply be leaking packets…

If you don’t like io-net doing the flow control for you, don’t
call the DCMD_IO_NET_MAX_QUEUE at all (the default is 65535). You’ll
continue getting packets but you should impose your own cap,
something like:

if(q_len >= max_pkts) {
release_any outstanding();
tickle_tx();
npkt->flags |= _NPKT_NOT_TXED;
ion->tx_done(reg_hdl, npkt);
errno = ENOBUFS;
return -1
}

-seanb

Dave Edwards <> nobody@home.com> > wrote:
: Hi I’m currently developing a Network Driver based upon the reference
: PCNet driver from QNX.

: This driver work well most of the time, but I have a strange problem
: with heavy network traffic and would appreciate some suggestions.

: The symptom of the problem is that IO-net fails to send packets from the
: IP stack to the transmit function of the driver. This eventually causes
: the IP stack to report that there is no buffer space available.

: The receive side of the driver is still working fine and nicinfo reports
: packets being accepted. Checking the driver shows that all appears well,
: it’s not blocked or waiting for anything and the last transmission
: appeared to work correctly.

: Is there anyway to debug what io-net is doing in this situation? I have
: tried placing a filter module inbetween the driver and IP_EN, this shows
: packets going across the filter but not being consumed by the driver.

: An examination of the RX side of the driver shows that RX packets are
: being held by the upper stack, typically I see a packet level of 3 or 4
: in the stack when operating normally, when the driver starts to fail
: this rises to around 100! When the tx finally dies these packets get
: released and the rx level reduces to around 3/4. I’ve tried examining
: the packet just as it starts to fail but these appear OK.

: Additionally I have seen references to stack size issues with TCPIP. Is
: there any way to check the size of the stack whilst running, or any
: guidelines as to what will cause the IPstack to fail?

: I running the driver with QNX 6.1A on both a PPC and X86. The problem
: appears on both systems.


: Thanks in Advance

: Dave

Sorry for the late response. Been on vacation…

There’s nothing magical about thread number 6. Which
number a thread gets depends only on the order things
are started in. If you do a simple ‘pidin | grep io-net’
at priority 10 when things are relatively quiescent, you’ll
often see the stack’s thread reply blocked at prio 20, whereas
the drivers thread is often at 21. If you start the stack
first (io-net -ptcpip …), it often gets thread 6.

I can’t think of any case where your tx_done() routine should
be called twice on a packet.

Regarding you email about stack usage in npm-tcpip.so. It
is difficult to detect after the fact and the symptoms can
be intermitent and inconsistent. The 6.2 stack is better
about catching this as it happens and has a command line
option to vary the stacksize if you’re hitting the condition.

-seanb

Dave Edwards <nobody@home.com> wrote:
: Hi Sean,

: Thanks for the info, I believed that is has enabled me to fix the
: symptom rather than the cause! What I’m seeing now is a the driver
: running without any problem (about 3-6 packets queued).

: After a period of time, with heavy loading the IO-Net subsystem crashes.
: The core file io-net.core indicates that thread 6 has suffered with a
: segmentation fault (SIGNALLED-SIGSEGV, fltno=11).

: Is there anyway to identify what thread 6 actually is?

: At this point there are no signs that there is anything wrong with the
: driver. However, I do occasionally see that my receive complete routine
: appears to have been called twice for the same npkt. Is this normal? In
: this case what happens if I try to free the npkt again?

: I should point out that IO-Net doesn’t crash when I see this event.

: Cheers

: Dave



: Sean Boudreau wrote:

:> You are essentially being flow controled. When the driver is
:> sent a packet to tx, io-net increments a counter. When you
:> call ion->tx_done() upon tx completion, io-net decrements the
:> counter. If too many packets are outstanding in the driver,
:> io-net won’t send you anymore and starts returning ENOBUFS on
:> your behalf. The threshold is controled with the DCMD_IO_NET_MAX_QUEUE
:> devctl (I bet you’re setting it to 100).
:>
:> The fact that your seeing 100 rx packets in the stack is probably a
:> red herring. What is probably happening is the stack is getting
:> the packet, swapping a few fields and sending the same packet back
:> to the driver to tx. ie they are probably Q’ing in the driver, not the
:> stack.
:>
:> One common case of this is waiting for another packet to tx before
:> releasing those that have already been tx’d. You see the catch 22…
:> Or you may simply be leaking packets…
:>
:> If you don’t like io-net doing the flow control for you, don’t
:> call the DCMD_IO_NET_MAX_QUEUE at all (the default is 65535). You’ll
:> continue getting packets but you should impose your own cap,
:> something like:
:>
:> if(q_len >= max_pkts) {
:> release_any outstanding();
:> tickle_tx();
:> npkt->flags |= _NPKT_NOT_TXED;
:> ion->tx_done(reg_hdl, npkt);
:> errno = ENOBUFS;
:> return -1
:> }
:>
:> -seanb
:>
:> Dave Edwards <nobody@home.com> wrote:
:> : Hi I’m currently developing a Network Driver based upon the reference
:> : PCNet driver from QNX.
:>
:> : This driver work well most of the time, but I have a strange problem
:> : with heavy network traffic and would appreciate some suggestions.
:>
:> : The symptom of the problem is that IO-net fails to send packets from the
:> : IP stack to the transmit function of the driver. This eventually causes
:> : the IP stack to report that there is no buffer space available.
:>
:> : The receive side of the driver is still working fine and nicinfo reports
:> : packets being accepted. Checking the driver shows that all appears well,
:> : it’s not blocked or waiting for anything and the last transmission
:> : appeared to work correctly.
:>
:> : Is there anyway to debug what io-net is doing in this situation? I have
:> : tried placing a filter module inbetween the driver and IP_EN, this shows
:> : packets going across the filter but not being consumed by the driver.
:>
:> : An examination of the RX side of the driver shows that RX packets are
:> : being held by the upper stack, typically I see a packet level of 3 or 4
:> : in the stack when operating normally, when the driver starts to fail
:> : this rises to around 100! When the tx finally dies these packets get
:> : released and the rx level reduces to around 3/4. I’ve tried examining
:> : the packet just as it starts to fail but these appear OK.
:>
:> : Additionally I have seen references to stack size issues with TCPIP. Is
:> : there any way to check the size of the stack whilst running, or any
:> : guidelines as to what will cause the IPstack to fail?
:>
:> : I running the driver with QNX 6.1A on both a PPC and X86. The problem
:> : appears on both systems.
:>
:>
:> : Thanks in Advance
:>
:> : Dave
:>
:>