strange packets from io-net

I’ve been having a strange problem with packets that io-net is passing up to my
ncm. My ncm has a down-type of ‘en’ and an up-type of ‘vmac’ (our own type).

After a while (measured in number of packets received) I will get a packet that
appears to be offset by two bytes. Thereafter, it happens more and more
frequently. The problem resets itself when I restart io-net. For example, if I
show you just the 802.3 header of an expected packet and the actual packet
received from io-net:

  • expected *
    00 40 05 7c 8e fb 00 90 c2 c1 87 86 00 04 …
  • actual *
    00 40 00 40 05 7c 8e fb 00 90 c2 c1 87 86 00 04 …

Naturally the length is also wrong (0x8786 instead of whatever length the real
packet has).

Ethereal sniffing on a different computer shows that the expected packets are
what is really out on the wire, so the problem lies between the wire and my
ncm.

I’ve been able to kludge around it with the following code, and this works
reliably. Of course that doesn’t change the fact that it’s a kludge…

p = ni->iov_base;
if (p[0] == 0x00 && p[1] == 0x40 && p[2] == 0x00 && p[3] == 0x40)
{
_802_3_hdr_t *hdr;
ni->iov_base = ni->iov_base + 2;

hdr = (_802_3_hdr_t*)ni->iov_base;
ni->iov_len = sizeof(_802_3_hdr_t) + ntohs(hdr->len);

log_warning(“vmac_en::rx_up: skewampus packet detected, correcting”);
}

I recognize that the fact that I have the MAC address hard-coded does nothing
to make it less-kludgey. It’s just a stop-gap measure.

So, is this a bug in io-net, the devn-rtl.so driver, or my code? I can’t see
anything in my code that would cause it, but I think it may be possible that
I’m not managing memory properly and that in some way is confusing io-net or
the en driver. Has anyone seen anything like this before?

I neglected to mention that in these packets ni->iov_len is 1520. The
real packets are only 60 bytes.

Hans Fugal <fugalh@byu.edu> wrote:

It doesn’t ring a bell. If you just pass packets straight through
your filter, does it happen? If no I’d suspect your filter. If
yes, I’d suspect the driver. Try with another driver to verify.

-seanb

So, is this a bug in io-net, the devn-rtl.so driver, or my code? I can’t see
anything in my code that would cause it, but I think it may be possible that
I’m not managing memory properly and that in some way is confusing io-net or
the en driver. Has anyone seen anything like this before?

On Fri, 22 Aug 2003 18:49:16 +0000, Sean Boudreau wrote:

It doesn’t ring a bell. If you just pass packets straight through
your filter, does it happen? If no I’d suspect your filter. If
yes, I’d suspect the driver. Try with another driver to verify.

My converter just passes the packet straight through. My protocol module
sees the packets just as I pass them up from the converter. Following is the
rx_up() function of my converter in full. I plan to try another driver as
soon as we can get our hands on another NIC.

int vmac_en_rx_up(npkt_t * npkt,
void *func_hdl,
int off,
int framlen_sub,
uint16_t cell,
uint16_t endpoint,
uint16_t iface)
{
net_iov_t *ni;
vmac_en_hdl *hdl;
uint8_t *p;

ni = TAILQ_FIRST(&npkt->buffers)->net_iov;
hdl = (vmac_en_hdl *) func_hdl;


// skewampus length stop-gap
p = ni->iov_base;
if (p[0] == 0x00 && p[1] == 0x40 && p[2] == 0x00 && p[3] == 0x40)
{
vmac_hdr_t *hdr;
ni->iov_base = ni->iov_base + 2;

hdr = (vmac_hdr_t*)ni->iov_base;
ni->iov_len = sizeof(vmac_hdr_t) + ntohs(hdr->len);

log_warning(“vmac_en::rx_up: skewampus length detected, correcting”);
}

//log_debug2(“vmac_en_rx_up iov_len: %d”,ni->iov_len);

if (hdl->ion->tx_up(hdl->reg_hdl, npkt, off, framlen_sub,
hdl->cell, hdl->endpoint, 0 // iface
) < 0)
log_perror(“tx_up failed”);
if (hdl->ion->tx_done(hdl->reg_hdl, npkt) < 0)
log_perror(“tx_done failed”);
return 0;
}

This may not your case, but you suppose to check the “off”.

p = ni->iov_base + off;

You probably have to make sure the iov_base + off doesn’t cross
the iov 's boundary.

-xtang

Hans Fugal <fugalh@byu.edu> wrote in message
news:pan.2003.08.22.19.48.13.123607@byu.edu

On Fri, 22 Aug 2003 18:49:16 +0000, Sean Boudreau wrote:

It doesn’t ring a bell. If you just pass packets straight through
your filter, does it happen? If no I’d suspect your filter. If
yes, I’d suspect the driver. Try with another driver to verify.

My converter just passes the packet straight through. My protocol module
sees the packets just as I pass them up from the converter. Following is
the
rx_up() function of my converter in full. I plan to try another driver as
soon as we can get our hands on another NIC.

int vmac_en_rx_up(npkt_t * npkt,
void *func_hdl,
int off,
int framlen_sub,
uint16_t cell,
uint16_t endpoint,
uint16_t iface)
{
net_iov_t *ni;
vmac_en_hdl *hdl;
uint8_t *p;

ni = TAILQ_FIRST(&npkt->buffers)->net_iov;
hdl = (vmac_en_hdl *) func_hdl;


// skewampus length stop-gap
p = ni->iov_base;
if (p[0] == 0x00 && p[1] == 0x40 && p[2] == 0x00 && p[3] == 0x40)
{
vmac_hdr_t *hdr;
ni->iov_base = ni->iov_base + 2;

hdr = (vmac_hdr_t*)ni->iov_base;
ni->iov_len = sizeof(vmac_hdr_t) + ntohs(hdr->len);

log_warning(“vmac_en::rx_up: skewampus length detected,
correcting”);
}

//log_debug2(“vmac_en_rx_up iov_len: %d”,ni->iov_len);

if (hdl->ion->tx_up(hdl->reg_hdl, npkt, off, framlen_sub,
hdl->cell, hdl->endpoint, 0 // iface
) < 0)
log_perror(“tx_up failed”);
if (hdl->ion->tx_done(hdl->reg_hdl, npkt) < 0)
log_perror(“tx_done failed”);
return 0;
}

On Fri, 22 Aug 2003 16:33:42 -0400, Xiaodan Tang wrote:

This may not your case, but you suppose to check the “off”.

p = ni->iov_base + off;

You probably have to make sure the iov_base + off doesn’t cross
the iov 's boundary.

-xtang

That’s good to be aware of, thanks. Unfortunately I checked and found that
it isn’t the problem here.

The values I am getting are:
off == 0
framlen_sub == 0
npkt->framelen == 0
npkt->tot_iov == 1
ni->iov_len == 1520

I’m a bit surprised by npkt->framelen being 0. Normally framelen is the
true length of the packet (most of the time that’s 60, in my situation).

OK, another posibility.

Whatelse is running in side the io-net? What is the “byte_pat” you registed
for your
vmc <-> en converter ? I was thinking that the packet you see might goes to
another
converter and be modified there before it reaches your converter.

-xtang


Hans Fugal <fugalh@byu.edu> wrote in message
news:pan.2003.08.22.21.28.08.34974@byu.edu

On Fri, 22 Aug 2003 16:33:42 -0400, Xiaodan Tang wrote:

This may not your case, but you suppose to check the “off”.

p = ni->iov_base + off;

You probably have to make sure the iov_base + off doesn’t cross
the iov 's boundary.

-xtang

That’s good to be aware of, thanks. Unfortunately I checked and found that
it isn’t the problem here.

The values I am getting are:
off == 0
framlen_sub == 0
npkt->framelen == 0
npkt->tot_iov == 1
ni->iov_len == 1520

I’m a bit surprised by npkt->framelen being 0. Normally framelen is the
true length of the packet (most of the time that’s 60, in my situation).

Whatelse is running in side the io-net? What is the “byte_pat” you
registed for your vmc <-> en converter ? I was thinking that the packet
you see might goes to another converter and be modified there before it
reaches your converter.

devn-rtl.so, my ncm (ncm-vmac_en.so) and npm (npm-vmac.so) are the only
modules running in io-net.

I do:
ion->reg_byte_pat(ncm_hdl->reg_hdl, NULL, NULL, NULL, _BYTE_PAT_ALL);

After some further investigation I found that my stop-gap was not truly
helping - it isn’t the desired packet after all but a copy of an older
packet (e.g. buffer reuse). I now check for framelen == 0, but the high
frequency of this occurring is still quite unnerving. I will be trying
different hardware today as well. (I have come across another D-Link card
that uses the same driver and a tulip-based card, I will let you know the
results with both).

Hans Fugal wrote:

Whatelse is running in side the io-net? What is the “byte_pat” you
registed for your vmc <-> en converter ? I was thinking that the packet
you see might goes to another converter and be modified there before it
reaches your converter.

devn-rtl.so, my ncm (ncm-vmac_en.so) and npm (npm-vmac.so) are the only
modules running in io-net.

I do:
ion->reg_byte_pat(ncm_hdl->reg_hdl, NULL, NULL, NULL, _BYTE_PAT_ALL);

After some further investigation I found that my stop-gap was not truly
helping - it isn’t the desired packet after all but a copy of an older
packet (e.g. buffer reuse). I now check for framelen == 0, but the high
frequency of this occurring is still quite unnerving. I will be trying
different hardware today as well. (I have come across another D-Link card
that uses the same driver and a tulip-based card, I will let you know the
results with both).

Are you checking the flags to make sure _NPKT_MSG isn’t set? It
doesn’t seem like you should be stting a lot of those packets, but they
can sure cause you trouble if you don’t check for them.

Murf

On Tue, 26 Aug 2003 11:32:59 -0500, John A. Murphy wrote:

Are you checking the flags to make sure _NPKT_MSG isn’t set? It
doesn’t seem like you should be stting a lot of those packets, but they
can sure cause you trouble if you don’t check for them.

Not in the ncm, I’m not, but I am in the npm. These packets are not
_NPKT_MSG packets, and they come exactly when a regular packet should come
(and that regular packet never comes).

Still working on testing with other NICs and getting some more informative
stats to report.

Hans Fugal <> fugalh@byu.edu> > wrote in message
news:> pan.2003.08.22.19.48.13.123607@byu.edu> …
On Fri, 22 Aug 2003 18:49:16 +0000, Sean Boudreau wrote:

My converter just passes the packet straight through. My protocol module
sees the packets just as I pass them up from the converter. Following is
the
rx_up() function of my converter in full. I plan to try another driver
as
soon as we can get our hands on another NIC.

int vmac_en_rx_up(npkt_t * npkt,
void *func_hdl,
int off,
int framlen_sub,
uint16_t cell,
uint16_t endpoint,
uint16_t iface)
{

[ code sniped]

if (hdl->ion->tx_up(hdl->reg_hdl, npkt, off, framlen_sub,
hdl->cell, hdl->endpoint, 0 // iface
) < 0)
log_perror(“tx_up failed”);
if (hdl->ion->tx_done(hdl->reg_hdl, npkt) < 0)
log_perror(“tx_done failed”);
return 0;

I think here is your problem. You suppose to do:

if (hdl->ion->tx_up(hdl->reg_hdl, npkt, off, framlen_sub, hdl->cell,
hdl->endpoint, 0) <= 0) {
hdl->ion->tx_done(hdl->reg_hdl, npkt);
}
return 0;

ie, you suppose call tx_done() only if tx_up() return is <= 0, your code
probably
double freed an npkt.

-xtang

Thanks, that seems to have solved it. I should point out that this goes in
the face of the documentation[1], and so I assume the documentation needs
to be updated.

  1. http://www.qnx.com/developer/docs/momentics621_docs/ddk_en/network/io_net_self_t.html#tx_done
    “For upward-headed packets, this function is called by each module
    (including the originator) when finished with the packet. The single
    tx_done() stored in the packet is called when the ref_cnt member goes to
    zero.”

On Wed, 27 Aug 2003 23:48:15 -0400, Xiaodan Tang wrote:

Hans Fugal <> fugalh@byu.edu> > wrote in message
news:> pan.2003.08.22.19.48.13.123607@byu.edu> …
On Fri, 22 Aug 2003 18:49:16 +0000, Sean Boudreau wrote:

My converter just passes the packet straight through. My protocol module
sees the packets just as I pass them up from the converter. Following is
the
rx_up() function of my converter in full. I plan to try another driver
as
soon as we can get our hands on another NIC.

int vmac_en_rx_up(npkt_t * npkt,
void *func_hdl,
int off,
int framlen_sub,
uint16_t cell,
uint16_t endpoint,
uint16_t iface)
{

[ code sniped]

if (hdl->ion->tx_up(hdl->reg_hdl, npkt, off, framlen_sub,
hdl->cell, hdl->endpoint, 0 // iface
) < 0)
log_perror(“tx_up failed”);
if (hdl->ion->tx_done(hdl->reg_hdl, npkt) < 0)
log_perror(“tx_done failed”);
return 0;

I think here is your problem. You suppose to do:

if (hdl->ion->tx_up(hdl->reg_hdl, npkt, off, framlen_sub, hdl->cell,
hdl->endpoint, 0) <= 0) {
hdl->ion->tx_done(hdl->reg_hdl, npkt);
}
return 0;

ie, you suppose call tx_done() only if tx_up() return is <= 0, your code
probably
double freed an npkt.

-xtang