io-net crashes when running two ping processes

I have been debugging this problem for days with no luck. Any help from you
guys at QNX will be greatly appreciated.

I wrote a custom network driver (an UP producer) whose type I call it “bd”.
In the same shared object, I register an ip_bd convertor to convert “bd”
packets into IP vice versa. The converter is a pass-through (no convertion
done since the device driver transmits and receives IP packets over the
“custom” physical layer).

The driver works if I have one ping application running. The moment I
introduce another ping request while the first one is running in the
background, this second ping crashes io-net. Using “pidin mem”, I came to
realize that the instruction pointer of the crash happens to be inside the
npm-tcpip.so module. I don’t have the code for the stack hence no use
trying to dig deepend on where exactly the crash occurs.

I have disabled the receive path in the driver (commented out the thread
that wait for packets and send them up). Hence, even if I try to ping a
valid destination, there will not be no ICMP ack coming in from the remote
dest. I have also stubbed out most of the transmit code. All I do is
accept the packet from the upper layer, and call txDone() on it and return
TX_DOWN_OK. Despite all these changes, I still see a crash when running two
ping processes.

One noteworthy point, the tulip driver is also loaded and registers an
ethernet device en0. The same two instances of ping utility works with the
en0 device and does not crash the io-net. This leads me to believe there is
something fishy about my driver. The main difference between the tulip and
my custom driver is that I register both the up producer and a convertor
in the same shared object while the tulip uses the ip_en converter.

My development environment is under Windows with QNX 6.1 Patch A SDK and my
target is a PPC platform running QNX 6.1.

Once again, any help you can provide will be greatly appreciated. If
someone wishes to see a particular function, I would be glad to provide
source code stubs here.

Thanks

  • Murtaza

Try running with Qnet disabled…That might help

Sreekanth

“Murtaza” <murti@yahoo.com> wrote in message
news:b5g767$6t9$1@inn.qnx.com

I have been debugging this problem for days with no luck. Any help from
you
guys at QNX will be greatly appreciated.

I wrote a custom network driver (an UP producer) whose type I call it
“bd”.
In the same shared object, I register an ip_bd convertor to convert “bd”
packets into IP vice versa. The converter is a pass-through (no
convertion
done since the device driver transmits and receives IP packets over the
“custom” physical layer).

The driver works if I have one ping application running. The moment I
introduce another ping request while the first one is running in the
background, this second ping crashes io-net. Using “pidin mem”, I came to
realize that the instruction pointer of the crash happens to be inside the
npm-tcpip.so module. I don’t have the code for the stack hence no use
trying to dig deepend on where exactly the crash occurs.

I have disabled the receive path in the driver (commented out the thread
that wait for packets and send them up). Hence, even if I try to ping a
valid destination, there will not be no ICMP ack coming in from the remote
dest. I have also stubbed out most of the transmit code. All I do is
accept the packet from the upper layer, and call txDone() on it and return
TX_DOWN_OK. Despite all these changes, I still see a crash when running
two
ping processes.

One noteworthy point, the tulip driver is also loaded and registers an
ethernet device en0. The same two instances of ping utility works with
the
en0 device and does not crash the io-net. This leads me to believe there
is
something fishy about my driver. The main difference between the tulip
and
my custom driver is that I register both the up producer and a convertor
in the same shared object while the tulip uses the ip_en converter.

My development environment is under Windows with QNX 6.1 Patch A SDK and
my
target is a PPC platform running QNX 6.1.

Once again, any help you can provide will be greatly appreciated. If
someone wishes to see a particular function, I would be glad to provide
source code stubs here.

Thanks

  • Murtaza

Qnet was never running on this target board.

  • Murtaza

“Sreekanth” <nospam@nospam.com> wrote in message
news:b5ge6b$ds3$1@inn.qnx.com

Try running with Qnet disabled…That might help

Sreekanth

“Murtaza” <> murti@yahoo.com> > wrote in message
news:b5g767$6t9$> 1@inn.qnx.com> …
I have been debugging this problem for days with no luck. Any help from
you
guys at QNX will be greatly appreciated.

I wrote a custom network driver (an UP producer) whose type I call it
“bd”.
In the same shared object, I register an ip_bd convertor to convert “bd”
packets into IP vice versa. The converter is a pass-through (no
convertion
done since the device driver transmits and receives IP packets over the
“custom” physical layer).

The driver works if I have one ping application running. The moment I
introduce another ping request while the first one is running in the
background, this second ping crashes io-net. Using “pidin mem”, I came
to
realize that the instruction pointer of the crash happens to be inside
the
npm-tcpip.so module. I don’t have the code for the stack hence no use
trying to dig deepend on where exactly the crash occurs.

I have disabled the receive path in the driver (commented out the thread
that wait for packets and send them up). Hence, even if I try to ping a
valid destination, there will not be no ICMP ack coming in from the
remote
dest. I have also stubbed out most of the transmit code. All I do is
accept the packet from the upper layer, and call txDone() on it and
return
TX_DOWN_OK. Despite all these changes, I still see a crash when running
two
ping processes.

One noteworthy point, the tulip driver is also loaded and registers an
ethernet device en0. The same two instances of ping utility works with
the
en0 device and does not crash the io-net. This leads me to believe
there
is
something fishy about my driver. The main difference between the tulip
and
my custom driver is that I register both the up producer and a
convertor
in the same shared object while the tulip uses the ip_en converter.

My development environment is under Windows with QNX 6.1 Patch A SDK and
my
target is a PPC platform running QNX 6.1.

Once again, any help you can provide will be greatly appreciated. If
someone wishes to see a particular function, I would be glad to provide
source code stubs here.

Thanks

  • Murtaza

    \

Nothing really stands out from that description. Post your
simplified transmit code I guess…

-seanb

Murtaza <murti@yahoo.com> wrote:

I have been debugging this problem for days with no luck. Any help from you
guys at QNX will be greatly appreciated.

I wrote a custom network driver (an UP producer) whose type I call it “bd”.
In the same shared object, I register an ip_bd convertor to convert “bd”
packets into IP vice versa. The converter is a pass-through (no convertion
done since the device driver transmits and receives IP packets over the
“custom” physical layer).

The driver works if I have one ping application running. The moment I
introduce another ping request while the first one is running in the
background, this second ping crashes io-net. Using “pidin mem”, I came to
realize that the instruction pointer of the crash happens to be inside the
npm-tcpip.so module. I don’t have the code for the stack hence no use
trying to dig deepend on where exactly the crash occurs.

I have disabled the receive path in the driver (commented out the thread
that wait for packets and send them up). Hence, even if I try to ping a
valid destination, there will not be no ICMP ack coming in from the remote
dest. I have also stubbed out most of the transmit code. All I do is
accept the packet from the upper layer, and call txDone() on it and return
TX_DOWN_OK. Despite all these changes, I still see a crash when running two
ping processes.

One noteworthy point, the tulip driver is also loaded and registers an
ethernet device en0. The same two instances of ping utility works with the
en0 device and does not crash the io-net. This leads me to believe there is
something fishy about my driver. The main difference between the tulip and
my custom driver is that I register both the up producer and a convertor
in the same shared object while the tulip uses the ip_en converter.

My development environment is under Windows with QNX 6.1 Patch A SDK and my
target is a PPC platform running QNX 6.1.

Once again, any help you can provide will be greatly appreciated. If
someone wishes to see a particular function, I would be glad to provide
source code stubs here.

Thanks

  • Murtaza

Below is part of the driver’s code.

Init structures:

// DLL entry
io_net_dll_entry_t io_net_dll_entry = {
2,
bdd_Init,
NULL
};

// UP Producer callbacks

io_net_registrant_funcs_t bddFuncs = {
_IO_NET_REG_NFUNCS,
NULL,
bdd_Transmit,
bdd_ReceiveComplete,
bdd_Shutdown1,
bdd_Shutdown2,
bdd_Advertise,
bdd_Devctl,
bdd_FlushTxQueue,
NULL,
};

// Producer registrant entry

io_net_registrant_t bddEntry = {
_REG_PRODUCER_UP,
“devn-bdd.so”,
“bd”,
NULL,
NULL,
&bddFuncs,
0
};

// Converter Callbacks
io_net_registrant_funcs_t bdcFuncs = {
_IO_NET_REG_NFUNCS, // nfuncs,
bdc_RxUp,
bdc_Transmit,
bdc_ReceiveComplete,
bdc_Shutdown1,
bdc_Shutdown2,
NULL,
NULL,
NULL,
NULL,
};

// Converter registrant entry
io_net_registrant_t bdcEntry = {
_REG_CONVERTOR,
“devn-bdd.so”,
“ip”,
“bd”,
NULL,
&bdcFuncs,
0,
};

// Bdd init (the DLL entrypoint)
int bdd_Init(void *dll_hdl, dispatch_t *dpp, io_net_self_t *ion, char
*options)
{
int ret;

dpp = dpp;

if ((ret = bdd_Detect(dll_hdl, ion, options)) != SUCCESS)
{
errno = ret;
return FAILURE;
}

return SUCCESS;

} // bdd_Init()

//Bdd detect – for now, this function simply registers the converter and
one instance of the producer

int bdd_Detect( void *dll_hdl, io_net_self_t *ion, char *options )
{
int iface, totalInterfaces = 1; //hardcoded to 1 for testing
Nic_t *nic = NULL, *conv_nic = NULL;
BDD_EXT *ext = NULL, *converterExt = NULL;
uint16_t lan;
char filename[20];

// First we register our custom converter with io-net

if ((conv_nic = nic_create_dev(sizeof (BDD_EXT))) == NULL)
{
return ENODEV;
}

converterExt = (BDD_EXT *)(conv_nic->ext);

memset((char *) converterExt, 0, sizeof(BDD_EXT));

converterExt->ion = ion;
converterExt->dll_hdl = dll_hdl;
converterExt->verbose = 1; // temp verbose

converterExt->type = _REG_CONVERTOR;

bdcEntry.func_hdl = (void *) conv_nic;

if ((ion->reg(dll_hdl, &bdcEntry, &converterExt->reg_hdl,
&converterExt->cell, &lan)) != SUCCESS)
{

fprintf(stderr, “bdd: Unable to register the converter with
io-net\n”);
return FAILURE;
}

conv_nic->lan = lan;

// Now register and advertise each device instance

for (iface = 0;iface < totalInterfaces; iface++)
{
// Create a device structure and initialize it
if ((nic = nic_create_dev(sizeof (BDD_EXT))) == NULL)
{
// If we can’t create a device structure then unregister the
converter and return ENODEV
ion->dereg(converterExt->reg_hdl);

return ENODEV;
}

// fill in our private device structure
ext = (BDD_EXT *)(nic->ext);

memset((char *) ext, 0, sizeof(BDD_EXT));

ext->type = _REG_PRODUCER_UP;
ext->ion = ion;
ext->dll_hdl = dll_hdl;
ext->reg_convhdl = converterExt->reg_hdl; // Save the converter
handle. We use this to send up packets to the IP stack on the receive path
thus bypassing the converter
ext->verbose = 3; // temp verbose
ext->interfaceNum = iface;
nic->media = NIC_MEDIA_CUSTOM;
nic->phy = PHY_NOT_INSTALLED;
nic->mtu = MAX_PACKET_SIZE;
ext->fp = NULL;

// Register and advertise each device to io-net.

if ((bdd_RegisterDevice(nic, ion, dll_hdl)) != SUCCESS)
{
// Could not register our device.

// Unregister the converter
ion->dereg(converterExt->reg_hdl);

return ENODEV;
}

if ((bdd_Advertise(ext->reg_hdl, nic)) != SUCCESS)
{
// Could not advertise device to upper layers

// Unregister the converter
ion->dereg(converterExt->reg_hdl);

return ENODEV;
}
}

return SUCCESS;

} // bdd_Detect()

// This is the stubbed out producer’s transmit function.

int bdd_Transmit( npkt_t *npkt, void *hdl )
{
Nic_t *nic = NULL;
BDD_EXT *ext = NULL;
HDLC_FRAME_MSG packet;
int ret = 0, dev_ret = 0;

nic = (Nic_t *) hdl;
ext = (BDD_EXT *)nic->ext;

//if link is down then return error

if ( ext->linkFail == 1 )
{
errno = ENOLINK;
goto Error;
}

// Give back the packet to io-net (ionetTxDone is #define to
ext->ion->tx_done())

ionetTxDone( ext->reg_hdl, npkt );

return(TX_DOWN_OK);
}

Error:
npkt->flags |= _NPKT_NOT_TXED;
pthread_mutex_unlock( &ext->mutex );
ionetTxDone( ext->reg_hdl, npkt );
return(TX_DOWN_FAILED);

}// bdd_Transmit()

// This is the converter’s transmit function

int bdc_Transmit(npkt_t *npkt, void *hdl)
{
int ret;

Nic_t *nic = (Nic_t *) hdl;
BDD_EXT *ext = (BDD_EXT *)nic->ext;

// ionetTxDown is #define to ext->ion->tx_down())
ret = ionetTxDown(ext->reg_hdl, npkt);

return ret;

} // bdc_Transmit()

Aside from this problem, I have yet another minor problem on my receive
path. I am not sure if these two are related. When I send up an advertise
packet, the packet hits my converter’s RxUp function and makes it to the IP
stack with no problem. However, when I have data packets to send up from my
receive thread, the ionet tx_up() always returns 0 on those packets (meaning
no one above the producer accepts them). The data packet will make it to
the IP stack if I use the converter register handle instead of the
producer’s register handle. Below is the converter’s RxUP function.

int bdc_RxUp( npkt_t *npkt,
void *func_hdl,
int off,
int framlen_sub,
uint16_t cell,
uint16_t endpoint,
uint16_t iface )
{

Nic_t *nic = (Nic_t *)func_hdl;
BDD_EXT *ext = (BDD_EXT *)nic->ext;
int ret;

// ionetTxUp is #define to ext->ion->tx_up())

ret = ionetTxUp( ext->reg_hdl, npkt, off, framlen_sub, cell, endpoint,
iface );

fprintf(stdout, “bdc_RxUp: ret = %x\n”, ret);
fflush(stdout);

if(ret == FAILURE)
{
fprintf(stdout, “bdc_RxUp: Failed to send up the packet. Errno=
%x\n”, errno);
return FAILURE;

}

else if(ret == 0)
{
// No one above us took the packet
// need to call TxDone

ionetTxDone(ext->reg_hdl, npkt);
}

return EOK;

} // bdc_RxUp()


Thanks for all your help.

  • Murtaza

“Sean Boudreau” <seanb@node25.ott.qnx.com> wrote in message
news:b5ip4c$6lq$1@nntp.qnx.com

Nothing really stands out from that description. Post your
simplified transmit code I guess…

-seanb

Murtaza <> murti@yahoo.com> > wrote:
I have been debugging this problem for days with no luck. Any help from
you
guys at QNX will be greatly appreciated.

I wrote a custom network driver (an UP producer) whose type I call it
“bd”.
In the same shared object, I register an ip_bd convertor to convert “bd”
packets into IP vice versa. The converter is a pass-through (no
convertion
done since the device driver transmits and receives IP packets over the
“custom” physical layer).

The driver works if I have one ping application running. The moment I
introduce another ping request while the first one is running in the
background, this second ping crashes io-net. Using “pidin mem”, I came
to
realize that the instruction pointer of the crash happens to be inside
the
npm-tcpip.so module. I don’t have the code for the stack hence no use
trying to dig deepend on where exactly the crash occurs.

I have disabled the receive path in the driver (commented out the thread
that wait for packets and send them up). Hence, even if I try to ping a
valid destination, there will not be no ICMP ack coming in from the
remote
dest. I have also stubbed out most of the transmit code. All I do is
accept the packet from the upper layer, and call txDone() on it and
return
TX_DOWN_OK. Despite all these changes, I still see a crash when running
two
ping processes.

One noteworthy point, the tulip driver is also loaded and registers an
ethernet device en0. The same two instances of ping utility works with
the
en0 device and does not crash the io-net. This leads me to believe
there is
something fishy about my driver. The main difference between the tulip
and
my custom driver is that I register both the up producer and a
convertor
in the same shared object while the tulip uses the ip_en converter.

My development environment is under Windows with QNX 6.1 Patch A SDK and
my
target is a PPC platform running QNX 6.1.

Once again, any help you can provide will be greatly appreciated. If
someone wishes to see a particular function, I would be glad to provide
source code stubs here.

Thanks

  • Murtaza

Murtaza <murti@yahoo.com> wrote:

Below is part of the driver’s code.

Aside from this problem, I have yet another minor problem on my receive
path. I am not sure if these two are related. When I send up an advertise
packet, the packet hits my converter’s RxUp function and makes it to the IP
stack with no problem. However, when I have data packets to send up from my
receive thread, the ionet tx_up() always returns 0 on those packets (meaning
no one above the producer accepts them). The data packet will make it to
the IP stack if I use the converter register handle instead of the
producer’s register handle. Below is the converter’s RxUP function.

This is probably because the convertor hasn’t done a
_ion->reg_byte_pat(hdl, 0, o, NULL, _BYTE_PAT_ALL) (assuming
you want all packets).

I don’t see anything else obviously wrong in your code.

-seanb

This is probably because the convertor hasn’t done a
_ion->reg_byte_pat(hdl, 0, o, NULL, _BYTE_PAT_ALL) (assuming
you want all packets).

You were correct. I added this call before reading your response. I am
able to receive and transmit packets via the converter.

I don’t see anything else obviously wrong in your code.

Regarding the crash with two ping processes (one running in the background
and the second introduced), for debugging purpose, I added a ion->tx_done()
in the transmit function of the converter which stops the packet transmit
flow in the converter. Code looks like:

int bdc_Transmit(npkt_t *npkt, void *hdl)
{
int ret;
Nic_t *nic = (Nic_t *) hdl;
BDD_EXT *ext = (BDD_EXT *)nic->ext;

if ((ionetTxDone( ext->reg_hdl, npkt)) == -1)
{
fprintf(stdout, “bdc_Transmit: ionetTxDone() failed. Errno = %x\n”,
errno);
fflush(stdout);
}

//ret = ionetTxDown(ext->reg_hdl, npkt);

return 0;

} // bdc_Transmit()

By doing so, I am able to run two ping processes without crashing io-net.
However, if I send the packet down by calling ionetTxDown() in the converter
code above (and don’t call TxDone() in the converter ), io-net will crash
the moment the second ping process is introduced. The producer’s Transmit
function simply calls TxDone() and returns TX_DOWN_OK. I used to have both
the converter and producer code in one shared object. I have recently
seperated them into their own .so.

This is my latest finding. I’ll keep digging :slight_smile:. If anyone has any
clues/ideas, then please share.

  • Murtaza