Routing Table Becoming Corrupt Over Time?

Hi All,
I’ve been running into a most perplexing problem recently with regard to
networking on an embedded QNX system we’re using here at work. The Ethernet
on this system will fail periodically, in such a way that it no longer
transmits packets. However packets can be seen being recieved from the
network. When a ping attempt is sent out from the system it returns a “no
route to host” message and does not transmit packets. This failure happens
at random intervals, and for no apparent reason on these systems. Nicinfo
always shows one or more aborted Tx Collision Errors (with older versions of
the system it always failed when one aborted Collision Error occurred).
I’ve tried manipulating the routing tables on one of these systems to
duplicate the problem. I found that in order to duplicate the issue I have
to delete the entry in the routing table for the subnet (ie 172.16.6.0) and
then the system behaves the same as when it fails. I have not yet had a
chance to test this on a failed system however, and would like to have a
little more knowledge of what I can do with a failed system before I try
anything.
I was wondering if anyone else has run into this sort of a problem at
any point, and if they have any suggestions of how I could recover from it.
I was thinking a full refresh of the routing table might work, however I am
uncertain how I’d go about doing it programatically. If anyone has anything
to offer, I’d appreciate their input greatly.

Regards,

Greg Buccini, E.I.T.
Electrical Engineer
Westronic Systems, Inc. - A Mediation Technology Company
Phone: (403) 250-8304 ext. 224
Fax: (403) 250-6711
Email: gbuccini@westronic.com
Website: http://www.westronic.com

Greg Buccini <gbuccini@westronic.com> wrote:
GB > Hi All,
GB > I’ve been running into a most perplexing problem recently with regard to
GB > networking on an embedded QNX system we’re using here at work. The Ethernet
GB > on this system will fail periodically, in such a way that it no longer
GB > transmits packets. However packets can be seen being recieved from the
GB > network. When a ping attempt is sent out from the system it returns a “no
GB > route to host” message and does not transmit packets. This failure happens
GB > at random intervals, and for no apparent reason on these systems. Nicinfo
GB > always shows one or more aborted Tx Collision Errors (with older versions of
GB > the system it always failed when one aborted Collision Error occurred).
GB > I’ve tried manipulating the routing tables on one of these systems to
GB > duplicate the problem. I found that in order to duplicate the issue I have
GB > to delete the entry in the routing table for the subnet (ie 172.16.6.0) and
GB > then the system behaves the same as when it fails. I have not yet had a
GB > chance to test this on a failed system however, and would like to have a
GB > little more knowledge of what I can do with a failed system before I try
GB > anything.
GB > I was wondering if anyone else has run into this sort of a problem at
GB > any point, and if they have any suggestions of how I could recover from it.
GB > I was thinking a full refresh of the routing table might work, however I am
GB > uncertain how I’d go about doing it programatically. If anyone has anything
GB > to offer, I’d appreciate their input greatly.

Is this just happening on IP or are you using QNET too?
(If just on IP maybe it’s worth getting QNET working too just to see if
this is happening at the driver layer or IP routing layer.)

What does ‘netstat -in’ & ‘netstat -rn’ look like before and after the
network stops transmitting?

gbuccini@westronic.com sed in <blsp2v$esj$1@inn.qnx.com>:

Hi All,
I’ve been running into a most perplexing problem recently with regard to
networking on an embedded QNX system we’re using here at work. The Ethernet

At least you have to say

  • the exact platform you are using
  • the exact version of the QNX you are using
  • any local tweaks or patches received from local sales rep

At first glance I couldn’t make out whether it’s an
embedded issue or just a plain TCP/IP primer issue.

kabe

Hello again,

At least you have to say

  • the exact platform you are using
  • the exact version of the QNX you are using
  • any local tweaks or patches received from local sales rep
    Using a x86 platform, with version 6.2.0 of QNX. No local patches.

I have managed to catch a system in the failure state here in my office and
have included some of the information I took from it. It seems I was
mistaken about the message that ping returns after the failure (No buffer
space available), however it does not display that message immediately. For
a while the ping command just runs without any error message, however no
packets are transmitted according to the nicinfo. The system in question
also does not send or respond to any ARP requests. We are also not using
any QNet services here. The information I gleaned from the system follows.

route show information (both before and after failure):

Internet:
Destination Gateway Flags
default 172.16.6.1 UG
127.0.0.1 127.0.0.1 UH
172.16.6.0 link#2 U
172.16.6.1 0:0:81:f3:32:bc UH
172.16.6.69 0:e0:f4:11:55:d2 UH
172.16.6.214 0:4:76:37:81:9c UH
172.16.6.229 0:80:c8:8a:e1:1a UH
172.16.6.237 link#2 UH

nicinfo information (after failure):

RealTek 8139 Ethernet Controller
Physical Node ID … 00E0F4 1155D2
Current Physical Node ID … 00E0F4 1155D2
Media Rate … 10.00 Mb/s half-duplex UTP
MTU … 1514
Lan … 0
I/O Port Range … 0x1000 → 0x10FF
Hardware Interrupt … 0xB
Promiscuous … Disabled
Multicast … Enabled

Total Packets Txd OK … 1701170
Total Packets Txd Bad … 7
Total Packets Rxd OK … 2258654
Total Rx Errors … 0

Total Bytes Txd … 1641967242
Total Bytes Rxd … 1392306773

Tx Collision Errors … 16842
Tx Collisions Errors (aborted) … 7
Carrier Sense Lost on Tx … 0
FIFO Underruns During Tx … 0
Tx deferred … 0
Out of Window Collisions … 0
FIFO Overruns During Rx … 0
Alignment errors … 0
CRC errors … 0

nicinfo information (after failure and ping attempts from within and
without of the system):

RealTek 8139 Ethernet Controller
Physical Node ID … 00E0F4 1155D2
Current Physical Node ID … 00E0F4 1155D2
Media Rate … 10.00 Mb/s half-duplex UTP
MTU … 1514
Lan … 0
I/O Port Range … 0x1000 → 0x10FF
Hardware Interrupt … 0xB
Promiscuous … Disabled
Multicast … Enabled

Total Packets Txd OK … 1701170
Total Packets Txd Bad … 7
Total Packets Rxd OK … 2262401
Total Rx Errors … 0

Total Bytes Txd … 1641967242
Total Bytes Rxd … 1392628847

Tx Collision Errors … 16842
Tx Collisions Errors (aborted) … 7
Carrier Sense Lost on Tx … 0
FIFO Underruns During Tx … 0
Tx deferred … 0
Out of Window Collisions … 0
FIFO Overruns During Rx … 0
Alignment errors … 0
CRC errors … 0

ping response (after failure):

ping 172.16.6.1

PING 172.16.6.1 (172.16.6.1): 56 data bytes
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1

— 172.16.6.1 ping statistics —
47 packets transmitted, 0 packets received, 100% packet loss

Hopefully this is enough information to give something of your thoughts.
Let me know what you think.

Regards,

Greg Buccini, E.I.T.
Electrical Engineer
Westronic Systems, Inc. - A Mediation Technology Company
Phone: (403) 250-8304 ext. 224
Fax: (403) 250-6711
Email: gbuccini@westronic.com
Website: http://www.westronic.com

Responded via e-mail.

“Greg Buccini” <gbuccini@westronic.com> wrote in message
news:bm24nj$8no$1@inn.qnx.com

Hello again,

At least you have to say

  • the exact platform you are using
  • the exact version of the QNX you are using
  • any local tweaks or patches received from local sales rep
    Using a x86 platform, with version 6.2.0 of QNX. No local patches.

I have managed to catch a system in the failure state here in my office
and
have included some of the information I took from it. It seems I was
mistaken about the message that ping returns after the failure (No buffer
space available), however it does not display that message immediately.
For
a while the ping command just runs without any error message, however no
packets are transmitted according to the nicinfo. The system in question
also does not send or respond to any ARP requests. We are also not using
any QNet services here. The information I gleaned from the system
follows.

route show information (both before and after failure):

Internet:
Destination Gateway Flags
default 172.16.6.1 UG
127.0.0.1 127.0.0.1 UH
172.16.6.0 link#2 U
172.16.6.1 0:0:81:f3:32:bc UH
172.16.6.69 0:e0:f4:11:55:d2 UH
172.16.6.214 0:4:76:37:81:9c UH
172.16.6.229 0:80:c8:8a:e1:1a UH
172.16.6.237 link#2 UH

nicinfo information (after failure):

RealTek 8139 Ethernet Controller
Physical Node ID … 00E0F4 1155D2
Current Physical Node ID … 00E0F4 1155D2
Media Rate … 10.00 Mb/s half-duplex UTP
MTU … 1514
Lan … 0
I/O Port Range … 0x1000 → 0x10FF
Hardware Interrupt … 0xB
Promiscuous … Disabled
Multicast … Enabled

Total Packets Txd OK … 1701170
Total Packets Txd Bad … 7
Total Packets Rxd OK … 2258654
Total Rx Errors … 0

Total Bytes Txd … 1641967242
Total Bytes Rxd … 1392306773

Tx Collision Errors … 16842
Tx Collisions Errors (aborted) … 7
Carrier Sense Lost on Tx … 0
FIFO Underruns During Tx … 0
Tx deferred … 0
Out of Window Collisions … 0
FIFO Overruns During Rx … 0
Alignment errors … 0
CRC errors … 0

nicinfo information (after failure and ping attempts from within and
without of the system):

RealTek 8139 Ethernet Controller
Physical Node ID … 00E0F4 1155D2
Current Physical Node ID … 00E0F4 1155D2
Media Rate … 10.00 Mb/s half-duplex UTP
MTU … 1514
Lan … 0
I/O Port Range … 0x1000 → 0x10FF
Hardware Interrupt … 0xB
Promiscuous … Disabled
Multicast … Enabled

Total Packets Txd OK … 1701170
Total Packets Txd Bad … 7
Total Packets Rxd OK … 2262401
Total Rx Errors … 0

Total Bytes Txd … 1641967242
Total Bytes Rxd … 1392628847

Tx Collision Errors … 16842
Tx Collisions Errors (aborted) … 7
Carrier Sense Lost on Tx … 0
FIFO Underruns During Tx … 0
Tx deferred … 0
Out of Window Collisions … 0
FIFO Overruns During Rx … 0
Alignment errors … 0
CRC errors … 0

ping response (after failure):

ping 172.16.6.1

PING 172.16.6.1 (172.16.6.1): 56 data bytes
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1
ping: sendto: No buffer space available
ping: wrote 172.16.6.1 64 chars, ret=-1

— 172.16.6.1 ping statistics —
47 packets transmitted, 0 packets received, 100% packet loss

Hopefully this is enough information to give something of your thoughts.
Let me know what you think.

Regards,

Greg Buccini, E.I.T.
Electrical Engineer
Westronic Systems, Inc. - A Mediation Technology Company
Phone: (403) 250-8304 ext. 224
Fax: (403) 250-6711
Email: > gbuccini@westronic.com
Website: > http://www.westronic.com