Hi everybody.
I’m having some trouble with Qnet.
My system consist of to x86 CPUs (x86, qnx 6.3 SP3). Some process of CPU#1 periodically access the resource managers running on CPU#2 over Qnet. By means of “nicinfo” and “sloginfo” I observe in CPU#2 ,that a few errors appears continuously and growing, but in CPU#1 all seems to be OK.
nicinfo on CPU#1:
RealTek 8139 Ethernet Controller
Physical Node ID … 00304F 51A19F
Current Physical Node ID … 00304F 51A19F
Current Operation Rate … 100.00 Mb/s full-duplex
Active Interface Type … MII
Active PHY address … 0
Maximum Transmittable data Unit … 1514
Maximum Receivable data Unit … 1514
Hardware Interrupt … 0xb
I/O Aperture … 0xec00 - 0xecff
Memory Aperture … 0xdfffff00 - 0xdfffffff
Promiscuous Mode … Off
Multicast Support … Enabled
Packets Transmitted OK … 343422
Bytes Transmitted OK … 39190221
Memory Allocation Failures on Transmit … 0
Packets Received OK … 321107
Bytes Received OK … 72174514
Memory Allocation Failures on Receive … 0
Single Collisions on Transmit … 0
Transmits aborted (excessive collisions) … 0
Transmit Underruns … 0
No Carrier on Transmit … 0
Receive Alignment errors … 0
Received packets with CRC errors … 0
Packets Dropped on receive … 0
nicinfo on CPU#2:
ns83815 : DP83815 Ethernet Controller
Physical Node ID … 0006D5 1098F3
Current Physical Node ID … 0006D5 1098F3
Current Operation Rate … 100.00 Mb/s full-duplex
Active Interface Type … MII
Active PHY address … 0
Maximum Transmittable data Unit … 1514
Maximum Receivable data Unit … 1514
Hardware Interrupt … 0xb
I/O Aperture … 0x1000 - 0x10ff
Promiscuous Mode … Off
Multicast Support … Enabled
Packets Transmitted OK … 3017967
Bytes Transmitted OK … 727686890
Packets Received OK … 3035888
Bytes Received OK … 289970721
Single Collisions on Transmit … 99
Multiple Collisions on Transmit … 141
Deferred Transmits … 0
Late Collision on Transmit errors … 0
Transmits aborted (excessive collisions) … 2
Transmits aborted (excessive deferrals) … 0
Transmit Underruns … 2
No Carrier on Transmit … 0
Receive Alignment errors … 0
Received packets with CRC errors … 34
Packets Dropped on receive … 0
Ethernet Headers out of range … 0
Oversized Packets received … 0
Short packets … 0
Total Frames experiencing Collison(s) … 240
sloginfo on CPU#1 gives this type of frequent errors (I think these are errors), as follows:
…
…
Feb 28 13:29:56 7 15 0 npm-qnet(L4): l4_rx_first_checks(): bad rxd pkt - hdr len 524 vs tot len 50
Feb 28 13:30:07 7 15 0 npm-qnet(L4): l4_rx_first_checks(): bad rxd pkt - hdr len 524 vs tot len 50
Feb 28 13:30:19 7 15 0 npm-qnet(L4): l4_rx_first_checks(): bad rxd pkt - hdr len 524 vs tot len 50
Feb 28 13:30:23 7 15 0 npm-qnet(L4): l4_rx_first_checks(): bad rxd pkt - hdr len 524 vs tot len 50
Feb 28 13:30:30 7 15 0 npm-qnet(L4): l4_rx_first_checks(): bad rxd pkt - hdr len 524 vs tot len 50
Feb 28 13:30:37 7 15 0 npm-qnet(L4): l4_rx_first_checks(): bad rxd pkt - hdr len 524 vs tot len 50
…
slofinfo on CPU#2 gives frequent timeouts with nd 12 (nd 12 is the CPU#1), as follows:
…
…
Feb 28 13:19:40 7 15 0 npm-qnet(L4): l4_tx_timeout(): timeout: nd 12 sc 8 dc 1 ss 1457224 tk 2666529 ct 2666531
Feb 28 13:19:52 7 15 0 npm-qnet(L4): l4_tx_timeout(): timeout: nd 12 sc 8 dc 1 ss 1457269 tk 2666590 ct 2666592
Feb 28 13:19:55 7 15 0 npm-qnet(L4): l4_tx_timeout(): timeout: nd 12 sc 8 dc 1 ss 1457282 tk 2666608 ct 2666610
Feb 28 13:20:03 7 15 0 npm-qnet(L4): l4_tx_timeout(): timeout: nd 12 sc 8 dc 1 ss 1457308 tk 2666644 ct 2666646
Feb 28 13:20:09 7 15 0 npm-qnet(L4): l4_tx_timeout(): timeout: nd 12 sc 8 dc 1 ss 1457333 tk 2666678 ct 2666680
Feb 28 13:20:39 7 15 0 npm-qnet(L4): l4_tx_timeout(): timeout: nd 12 sc 8 dc 1 ss 1457446 tk 2666827 ct 2666829
Feb 28 13:21:00 7 15 0 npm-qnet(L4): l4_tx_timeout(): timeout: nd 12 sc 8 dc 1 ss 1457524 tk 2666930 ct 2666932
Sometimes the communication between both CPUs becomes heavy and even the CPU#1 is not able to update the information collected from CPU#2, perhaps because of these errrors.
What is the meaning of the system log messages?
I would like to know what’s happening with the Qnet?
Any advice on new inquiries to find out the problem?
Thanks a lot.
Regards
ogr