Network driver gives up when using USB

Hi,
when using the USB intensively (i.e. copying files from a USB CD within a loop), the networl driver (RTL8169) hangs up. That means I can’t even set a ping to another PC.

I did (for testing purposes) write a script that copies a 500 MB file from USB CD to HDD in a loop.
After some hours the Network is down.

Platform: AMD Geode LX800

It doesn’t seem to be a CPU-load-problem; a simple burn-in of the CPU didn’t stop the network. The IDE also seems to be ok; copying files from HDD to HDD in a loop didn’t hang up the network interface until now either.

Regards
Mr.Green

I have to correct it:
Now this problem also occured when just using the IDE.
So it doesn’t seem to be a USB problem…

Still strange: Network and IDE don’t use same IRQ, the load on IDE is high, but the packets on Network only arrive with a frequency of ~ 10 Hz.

Are you using adaptive partitioning?

It might be useful to try the -p option to procnto to see if it helps.

We had a similar problem when trying adaptive partitioning that went away with the procnto -p option.

As far as I know we don’t use adaptive partitioning.
When running pidin arg, procnto is started without any parameter, and the “aps show -l” command fails (becuase aps doesn’t seem to be exist).
Does that mean adaptive partitioning is turned off?

You probably aren’t running AP if aps is not on your system.

If you run ‘pidin sched’ does any process have an ExtSched other than “System”?

On our systems the symptom was the network eventually failing. As I understand it, this is caused by some system calls never completing due to continuous kernel call preemption under high interrupt loads. You might try the -p option to procnto to rule out a similar problem. We were using APS at the time but I’m not sure the problem is exclusive to APS systems.

The ExtSched column is empty for every running process.

For using the -p option with procnto I have to manipulate the boot-image, haven’t I?
Or can I change the parameter while the system is running (I guess not :slight_smile:

Thanks!

Edit:
One minute ago the network died again - despite of “procnto -p” :frowning:

It seems to be a just a network problem, maybe the driver RTL8169 you are using is not the correct or last updated. Is the machine connected to other nodes via qnet, or just tcp/ip?

And after the net is down you have to review the io-net and the driver status… maybe a pidin -Pio-net (and variants) could help a lot, and ifconfig… what happend with your IP? Do you loose your address after network goes down?

Also you can try cat ‘/proc/qnetstats’ and check ‘sloginfo’ to see if there is a problem…

I saw just a network (maybe driver) problem… but, you can tell as a bit more datailed networking status when is OK and then after crash…

I do not think that is a problem with AP… at least, for me… Which QNX’s version are you running?

Regards,
Juan Manuel

Well it’s not the same problem we had, sorry to send you down the wrong path.

MrGreen,

Can you post a nicinfo when the problem occurs. Ideally one when everything is fine and another once the problem occurs.

When the problem happens, if you slay the network driver and io-net (or io-pkt) and restart it does the problem go away? Or are you forced to reboot QNX entirely.

Knowing that will help narrow down the focus of the search.

Tim

Hi,
after the network is down, the IP still remains the same.
I’ve added the verbosity parameter to io-net-d rtl…; when the interface goes down, sloginfo just says “Link Down”. No info why.
I’ve searched the Download section but the driver seems to be the latest revision. We also use the same driver on another platform (VIA), and until today it did never die.

nicinfo output before it died:
RealTek unknown Ethernet Controller

Physical Node ID … 000FC9 035BD8
Current Physical Node ID … 000FC9 035BD8
Current Operation Rate … 100.00 Mb/s full-duplex
Active Interface Type … MII
Active PHY address … 0
Maximum Transmittable data Unit … 1514
Maximum Receivable data Unit … 1514
Hardware Interrupt … 0xb
I/O Aperture … 0xfc00 - 0xfcff
Memory Aperture … 0xeffff000 - 0xeffff0ff
Promiscuous Mode … Off
Multicast Support … Enabled

Packets Transmitted OK … 598
Bytes Transmitted OK … 32411
Memory Allocation Failures on Transmit … 0

Packets Received OK … 1177
Bytes Received OK … 124907
Broadcast Packets Received OK … 1
Multicast Packets Received OK … 1
Memory Allocation Failures on Receive … 0

Single Collisions on Transmit … 0
Transmits aborted (excessive collisions) … 0
Transmit Underruns … 0
No Carrier on Transmit … 0
Receive Alignment errors … 0
Received packets with CRC errors … 0
Packets Dropped on receive … 0

nicinfo after death:

RealTek unknown Ethernet Controller

Physical Node ID … 000FC9 035BD8
Current Physical Node ID … 000FC9 035BD8
Current Operation Rate … 100.00 Mb/s full-duplex
Active Interface Type … MII
Active PHY address … 0
Maximum Transmittable data Unit … 1514
Maximum Receivable data Unit … 1514
Hardware Interrupt … 0xb
I/O Aperture … 0xfc00 - 0xfcff
Memory Aperture … 0xeffff000 - 0xeffff0ff
Promiscuous Mode … Off
Multicast Support … Enabled

Packets Transmitted OK … 1598
Bytes Transmitted OK … 86763
Memory Allocation Failures on Transmit … 0

Packets Received OK … 3269
Bytes Received OK … 336946
Broadcast Packets Received OK … 9
Multicast Packets Received OK … 9
Memory Allocation Failures on Receive … 0

Single Collisions on Transmit … 0
Transmits aborted (excessive collisions) … 0
Transmit Underruns … 0
No Carrier on Transmit … 0
Receive Alignment errors … 0
Received packets with CRC errors … 0
Packets Dropped on receive … 0

I couldn’t see anything suspicious there.

When i slay the io-net, then restart it and configure the interfaces with netmanager, it works fine again.

It looks as if the driver just “hangs up”.
We have a software running that continously processes the incoming packets (they arrive with a frequency of 10 hz). This software produces an error when missing a packet. So when the driver hangs up, I can see that error.
When doing a simple ping -f after the interface has hung up, some packets seem to arrive. Some pings are lost, but some arrive. The error message disappears and the software processes these packets. After some seconds the interface hangs up again.

Conclusion: I actually think that the network driver goes into kind of hang-up state, but it can be “released” (at least for some seconds) by doing a ping…

Maybe anybody has an idea :smiley:

Can you change the source frequency?. For example to set a more spaced packets arrival (1 hz) ?

Perhaps the opposite is happening. Your software produce an error thing so it hangs up the network driver, and then you see the error…

Again, you sould try ‘cat /proc/qnetstats’ and check ‘sloginfo’… and post it.

Juan Manuel

No, the software is fine :slight_smile: It works on other PCs without problems.

/proc/qnetstats isn’t available;

and as I said, when the interface died, sloginfo just said “Link Down”. There was no error before it died.

I will try to run it with 1 Hz.

I’ve just set up another PC (VIA platform) with the same software and test setup; this PC uses same IDE and Ethernet driver (ethernet driver is rtl8169 on both machines; both drivers in same version). It has run > 3 hours now without missing a packet.

Mr Green,

What is the date of the driver? Have you checked to see if there is a newer version?

This post on foundry27 indicates others have had similar issues and there was a driver update.

community.qnx.com/sf/discussion/ … _pagenum=1

What I find suspicious about your nicinfo is that it says “RealTek UNKNOWN Ethernet Controller”. That makes me think it didn’t quite identify the chip properly. What does a pci -vv show on your RealTek chip?

Tim

Hi,
I have used the experimental driver now and it works fine. It ran a week now without any failure.
I am just a bit worried to use an experimental driver within a customer’s product…

How likely is it that this “beta” driver will be put to the state of an official update?

pci -vv doesn’t identify the chip properly.

Mr.Green

According to QNX license you cannot use beta or experimental driver in a product. If you want it to become official faster you have to go through sales a request it. That can be a long and expensive process.