Sigh, I’m back here again with issues moving to 6.3. It seems far more has changed than I thought and obviously not always for the better.
This time it has to do with getting TCPIP networking working reliably on our machine.
Here’s what I have hardware wise. A card cage board with a San Disk on the EIDE controller and 2 Intel ethernet cards, a USB port and a monitor port. All in all a very simple setup that works well under 6.1.
I’m to the stage in my porting from 6.1 that I have move the card cage board from my boot rig (where I boot off a HD and mount the San disk) to putting the board in the instrument card cage.
Now the problem I am seeing is that TCPIP networking isn’t always coming up correctly. Sometimes it comes up after 3-5 minutes and other times no matter what I do it just never comes up properly.
Here are the relevant lines from my sysinit file to start networking.
io-net -ptcpip -ppppmgr -dspeedo pci=0 -dspeedo pci=1 &
waitfor /dev/io-net 60
echo “Waiting for io-net to release sockets…”
io-net -ptcpip -ppppmgr -pqnet -dspeedo pci=0 -dspeedo pci=1 &
echo “Waiting for io-net to re-establish sockets…”
Here’s the net.cfg that I have on my machine:
route 10.42.104.1 0.0.0.0 0.0.0.0
lookup file bind
Now here’s what happens.
- Booting takes an VERY long time to get past the EIDE driver even though I turned off DMA access. This is the line to diskboot in my build file
[pri=10o] PATH=/proc/boot diskboot -s -vvvv -b1 -D0 -odevc-con,-n1
This is strange because the only device on the EIDE is the Sandisk. Looking in sloginfo I see the San disk recognized then I see it followed by a bunch of errors reported even though there are no other devices. Not sure if this has any bearing but it does make booting take longer which is unacceptable and needs to be fixed at some point.
My sysinit executes and I see the info printed to the screen about starting io-net, netmanger etc. Even though further above I have ‘random’ started in my sysinit file it always complains about having to use the pseudo random number generator. Strange. Anyone know why?
When I get to a long shell, I login in and type ifconfig -a. At that point I see the 192.168.0.2 address assigned to en1 (2nd card) and no IP address for en0 (1st card). Both cards are plugged in cable wise and card 1 is plugged into the corp net and the routes are set right to get a DHCP address since this works perfectly in my boot rig, just on in the real machine.
I try to ping out to the 192.168.0.2 address (my own card). Ping just sits there hung and I CTRL-C out after 5 seconds. I try to ping 192.168.0.1 which is the GUI Windows box and again it’s hung. If I look at the lights on the card while attempting the ping it certainly looks like packets are transferred because the lights blink. If I actually just sit there and wait ping will eventually respond after about 2 minutes with ping information. What’s strange is that it will suddenly start showing all the packets and mention that there has been 0% packet loss for the 2 minutes I’ve been waiting so I don’t know if this means the responses have been in a queue someplace for the 2 minutes. Of course the very next ping command then takes another 2 minutes to respond.
From the Windoze box I can ping 192.168.0.2 which means the address of the card got set by QNX correctly.
I have tried slaying io-net, inetd and re-typing the entire lines of code from the sysinit. Everying starts without complaint but once up I have no DHCP address to card 1 nor can I ping out card 2.
What’s really strange is that this does on occasion work. Once in a while (say every 10-15 boots) after about 3-4 minutes QNX will find itself and I’ll get a DHCP address on card 1 and be able to use both card 1 and 2 just fine and normally until the next re-boot.
Any idea’s what could be going on here? I’m totally stumped but am sure something is causing a driver someplace to be stuck not passing packets back to the O/S but transmitting outward just fine.