Network boot problems

Hi all

I’m experiencing some problems while trying to perform a network boot. My
set-up is as follows; two Geode CPU PC cards, both with dual Intel 82559
Ethernet controllers. The BIOS of the diskless boot client has been updated
by the board manufacturer to include Etherboot (etherboot.sourceforge.net)
and the client uses BOOTP for its boot requests. The server card has a hard
drive and runs a dhcpd v2.0 boot server process. I’m using QNX 4.25E from
the May 2001 CD release.

My server node is node #8 and the client is node #9. The netmap file looks
like this (updated to be the same on all 12 nodes in our network):

8 1 00D0C9 350375
8 2 00D0C9 350376
9 2 0020CE C78003

The kernel of the boot client looks like this:

sys/boot
$ boot -v

sys/Proc32
$ Proc32 -l 9

sys/Slib32
$ Slib32

sys/Slib16
$ Slib16

/bin/Net
$ Net -E3 -T -n15 -m “8 2 00D0C9 350376”

/bin/Net.ether82557
$ Net.ether82557 -I0 -l2 -v

/bin/sinit
$ sinit -r //8/ TERM=qnx TZ=$(TZ)

The kernel is compiled with buildqnx -b 0x10000 build/ws.ether82557
images/kernel and then adapted to be used with Etherboot with mkQNXnbi -i
images/kernel -o /tftpboot/kernel

Most of the time this will work, but occasionally the boot will fail for
various reasons. I never get any indication that there was any problem for
tftp to send the kernel to the client, my feeling is that the problems start
after the QNX kernel is launched on the client. Sometimes I get a kernel
crash:

Version 425.L Feb 15 2001 Technical Support: +1 (613) 591-0941
Proc fault 1, ldt 100 sys/Proc32; fault e+0
cd:eip=5:10568 ss:esp=d:f7c0f84 efl=12246 ds=d es=d fs=0 gs=0
eax/0 ebx/f7c0fd0 ecx/668 edx/8 esi/5965 edi/3820 ebp/f7c0fb4
Stack (d:f7c0f84)
00000020 00015c35 000159e0 000163c0 00018319 000028f0 00000000 00000000
00000006 0000bf23 000139d0 000159e0 0f7c0fe8 00000000 00005965 0000001f
00000668 0f7c0fd0 00005836 00003820 0000000d 00000610 00003820 0000000d
00000002 00000001 0000594d 00005965 0f7c092d 00000668 00005998 00000000
Process Entry (addr 6050)
00000000 00000001 00000000 00000001 00000000 00000000 30020207 00001e1e
00005840 0100000d 00006108 ffffffff 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000501 000d0005 00007118 00000000 00000009
00000038 00000000 00000046 0000c140 00000000 00000000 00010090 00000000
00000000 00000000 00000000 ffff0001 00000000 00000000 00000000

Can anyone tell me the reason for this crash?

Other times it is as if there are big delays in the network traffic between
boot client and server even though their Ethernet controllers are connected
directly by a cable. These ‘delays’ will occur even if node #8 is
disconnected from logical net #1 and only one Net.ether82557 is running. I
added debug options to boot and Proc32:

boot: serial started
cpu 586,fpu 38469632,speed 587Mhz,box 20250624, bus 309

309MHZ 586/587 PCI bus boot modules:
sys/Proc32
sys/Slib32
sys/Slib16
/bin/Net
/bin/Net.ether82557
/bin/sinit
starting QNX…
Proc output to serial port 3f8 at 9600 baud
Unable to exec /bin/sh: No such process.

Other times the boot has stopped or failed partially, here are some of the
error messages printed then:

  • Unable to exec /bin/sh: Input/output error
  • kbd: No such process (in sysinit I set the keyboard mapping)
  • Could not link shared object ‘rpc_so’ – Input/output error

Help?

Thanks in advance,
P-O Håkansson

Hi,

Does this happen on the systems or just one in particular? If it happens
on one system in particular, try changing network cards. Also as a
suggestion, try switching the servers network card to see if it might be
sending out bad packets.

We also have a Pre-Alpha version of Proc that might help solve this problem
too. I will see if I can send it to you.

Erick.


Hi all

I’m experiencing some problems while trying to perform a network boot. My
set-up is as follows; two Geode CPU PC cards, both with dual Intel 82559
Ethernet controllers. The BIOS of the diskless boot client has been updated
by the board manufacturer to include Etherboot (etherboot.sourceforge.net)
and the client uses BOOTP for its boot requests. The server card has a hard
drive and runs a dhcpd v2.0 boot server process. I’m using QNX 4.25E from
the May 2001 CD release.

My server node is node #8 and the client is node #9. The netmap file looks
like this (updated to be the same on all 12 nodes in our network):

8 1 00D0C9 350375
8 2 00D0C9 350376
9 2 0020CE C78003

The kernel of the boot client looks like this:

sys/boot
$ boot -v

sys/Proc32
$ Proc32 -l 9

sys/Slib32
$ Slib32

sys/Slib16
$ Slib16

/bin/Net
$ Net -E3 -T -n15 -m “8 2 00D0C9 350376”

/bin/Net.ether82557
$ Net.ether82557 -I0 -l2 -v

/bin/sinit
$ sinit -r //8/ TERM=qnx TZ=$(TZ)

The kernel is compiled with buildqnx -b 0x10000 build/ws.ether82557
images/kernel and then adapted to be used with Etherboot with mkQNXnbi -i
images/kernel -o /tftpboot/kernel

Most of the time this will work, but occasionally the boot will fail for
various reasons. I never get any indication that there was any problem for
tftp to send the kernel to the client, my feeling is that the problems start
after the QNX kernel is launched on the client. Sometimes I get a kernel
crash:

Version 425.L Feb 15 2001 Technical Support: +1 (613) 591-0941
Proc fault 1, ldt 100 sys/Proc32; fault e+0
cd:eip=5:10568 ss:esp=d:f7c0f84 efl=12246 ds=d es=d fs=0 gs=0
eax/0 ebx/f7c0fd0 ecx/668 edx/8 esi/5965 edi/3820 ebp/f7c0fb4
Stack (d:f7c0f84)
00000020 00015c35 000159e0 000163c0 00018319 000028f0 00000000 00000000
00000006 0000bf23 000139d0 000159e0 0f7c0fe8 00000000 00005965 0000001f
00000668 0f7c0fd0 00005836 00003820 0000000d 00000610 00003820 0000000d
00000002 00000001 0000594d 00005965 0f7c092d 00000668 00005998 00000000
Process Entry (addr 6050)
00000000 00000001 00000000 00000001 00000000 00000000 30020207 00001e1e
00005840 0100000d 00006108 ffffffff 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000501 000d0005 00007118 00000000 00000009
00000038 00000000 00000046 0000c140 00000000 00000000 00010090 00000000
00000000 00000000 00000000 ffff0001 00000000 00000000 00000000

Can anyone tell me the reason for this crash?

Other times it is as if there are big delays in the network traffic between
boot client and server even though their Ethernet controllers are connected
directly by a cable. These ‘delays’ will occur even if node #8 is
disconnected from logical net #1 and only one Net.ether82557 is running. I
added debug options to boot and Proc32:

boot: serial started
cpu 586,fpu 38469632,speed 587Mhz,box 20250624, bus 309

309MHZ 586/587 PCI bus boot modules:
sys/Proc32
sys/Slib32
sys/Slib16
/bin/Net
/bin/Net.ether82557
/bin/sinit
starting QNX…
Proc output to serial port 3f8 at 9600 baud
Unable to exec /bin/sh: No such process.

Other times the boot has stopped or failed partially, here are some of the
error messages printed then:

  • Unable to exec /bin/sh: Input/output error
  • kbd: No such process (in sysinit I set the keyboard mapping)
  • Could not link shared object ‘rpc_so’ – Input/output error

Help?

Thanks in advance,

“P-O Håkansson” <par-olof.hakansson@gambro.com> wrote in
news:a5ld7u$sho$1@inn.qnx.com:


The kernel is compiled with buildqnx -b 0x10000 build/ws.ether82557
images/kernel and then adapted to be used with Etherboot with mkQNXnbi
-i images/kernel -o /tftpboot/kernel

It should be noted the program boot is executing 16 bit code, which tends
to bomb when etherboot boots it. I’ve gotten Neutrino to come up using
Etherboot, but with the caveat that IPL code can’t run since we’re in 32-
bit mode.

Also the mkqnxnbi util isn’t supported by QSSL software.

Most of the time this will work, but occasionally the boot will fail
for various reasons. I never get any indication that there was any
problem for tftp to send the kernel to the client, my feeling is that
the problems start after the QNX kernel is launched on the client.
Sometimes I get a kernel crash:

Does it crash if you don’t (didn’t use) etherboot - You mentioned it was
after an upgrade you got the etherboot feature.


Version 425.L Feb 15 2001 Technical Support: +1 (613) 591-0941
Proc fault 1, ldt 100 sys/Proc32; fault e+0
cd:eip=5:10568 ss:esp=d:f7c0f84 efl=12246 ds=d es=d fs=0 gs=0
eax/0 ebx/f7c0fd0 ecx/668 edx/8 esi/5965 edi/3820 ebp/f7c0fb4
Stack (d:f7c0f84)
00000020 00015c35 000159e0 000163c0 00018319 000028f0 00000000 00000000
00000006 0000bf23 000139d0 000159e0 0f7c0fe8 00000000 00005965 0000001f
00000668 0f7c0fd0 00005836 00003820 0000000d 00000610 00003820 0000000d
00000002 00000001 0000594d 00005965 0f7c092d 00000668 00005998 00000000
Process Entry (addr 6050)
00000000 00000001 00000000 00000001 00000000 00000000 30020207 00001e1e
00005840 0100000d 00006108 ffffffff 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000501 000d0005 00007118 00000000 00000009
00000038 00000000 00000046 0000c140 00000000 00000000 00010090 00000000
00000000 00000000 00000000 ffff0001 00000000 00000000 00000000

Can anyone tell me the reason for this crash?

Seems to be bombing in a VC attach request. If you don’t start Net or your
ethernet driver, does the crashes occur? Perhaps, etherboot is leaving the
controller in a state not anticipated by the ethernet driver.

Other times it is as if there are big delays in the network traffic
between boot client and server even though their Ethernet controllers
are connected directly by a cable. These ‘delays’ will occur even if
node #8 is disconnected from logical net #1 and only one Net.ether82557
is running. I added debug options to boot and Proc32:

boot: serial started
cpu 586,fpu 38469632,speed 587Mhz,box 20250624, bus 309

309MHZ 586/587 PCI bus boot modules:
sys/Proc32
sys/Slib32
sys/Slib16
/bin/Net
/bin/Net.ether82557
/bin/sinit
starting QNX…
Proc output to serial port 3f8 at 9600 baud
Unable to exec /bin/sh: No such process.

Other times the boot has stopped or failed partially, here are some of
the error messages printed then:

  • Unable to exec /bin/sh: Input/output error
  • kbd: No such process (in sysinit I set the keyboard mapping)
  • Could not link shared object ‘rpc_so’ – Input/output error

These could be symptoms of the protected mode confusion.

\

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Hi

Thanks for your reply.

“Adam Mallory” <amallory@qnx.com> wrote in message
news:Xns91C7A168298EBamalloryqnxcom@209.226.137.4

It should be noted the program boot is executing 16 bit code, which tends
to bomb when etherboot boots it. I’ve gotten Neutrino to come up using
Etherboot, but with the caveat that IPL code can’t run since we’re in 32-
bit mode.

snip
Does it crash if you don’t (didn’t use) etherboot - You mentioned it was
after an upgrade you got the etherboot feature.

My first attempt at network booting was with PXE in the BIOS and using GNU
GRUB for booting the client. This involved fairly many steps with tftp etc.
and took about 1min 15s for the client to boot. I then learnt that the
supplier of the PC boards we are using said that they used BOOTP for network
booting. I got a new BIOS with etherboot from them and that cut the boot
time to 30s (when it worked…). I also used the special Net.ether82559 they
supplied, not the original QSSL Net.ether82557. I admit that I wrote
Net.ether82557 in my post to the newsgroup, I felt I was already using a bit
too much unspported software to get a reply from QSSL, sorry :slight_smile:

The PC boards have Intel 82559 ER chips and since the board supplier
specially sent me their Net.ether82559 I thought I’d better use it, they
wouldn’t send it if there wasn’t a reason for it. Now I have realized that
the Net.Ether82559 was no good, it caused delays in our entire QNX network,
it seems it confused nameloc. When I’m using the standard Net.Ether82557 it
seems to work OK.

Would you recommend against using Etherboot then? Am I just lucky that it
happens to work now and should I go with the slower PXE/GRUB combination
instead?

Thanks for your help,
P-O

“P-O Håkansson” <par-olof.hakansson@gambro.com> wrote in
news:a625m9$1tg$1@inn.qnx.com:

My first attempt at network booting was with PXE in the BIOS and using
GNU GRUB for booting the client. This involved fairly many steps with
tftp etc. and took about 1min 15s for the client to boot. I then learnt
that the supplier of the PC boards we are using said that they used
BOOTP for network booting. I got a new BIOS with etherboot from them
and that cut the boot time to 30s (when it worked…). I also used the
special Net.ether82559 they supplied, not the original QSSL
Net.ether82557. I admit that I wrote Net.ether82557 in my post to the
newsgroup, I felt I was already using a bit too much unspported
software to get a reply from QSSL, sorry > :slight_smile:

Personally (and this is my opinion, not QSSL’s), if I can answer a question
or help out - I will, regardless of how much unsupported software you’re
using. But inaccurate information is doing yourself a disservice, and
usually draws issues out longer than nessesary. IMHO, most technical
people aren’t interested in PR or policy when it comes between helping
someone and supporting someone elses product - we just lend a hand when
possible. …

The PC boards have Intel 82559 ER chips and since the board supplier
specially sent me their Net.ether82559 I thought I’d better use it,
they wouldn’t send it if there wasn’t a reason for it. Now I have
realized that the Net.Ether82559 was no good, it caused delays in our
entire QNX network, it seems it confused nameloc. When I’m using the
standard Net.Ether82557 it seems to work OK.

It could be version-itus, the driver may be a little dated on the Net
interface, or there could be a bug. I’d stick with what works rather than
using the driver from the board supplier.

Would you recommend against using Etherboot then? Am I just lucky that
it happens to work now and should I go with the slower PXE/GRUB
combination instead?

No, not at all - I’ve used etherboot for a few things. I was mainly
concerned that etherboot was your only way of uploading an image to the
board, and that you had no work around, thus being a show-stopping problem.


\

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>