Proc32.425N availability

Hello QNX support folks,

I worked with QSSL about a year ago to identify Proc faults in the private
interface between Proc32 and Net, and to isolate what turned out to be timer
table interactions under heavy interrupt load. At the time, I was told a
new version of Proc32 would soon be posted. In the interim, we’ve been
using a patch that will reset the system if Proc faults occur.

We’d like to get an actual fix to Proc32. Is Proc32.425N (or higher)
available? All I need is the kernel, proper; I can build my own boot images
and patch the disk myself.

Thanks in advance,

-Jim Parnell
WorldGate Communications, Inc.

Proc32 version N hasn’t been offically released. You’ll need to contact
your sales rep to get an expermental version of Proc that incorporates that
fix.

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

James Parnell <spam@home.com> wrote in message
news:3DB060A8.600A258F@home.com

Hello QNX support folks,

I worked with QSSL about a year ago to identify Proc faults in the private
interface between Proc32 and Net, and to isolate what turned out to be
timer
table interactions under heavy interrupt load. At the time, I was told a
new version of Proc32 would soon be posted. In the interim, we’ve been
using a patch that will reset the system if Proc faults occur.

We’d like to get an actual fix to Proc32. Is Proc32.425N (or higher)
available? All I need is the kernel, proper; I can build my own boot
images
and patch the disk myself.

Thanks in advance,

-Jim Parnell
WorldGate Communications, Inc.

Will it be released? I have a pretty heavy network load on our systems too.

“Adam Mallory” <amallory@qnx.com> wrote in message
news:aopohh$e9k$1@nntp.qnx.com

Proc32 version N hasn’t been offically released. You’ll need to contact
your sales rep to get an expermental version of Proc that incorporates
that
fix.

Cheers,
Adam

QNX Software Systems Ltd.
[ > amallory@qnx.com > ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <> pschon@baste.magibox.net

James Parnell <> spam@home.com> > wrote in message
news:> 3DB060A8.600A258F@home.com> …
Hello QNX support folks,

I worked with QSSL about a year ago to identify Proc faults in the
private
interface between Proc32 and Net, and to isolate what turned out to be
timer
table interactions under heavy interrupt load. At the time, I was told
a
new version of Proc32 would soon be posted. In the interim, we’ve been
using a patch that will reset the system if Proc faults occur.

We’d like to get an actual fix to Proc32. Is Proc32.425N (or higher)
available? All I need is the kernel, proper; I can build my own boot
images
and patch the disk myself.

Thanks in advance,

-Jim Parnell
WorldGate Communications, Inc.

Proc N will probably never be released, I’ve already made further fixes
which will be in the next release coming soon (honest).

That said, that particular problem isn’t really targeted towards networks,
just in his case, he could generate a lot of interrupts. I’ve seen further
problems where some boards with many serial ports running in interrupt mode
at high baud rates do the exact same thing.

Again, if you feel the need for the expermental Proc, contact sales and
they’ll help you out.

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Bill Caroselli (Q-TPS) <QTPS@EarthLink.net> wrote in message
news:aopq80$609$1@inn.qnx.com

Will it be released? I have a pretty heavy network load on our systems
too.

“Adam Mallory” <> amallory@qnx.com> > wrote in message
news:aopohh$e9k$> 1@nntp.qnx.com> …

Proc32 version N hasn’t been offically released. You’ll need to contact
your sales rep to get an expermental version of Proc that incorporates
that
fix.

Cheers,
Adam

QNX Software Systems Ltd.
[ > amallory@qnx.com > ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <> pschon@baste.magibox.net

James Parnell <> spam@home.com> > wrote in message
news:> 3DB060A8.600A258F@home.com> …
Hello QNX support folks,

I worked with QSSL about a year ago to identify Proc faults in the
private
interface between Proc32 and Net, and to isolate what turned out to be
timer
table interactions under heavy interrupt load. At the time, I was
told
a
new version of Proc32 would soon be posted. In the interim, we’ve
been
using a patch that will reset the system if Proc faults occur.

We’d like to get an actual fix to Proc32. Is Proc32.425N (or higher)
available? All I need is the kernel, proper; I can build my own boot
images
and patch the disk myself.

Thanks in advance,

-Jim Parnell
WorldGate Communications, Inc.

\

Adam Mallory wrote:

Proc N will probably never be released, I’ve already made further fixes
which will be in the next release coming soon (honest).

That said, that particular problem isn’t really targeted towards networks,
just in his case, he could generate a lot of interrupts. I’ve seen further
problems where some boards with many serial ports running in interrupt mode
at high baud rates do the exact same thing.

Again, if you feel the need for the expermental Proc, contact sales and
they’ll help you out.

We have just identified a problem at our company this week, where Net
is segment violating under heavy load conditions. We have 4 Intel
82559 cards all operating in full duplex mode (with an async protocol
that takes full advantage of full duplex), and an interrupt driven
multiport serial card, as well, we are using the on-board real-time
clock to generate 512 interrupts per second, and the ticksize is 1ms;
so we definately have a lot of interrupts occuring. Is it possible
that this is related (I know Proc and Net are incestuous) ?

btw: yes “if I feel the need for the experimental Proc” I’ll contact
sales. At this point I am simply wondering “if I feel the need”
since it is a Net crash rather than a Proc crash (although the
kernel is completely hosed after the Net crash).

Adam Mallory wrote:

We have just identified a problem at our company this week, where Net
is segment violating under heavy load conditions. We have 4 Intel
82559 cards all operating in full duplex mode (with an async protocol
that takes full advantage of full duplex), and an interrupt driven
multiport serial card, as well, we are using the on-board real-time
clock to generate 512 interrupts per second, and the ticksize is 1ms;
so we definately have a lot of interrupts occuring. Is it possible
that this is related (I know Proc and Net are incestuous) ?


You should post the SIGSEV info as well as what dumper has to say.

The SIGSEGV is at 25:2A30 (which is in the driver for the second
instance of the 82557 driver). We have verified this isn’t a hardware
issue by testing on 8 different machines (all identical component
types). Out of four tests I ran looking specifically for variation
in the selector, I never saw it (it is always selector 25).

Oddly, the crash can be 100% repeatably reproduced by the simple
act of having our software make use of the serial ports across the
QNX LAN (yes - the ones that are on the multiport serial card).

Quick background on the app. The system is a redundant controller
that makes use of QNX networking for redundancy. Essentially what
we have is the following:

Ethernet /------±-- Ethernet
TCP/IP-A | |B / /—+ A| |B
| | / / | |
±--------------+ / / ±---------------+
| |-----/-----| |
| |-----------| |
| |===========| |
±-------------+ ±---------------+
//// \\ //// \\
/ ||\ / ||\ / ||\ / ||
Multiport serial Multiport serial

Our software has many levels of fault tolerance (it actually survives
the Net crash - the other node takes over), and one of them is that
if a device out in the field is accessable via the “backup” node (i.e.
the non-controlling node), the controlling node will access the
“backup” nodes serial ports (by doing an open("//2/dev/ser")) in order
to establish communications with the device (the software is actually
dealing with a triple hardware fault at this point). This is when the
Net crash happens.

How many serial ports do you have running (@ what baud)?

We have 8 serial ports running at 19200, however, due to on-board
buffering there is a worst case interrupt rate (which we are probably
hitting) of approx. 200 interrupts per second.

The netcards are probably generating upwords of 5000 interrupts per
second (400 mbits per/second aggregate). The RTC is exactly 512/sec,
combined with the 200/sec from the serial board, and the 1000/sec due
to ticksize we are looking at around 7000 interrupts/sec (not including
disk, which should be very low). The machines are 850Mhz Pentium III’s.

I have attached the result of “sin -PNet mem” before the crash occurs,
as well as the Net.dmp.

Thanks

Rennie

We have just identified a problem at our company this week, where Net
is segment violating under heavy load conditions. We have 4 Intel
82559 cards all operating in full duplex mode (with an async protocol
that takes full advantage of full duplex), and an interrupt driven
multiport serial card, as well, we are using the on-board real-time
clock to generate 512 interrupts per second, and the ticksize is 1ms;
so we definately have a lot of interrupts occuring. Is it possible
that this is related (I know Proc and Net are incestuous) ?

You should post the SIGSEV info as well as what dumper has to say. It’s
possible they are releated, but not directly as Net makes use of it’s own
timers, and does not use Proc’s. I suppose under heavy interrupt load it’s
possible that Net isn’t making forward progress fast enough to keep up with
the 3 nic cards, and the queue gets blown (not having looked at Net). Does
tracelog have anything to say (up the severity logging for more verbosity)
during the time of Net crash? In the end, it’s possible, so I think it’d be
worth at least trying Proc32 N.

How many serial ports do you have running (@ what baud)?

\

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Adam Mallory wrote:

Forgive me for being obtuse, but the Net.selectors file you gave doesn’t
show 25:2A30 as being in the second instance of the 82557 driver.

My original wording was poor.

refering to the following editted copy of Net.selectors (editted to
remove data extraneous to this conversation).

0160C000 is the Net.ether82557 code segment.

Net has four instances of the same code “mapped” into four different
selectors (15,25,35, and 45). The second instance (i.e. the second
Net.ether82557 driver to register with Net), is at Nets’ selector 25.
Assuming that the kernel is correct when it reports a SIGSEGV at
25:XXXX inside Net, isn’t it reasonable to assume that Nets’ selector
25 was active at the time Net SEGV’d ? This means that the instruction
pointer was at address 0160C000 + 2A30 (160EA30) does it not ?

Of course, since all the drivers are the same, the physical address of
the code would be the same no matter what driver was being used (but
presumably the data selector would be 2D in this case - although there
is no way to confirm that).

//2/bin/Net 61
0005 015F5000
0015 0160C000
0025 0160C000 002D 01714000
0035 0160C000
0045 0160C000
//2/bin/Net.ether82557 63
0005 0160C000
0015 016B9000
0025 01624000
//2/bin/Net.ether82557 64
0005 0160C000
0015 016F6000
0025 01708000
//2/bin/Net.ether82557 65
0005 0160C000
0015 0173F000
0025 01689000
//2/bin/Net.ether82557 66
0005 0160C000
0015 0177C000
0025 01695000

You
shouldn’t get much in the way of variation of selector numbers, other
than
another code segment perhaps.

Isn’t that what we’re talking about here (Net making calls into the
network driver code) ?

snip

I have attached the result of “sin -PNet mem” before the crash occurs,
as well as the Net.dmp.


The Net.dmp file has cs:ip of 55434673:736f7243, which is obviously wrong -
as well as an indication that dumper failed at the time.

True. In fact there is data from one of our configuration files in
there (which is truly bizarre since there is no file system corruption
after the system is rebooted).

Admittedly, I didn’t look at Net.dmp before I sent it. Like I said the
kernel is hosed at the time, so I was quite surprised to get a Net.dmp
at all (most times there is not a Net.dmp file after the SIGSEGV (and
rebooting the machine), since (I assume) dumper doesn’t get a chance
to run after the kernel is hosed.

Is there anything else I can get you to help track the problem down ?
(obviously I probably cannot get a valid .dmp file at this point).

Rennie

Rennie Allen wrote:

0160C000 is the Net.ether82557 code segment.

Dang! bad wording again.

Try:

0160C000 is where the Net.ether82557 code is loaded.

Adam Mallory wrote:

What version of QNX4 are we talking about?

Net - 4.25C, Net.ether82557 - 4.25G.

There is a Net.speedo driver
which was put out a while ago which is for the 82557/8/9; perhaps trying
that.

I’d love to. Where is it ? I checked the latest updates for QNX4 on
the website, and the latest is 4.25E (for which the release notes still
refer to Net.ether82557).

Have you spoken to support about your issue yet?

That’s next. Thanks for you help.

Rennie

The SIGSEGV is at 25:2A30 (which is in the driver for the second
instance of the 82557 driver). We have verified this isn’t a hardware
issue by testing on 8 different machines (all identical component
types). Out of four tests I ran looking specifically for variation
in the selector, I never saw it (it is always selector 25).

Forgive me for being obtuse, but the Net.selectors file you gave doesn’t
show 25:2A30 as being in the second instance of the 82557 driver. You
shouldn’t get much in the way of variation of selector numbers, other than
another code segment perhaps.

I have attached the result of “sin -PNet mem” before the crash occurs,
as well as the Net.dmp.

The Net.dmp file has cs:ip of 55434673:736f7243, which is obviously wrong -
as well as an indication that dumper failed at the time.

\

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Net has four instances of the same code “mapped” into four different
selectors (15,25,35, and 45). The second instance (i.e. the second
Net.ether82557 driver to register with Net), is at Nets’ selector 25.
Assuming that the kernel is correct when it reports a SIGSEGV at
25:XXXX inside Net, isn’t it reasonable to assume that Nets’ selector
25 was active at the time Net SEGV’d ? This means that the instruction
pointer was at address 0160C000 + 2A30 (160EA30) does it not ?

Yes, I know that the driver code and data segments are mapped into Net etc,
I think we just both misunderstood each other here.

Is there anything else I can get you to help track the problem down ?
(obviously I probably cannot get a valid .dmp file at this point).

What version of QNX4 are we talking about? There is a Net.speedo driver
which was put out a while ago which is for the 82557/8/9; perhaps trying
that. Have you spoken to support about your issue yet?

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

I’d love to. Where is it ? I checked the latest updates for QNX4 on
the website, and the latest is 4.25E (for which the release notes still
refer to Net.ether82557).

Have you spoken to support about your issue yet?

That’s next. Thanks for you help.

For that you’ll have to contact support to get a binary of both the driver
and the Proc.

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>