qnx crashing under high load

Hello,

Has anyone else had problems with QNX under high load?

I am running some data logging software, which at high sample rates will
load the CPU considerably, especially when dumping to disk. Regularly, the
system will hang and reboot. No errors.

Any help or similar experiences would be very much appreciated.

QNX OS version 6.2.1
mobo: x86 pcm5820

Thank you,
Robert Muil.

  1. 6.3 looks more stable than 6.2.1

  2. Do you see any message on the screen when it hangs?

Roman

“Robert Muil” <r.muil@crcmining.com.au> wrote in message
news:cdi7tu$bfr$1@inn.qnx.com

Hello,

Has anyone else had problems with QNX under high load?

I am running some data logging software, which at high sample rates will
load the CPU considerably, especially when dumping to disk. Regularly, the
system will hang and reboot. No errors.

Hardware problem, excessive heat are what comes to mind.

Any help or similar experiences would be very much appreciated.

QNX OS version 6.2.1
mobo: x86 pcm5820

Thank you,
Robert Muil.

Heat is not an issue: the system is an embedded fan-less board designed for
industrial applications.

I have yet to try 6.3, but I am loathe to jump headlong into inevitable
troubles implicit in updating.

On this motherboard, there is no message at all. No system log. It just
hangs, waits about 2 minutes, then reboots.

Cheers,
Robert.

“Roman Pavlyuk” <eleks_76@yahoo.com> wrote in message
news:cdjjih$f9i$1@inn.qnx.com

  1. 6.3 looks more stable than 6.2.1

  2. Do you see any message on the screen when it hangs?

Roman

I think you’ll find it’s the CPU card at fault. The Geodes seem to
heavily utilise BIOS call-outs to maintain the chipset. These are
devastating on real-time performance.

Can you describe your ‘high load’? What is the system doing? Do you have a
lot of message passing going on? How it is organized? I mean how many
senders/receivers you have, how many channels are used, etc.

– igor

Robert Muil wrote:

Hello,

Has anyone else had problems with QNX under high load?

I am running some data logging software, which at high sample rates will
load the CPU considerably, especially when dumping to disk. Regularly, the
system will hang and reboot. No errors.

Any help or similar experiences would be very much appreciated.

QNX OS version 6.2.1
mobo: x86 pcm5820

Thank you,
Robert Muil.

Igor,

It is a simple application:

  • 2 driver processes each controlling an analog io card
  • 1 data logging process that read from the drivers and log to disk

The driver/logger interface is resource manager open/read. The data logging
process reads all channels from the drivers at 1kHz. When it has 10,000
samples, it dumps these to disk using the zlib gzio library.



“Igor Kovalenko” <igor.kovalenko@motorola.com> wrote in message
news:ce4q1h$pt1$1@inn.qnx.com

Can you describe your ‘high load’? What is the system doing? Do you have a
lot of message passing going on? How it is organized? I mean how many
senders/receivers you have, how many channels are used, etc.

– igor

Robert Muil wrote:

Hello,

Has anyone else had problems with QNX under high load?

I am running some data logging software, which at high sample rates will
load the CPU considerably, especially when dumping to disk. Regularly,
the
system will hang and reboot. No errors.

Any help or similar experiences would be very much appreciated.

QNX OS version 6.2.1
mobo: x86 pcm5820

Thank you,
Robert Muil.



\

This is a bit too vague description to make conclusions. I can share one
experience though - make sure you don’t have a situation where a lot of
pulses are directed to a single channel. Split the load between different
channels if possible.

“Robert Muil” <r.muil@crcmining.com.au> wrote in message
news:cea2e0$sl2$1@inn.qnx.com

Igor,

It is a simple application:

  • 2 driver processes each controlling an analog io card
  • 1 data logging process that read from the drivers and log to disk

The driver/logger interface is resource manager open/read. The data
logging
process reads all channels from the drivers at 1kHz. When it has 10,000
samples, it dumps these to disk using the zlib gzio library.



“Igor Kovalenko” <> igor.kovalenko@motorola.com> > wrote in message
news:ce4q1h$pt1$> 1@inn.qnx.com> …
Can you describe your ‘high load’? What is the system doing? Do you have
a
lot of message passing going on? How it is organized? I mean how many
senders/receivers you have, how many channels are used, etc.

– igor

Robert Muil wrote:

Hello,

Has anyone else had problems with QNX under high load?

I am running some data logging software, which at high sample rates
will
load the CPU considerably, especially when dumping to disk. Regularly,
the
system will hang and reboot. No errors.

Any help or similar experiences would be very much appreciated.

QNX OS version 6.2.1
mobo: x86 pcm5820

Thank you,
Robert Muil.





\

Igor,

I realise how vague it is - but unfortuanetly, that is how the problem has
presented. It doesn’t seem tied to any particular motherboard, nor a
particular method I am using.

I will investigate further and have a look at the channels situation. What
do you define as a lot of pulses? I essentially have only 1, firing every
millisecond.

BTW, I have only just twigged to the fact that you are the author of ‘spin’.
Thank you indeed - it is an invaluable replacement for top, and I use it all
the time.

Robert.

“Igor Kovalenko” <kovalenko@comcast.net> wrote in message
news:ceh89t$f8f$1@inn.qnx.com

This is a bit too vague description to make conclusions. I can share one
experience though - make sure you don’t have a situation where a lot of
pulses are directed to a single channel. Split the load between different
channels if possible.

“Robert Muil” <> r.muil@crcmining.com.au> > wrote in message
news:cea2e0$sl2$> 1@inn.qnx.com> …
Igor,

It is a simple application:

  • 2 driver processes each controlling an analog io card
  • 1 data logging process that read from the drivers and log to disk

The driver/logger interface is resource manager open/read. The data
logging
process reads all channels from the drivers at 1kHz. When it has 10,000
samples, it dumps these to disk using the zlib gzio library.



“Igor Kovalenko” <> igor.kovalenko@motorola.com> > wrote in message
news:ce4q1h$pt1$> 1@inn.qnx.com> …
Can you describe your ‘high load’? What is the system doing? Do you
have
a
lot of message passing going on? How it is organized? I mean how many
senders/receivers you have, how many channels are used, etc.

– igor

Robert Muil wrote:

Hello,

Has anyone else had problems with QNX under high load?

I am running some data logging software, which at high sample rates
will
load the CPU considerably, especially when dumping to disk.
Regularly,
the
system will hang and reboot. No errors.

Any help or similar experiences would be very much appreciated.

QNX OS version 6.2.1
mobo: x86 pcm5820

Thank you,
Robert Muil.







\

Robert Muil <r.muil@crcmining.com.au> wrote:

Igor,

I realise how vague it is - but unfortuanetly, that is how the problem has
presented. It doesn’t seem tied to any particular motherboard, nor a
particular method I am using.

I will investigate further and have a look at the channels situation. What
do you define as a lot of pulses? I essentially have only 1, firing every
millisecond.

If you fall behind on pulses, they queue can grow, and can slow
the system down, badly.

-David


Please follow-up to newsgroup, rather than personal email.
David Gibbs
QNX Training Services
dagibbs@qnx.com

Now that could certainly explain the symptoms.

Any way to stop pulses queuing? In other words, a missed pulse is cancelled?

Cheers,
Rob.

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:ceoa81$pd2$1@inn.qnx.com

Robert Muil <> r.muil@crcmining.com.au> > wrote:
Igor,

I realise how vague it is - but unfortuanetly, that is how the problem
has
presented. It doesn’t seem tied to any particular motherboard, nor a
particular method I am using.

I will investigate further and have a look at the channels situation.
What
do you define as a lot of pulses? I essentially have only 1, firing
every
millisecond.

If you fall behind on pulses, they queue can grow, and can slow
the system down, badly.

-David


Please follow-up to newsgroup, rather than personal email.
David Gibbs
QNX Training Services
dagibbs@qnx.com

Robert Muil <r.muil@crcmining.com.au> wrote:

Now that could certainly explain the symptoms.

Any way to stop pulses queuing? In other words, a missed pulse is cancelled?

No, no way to stop them being queued – they are guaranteed to be
delivered.

Well, with a single-pulse queued to a process, and it getting any reasonable
amount of CPU time, it shouldn’t happen.

That is, to see this sort of symptom with a single pulse to the server,
you’d have to miss (at least) 100s of thousands of pulses. It’s the
sort of thing that would happen if the pulse receiver is either getting no
CPU due to a pre-emption problem, or has a bug such that it never gets to
a MsgReceive() to get the pulses.

It actually occurs most often when debugging a system – somebody drops
a server with pulses being queued into the debugger, and then leaves it
there over lunch, or over night.

-David

Please follow-up to newsgroup, rather than personal email.
David Gibbs
QNX Training Services
dagibbs@qnx.com

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:cetqm8$439$1@inn.qnx.com

Robert Muil <> r.muil@crcmining.com.au> > wrote:
Now that could certainly explain the symptoms.

Any way to stop pulses queuing? In other words, a missed pulse is
cancelled?

No, no way to stop them being queued – they are guaranteed to be
delivered.

Well, with a single-pulse queued to a process, and it getting any
reasonable
amount of CPU time, it shouldn’t happen.

That is, to see this sort of symptom with a single pulse to the server,
you’d have to miss (at least) 100s of thousands of pulses. It’s the

I suspect you don’t have to fall behind that far. I’ve seen a drastic
slowdown effect with about 1005ms + 10020ms pulses going to a single
channel on 800Mhz CPU. In fact under severe load in that fashion, the system
does not just ‘slow down’. It essentially folds. Interrupts may be serviced
with HUGE latencies (on order of SECONDS), or perhaps they are just not
being serviced at all for extended periods of time.

The problem is, the event queues need to be kept sorted (per channel). The
longer they get, the more time is spent on sorting. This is why splitting
the load among many channels helps considerably. In that particular case,
splitting the pulses among 100 channels resolved all performance problems.

– igor

“Robert Muil” <r.muil@crcmining.com.au> wrote in message
news:cen19i$ojc$1@inn.qnx.com

Igor,
BTW, I have only just twigged to the fact that you are the author of
‘spin’.
Thank you indeed - it is an invaluable replacement for top, and I use it
all
the time.

:slight_smile:
That’s because you did not comply with the license, that says ‘send me an
e-card if you find it useful’.
It’s in the use message :wink:

Cheers
– igor

Robert.

“Igor Kovalenko” <> kovalenko@comcast.net> > wrote in message
news:ceh89t$f8f$> 1@inn.qnx.com> …
This is a bit too vague description to make conclusions. I can share one
experience though - make sure you don’t have a situation where a lot of
pulses are directed to a single channel. Split the load between
different
channels if possible.

“Robert Muil” <> r.muil@crcmining.com.au> > wrote in message
news:cea2e0$sl2$> 1@inn.qnx.com> …
Igor,

It is a simple application:

  • 2 driver processes each controlling an analog io card
  • 1 data logging process that read from the drivers and log to
    disk

The driver/logger interface is resource manager open/read. The data
logging
process reads all channels from the drivers at 1kHz. When it has
10,000
samples, it dumps these to disk using the zlib gzio library.



“Igor Kovalenko” <> igor.kovalenko@motorola.com> > wrote in message
news:ce4q1h$pt1$> 1@inn.qnx.com> …
Can you describe your ‘high load’? What is the system doing? Do you
have
a
lot of message passing going on? How it is organized? I mean how
many
senders/receivers you have, how many channels are used, etc.

– igor

Robert Muil wrote:

Hello,

Has anyone else had problems with QNX under high load?

I am running some data logging software, which at high sample
rates
will
load the CPU considerably, especially when dumping to disk.
Regularly,
the
system will hang and reboot. No errors.

Any help or similar experiences would be very much appreciated.

QNX OS version 6.2.1
mobo: x86 pcm5820

Thank you,
Robert Muil.









\

Igor Kovalenko wrote:

“David Gibbs” <> dagibbs@qnx.com> > wrote in message
news:cetqm8$439$> 1@inn.qnx.com> …

Robert Muil <> r.muil@crcmining.com.au> > wrote:

Now that could certainly explain the symptoms.

Any way to stop pulses queuing? In other words, a missed pulse is

cancelled?

No, no way to stop them being queued – they are guaranteed to be
delivered.

Well, with a single-pulse queued to a process, and it getting any

reasonable

amount of CPU time, it shouldn’t happen.

That is, to see this sort of symptom with a single pulse to the server,
you’d have to miss (at least) 100s of thousands of pulses. It’s the


I suspect you don’t have to fall behind that far. I’ve seen a drastic
slowdown effect with about 1005ms + 10020ms pulses going to a single
channel on 800Mhz CPU. In fact under severe load in that fashion, the system
does not just ‘slow down’. It essentially folds. Interrupts may be serviced
with HUGE latencies (on order of SECONDS), or perhaps they are just not
being serviced at all for extended periods of time.

The problem is, the event queues need to be kept sorted (per channel).

… and keep in mind that pulses are real messages, that means the memory
requirement of the message queues can grow dramatically.

Sorting under memory constraints is not so easy :slight_smile:

Regards

Armin


The

longer they get, the more time is spent on sorting. This is why splitting
the load among many channels helps considerably. In that particular case,
splitting the pulses among 100 channels resolved all performance problems.

– igor

Igor Kovalenko <kovalenko@comcast.net> wrote:

“David Gibbs” <> dagibbs@qnx.com> > wrote in message
news:cetqm8$439$> 1@inn.qnx.com> …
Robert Muil <> r.muil@crcmining.com.au> > wrote:
Now that could certainly explain the symptoms.

Any way to stop pulses queuing? In other words, a missed pulse is
cancelled?

No, no way to stop them being queued – they are guaranteed to be
delivered.

Well, with a single-pulse queued to a process, and it getting any
reasonable
amount of CPU time, it shouldn’t happen.

That is, to see this sort of symptom with a single pulse to the server,
you’d have to miss (at least) 100s of thousands of pulses. It’s the

I suspect you don’t have to fall behind that far. I’ve seen a drastic
slowdown effect with about 1005ms + 10020ms pulses going to a single
channel on 800Mhz CPU.

That’s two pulses. If they are the same priority, it is a different
situation, far worse, the problem can (and will) occur far more quickly.
I’d expect that to exhibit symptons at the 100s to 1000s of pulses level.
(The queueing has optimizations to allow folding of about 250 identical
pulses together – reducing queue length; if two different pulses of
the same priority are alternating (or, as in your example, 4 of type a,
1 of type b, 4 of type a, etc), due to the promise of ordered delivery
for pulses of the same priority, they can’t be “folded” in the queue
and the queue grows far more rapidly.


In fact under severe load in that fashion, the system
does not just ‘slow down’. It essentially folds. Interrupts may be serviced
with HUGE latencies (on order of SECONDS), or perhaps they are just not
being serviced at all for extended periods of time.

Well, it keeps slowing down, until all CPU time is spent trying to
enqueue pulses.

The problem is, the event queues need to be kept sorted (per channel). The
longer they get, the more time is spent on sorting. This is why splitting
the load among many channels helps considerably. In that particular case,
splitting the pulses among 100 channels resolved all performance problems.

I’d have expected that one channel for each pulse (2 channels) would have
probably done it. 100 channels seems…overkill.

-David

Please follow-up to newsgroup, rather than personal email.
David Gibbs
QNX Training Services
dagibbs@qnx.com

You are wrong.

I sent you an e-card, probably over a year ago now.

Would you like another? :slight_smile:

Robert.

“Igor Kovalenko” <kovalenko@comcast.net> wrote in message
news:ceuthb$rgf$1@inn.qnx.com

“Robert Muil” <> r.muil@crcmining.com.au> > wrote in message
news:cen19i$ojc$> 1@inn.qnx.com> …
Igor,
BTW, I have only just twigged to the fact that you are the author of
‘spin’.
Thank you indeed - it is an invaluable replacement for top, and I use it
all
the time.

:slight_smile:
That’s because you did not comply with the license, that says ‘send me an
e-card if you find it useful’.
It’s in the use message > :wink:

Cheers
– igor


Robert.

“Igor Kovalenko” <> kovalenko@comcast.net> > wrote in message
news:ceh89t$f8f$> 1@inn.qnx.com> …
This is a bit too vague description to make conclusions. I can share
one
experience though - make sure you don’t have a situation where a lot
of
pulses are directed to a single channel. Split the load between
different
channels if possible.

“Robert Muil” <> r.muil@crcmining.com.au> > wrote in message
news:cea2e0$sl2$> 1@inn.qnx.com> …
Igor,

It is a simple application:

  • 2 driver processes each controlling an analog io card
  • 1 data logging process that read from the drivers and log to
    disk

The driver/logger interface is resource manager open/read. The data
logging
process reads all channels from the drivers at 1kHz. When it has
10,000
samples, it dumps these to disk using the zlib gzio library.



“Igor Kovalenko” <> igor.kovalenko@motorola.com> > wrote in message
news:ce4q1h$pt1$> 1@inn.qnx.com> …
Can you describe your ‘high load’? What is the system doing? Do
you
have
a
lot of message passing going on? How it is organized? I mean how
many
senders/receivers you have, how many channels are used, etc.

– igor

Robert Muil wrote:

Hello,

Has anyone else had problems with QNX under high load?

I am running some data logging software, which at high sample
rates
will
load the CPU considerably, especially when dumping to disk.
Regularly,
the
system will hang and reboot. No errors.

Any help or similar experiences would be very much appreciated.

QNX OS version 6.2.1
mobo: x86 pcm5820

Thank you,
Robert Muil.











\

Oh well, you got me :wink:
Now I will have to maintain a database of those cards … all 7 of them :\

“Robert Muil” <r.muil@crcmining.com.au> wrote in message
news:cfh92b$kpu$1@inn.qnx.com

You are wrong.

I sent you an e-card, probably over a year ago now.

Would you like another? > :slight_smile:

Robert.

“Igor Kovalenko” <> kovalenko@comcast.net> > wrote in message
news:ceuthb$rgf$> 1@inn.qnx.com> …
“Robert Muil” <> r.muil@crcmining.com.au> > wrote in message
news:cen19i$ojc$> 1@inn.qnx.com> …
Igor,
BTW, I have only just twigged to the fact that you are the author of
‘spin’.
Thank you indeed - it is an invaluable replacement for top, and I use
it
all
the time.

:slight_smile:
That’s because you did not comply with the license, that says ‘send me
an
e-card if you find it useful’.
It’s in the use message > :wink:

Cheers
– igor


Robert.

“Igor Kovalenko” <> kovalenko@comcast.net> > wrote in message
news:ceh89t$f8f$> 1@inn.qnx.com> …
This is a bit too vague description to make conclusions. I can share
one
experience though - make sure you don’t have a situation where a lot
of
pulses are directed to a single channel. Split the load between
different
channels if possible.

“Robert Muil” <> r.muil@crcmining.com.au> > wrote in message
news:cea2e0$sl2$> 1@inn.qnx.com> …
Igor,

It is a simple application:

  • 2 driver processes each controlling an analog io card
  • 1 data logging process that read from the drivers and log to
    disk

The driver/logger interface is resource manager open/read. The
data
logging
process reads all channels from the drivers at 1kHz. When it has
10,000
samples, it dumps these to disk using the zlib gzio library.



“Igor Kovalenko” <> igor.kovalenko@motorola.com> > wrote in message
news:ce4q1h$pt1$> 1@inn.qnx.com> …
Can you describe your ‘high load’? What is the system doing? Do
you
have
a
lot of message passing going on? How it is organized? I mean how
many
senders/receivers you have, how many channels are used, etc.

– igor

Robert Muil wrote:

Hello,

Has anyone else had problems with QNX under high load?

I am running some data logging software, which at high sample
rates
will
load the CPU considerably, especially when dumping to disk.
Regularly,
the
system will hang and reboot. No errors.

Any help or similar experiences would be very much
appreciated.

QNX OS version 6.2.1
mobo: x86 pcm5820

Thank you,
Robert Muil.













\