30 of our engine stands hung up

Robert_Kindred1 · October 31, 2005, 3:05pm

We have a system where we use QNX boxes to run engines on stands for
standardized oil testing. The box has A/D, D/A, digital in/out, and
frequency cards with which it both controls and gathers data for the report.
Because our engineers are afraid of *nix, we do not have a keyboard or
display on the QNX box, but do all of the displaying on Windows XP boxes.
This also gives us Oracle database connectivity.

At any rate, when daylight savings time switched over last night at 2:00 AM,
all of our stands locked up. We don’t yet know if it was the network, the
Windows box, or the QNX box. What I would like to know is if anyone else
experienced something like this.

In the meantime, we are going to take an experimental box, set its time
back, and watch it to see what happens.

Please respond if anything happened to you,

Robert Kindred
RKindred@SwRI.edu

Tony1 · October 31, 2005, 3:35pm

On Mon, 31 Oct 2005 18:05:18 +0300, Robert Kindred <RKindred@SwRI.edu>
wrote:

At any rate, when daylight savings time switched over last night at 2:00
AM, all of our stands locked up. We don’t yet know if it was the
network, the Windows box, or the QNX box. What I would like to know is
if anyone else experienced something like this.
I’ve been using

TZ=MSK-3MDT,M3.5.0/2,M10.5.0/3
for years on QNX4 box and never had any glitches while networking it to
windows host.
(My protocols: Telnet+FTP (now deprecated) and SSH)

Tony.

Kevin_Miller1 · October 31, 2005, 5:15pm

I too have been using QNX 4.24 and also 4.25 networked to Windows boxes for
years, and have never had a problem with the time switchover. We use both
FTP and socket connections, with QNX as the client and the Windows (NT,
2000) box as the host.

“Tony” <mts.spb.suxx@mail.ru> wrote in message
news:op.szin9yw3o93ri4@mobile.wst.quantum.ru…

On Mon, 31 Oct 2005 18:05:18 +0300, Robert Kindred <> RKindred@SwRI.edu
wrote:
At any rate, when daylight savings time switched over last night at 2:00
AM, all of our stands locked up. We don’t yet know if it was the
network, the Windows box, or the QNX box. What I would like to know is
if anyone else experienced something like this.
I’ve been using
TZ=MSK-3MDT,M3.5.0/2,M10.5.0/3
for years on QNX4 box and never had any glitches while networking it to
windows host.
(My protocols: Telnet+FTP (now deprecated) and SSH)

Tony.

Robert_Kindred1 · November 1, 2005, 2:47pm

“Robert Kindred” <RKindred@SwRI.edu> wrote in message
news:dk5bjf$9ef$1@inn.qnx.com…
[]

Please respond if anything happened to you,

Hello, people, and thanks for the feedback. I thought I would let people
know that the problem is not with QNX, but with my Windows service. When I
know the details, I will post them here. Oddly enough, in the home-grown
messaging system that I mentioned, we have checksums, which causes the error
to be trapped, reported, and discarded. I noticed yesterday that as soon as
I bumped the clock forward again, then the service began running smoothly
after reporting one more error.

My details:

QNX 6.3.0 (not using SP2 yet)
Microsoft Windows XP
Borland C++ Builder 5.0 (with SP1) (but planning to switch to Visual
Studio 2003)

Robert Kindred
RKindred@SwRI.edu

Oleg_Khamayko1 · November 2, 2005, 11:10am

Also please note that the rules for the date of switching from summer to
winter time is different in the USA and Europe. XP seems to use only the USA
rules - one week of one hour time diiference is garanteed for POSIX users…

Kevin Miller <kevin.miller@transcore.com> wrote in message
news:dk5j75$eth$1@inn.qnx.com…

I too have been using QNX 4.24 and also 4.25 networked to Windows boxes
for
years, and have never had a problem with the time switchover. We use both
FTP and socket connections, with QNX as the client and the Windows (NT,
2000) box as the host.

“Tony” <> mts.spb.suxx@mail.ru> > wrote in message
news:> op.szin9yw3o93ri4@mobile.wst.quantum.ru> …
On Mon, 31 Oct 2005 18:05:18 +0300, Robert Kindred <> RKindred@SwRI.edu
wrote:
At any rate, when daylight savings time switched over last night at
2:00
AM, all of our stands locked up. We don’t yet know if it was the
network, the Windows box, or the QNX box. What I would like to know is
if anyone else experienced something like this.
I’ve been using
TZ=MSK-3MDT,M3.5.0/2,M10.5.0/3
for years on QNX4 box and never had any glitches while networking it to
windows host.
(My protocols: Telnet+FTP (now deprecated) and SSH)

Tony.

Robert_Kindred1 · November 2, 2005, 3:43pm

“Robert Kindred” <RKindred@SwRI.edu> wrote in message
news:dk5bjf$9ef$1@inn.qnx.com…
[]

I just want give everyone a heads up since I threw all of this into the mix.

I am not finishe checking things, but I am pretty sure that there is no
problem with QNX. I did do a simple program check with BCB5, however. To
run the following program I stopped the Windows Time Service, and set my
computer time to October 30, 1:55:00 am. It doesn’t seem to do the dst jump
if you set it to 1:59:00 am. Here is the program:

//---------------------------------------------------------------------------

#include <vcl.h>
#pragma hdrstop
#include <stdio.h>

//---------------------------------------------------------------------------

#pragma argsused
int main(int argc, char* argv[])
{
int lastTime(0);
int thisTime;

while(true) {
Sleep(1000);
thisTime = time(NULL);
printf("%d %d %d \n", thisTime, lastTime, thisTime - lastTime);
lastTime = thisTime;
}

return 0; // Warning, unreachable code
}
//---------------------------------------------------------------------------

When I run this program, I get the following output:

1130655588 1130655587 1
1130655589 1130655588 1
1130655590 1130655589 1
1130655591 1130655590 1
1130655592 1130655591 1
1130655593 1130655592 1
1130655594 1130655593 1
1130655595 1130655594 1
1130655596 1130655595 1
1130655597 1130655596 1
1130655598 1130655597 1
1130655599 1130655598 1
1130652000 1130655599 -3599
1130652001 1130652000 1
1130652002 1130652001 1
1130652003 1130652002 1
1130652004 1130652003 1
1130652005 1130652004 1
1130652006 1130652005 1
1130652007 1130652006 1
1130652008 1130652007 1
1130652009 1130652008 1

This explains what I was seeing. I have about 14 threads in this Windows
service, and most of my threads idle on a condvar. Windows doesn’t have
condvars, so I use the pthreads_Win32 library, downloadable from RedHat.
The way pthreadcond_timedwait works is that you give it absolute time. So,
I would call time(NULL), add the seconds I wanted to time out on, and make
the above call. I looked into the code from pthreads_Win32, and I found a
place in the code, such that if the absolute time is in the past, then the
subroutine returns immediately with a timeout error. What this means is
that all of my threads began busy-waiting, totally bogging down the machine,
and making it look as though it stopped. This explains also why several of
the machines we got to last actually recovered and continued to run (after
about an hour).

By the way, this same program counts seconds continuously in Visual Studio
…NET 2003. I am going to try it in QNX.

Robert Kindred
RKindred@SwRI.edu

Robert_Kindred1 · November 2, 2005, 4:38pm

“Robert Kindred” <RKindred@SwRI.edu> wrote in message
news:dkamiq$9f5$1@inn.qnx.com…

“Robert Kindred” <> RKindred@SwRI.edu> > wrote in message
news:dk5bjf$9ef$> 1@inn.qnx.com> …
[]
I am going to try it in QNX.

QNX works fine.

Robert Kindred
RKindred@SwRI.edu

David_Gibbs1 · November 4, 2005, 5:38pm

Robert Kindred <RKindred@swri.edu> wrote:

This explains what I was seeing. I have about 14 threads in this Windows
service, and most of my threads idle on a condvar. Windows doesn’t have
condvars, so I use the pthreads_Win32 library, downloadable from RedHat.
The way pthreadcond_timedwait works is that you give it absolute time.

This is one of the stupidest “decisions” I’ve seen come out of the POSIX
spec. Why, oh why, would anyone want to wait until an ABSOLUTE time for
a timeout on a blocking synch call? (This is defined this way for at
least condvars, mutexes, and semaphores that I’ve looked at.) Almost
always you want to wait a relative time… if the operation took too
long, give up and try again… so you always grab a local time, add
your relative time to it, then wait for the absolute time. Ugh, ugh,
ugh, ugh, ugh. Now, at least on most UNIX systems (including QNX),
you won’t get bitten… because we store/use the time in seconds since
January 1st, 1970; and daylight savings is only an input/output
modification.

So, if you want to time-out mutexes/condvars/etc under QNX, I would
recommend using the kernel timeouts, TimerTimeout(), with a relative
time, rather than the (kinda sad) POSIX functions. (Cause, among
other things the POSIX function implementations under QNX just turn
that absolute time back into a relative time, then do a TimerTimeout()
with the relative time.)

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

Wojtek_Lerch1 · November 4, 2005, 7:48pm

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:dkg6ah$dh2$3@inn.qnx.com…

Robert Kindred <> RKindred@swri.edu> > wrote:
The way pthreadcond_timedwait works is that you give it absolute time.

This is one of the stupidest “decisions” I’ve seen come out of the POSIX
spec. Why, oh why, would anyone want to wait until an ABSOLUTE time for
a timeout on a blocking synch call? (This is defined this way for at

Because it allows you to start the countdown at a chosen point before you
call the function, rather than an unspecified point after. The stupid part
is not that it’s absolute; the stupid part is that it uses the
user-adjustable CLOCK_REALTIME rather than CLOCK_MONOTONIC by default. But
since CLOCK_MONOTONIC is optional, it would’t be a very good default… I
agree that it’s an inconvenience to have to call pthread_condattr_setclock()
and three or four other extra functions just because you want to be able to
do reliable timed waits on your condvar, but I think I could come up with a
few things in POSIX that are clearly stupider…

least condvars, mutexes, and semaphores that I’ve looked at.) Almost
always you want to wait a relative time… if the operation took too
long, give up and try again… so you always grab a local time, add
your relative time to it, then wait for the absolute time. Ugh, ugh,

Well, how would you write a timed condvar loop if you had to use a relative
timer? How would you write it if checking your condition could took a hard
to predict amount of time, for instance because it involved disk access or
ran at a relatively low priority?

ugh, ugh, ugh. Now, at least on most UNIX systems (including QNX),
you won’t get bitten… because we store/use the time in seconds since
January 1st, 1970; and daylight savings is only an input/output
modification.

It’s guaranteed on any POSIX system. Presumably, a program using the POSIX
threads API should be able to rely on it.

BTW Here’s the relevant text from the POSIX Rationale:

Timed Wait Semantics

An absolute time measure was chosen for specifying the timeout parameter for
two reasons. First, a relative time measure can be easily implemented on top
of a function that specifies absolute time, but there is a race condition
associated with specifying an absolute timeout on top of a function that
specifies relative timeouts. For example, assume that clock_gettime()
returns the current time and cond_relative_timed_wait() uses relative
timeouts:

clock_gettime(CLOCK_REALTIME, &now)
reltime = sleep_til_this_absolute_time -now;
cond_relative_timed_wait(c, m, &reltime);

If the thread is preempted between the first statement and the last
statement, the thread blocks for too long. Blocking, however, is irrelevant
if an absolute timeout is used. An absolute timeout also need not be
recomputed if it is used multiple times in a loop, such as that enclosing a
condition wait.
For cases when the system clock is advanced discontinuously by an operator,
it is expected that implementations process any timed wait expiring at an
intervening time as if that time had actually occurred.

Timed Condition Wait

The pthread_cond_timedwait() function allows an application to give up
waiting for a particular condition after a given amount of time. An example
of its use follows:

(void) pthread_mutex_lock(&t.mn);
t.waiters++;
clock_gettime(CLOCK_REALTIME, &ts);
ts.tv_sec += 5;
rc = 0;
while (! mypredicate(&t) && rc == 0)
rc = pthread_cond_timedwait(&t.cond, &t.mn, &ts);
t.waiters–;
if (rc == 0) setmystate(&t);
(void) pthread_mutex_unlock(&t.mn);

By making the timeout parameter absolute, it does not need to be recomputed
each time the program checks its blocking predicate. If the timeout was
relative, it would have to be recomputed before each call. This would be
especially difficult since such code would need to take into account the
possibility of extra wakeups that result from extra broadcasts or signals on
the condition variable that occur before either the predicate is true or the
timeout is due.

http://www.opengroup.org/onlinepubs/009695399/functions/pthread_cond_timedwait.html

Robert_Kindred1 · November 7, 2005, 2:51pm

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:dkg6ah$dh2$3@inn.qnx.com…

Robert Kindred <> RKindred@swri.edu> > wrote:

This explains what I was seeing.
[]
The way pthreadcond_timedwait works is that you give it absolute time.

This is one of the stupidest “decisions” I’ve seen come out of the POSIX
spec. Why, oh why, would anyone want to wait until an ABSOLUTE time for
a timeout on a blocking synch call? (This is defined this way for at

You got me. I always want to wait a given number of seconds. The best
guess I can make is that it is easier for either the library or the
operating system. With absolute times, you can take all of the expiration
times of all the requests and put them into a sorted list. Then, only the
one about to expire needs any attention. But, this makes it harder for the
user.

least condvars, mutexes, and semaphores that I’ve looked at.) Almost
always you want to wait a relative time… if the operation took too
long, give up and try again… so you always grab a local time, add
your relative time to it, then wait for the absolute time. Ugh, ugh,
ugh, ugh, ugh. Now, at least on most UNIX systems (including QNX),
you won’t get bitten… because we store/use the time in seconds since
January 1st, 1970; and daylight savings is only an input/output
modification.

So, if you want to time-out mutexes/condvars/etc under QNX, I would
recommend using the kernel timeouts, TimerTimeout(), with a relative
time, rather than the (kinda sad) POSIX functions. (Cause, among
other things the POSIX function implementations under QNX just turn
that absolute time back into a relative time, then do a TimerTimeout()
with the relative time.)

This sounds good, but I have this messageQueue code running on both Windows
and QNX. For this reason, I need libraries that are available to both.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

John_Nagle1 · November 7, 2005, 10:07pm

Robert Kindred wrote:

“David Gibbs” <> dagibbs@qnx.com> > wrote in message
news:dkg6ah$dh2$> 3@inn.qnx.com> …

Robert Kindred <> RKindred@swri.edu> > wrote:

This explains what I was seeing.

[]

The way pthreadcond_timedwait works is that you give it absolute time.

This is one of the stupidest “decisions” I’ve seen come out of the POSIX
spec. Why, oh why, would anyone want to wait until an ABSOLUTE time for
a timeout on a blocking synch call? (This is defined this way for at

You got me. I always want to wait a given number of seconds. The best
guess I can make is that it is easier for either the library or the
operating system. With absolute times, you can take all of the expiration
times of all the requests and put them into a sorted list. Then, only the
one about to expire needs any attention. But, this makes it harder for the
user.

If you want something to happen every N microseconds, absolute

timing is what you want.

However, CLOCK_MONOTONIC should be supported on all platforms
and the default. You don’t want your real-time timing to mess up
when someone sets the clock.

John Nagle
Animats