Tick-tock

yyd_iris · July 9, 2007, 9:19pm

Hi,

I try the example in the QNX technical article:
“Tick-tock - Understanding the Neutrino micro kernel’s concept of time, Part II” by Mario Charest, with Brian Stecher’s assistance

The result does not match what the article describes.
Here is the result of 1ms: (expected to see elapse every 6.5 seconds)
Elapse 0.002000 at 7
Elapse 0.002000 at 13
Elapse 0.001279 at 20
Elapse 0.001999 at 20
Elapse 0.001266 at 21
Elapse 0.001250 at 22
Elapse 0.001236 at 23
Elapse 0.001220 at 24
Elapse 0.001205 at 25
Elapse 0.001191 at 26
Elapse 0.002000 at 26
Elapse 0.001175 at 27
Elapse 0.001162 at 28
Elapse 0.001146 at 29
Elapse 0.001131 at 30
Elapse 0.001115 at 31
Elapse 0.001101 at 32
Elapse 0.001085 at 33
Elapse 0.002000 at 33
Elapse 0.001071 at 34
Elapse 0.001056 at 35
Elapse 0.002000 at 39
Elapse 0.001999 at 46
Elapse 0.002000 at 52
Elapse 0.002000 at 59

Here is the result of 0.999847ms: (expected to see no printouts)
Elapse 0.001281 at 20
Elapse 0.001270 at 21
Elapse 0.001252 at 22
Elapse 0.001238 at 23
Elapse 0.001221 at 24
Elapse 0.001207 at 25
Elapse 0.001194 at 26
Elapse 0.001177 at 27
Elapse 0.001162 at 28
Elapse 0.001147 at 29
Elapse 0.001131 at 30
Elapse 0.001116 at 31
Elapse 0.001102 at 32
Elapse 0.001086 at 33
Elapse 0.001071 at 34
Elapse 0.001056 at 35
Elapse 0.001285 at 87
Elapse 0.001274 at 88
Elapse 0.001258 at 89
Elapse 0.001242 at 90

My computer is a PIII PC machine, running only QNX 6.3.
Any advice?

Thank you!
Iris

mario · July 10, 2007, 2:06am

Post the exact code you are using.

yyd_iris · July 10, 2007, 3:02am

I basically copy and paste the codes from the website.
two changes:

last_cycles is initialized with 0, instead of -1. The compiler complains about the comparison between uint_64 and negative numbers.
In the “printf” line, add “”

Here are the codes:
// — INCLUDE FILE

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/neutrino.h>
#include <sys/netmgr.h>
#include <sys/syspage.h>

// — FUNCTION (from Bottom to Top, and/or sorted)

int main( int argc, char *argv[] )
{
int pid;
int chid;
int pulse_id;
timer_t timer_id;
struct sigevent event;
struct itimerspec timer;
struct _clockperiod clkper;
struct _pulse pulse;
_uint64 last_cycles=0;
_uint64 current_cycles;
float cpu_freq;
time_t start;

// — Get CPU frequency in order to do precise time calculation
cpu_freq = SYSPAGE_ENTRY( qtime )->cycles_per_sec;

// — Set priority to max so we don’t get disrupted but anything
// — else then interrupts
{
struct sched_param param;
int ret;
param.sched_priority = sched_get_priority_max( SCHED_RR );
ret = sched_setscheduler( 0, SCHED_RR, &param);
assert ( ret != -1 );
}

// — Create channel to receive timer event
chid = ChannelCreate( 0 );
assert ( chid != -1 );

// — setup timer and timer event
event.sigev_notify = SIGEV_PULSE;
event.sigev_coid = ConnectAttach ( ND_LOCAL_NODE, 0, chid, 0, 0 );
event.sigev_priority = getprio(0);
event.sigev_code = 1023;
event.sigev_value.sival_ptr = (void*)pulse_id;

assert ( event.sigev_coid != -1 );

if ( timer_create( CLOCK_REALTIME, &event, &timer_id ) == -1 )
{
perror ( “can’t create timer” );
exit( EXIT_FAILURE );
}

// — change timer request to alter behavior
#if 1
timer.it_value.tv_sec = 0;
timer.it_value.tv_nsec = 1000000;
timer.it_interval.tv_sec = 0;
timer.it_interval.tv_nsec = 1000000;
#else
timer.it_value.tv_sec = 0;
timer.it_value.tv_nsec = 999847;
timer.it_interval.tv_sec = 0;
timer.it_interval.tv_nsec = 999847;
#endif

// — start timer
if ( timer_settime( timer_id, 0, &timer, NULL ) == -1 )
{
perror(“Can’t start timern”);
exit( EXIT_FAILURE );
}

// — set tick to 1ms otherwise if left to 10 ms default it
// — would take 65 seconds to demonstrate ;-)
clkper.nsec = 1000000;
clkper.fract = 0;
ClockPeriod ( CLOCK_REALTIME, &clkper, NULL, 0 ); // 1ms

// — keep track of time
start = time(NULL);
for( ;; )
{
// — wait for pulse
pid = MsgReceivePulse ( chid, &pulse, sizeof( pulse ), NULL );

  // --- should put pulse validation here ...
  current_cycles = ClockCycles();

  if ( last_cycles != 0 )    // --- don't print first iteration
    {
      // --- could get rid of timer by using more
      // --- clever timer setup
      float elapse = (current_cycles - last_cycles) / cpu_freq;

      // --- printf if request is 50us longer then requested...
      if ( elapse > .00105 )
        {
          printf("Elapse %f at %dn", elapse, time(NULL)-start );
        }
    }

  last_cycles = current_cycles;
}

}

yyd_iris · July 10, 2007, 3:18am

We have one important observation:
Those “big variance” printouts have patterns.
They happen every 68 seconds – activate 16 seconds and sleep 52 seconds, approximately.

I start to kill background processes one by one and see whether I can get rid of the big variances.
It does not work. Here is the processes left shown by the pidin command.
pid name
1 rldbuild/cdr/qnx6/tmp/target/qnx6/x86/boot/sys/procnto-instr
5 proc/boot/pci-bios
6 proc/boot/devb-eide
7 proc/boot/devc-con
180243 bin/sh
1949705 bin/pidin

procnto has 7 threads and devb-eide has 9 threads. Other processes have one thread.

I add the following codes into the end of the “for” loop:

// ------
if ( elapse < .00095 )
{
printf(“ELAPSE %f at %d\n”, elapse, time(NULL)-start );
}
// ------
I can see there are lots of "elapse"s smaller than 1ms, corresponding to the big variance.

mario · July 10, 2007, 12:53pm

It’s possible cpu_freq isn’t precise, it’s also possible the CPU clock is dynamically changing frequency ( some motherboard/chipset/cpu) can do that to save power.

maschoen · July 10, 2007, 5:55pm

There’s one more situation in which I’ve seen something like this. Some processors have a built in supervisor mode. This allows the processor to emulate hardware. The trouble is that it makes the processor non-realtime. There’s nothing the OS can do to control supervisor mode once the processor has booted.

yyd_iris · July 10, 2007, 9:30pm

Thanks mario and maschoen!

I disable serial, parallel, usb and floppy in BIOS.
The problem is solved.
It does not work by just killing driver processes in QNX.

mario · July 10, 2007, 9:52pm

What kind of PC is that. Can you try disabling one device at a time. My bet is USB, do you have any option to disable USB emulation.

yyd_iris · July 11, 2007, 4:03am

The machine is Dell Dimension 4100.
It is the Legacy USB that causes the problem.
I enable the serial and parallel ports, with the Legacy USB disabled in BIOS.
The Timer still works fine. If I enable the Legacy USB, the problem occurs again.

I do need the USB port in my application. I am working on enabling USB without affecting the Timer. Do you have any suggestion?

mario · July 11, 2007, 4:08am

Legacy USB is to support USB devices like keyboard/mouse and make them appears as non USB devices. QNX 6.3 supports USB mouse and keyboard so you shouldn’t need to enable legacy USB.

maschoen · July 11, 2007, 12:43pm

This does begin to sound like supervisor mode kicking in, although I can’t imagine why a keyboard and mouse would take that much cpu.

yyd_iris · July 12, 2007, 4:58am

It seems the Legacy USB generates interrupts from time to time, and it has a long ISR. Otherwise, it won’t affect my program that has the highest priority.

I also observe that the parallel port may affects the timer too. I need to run more experiments to verify it.

It makes me worry about other hardware equipped, such as serial port, USB devices and network card. I don’t know how badly the network driver affects the Timer, when I add network communication codes.
And what if I attach a serial device to the machine.

yyd_iris · July 12, 2007, 6:25pm

I have two ssh terminals to the PC. One runs the timer program, and the other runs emacs.
Page-up and page-down in emacs cause timer to report errors (over 1.05ms).
When I type pidin in this machine, I see 1-3 procnto threads and possible one devb-eide thread that have the highest priority 255. How do I figure out what these high-priority threads are doing?
Here is the pidin result that has 1 procnto thread with the highest priority:
bash-2.05a$ pidin
pid tid name prio STATE Blocked
1 1 procnto 0f READY
1 2 procnto 255r RECEIVE 1
1 3 procnto 10r RECEIVE 1
1 4 procnto 10r RECEIVE 1
1 5 procnto 10r RECEIVE 1
1 6 procnto 10r RECEIVE 1
1 7 procnto 10r RUNNING
1 8 procnto 10r RECEIVE 1
2 1 sbin/tinit 10o REPLY 1
3 1 proc/boot/slogger 10o RECEIVE 1
5 1 proc/boot/pci-bios 10o RECEIVE 1
6 1 roc/boot/devb-eide 10o SIGWAITINFO
6 2 roc/boot/devb-eide 21r RECEIVE 1
6 3 roc/boot/devb-eide 10o RECEIVE 7
6 4 roc/boot/devb-eide 10o RECEIVE 4
6 5 roc/boot/devb-eide 10o RECEIVE 4
6 6 roc/boot/devb-eide 10o RECEIVE 4
6 8 roc/boot/devb-eide 10o RECEIVE 4
7 1 proc/boot/devc-con 15o RECEIVE 1
4104 1 sbin/pipe 10o SIGWAITINFO
4104 2 sbin/pipe 10o RECEIVE 1
4104 3 sbin/pipe 10o RECEIVE 1
4104 4 sbin/pipe 10o RECEIVE 1
12292 1 sbin/mqueue 10o RECEIVE 1
69643 1 sbin/io-net 10o SIGWAITINFO
69643 2 sbin/io-net 10o RECEIVE 5
69643 3 sbin/io-net 10o RECEIVE 1
69643 4 sbin/io-net 10o RECEIVE 1
69643 5 sbin/io-net 10o RECEIVE 1
69643 6 sbin/io-net 21r RECEIVE 19
69643 7 sbin/io-net 10o RECEIVE 20
114697 1 sbin/devc-pty 10o RECEIVE 1
126986 1 usr/sbin/random 10o SIGWAITINFO
126986 2 usr/sbin/random 10o RECEIVE 1
126986 3 usr/sbin/random 10o NANOSLEEP
139276 1 usr/sbin/dumper 10o RECEIVE 1
163854 1 opt/sbin/sshd 10o SIGWAITINFO
167949 1 can-pci266-sja1000 10o RECEIVE 1
184335 1 bin/login 10o REPLY 7
184336 1 bin/login 10o REPLY 7
184337 1 bin/login 10o REPLY 7
184338 1 bin/login 10o REPLY 7
184339 1 opt/sbin/sshd 10o SIGWAITINFO
184340 1 bin/bash 10o REPLY 1
294933 1 opt/sbin/sshd 10o SIGWAITINFO
294934 1 bin/bash 10o REPLY 1
397335 1 opt/bin/bash 10o REPLY 1
417816 1 ./testQNXTimer 255o INTR
430105 1 bin/pidin 10o REPLY 1

In another QNX machine with similar settings, I see no thread with the highest priority. The timer seems work fine after Legacy USB disabled. Here is the pidin result:

pidin

 pid tid name               prio STATE       Blocked
   1   1 /sys/procnto-instr   0f READY
   1   2 /sys/procnto-instr  10r RECEIVE     1
   1   3 /sys/procnto-instr  10r RECEIVE     1
   1   4 /sys/procnto-instr  10r RECEIVE     1
   1   5 /sys/procnto-instr  10r RUNNING
   1   6 /sys/procnto-instr  10r RECEIVE     1
   1   7 /sys/procnto-instr  10r RECEIVE     1
   1   8 /sys/procnto-instr   9r RECEIVE     1
   2   1 sbin/tinit          10o REPLY       1
   3   1 proc/boot/slogger   10o RECEIVE     1
   5   1 proc/boot/pci-bios  10o RECEIVE     1
   6   1 roc/boot/devb-eide  10o SIGWAITINFO
   6   2 roc/boot/devb-eide  21r RECEIVE     1
   6   3 roc/boot/devb-eide  21r RECEIVE     4
   6   4 roc/boot/devb-eide  10o RECEIVE     10
   6   5 roc/boot/devb-eide  10o RECEIVE     7
   6   6 roc/boot/devb-eide  10o RECEIVE     7
   6   7 roc/boot/devb-eide  10o RECEIVE     7
   6   9 roc/boot/devb-eide  10o RECEIVE     7
   7   1 proc/boot/devc-con  15o RECEIVE     1
4104   2 sbin/pipe           10o RECEIVE     1
4104   3 sbin/pipe           10o RECEIVE     1
4104   4 sbin/pipe           10o RECEIVE     1
4104   5 sbin/pipe           10o RECEIVE     1

12292 1 sbin/mqueue 10o RECEIVE 1
69643 1 sbin/io-audio 10o SIGWAITINFO
69643 2 sbin/io-audio 10o RECEIVE 1
69643 3 sbin/io-audio 10o RECEIVE 1
69643 4 sbin/io-audio 10o RECEIVE 1
69643 5 sbin/io-audio 50r INTR
86028 1 sbin/devc-ser8250 10o RECEIVE 1
86030 1 sbin/devc-par 10o RECEIVE 1
86030 2 sbin/devc-par 9r CONDVAR 80519e8
86032 1 sbin/io-net 10o SIGWAITINFO
86032 2 sbin/io-net 10o RECEIVE 1
86032 3 sbin/io-net 9o RECEIVE 1
86032 4 sbin/io-net 10o RECEIVE 1
86032 5 sbin/io-net 10o RECEIVE 5
86032 6 sbin/io-net 21r RECEIVE 20
102418 1 usr/sbin/spooler 10o NANOSLEEP
114701 1 r/sbin/dhcp.client 10o NANOSLEEP
122889 1 sbin/devc-pty 10o RECEIVE 1
135178 1 usr/sbin/random 10o SIGWAITINFO
135178 2 usr/sbin/random 10o RECEIVE 1
135178 3 usr/sbin/random 10o NANOSLEEP
147471 1 usr/sbin/dumper 10o RECEIVE 1
172049 1 bin/sh 10o REPLY 7
172051 1 bin/login 10o REPLY 7
172052 1 bin/login 10o REPLY 7
172053 1 bin/login 10o REPLY 7
229399 1 opt/sbin/sshd 10o SIGWAITINFO
245782 1 opt/sbin/sshd 10o SIGWAITINFO
245784 1 bin/sh 10o SIGSUSPEND
327705 1 opt/sbin/sshd 10o SIGWAITINFO
327706 1 bin/sh 10o REPLY 122889
1900571 1 bin/pidin 10o REPLY 1

yyd_iris · July 13, 2007, 12:21pm

More information: when the machine starts up, only one procnto thread has the highest priority.
As the timer program runs, more procnto threads and devb-eide threads gain the highest priority.
I guess it is because of priority inversion avoidance.
I have not figured out what the first high-priority procnto does.

How can I manually modify the “procnto” thread priority from terminals? The command slay reports “cannot find procnto”. It works for other regular processes.
I know it is dangerous to modify procnto priority. I just want to make sure it is the priority of procnto that causes the problem.

maschoen · July 13, 2007, 12:57pm

You might be misunderstanding the output. First note that the procnto thread at priority 255r is in
receive mode. So it is not actually running. Other procnto threads are also in receive mode, with
their priority listed as 10r. 10r would be the default application priority. So what you are seeing is
the priority that the thread ran at the last time it was in READY mode, before it called MsgReceive().
That doesn’t tell you much. When MsgReceive() returns, the thread is in the priority of the calling
thread. Since no other thread runs at 255, it’s my guess that this thread last responded to an
event (probably an interrupt) that is considered very very important, maybe the timer interrupt.
The same thing might be happening in the 2nd pidin output, but by coincidence, this historical
priority is erased by the thread receiving a message since the timer interrupt occured.
You could verify this by repeatedly running “pidin | grep 255” on the 2nd computer.

All the procnto threads are receive blocked on coid 1, except for the one responding to pidin which of course
is READY, or it wouldn’t be running, so they all are equivalent.

Similary, devb-eide has threads receive blocked on more than one coid 1, and 4. These probably represent different devices, eg. hd0 and cd0 or fd0. The priority 21 probably again is a latent indicator that the thread was last handling an interrupt.

Since the problem occurs when you page up and down on Emacs, let me suggest that you might be having a problem related to the video card. In particular, I recall a computer whose cpu was put into supervisor mode for the purpose of emulating certain video modes. The computer had an LCD screen whose only native mode was 1024x768, but 640x480 and other modes were emulated using pixel stretching. This caused the cpu to disappear from the OS periodically for uncomfortably long periods of time. Repeating myself here, this made QNX a real time OS run on a non-real time cpu, the result being a non-real time system. The solution was to put the video card into the right video mode.

mario · July 13, 2007, 1:35pm

What type of PC are you using?

yyd_iris · July 13, 2007, 2:18pm

Thanks for the information!

I try “pidin | grep 255” on both machines. There is one high-priority procnto after many trials on the 2nd computer.
On the 1st computer, there are high-priority procnto(s) for all “pidin | grep 255”.
And there are high-priority devb-eide from time to time.
So there must be sth that keeps generating interrupts and invoking kernels (or use CPU supervisor mode).

Here are the difference of two computers:
The first one is a industrial computer: P4 CPU, internal video card, can bus, a driver card for force sensors QNX6.3 with service pack. I disable two serial, one parallel ports and usb in bios.
The second one is a DELL PC: PIII CPU, QNX6.3 without service pack.

Both computers start without photo. It means there are no io-graphics driver running.

I found in BIOS two settings related to video. They seem having the right setting.
Search for MDA resources-> No
Assign IRQ to PCI VGA → No
USB related BIOS setting:
ICH4, Device29 func0, USB host(Disabled)

yyd_iris · July 13, 2007, 2:55pm

The machine with timer problems is an industrial computer with a P4 2.4G CPU. It is pluged in a shared panel with ISA bus. The panel has other PCI/ISA slots. These slots are currently used for can bus, network card, force sensor driver card. There is another linux computer plugged and running on the panel too.

yyd_iris · July 13, 2007, 3:12pm

I remove can bus, force driver card, and network card.
It does not help.
I have not found problems with the video cards yet. I will keep finding.

maschoen · July 13, 2007, 6:17pm

Hey, all of a sudden this sounds a lot like a problem we talked about here a number of months ago.
Someone reported a latency problem, they were timing on a parallel port. Does anyone else remember?
Here’s what to try next. Turn off Photon and try your experiment again. If the problem goes away,
it is probably a video driver problem.