Kernel Thread Scheduler bug?

Pieter_Liefooghe · October 9, 2003, 8:54am

Hello,

We developed a small test program, called SMPtest, to find out the number of
threads that can execute simultaneously on a given machine.

The output we see from SMPtest on a x86 SMP machine running a 6.2.1 NC QNX
SMP kernel looks erratic to us.

We expect the output of our program to remain constant while the number of
threads is lower than or equal to the number of CPU’s, and increase linearly
in the number of threads otherwise.

The output we see however is this:

1 threads finished in 1.345794 sec

2 threads finished in 1.346794 sec

3 threads finished in 2.047686 sec

4 threads finished in 3.319492 sec

5 threads finished in 3.381483 sec

6 threads finished in 4.099373 sec

7 threads finished in 5.316186 sec

8 threads finished in 5.417171 sec

9 threads finished in 6.114064 sec

10 threads finished in 7.317881 sec

11 threads finished in 7.408866 sec

12 threads finished in 8.156752 sec

13 threads finished in 9.296577 sec

14 threads finished in 9.424558 sec

15 threads finished in 10.149447 sec

The machine contains 2 CPU’s. The numbers are as expected for 1, 2, 3, 5, 6,
8, 9, 11, 12, 14 and 15. They are not as expected for 4, 7, 10, 13. Notice
that
it takes almost as long to execute 4 threads as it takes to executes 5!

We suspect there is some bug in the kernel’s thread scheduling that only
shows up for (4 + 3*n) threads running simultaneously.

You can download the source code from
http://telecom.vub.ac.be/download/SMPtest.tar.gz

Is this a know issue and are there any patches available for this?

Machine details:

Dual Xeon 2.8GHz
Hyperthreading turned of
1GB RAM

OS details:

Installed from the Non-commercial downloadable ISO image (QNX Neutrino
6.2.1) and enabled the SMP kernel.

– dr. ir. Pieter LiefoogheDoctor AssistantVrije Universiteit Brussel (VUB)
INFO/TWTele.Com GroupPleinlaan 21050 BrusselPhone: +32-2-6292977Fax:
+32-2-6292870pieter@info.vub.ac.be

Mario_Charest1 · October 9, 2003, 5:10pm

“Pieter Liefooghe” <pieter@info.vub.ac.be> wrote in message
news:bm36qi$1bv$2@inn.qnx.com…

Hello,

We developed a small test program, called SMPtest, to find out the number
of
threads that can execute simultaneously on a given machine.

The output we see from SMPtest on a x86 SMP machine running a 6.2.1 NC QNX
SMP kernel looks erratic to us.

A couple of comments:

In your program you are syncing the end of the thread, but not the begining.
Thus
your main program which is a thread will not be able to start each thread at
the same
time

You are not specifing how much ram each thread takes, but this could have a
huge impact because of cache. Xeon have big cache

Why are you synking end of thread, a phtread_join should be enough no?

I find time for 3 threads odd. If 1 thread takes 1.3 how come when it’s
three threads that each of these thread takes .6 seconds (2.06/3) it’s as
though with 3 thread each thread is twice as fast as a single thread.

The way you do the math for the time looks weird to me (if useconds < 0 ).
Are you sure this works?

The program that star

We expect the output of our program to remain constant while the number of
threads is lower than or equal to the number of CPU’s, and increase
linearly
in the number of threads otherwise.

The output we see however is this:

1 threads finished in 1.345794 sec

2 threads finished in 1.346794 sec

3 threads finished in 2.047686 sec

4 threads finished in 3.319492 sec

5 threads finished in 3.381483 sec

6 threads finished in 4.099373 sec

7 threads finished in 5.316186 sec

8 threads finished in 5.417171 sec

9 threads finished in 6.114064 sec

10 threads finished in 7.317881 sec

11 threads finished in 7.408866 sec

12 threads finished in 8.156752 sec

13 threads finished in 9.296577 sec

14 threads finished in 9.424558 sec

15 threads finished in 10.149447 sec

The machine contains 2 CPU’s. The numbers are as expected for 1, 2, 3, 5,
6,
8, 9, 11, 12, 14 and 15. They are not as expected for 4, 7, 10, 13. Notice
that
it takes almost as long to execute 4 threads as it takes to executes 5!

We suspect there is some bug in the kernel’s thread scheduling that only
shows up for (4 + 3*n) threads running simultaneously.

You can download the source code from
http://telecom.vub.ac.be/download/SMPtest.tar.gz

Is this a know issue and are there any patches available for this?

Machine details:

Dual Xeon 2.8GHz

Hyperthreading turned of

1GB RAM

OS details:

Installed from the Non-commercial downloadable ISO image (QNX
Neutrino
6.2.1) and enabled the SMP kernel.

– dr. ir. Pieter LiefoogheDoctor AssistantVrije Universiteit Brussel
(VUB)
INFO/TWTele.Com GroupPleinlaan 21050 BrusselPhone: +32-2-6292977Fax:
+> 32-2-6292870pieter@info.vub.ac.be

Pieter_Liefooghe · October 13, 2003, 9:25am

A couple of comments:

In your program you are syncing the end of the thread, but not the
begining.
Thus
your main program which is a thread will not be able to start each thread
at
the same
time

You’re right it is more logical to sync at the beginning. We modified the
code accordingly but it doesn’t change the results at all. Code can be
obtained from http://telecom.vub.ac.be/download/SMPtest_new.tar.gz

You are not specifing how much ram each thread takes, but this could have
a
huge impact because of cache. Xeon have big cache >

The memory requirements are in gross: number of primes * sizeof(int) for

each thread (cfr calloc call). The cache influence - if at all present -
doesn’t explain the strange “steps” at 4+3*n threads.

Why are you synking end of thread, a phtread_join should be enough no?

I find time for 3 threads odd. If 1 thread takes 1.3 how come when it’s
three threads that each of these thread takes .6 seconds (2.06/3) it’s as
though with 3 thread each thread is twice as fast as a single thread.

the time for three threads is logic: you have three times the work load of

one thread, but you are only able to spread it over two CPUs. So calculation
takes 3/2 times longer. The time for four threads on the otherhand is not
logical just by the mere fact that it is almost equal to the five threads
case.

The way you do the math for the time looks weird to me (if useconds
).

The math for useconde is basic: if you substract two numbers and you obtain
a negative value then you have to go and lend from the more significand
number. This is exactly what we do with base number 10E6 us.

Are you sure this works?
YES this works!..only the QNX scheduler is broken!!! >

Bye,

Pieter