SMP and priorities

I’m puzzled by the way priorities seem to be handled on SMP systems
(with RTP CD version).

I was trying to run a thread at the highest priority with FIFO
scheduling, assigned to a CPU, thinking that the availability of other
CPU(s) would let that thread run with practically no interruption, since
other processes and the OS have CPU time. There were no other 63f
priority threads, and my 63f thread does no system call during its loop,
except possibly semaphore tests and posts.

It turns out that that 63f thread is disturbed by running other, normal
priority threads. This does not happen on the uniprocessor kernel:
there, the 63f thread effectively locks the machine, which is expected,
and it is not affected by other processes. But with the SMP kernel, this
high priority thread is slowed down (a lot!) by low priority processes.

I’ve inserted a test program that duplicates this: it sets itself to
priority 63f and loops for n iterations. It calculates the fastest and
slowest times through the loop. It accepts an optional argument, which
is a mask for the ThreadCtl(_NTO_TCTL_RUNMASK…) call.

Running this on a uniprocessor, you should get a fairly constant time.
Running this on the SMP kernel, I saw a 12-fold(!) increase in maximum
loop time (comparing the machine when quiet, and the machine running
other processes, for example pidin, continually during the test run).

So, my question is: how can I get the 63f thread’s priority honored on
SMP machines ?

Thanks in advance.

===================================================================
#include <stdio.h>
#include <errno.h>
#include <sched.h>
#include <sys/neutrino.h>
#include <sys/syspage.h>
#include <inttypes.h>
#include <limits.h>


#define LOOP_LIMIT 1000000


main(int argc, char *argv[])
{
unsigned long count, j;
double fast, slow;
struct sched_param mysched;
uint64_t cps, currTime, cycleTime, maxTime, minTime, prevTime;

mysched.sched_priority = sched_get_priority_max(SCHED_FIFO);

if (sched_setscheduler(0, SCHED_FIFO, &mysched) == -1 )
{
printf(“Error changing scheduler to SCHED_FIFO, priority %d, error %d
%s\n”,
mysched.sched_priority, errno, strerror(errno));
}

if (argc > 1)
{
int mask;

mask = atoi(argv[1]);

if (ThreadCtl(_NTO_TCTL_RUNMASK, (void *)mask) == -1)
{
printf(“Could not assign to CPU mask %d (error %d %s)\n”,
mask, errno, strerror(errno));
}
}

for (count = 0; count < LOOP_LIMIT; ++count)
{
currTime = ClockCycles();

if (count > 0)
{
if (prevTime < currTime)
{
cycleTime = currTime - prevTime;
}
else
{
cycleTime = ULONGLONG_MAX - prevTime + currTime;
}
if (cycleTime > maxTime)
{
maxTime = cycleTime;
}
if (count > 10 && cycleTime < minTime)
{
minTime = cycleTime;
}
}
else
{
maxTime = 0;
minTime = ULONGLONG_MAX;
// Filler junk
if (maxTime < minTime)
{
cycleTime = maxTime;
}
else
{
cps = minTime;
}
}
// Filler junk
for (j = 0; j < 1000; ++j)
{
cps = count * j;
}

prevTime = currTime;
}


cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
fast = (double)minTime / cps;
slow = (double)maxTime / cps;
printf(“fastest %lld slowest %lld (cycles)\n”, minTime, maxTime);
printf(“fastest %f slowest %f (seconds)\n”, fast, slow);
}

“Francois Desruisseaux” <Francois.Desruisseaux@opal-rt.com> wrote in message
news:3A3F8201.FA77BEBE@opal-rt.com

I’m puzzled by the way priorities seem to be handled on SMP systems
(with RTP CD version).

I was trying to run a thread at the highest priority with FIFO
scheduling, assigned to a CPU, thinking that the availability of other
CPU(s) would let that thread run with practically no interruption, since
other processes and the OS have CPU time. There were no other 63f
priority threads, and my 63f thread does no system call during its loop,
except possibly semaphore tests and posts.

First, do not use FIFO with SMP it is evil. FIFO is almost useless
on SMP.

PS Francois I beleive your familliar enough with SMP to know
about this stuff but I’m mentionning this in case other are reading this.
There should be an article on SMP comming soon.

It turns out that that 63f thread is disturbed by running other, normal
priority threads. This does not happen on the uniprocessor kernel:
there, the 63f thread effectively locks the machine, which is expected,
and it is not affected by other processes. But with the SMP kernel, this
high priority thread is slowed down (a lot!) by low priority processes.


I’ve inserted a test program that duplicates this: it sets itself to
priority 63f and loops for n iterations. It calculates the fastest and
slowest times through the loop. It accepts an optional argument, which
is a mask for the ThreadCtl(_NTO_TCTL_RUNMASK…) call.

Running this on a uniprocessor, you should get a fairly constant time.
Running this on the SMP kernel, I saw a 12-fold(!) increase in maximum
loop time (comparing the machine when quiet, and the machine running
other processes, for example pidin, continually during the test run).

I have tested your program on my machine and it’s not slow down at all. I
haven’t
tried pidin though, I tried it with ls -Rl (to make sure it generated
interrupts).
I beleive pidin can be distruptive since it must be talking to the kernel
running on each CPU.

You wouldn’t see this on single CPU machine, since you can’t run pidin :wink:

I actuall ran your test program simultaneously force one on CPU 1 and the
other
on CPU 2 and they both ran at the exact same speed as on the uni processor
kernel!

So, my question is: how can I get the 63f thread’s priority honored on
SMP machines ?

Try not running pidin?

I’m still puzzle by the difference in min(8000) max (14000 in my case ).
Quick test shows that every ~250 loop the cycle time peaks. That means
a bip evey 3ms, doesn’t make sense hardware wise. However the code
handling the uint64_t could be handling values differently(250 is close to
magic
value of 256 ). Benchmarking is intrusive :wink: I haven’t tried compiling
with optimisation flag or use floating point. That could be interesting.

Thanks in advance.

===================================================================
#include <stdio.h
#include <errno.h
#include <sched.h
#include <sys/neutrino.h
#include <sys/syspage.h
#include <inttypes.h
#include <limits.h


#define LOOP_LIMIT 1000000


main(int argc, char *argv[])
{
unsigned long count, j;
double fast, slow;
struct sched_param mysched;
uint64_t cps, currTime, cycleTime, maxTime, minTime, prevTime;

mysched.sched_priority = sched_get_priority_max(SCHED_FIFO);

if (sched_setscheduler(0, SCHED_FIFO, &mysched) == -1 )
{
printf(“Error changing scheduler to SCHED_FIFO, priority %d, error %d
%s\n”,
mysched.sched_priority, errno, strerror(errno));
}

if (argc > 1)
{
int mask;

mask = atoi(argv[1]);

if (ThreadCtl(_NTO_TCTL_RUNMASK, (void *)mask) == -1)
{
printf(“Could not assign to CPU mask %d (error %d %s)\n”,
mask, errno, strerror(errno));
}
}

for (count = 0; count < LOOP_LIMIT; ++count)
{
currTime = ClockCycles();

if (count > 0)
{
if (prevTime < currTime)
{
cycleTime = currTime - prevTime;
}
else
{
cycleTime = ULONGLONG_MAX - prevTime + currTime;
}
if (cycleTime > maxTime)
{
maxTime = cycleTime;
}
if (count > 10 && cycleTime < minTime)
{
minTime = cycleTime;
}
}
else
{
maxTime = 0;
minTime = ULONGLONG_MAX;
// Filler junk
if (maxTime < minTime)
{
cycleTime = maxTime;
}
else
{
cps = minTime;
}
}
// Filler junk
for (j = 0; j < 1000; ++j)
{
cps = count * j;
}

prevTime = currTime;
}


cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
fast = (double)minTime / cps;
slow = (double)maxTime / cps;
printf(“fastest %lld slowest %lld (cycles)\n”, minTime, maxTime);
printf(“fastest %f slowest %f (seconds)\n”, fast, slow);
}

Mario Charest wrote:

“Francois Desruisseaux” <> Francois.Desruisseaux@opal-rt.com> > wrote in message
news:> 3A3F8201.FA77BEBE@opal-rt.com> …
I’m puzzled by the way priorities seem to be handled on SMP systems
(with RTP CD version).

I was trying to run a thread at the highest priority with FIFO
scheduling, assigned to a CPU, thinking that the availability of other
CPU(s) would let that thread run with practically no interruption, since
other processes and the OS have CPU time. There were no other 63f
priority threads, and my 63f thread does no system call during its loop,
except possibly semaphore tests and posts.

First, do not use FIFO with SMP it is evil. FIFO is almost useless
on SMP.

PS Francois I beleive your familliar enough with SMP to know
about this stuff but I’m mentionning this in case other are reading this.
There should be an article on SMP comming soon.

Well, FIFO shouldn’t be used as a synchronization mechanism, since SMP
will allow something else to run and mess up shared data if that is the
only synchronization mechanism used. Other than that, FIFO and SMP
shouldn’t be mutually exlusive.

I should be able to dedicate a CPU to a thread if the application needs
it.

It turns out that that 63f thread is disturbed by running other, normal
priority threads. This does not happen on the uniprocessor kernel:
there, the 63f thread effectively locks the machine, which is expected,
and it is not affected by other processes. But with the SMP kernel, this
high priority thread is slowed down (a lot!) by low priority processes.


I’ve inserted a test program that duplicates this: it sets itself to
priority 63f and loops for n iterations. It calculates the fastest and
slowest times through the loop. It accepts an optional argument, which
is a mask for the ThreadCtl(_NTO_TCTL_RUNMASK…) call.

Running this on a uniprocessor, you should get a fairly constant time.
Running this on the SMP kernel, I saw a 12-fold(!) increase in maximum
loop time (comparing the machine when quiet, and the machine running
other processes, for example pidin, continually during the test run).

I have tested your program on my machine and it’s not slow down at all. I
haven’t
tried pidin though, I tried it with ls -Rl (to make sure it generated
interrupts).
I beleive pidin can be distruptive since it must be talking to the kernel
running on each CPU.

You wouldn’t see this on single CPU machine, since you can’t run pidin > :wink:

Right. And if you start a loop of pidins (or whatever) in a window, and
run the test_sched in another, the others will stop while test_sched
runs, which is expected.


I actuall ran your test program simultaneously force one on CPU 1 and the
other
on CPU 2 and they both ran at the exact same speed as on the uni processor
kernel!


So, my question is: how can I get the 63f thread’s priority honored on
SMP machines ?


Try not running pidin?

“I told the doctor it hurts when I raise my arm… So he told me not to
raise my arm.” :slight_smile:

Actually, pidin was only a means to demonstrate the problem with my
(simplistic) test case. I’ve observed a slowdown with other programs, as
I’d observed with our product, where pidin was not in the picture.

What are you running on your machine: updated or CD version ? I went
back to the CD version; if you’re on the updated version, I’ll try it
again after updating.


I’m still puzzle by the difference in min(8000) max (14000 in my case ).
Quick test shows that every ~250 loop the cycle time peaks. That means
a bip evey 3ms, doesn’t make sense hardware wise. However the code
handling the uint64_t could be handling values differently(250 is close to
magic
value of 256 ). Benchmarking is intrusive > :wink: > I haven’t tried compiling
with optimisation flag or use floating point. That could be interesting.

I haven’t compiled this example with optimization, as I didn’t want gcc
to realize I was just wasting its time with my write-only variables :slight_smile:


[…example deleted…]

“Francois Desruisseaux” <Francois.Desruisseaux@opal-rt.com> wrote in message
news:3A3FD9AC.24DB36DF@opal-rt.com

I actuall ran your test program simultaneously force one on CPU 1 and
the
other
on CPU 2 and they both ran at the exact same speed as on the uni
processor
kernel!


So, my question is: how can I get the 63f thread’s priority honored on
SMP machines ?


Try not running pidin?

That was a suggestion to determine if the problem was cause by pin

specificaly. Apparently in your case it’s not.

“I told the doctor it hurts when I raise my arm… So he told me not to
raise my arm.” > :slight_smile:

I’v been thinking about this and I’m a bit confiuse. The pidin request
should be processed at the pidin’s priority so it should be blocked
until the test program complete. I’m probably missing something.

Actually, pidin was only a means to demonstrate the problem with my
(simplistic) test case. I’ve observed a slowdown with other programs, as
I’d observed with our product, where pidin was not in the picture.

What are you running on your machine: updated or CD version ? I went
back to the CD version; if you’re on the updated version, I’ll try it
again after updating.

I’m not sure… I think I’m running the first release of beta Patch A.

I can confirm that Neutrino 2.1 ran beautifully on Quad Machine.
Timing was very precise so much that we could see the effect of
the cache on the software. I can assure you CPU 1,2,3 were NOT
disturb by activity on CPU 0.

So maybe try Beta Patch A (as you apprently have access to it) or wait for
the official Patch A. Or NTO 2.1 :wink:

[cut]

Mario Charest <mcharest@void_zinformatic.com> wrote:

“Francois Desruisseaux” <> Francois.Desruisseaux@opal-rt.com> > wrote in message
news:> 3A3F8201.FA77BEBE@opal-rt.com> …
I’m puzzled by the way priorities seem to be handled on SMP systems
(with RTP CD version).

I was trying to run a thread at the highest priority with FIFO
scheduling, assigned to a CPU, thinking that the availability of other
CPU(s) would let that thread run with practically no interruption, since
other processes and the OS have CPU time. There were no other 63f
priority threads, and my 63f thread does no system call during its loop,
except possibly semaphore tests and posts.

First, do not use FIFO with SMP it is evil. FIFO is almost useless
on SMP.

PS Francois I beleive your familliar enough with SMP to know
about this stuff but I’m mentionning this in case other are reading this.
There should be an article on SMP comming soon.


It turns out that that 63f thread is disturbed by running other, normal
priority threads. This does not happen on the uniprocessor kernel:
there, the 63f thread effectively locks the machine, which is expected,
and it is not affected by other processes. But with the SMP kernel, this
high priority thread is slowed down (a lot!) by low priority processes.


I’ve inserted a test program that duplicates this: it sets itself to
priority 63f and loops for n iterations. It calculates the fastest and
slowest times through the loop. It accepts an optional argument, which
is a mask for the ThreadCtl(_NTO_TCTL_RUNMASK…) call.

Running this on a uniprocessor, you should get a fairly constant time.
Running this on the SMP kernel, I saw a 12-fold(!) increase in maximum
loop time (comparing the machine when quiet, and the machine running
other processes, for example pidin, continually during the test run).

I have tested your program on my machine and it’s not slow down at all. I
haven’t
tried pidin though, I tried it with ls -Rl (to make sure it generated
interrupts).
I beleive pidin can be distruptive since it must be talking to the kernel
running on each CPU.

You wouldn’t see this on single CPU machine, since you can’t run pidin > :wink:

I actuall ran your test program simultaneously force one on CPU 1 and the
other
on CPU 2 and they both ran at the exact same speed as on the uni processor
kernel!

With an SMP system we make all interrupts go to processor 0. Is it
possible, Francois, that your 63f thread is running on processor 0?
Since you are on SMP, you can run other things and these other things
may cause interrupts - though admittedly that wouldn’t be the case
with pidin :slight_smile:. Mario, when you run it, possibly your 63f thread is
not running on processor 0 so you don’t see this.

So, my question is: how can I get the 63f thread’s priority honored on
SMP machines ?


Try not running pidin?

I’m still puzzle by the difference in min(8000) max (14000 in my case ).
Quick test shows that every ~250 loop the cycle time peaks. That means
a bip evey 3ms, doesn’t make sense hardware wise. However the code
handling the uint64_t could be handling values differently(250 is close to
magic
value of 256 ). Benchmarking is intrusive > :wink: > I haven’t tried compiling
with optimisation flag or use floating point. That could be interesting.

Thanks in advance.

===================================================================
#include <stdio.h
#include <errno.h
#include <sched.h
#include <sys/neutrino.h
#include <sys/syspage.h
#include <inttypes.h
#include <limits.h


#define LOOP_LIMIT 1000000


main(int argc, char *argv[])
{
unsigned long count, j;
double fast, slow;
struct sched_param mysched;
uint64_t cps, currTime, cycleTime, maxTime, minTime, prevTime;

mysched.sched_priority = sched_get_priority_max(SCHED_FIFO);

if (sched_setscheduler(0, SCHED_FIFO, &mysched) == -1 )
{
printf(“Error changing scheduler to SCHED_FIFO, priority %d, error %d
%s\n”,
mysched.sched_priority, errno, strerror(errno));
}

if (argc > 1)
{
int mask;

mask = atoi(argv[1]);

if (ThreadCtl(_NTO_TCTL_RUNMASK, (void *)mask) == -1)
{
printf(“Could not assign to CPU mask %d (error %d %s)\n”,
mask, errno, strerror(errno));
}
}

for (count = 0; count < LOOP_LIMIT; ++count)
{
currTime = ClockCycles();

if (count > 0)
{
if (prevTime < currTime)
{
cycleTime = currTime - prevTime;
}
else
{
cycleTime = ULONGLONG_MAX - prevTime + currTime;
}
if (cycleTime > maxTime)
{
maxTime = cycleTime;
}
if (count > 10 && cycleTime < minTime)
{
minTime = cycleTime;
}
}
else
{
maxTime = 0;
minTime = ULONGLONG_MAX;
// Filler junk
if (maxTime < minTime)
{
cycleTime = maxTime;
}
else
{
cps = minTime;
}
}
// Filler junk
for (j = 0; j < 1000; ++j)
{
cps = count * j;
}

prevTime = currTime;
}


cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
fast = (double)minTime / cps;
slow = (double)maxTime / cps;
printf(“fastest %lld slowest %lld (cycles)\n”, minTime, maxTime);
printf(“fastest %f slowest %f (seconds)\n”, fast, slow);
}

Steven Dufresne wrote:

Mario Charest <mcharest@void_zinformatic.com> wrote:

“Francois Desruisseaux” <> Francois.Desruisseaux@opal-rt.com> > wrote in message
news:> 3A3F8201.FA77BEBE@opal-rt.com> …
I’m puzzled by the way priorities seem to be handled on SMP systems
(with RTP CD version).

I was trying to run a thread at the highest priority with FIFO
scheduling, assigned to a CPU, thinking that the availability of other
CPU(s) would let that thread run with practically no interruption, since
other processes and the OS have CPU time. There were no other 63f
priority threads, and my 63f thread does no system call during its loop,
except possibly semaphore tests and posts.

First, do not use FIFO with SMP it is evil. FIFO is almost useless
on SMP.

PS Francois I beleive your familliar enough with SMP to know
about this stuff but I’m mentionning this in case other are reading this.
There should be an article on SMP comming soon.


It turns out that that 63f thread is disturbed by running other, normal
priority threads. This does not happen on the uniprocessor kernel:
there, the 63f thread effectively locks the machine, which is expected,
and it is not affected by other processes. But with the SMP kernel, this
high priority thread is slowed down (a lot!) by low priority processes.


I’ve inserted a test program that duplicates this: it sets itself to
priority 63f and loops for n iterations. It calculates the fastest and
slowest times through the loop. It accepts an optional argument, which
is a mask for the ThreadCtl(_NTO_TCTL_RUNMASK…) call.

Running this on a uniprocessor, you should get a fairly constant time.
Running this on the SMP kernel, I saw a 12-fold(!) increase in maximum
loop time (comparing the machine when quiet, and the machine running
other processes, for example pidin, continually during the test run).

I have tested your program on my machine and it’s not slow down at all. I
haven’t
tried pidin though, I tried it with ls -Rl (to make sure it generated
interrupts).
I beleive pidin can be distruptive since it must be talking to the kernel
running on each CPU.

You wouldn’t see this on single CPU machine, since you can’t run pidin > :wink:

I actuall ran your test program simultaneously force one on CPU 1 and the
other
on CPU 2 and they both ran at the exact same speed as on the uni processor
kernel!

With an SMP system we make all interrupts go to processor 0. Is it
possible, Francois, that your 63f thread is running on processor 0?
Since you are on SMP, you can run other things and these other things
may cause interrupts - though admittedly that wouldn’t be the case
with pidin > :slight_smile:> . Mario, when you run it, possibly your 63f thread is
not running on processor 0 so you don’t see this.

Steven, I had it running on CPU 1 originally. With the test program, I
also tried with CPU0, and also not assigning it specifically.

Following Mario’s comments, I tried RR scheduling, and saw no
improvement. When I had a make (and nothing else) running, I tried the
test program again, and the slowest loop took 201us instead of the 23us
or when the machine is quiet, when assigned to CPU1. The problem is not
specific to pidin.




So, my question is: how can I get the 63f thread’s priority honored on
SMP machines ?


Try not running pidin?

I’m still puzzle by the difference in min(8000) max (14000 in my case ).
Quick test shows that every ~250 loop the cycle time peaks. That means
a bip evey 3ms, doesn’t make sense hardware wise. However the code
handling the uint64_t could be handling values differently(250 is close to
magic
value of 256 ). Benchmarking is intrusive > :wink: > I haven’t tried compiling
with optimisation flag or use floating point. That could be interesting.

Thanks in advance.

===================================================================
#include <stdio.h
#include <errno.h
#include <sched.h
#include <sys/neutrino.h
#include <sys/syspage.h
#include <inttypes.h
#include <limits.h


#define LOOP_LIMIT 1000000


main(int argc, char *argv[])
{
unsigned long count, j;
double fast, slow;
struct sched_param mysched;
uint64_t cps, currTime, cycleTime, maxTime, minTime, prevTime;

mysched.sched_priority = sched_get_priority_max(SCHED_FIFO);

if (sched_setscheduler(0, SCHED_FIFO, &mysched) == -1 )
{
printf(“Error changing scheduler to SCHED_FIFO, priority %d, error %d
%s\n”,
mysched.sched_priority, errno, strerror(errno));
}

if (argc > 1)
{
int mask;

mask = atoi(argv[1]);

if (ThreadCtl(_NTO_TCTL_RUNMASK, (void *)mask) == -1)
{
printf(“Could not assign to CPU mask %d (error %d %s)\n”,
mask, errno, strerror(errno));
}
}

for (count = 0; count < LOOP_LIMIT; ++count)
{
currTime = ClockCycles();

if (count > 0)
{
if (prevTime < currTime)
{
cycleTime = currTime - prevTime;
}
else
{
cycleTime = ULONGLONG_MAX - prevTime + currTime;
}
if (cycleTime > maxTime)
{
maxTime = cycleTime;
}
if (count > 10 && cycleTime < minTime)
{
minTime = cycleTime;
}
}
else
{
maxTime = 0;
minTime = ULONGLONG_MAX;
// Filler junk
if (maxTime < minTime)
{
cycleTime = maxTime;
}
else
{
cps = minTime;
}
}
// Filler junk
for (j = 0; j < 1000; ++j)
{
cps = count * j;
}

prevTime = currTime;
}


cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
fast = (double)minTime / cps;
slow = (double)maxTime / cps;
printf(“fastest %lld slowest %lld (cycles)\n”, minTime, maxTime);
printf(“fastest %f slowest %f (seconds)\n”, fast, slow);
}

“Steven Dufresne” <stevend@qnx.com> wrote in message
news:91qdgf$89d$1@nntp.qnx.com

Mario Charest <mcharest@void_zinformatic.com> wrote:
With an SMP system we make all interrupts go to processor 0. Is it
possible, Francois, that your 63f thread is running on processor 0?
Since you are on SMP, you can run other things and these other things
may cause interrupts - though admittedly that wouldn’t be the case
with pidin > :slight_smile:> . Mario, when you run it, possibly your 63f thread is
not running on processor 0 so you don’t see this.

Francois’ sample support an argument to set the affinity mask. I
tried it on processor 0, processor 1 and no mask. I get the same result
in every case.

I know the affinity mask work because if I run two instances simultaneously
with affinity mask 2 each program take twice as long to complete.
If I start them with affinity mask 2 and 1 each complete in the same time
as one instance would (confirming they got assigned to different CPU)

One of the first think I did was to change the priority to 63r instead or
63f.
I made it an habit to stay away from FIFO in general, the autonomous part
of my brain took care of that :wink: That shouldn’t be making any difference
for
this perticular test though.

Is the SMP kernel the same in early beta patch A and in the first release?

-Mario
“People looking to serious, should be looking to Sirius”

Steven, I had it running on CPU 1 originally. With the test program, I
also tried with CPU0, and also not assigning it specifically.

Following Mario’s comments, I tried RR scheduling, and saw no
improvement. When I had a make (and nothing else) running, I tried the
test program again, and the slowest loop took 201us instead of the 23us
or when the machine is quiet, when assigned to CPU1. The problem is not
specific to pidin.

I wonder if this could be a cache issue. For some reason the hardware
would be invalidating the cache on processor 1 when “something” happens on
processor 0. That’s a VERY wild guess…

Francois, what happens if you run the program simultenously on each
processor?

So, my question is: how can I get the 63f thread’s priority honored
on
SMP machines ?


Try not running pidin?

I’m still puzzle by the difference in min(8000) max (14000 in my
case ).
Quick test shows that every ~250 loop the cycle time peaks. That
means
a bip evey 3ms, doesn’t make sense hardware wise. However the code
handling the uint64_t could be handling values differently(250 is
close to
magic
value of 256 ). Benchmarking is intrusive > :wink: > I haven’t tried
compiling
with optimisation flag or use floating point. That could be
interesting.

Thanks in advance.

===================================================================
#include <stdio.h
#include <errno.h
#include <sched.h
#include <sys/neutrino.h
#include <sys/syspage.h
#include <inttypes.h
#include <limits.h


#define LOOP_LIMIT 1000000


main(int argc, char *argv[])
{
unsigned long count, j;
double fast, slow;
struct sched_param mysched;
uint64_t cps, currTime, cycleTime, maxTime, minTime, prevTime;

mysched.sched_priority = sched_get_priority_max(SCHED_FIFO);

if (sched_setscheduler(0, SCHED_FIFO, &mysched) == -1 )
{
printf(“Error changing scheduler to SCHED_FIFO, priority %d, error %d
%s\n”,
mysched.sched_priority, errno, strerror(errno));
}

if (argc > 1)
{
int mask;

mask = atoi(argv[1]);

if (ThreadCtl(_NTO_TCTL_RUNMASK, (void *)mask) == -1)
{
printf(“Could not assign to CPU mask %d (error %d %s)\n”,
mask, errno, strerror(errno));
}
}

for (count = 0; count < LOOP_LIMIT; ++count)
{
currTime = ClockCycles();

if (count > 0)
{
if (prevTime < currTime)
{
cycleTime = currTime - prevTime;
}
else
{
cycleTime = ULONGLONG_MAX - prevTime + currTime;
}
if (cycleTime > maxTime)
{
maxTime = cycleTime;
}
if (count > 10 && cycleTime < minTime)
{
minTime = cycleTime;
}
}
else
{
maxTime = 0;
minTime = ULONGLONG_MAX;
// Filler junk
if (maxTime < minTime)
{
cycleTime = maxTime;
}
else
{
cps = minTime;
}
}
// Filler junk
for (j = 0; j < 1000; ++j)
{
cps = count * j;
}

prevTime = currTime;
}


cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
fast = (double)minTime / cps;
slow = (double)maxTime / cps;
printf(“fastest %lld slowest %lld (cycles)\n”, minTime, maxTime);
printf(“fastest %f slowest %f (seconds)\n”, fast, slow);
}

Mario Charest wrote:

“Francois Desruisseaux” <> Francois.Desruisseaux@opal-rt.com> > wrote in message
news:> 3A3FD9AC.24DB36DF@opal-rt.com> …




I actuall ran your test program simultaneously force one on CPU 1 and
the
other
on CPU 2 and they both ran at the exact same speed as on the uni
processor
kernel!


So, my question is: how can I get the 63f thread’s priority honored on
SMP machines ?


Try not running pidin?

That was a suggestion to determine if the problem was cause by pin
specificaly. Apparently in your case it’s not.

And a valid suggestion it was, but I couldn’t resist making fun of it
anyway :slight_smile:

“I told the doctor it hurts when I raise my arm… So he told me not to
raise my arm.” > :slight_smile:


I’v been thinking about this and I’m a bit confiuse. The pidin request
should be processed at the pidin’s priority so it should be blocked
until the test program complete. I’m probably missing something.

It shouldn’t have to block on an SMP machine, since there’s CPU time
available. At most, pidin may be a bad choice for an example on my part,
because it may have to lock process tables while it retrieves
information from them. But I’ve had the problem running make and the
test program, for another example, where the slowest loop time went from
23us to 201us with the test program assigned to CPU 1. So putting aside
the particularities of pidin’s operation, my hope is that an SMP RTOS
would keep the 63f thread running on its CPU, and be able to schedule
the other stuff on the other CPU, with no interference to the 63f
thread.


Actually, pidin was only a means to demonstrate the problem with my
(simplistic) test case. I’ve observed a slowdown with other programs, as
I’d observed with our product, where pidin was not in the picture.

What are you running on your machine: updated or CD version ? I went
back to the CD version; if you’re on the updated version, I’ll try it
again after updating.

I’m not sure… I think I’m running the first release of beta Patch A.

I can confirm that Neutrino 2.1 ran beautifully on Quad Machine.
Timing was very precise so much that we could see the effect of
the cache on the software. I can assure you CPU 1,2,3 were NOT
disturb by activity on CPU 0.

So maybe try Beta Patch A (as you apprently have access to it) or wait for
the official Patch A. Or NTO 2.1 > :wink:

[cut]

Since you’ve not been able to confirm this problem on your machine, I
guess I’ll try installing updates to see how that goes…

Mario Charest wrote:

Steven, I had it running on CPU 1 originally. With the test program, I
also tried with CPU0, and also not assigning it specifically.

Following Mario’s comments, I tried RR scheduling, and saw no
improvement. When I had a make (and nothing else) running, I tried the
test program again, and the slowest loop took 201us instead of the 23us
or when the machine is quiet, when assigned to CPU1. The problem is not
specific to pidin.


I wonder if this could be a cache issue. For some reason the hardware
would be invalidating the cache on processor 1 when “something” happens on
processor 0. That’s a VERY wild guess…

Francois, what happens if you run the program simultenously on each
processor?

Running each simultaneously on each CPU, results are good, that is, fast
times are 15us for each, and slow times are 25us, about the same as
running only one. Also, I ran a version of test_sched with the priority
and affinity stuff taken out (leaves the calculation loop only) at the
same time as test_sched, and the priority 63 test_sched ran OK again.

If I were to make a wild guess of my own, I’m starting to suspect that
programs doing I/O will affect the 63f thread.

This seems to be the double-edge sword of SMP on PC architectures. Low
priority programs can get in and make requests that may lock the
hardware for a while. In RTLinux newsgroups, they found code in some of
the Linux video drivers that could lock a machine (the PCI bus,
actually, by jamming its FIFO full of commands) for 2millisecs or more,
so the real-time kernels (RTL or RTAI) running under Linux would grind
to a halt, same as every one else, whenever the content of a window was
scrolled.

If anyone can get real-time SMP running, I expect QNX to be right in
there. But there may be some mobos that do better than others, or some
things to avoid altogether. Are there any tips in that area ? Couldn’t
find anything that specific in QDN.



So, my question is: how can I get the 63f thread’s priority honored
on
SMP machines ?


Try not running pidin?

I’m still puzzle by the difference in min(8000) max (14000 in my
case ).
Quick test shows that every ~250 loop the cycle time peaks. That
means
a bip evey 3ms, doesn’t make sense hardware wise. However the code
handling the uint64_t could be handling values differently(250 is
close to
magic
value of 256 ). Benchmarking is intrusive > :wink: > I haven’t tried
compiling
with optimisation flag or use floating point. That could be
interesting.

Thanks in advance.

===================================================================
#include <stdio.h
#include <errno.h
#include <sched.h
#include <sys/neutrino.h
#include <sys/syspage.h
#include <inttypes.h
#include <limits.h


#define LOOP_LIMIT 1000000


main(int argc, char *argv[])
{
unsigned long count, j;
double fast, slow;
struct sched_param mysched;
uint64_t cps, currTime, cycleTime, maxTime, minTime, prevTime;

mysched.sched_priority = sched_get_priority_max(SCHED_FIFO);

if (sched_setscheduler(0, SCHED_FIFO, &mysched) == -1 )
{
printf(“Error changing scheduler to SCHED_FIFO, priority %d, error %d
%s\n”,
mysched.sched_priority, errno, strerror(errno));
}

if (argc > 1)
{
int mask;

mask = atoi(argv[1]);

if (ThreadCtl(_NTO_TCTL_RUNMASK, (void *)mask) == -1)
{
printf(“Could not assign to CPU mask %d (error %d %s)\n”,
mask, errno, strerror(errno));
}
}

for (count = 0; count < LOOP_LIMIT; ++count)
{
currTime = ClockCycles();

if (count > 0)
{
if (prevTime < currTime)
{
cycleTime = currTime - prevTime;
}
else
{
cycleTime = ULONGLONG_MAX - prevTime + currTime;
}
if (cycleTime > maxTime)
{
maxTime = cycleTime;
}
if (count > 10 && cycleTime < minTime)
{
minTime = cycleTime;
}
}
else
{
maxTime = 0;
minTime = ULONGLONG_MAX;
// Filler junk
if (maxTime < minTime)
{
cycleTime = maxTime;
}
else
{
cps = minTime;
}
}
// Filler junk
for (j = 0; j < 1000; ++j)
{
cps = count * j;
}

prevTime = currTime;
}


cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
fast = (double)minTime / cps;
slow = (double)maxTime / cps;
printf(“fastest %lld slowest %lld (cycles)\n”, minTime, maxTime);
printf(“fastest %f slowest %f (seconds)\n”, fast, slow);
}

“Francois Desruisseaux” <Francois.Desruisseaux@opal-rt.com> wrote in message
news:3A40DC5A.1B555E79@opal-rt.com

Mario Charest wrote:



Steven, I had it running on CPU 1 originally. With the test program, I
also tried with CPU0, and also not assigning it specifically.

Following Mario’s comments, I tried RR scheduling, and saw no
improvement. When I had a make (and nothing else) running, I tried the
test program again, and the slowest loop took 201us instead of the
23us
or when the machine is quiet, when assigned to CPU1. The problem is
not
specific to pidin.


I wonder if this could be a cache issue. For some reason the hardware
would be invalidating the cache on processor 1 when “something” happens
on
processor 0. That’s a VERY wild guess…

Francois, what happens if you run the program simultenously on each
processor?

Running each simultaneously on each CPU, results are good, that is, fast
times are 15us for each, and slow times are 25us, about the same as
running only one. Also, I ran a version of test_sched with the priority
and affinity stuff taken out (leaves the calculation loop only) at the
same time as test_sched, and the priority 63 test_sched ran OK again.

If I were to make a wild guess of my own, I’m starting to suspect that
programs doing I/O will affect the 63f thread.

What if you run your “other app” in text mode. The difference between
our setup could be related to graphics cards/driver.

What’s odd about the I/O theory is your program seems to fit in the
cache (as indicated by the same number obtain when one or two
instances are running), so it shouldn’t be IO or memory bound.

I’m using a very low end SMP mobo, I beleive most of the magic
is handle by the chipset anyway (BX in my case)

Oringially from Brian:

One thing to note that ClockCycles() doesn’t work if the thread migrates
from one CPU to another - the timestamp counters on the X86 aren’t
synchronized between the two.

But, except for your tests where you are not setting thread affinity,
this isn’t it either.

Mario Charest wrote:

“Francois Desruisseaux” <> Francois.Desruisseaux@opal-rt.com> > wrote in message
news:> 3A40DC5A.1B555E79@opal-rt.com> …


Mario Charest wrote:



Steven, I had it running on CPU 1 originally. With the test program, I
also tried with CPU0, and also not assigning it specifically.

Following Mario’s comments, I tried RR scheduling, and saw no
improvement. When I had a make (and nothing else) running, I tried the
test program again, and the slowest loop took 201us instead of the
23us
or when the machine is quiet, when assigned to CPU1. The problem is
not
specific to pidin.


I wonder if this could be a cache issue. For some reason the hardware
would be invalidating the cache on processor 1 when “something” happens
on
processor 0. That’s a VERY wild guess…

Francois, what happens if you run the program simultenously on each
processor?

Running each simultaneously on each CPU, results are good, that is, fast
times are 15us for each, and slow times are 25us, about the same as
running only one. Also, I ran a version of test_sched with the priority
and affinity stuff taken out (leaves the calculation loop only) at the
same time as test_sched, and the priority 63 test_sched ran OK again.

If I were to make a wild guess of my own, I’m starting to suspect that
programs doing I/O will affect the 63f thread.

What if you run your “other app” in text mode. The difference between
our setup could be related to graphics cards/driver.

I’ve run these from telnet (frequently), from Phindows (occasionally),
or from local Photon (rarely). The timings with make and the
simultaneous test_sched’s were with telnet. So as far as I/O goes, I was
using the disk and the network.


What’s odd about the I/O theory is your program seems to fit in the
cache (as indicated by the same number obtain when one or two
instances are running), so it shouldn’t be IO or memory bound.

Right. It shouldn’t be affected. And it isn’t affected by another
program that is also CPU bound and in cache. I can’t explain it, but if
I try to find some common ground between the programs that do affect it,
that’s all I’ve come up with.




I’m using a very low end SMP mobo, I beleive most of the magic
is handle by the chipset anyway (BX in my case)

I may be getting somewhere in isolating what affects a 63f thread.

I tried running a program that writes/reads from disk while test_sched
ran on CPU 1, and I got fast/slow times of 15us and 36us. Not great, and
not drastic.

I tried running a program that does printf’s in a loop, while test_sched
ran on CPU 1. When running them from telnet windows,
I got fast/slow times of 15us and 27us. OK. Running in Phindows pterm
windows, I got 15us and 36 us. Not drastic.

Then, I ran this program, listed below, which creates 2 semaphores, and
1 thread. Each thread waits on a semaphore which will be posted by the
other thread, on effect forcing a switch from one to the other in every
loop. These are running at the default priority level and scheduling
mode.

The slow times jumped up, to the MILLIsecs range. Fast times, were 15us,
examples of slow times were 3017us, 5026us, and so on. test_sched by
itself still gets 15us to 26us.

So, is this specific to semaphores, or more generally to context
switching ? Since I had problems during make, I fear it is more related
to scheduling, generally. Please say I’m wrong.

===================================================================================
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
#include <semaphore.h>


#define LOOP_LIMIT 1000000

sem_t sem1, sem2;


void *testThread(void *args)
{
int i, rc;

for (i = 0; i < LOOP_LIMIT; ++i)
{
rc = sem_wait(&sem2);
if (-1 == rc)
{
printf(“sem_wait error %d %s\n”, errno, strerror(errno));
}
rc = sem_post(&sem1);
if (-1 == rc)
{
printf(“sem_post error %d %s\n”, errno, strerror(errno));
}
}
return(NULL);
}


main()
{
int i, rc;
pthread_attr_t attr;
pthread_t tid;


rc = sem_init(&sem1, 0, 1);
if (-1 == rc)
{
printf(“sem_init error %d %s\n”, errno, strerror(errno));
return(1);
}
rc = sem_init(&sem2, 0, 0);
if (-1 == rc)
{
printf(“sem_init error %d %s\n”, errno, strerror(errno));
return(1);
}

pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 16000);
pthread_attr_setstacklazy(&attr, PTHREAD_STACK_NOTLAZY);

rc = pthread_create(&tid, &attr, testThread, NULL);
pthread_attr_destroy(&attr);
if (EOK != rc)
{
printf(“pthread_create error %d %s\n”, rc, strerror(rc));
return(1);
}

for (i = 0; i < LOOP_LIMIT; ++i)
{
rc = sem_wait(&sem1);
if (-1 == rc)
{
printf(“sem_wait error %d %s\n”, errno, strerror(errno));
}
rc = sem_post(&sem2);
if (-1 == rc)
{
printf(“sem_post error %d %s\n”, errno, strerror(errno));
}
}

pthread_join(tid, NULL);

return(0);
}

Steven Dufresne wrote:

Oringially from Brian:

One thing to note that ClockCycles() doesn’t work if the thread migrates
from one CPU to another - the timestamp counters on the X86 aren’t
synchronized between the two.

But, except for your tests where you are not setting thread affinity,
this isn’t it either.

I just tried the semaphore test program on another machine, and
test_sched didn’t mind !

Differences are HW, and the other is running the CD RTP version. I guess
it’s back to the CD version on my HW, to see if the CD version is OK on
it…

This is like nailing Jello to a tree :slight_smile:

Steven Dufresne wrote:

Oringially from Brian:

One thing to note that ClockCycles() doesn’t work if the thread migrates
from one CPU to another - the timestamp counters on the X86 aren’t
synchronized between the two.

But, except for your tests where you are not setting thread affinity,
this isn’t it either.

On my machine it seems qcc is the culpid. I wrote test program
to exercise read and write on the HD, plus program that does heavy
graph to no avail. Only with qcc could I reproduce Francois’s problem.

Is it possible qcc usage of the swap file creates that side effect on SMP?
Loading data from swap file in ram must somehow affec the cache of both
CPU no or at least invoke some kernel operation on both CPU?

Francois my own test shows that mutext are faster then semaphores.
Stick with semaphores if you can.

“Francois Desruisseaux” <Francois.Desruisseaux@opal-rt.com> wrote in message
news:3A4232F0.782970EA@opal-rt.com

I just tried the semaphore test program on another machine, and
test_sched didn’t mind !

Differences are HW, and the other is running the CD RTP version. I guess
it’s back to the CD version on my HW, to see if the CD version is OK on
it…

This is like nailing Jello to a tree > :slight_smile:

Steven Dufresne wrote:

Oringially from Brian:

One thing to note that ClockCycles() doesn’t work if the thread migrates
from one CPU to another - the timestamp counters on the X86 aren’t
synchronized between the two.

But, except for your tests where you are not setting thread affinity,
this isn’t it either.

Mario Charest wrote:

On my machine it seems qcc is the culpid. I wrote test program
to exercise read and write on the HD, plus program that does heavy
graph to no avail. Only with qcc could I reproduce Francois’s problem.

Is it possible qcc usage of the swap file creates that side effect on SMP?
Loading data from swap file in ram must somehow affec the cache of both
CPU no or at least invoke some kernel operation on both CPU?

Francois my own test shows that mutext are faster then semaphores.
Stick with semaphores if you can.

I guess you meant stick with mutexes


“Francois Desruisseaux” <> Francois.Desruisseaux@opal-rt.com> > wrote in message
news:> 3A4232F0.782970EA@opal-rt.com> …
I just tried the semaphore test program on another machine, and
test_sched didn’t mind !

Differences are HW, and the other is running the CD RTP version. I guess
it’s back to the CD version on my HW, to see if the CD version is OK on
it…

Same test on my HW with the CD version slows down badly as well. So from
the looks of it, the problem shows up when running on this HW. We had a
few of these machines, and those we’re trying with Neutrino show similar
problems.

I’ll try our software on the new machine, which ran the
semaphore/context switching test without major problems, to see if
there’s anything else lurking to get us.

(The old machines are ASUS P2B-D boards with a BX440 chipset. The new
machine’s mobo is an MSI 694D Pro with a VIA 694X chipset).


This is like nailing Jello to a tree > :slight_smile:

Steven Dufresne wrote:

Oringially from Brian:

One thing to note that ClockCycles() doesn’t work if the thread migrates
from one CPU to another - the timestamp counters on the X86 aren’t
synchronized between the two.

But, except for your tests where you are not setting thread affinity,
this isn’t it either.

“Francois Desruisseaux” <Francois.Desruisseaux@opal-rt.com> wrote in message
news:3A424E12.BE9BCDC0@opal-rt.com

Mario Charest wrote:

On my machine it seems qcc is the culpid. I wrote test program
to exercise read and write on the HD, plus program that does heavy
graph to no avail. Only with qcc could I reproduce Francois’s problem.

Is it possible qcc usage of the swap file creates that side effect on
SMP?
Loading data from swap file in ram must somehow affec the cache of both
CPU no or at least invoke some kernel operation on both CPU?

Francois my own test shows that mutext are faster then semaphores.
Stick with semaphores if you can.


I guess you meant stick with mutexes

Oops.

Same test on my HW with the CD version slows down badly as well. So from
the looks of it, the problem shows up when running on this HW. We had a
few of these machines, and those we’re trying with Neutrino show similar
problems.

I’ll try our software on the new machine, which ran the
semaphore/context switching test without major problems, to see if
there’s anything else lurking to get us.

(The old machines are ASUS P2B-D boards with a BX440 chipset. The new
machine’s mobo is an MSI 694D Pro with a VIA 694X chipset).

Does the BIOS says it’s SMP 1.0 or 1.4 ? I’m not even sure
if that is important under QRTP. Maybe it’s worth a BIOS Upgrade

This is like nailing Jello to a tree > :slight_smile:

Steven Dufresne wrote:

Oringially from Brian:

One thing to note that ClockCycles() doesn’t work if the thread
migrates
from one CPU to another - the timestamp counters on the X86 aren’t
synchronized between the two.

But, except for your tests where you are not setting thread
affinity,
this isn’t it either.