usleep taking too much time, small test case included

That’s not very desirable on common PC hardware due to amount of PCI bus
time needed to adjust the timer. The APIC in some server x86 setups has
the required hardware built right into the CPU, this is exactly what is
needed to efficiently do dynamic timeouts.

BeOS didn’t use the the standard ISA timer chip but used the APIC.

What they might be doing here is having a high-res system tick but only
servicing the full system lists at the usual 10 msecs. And even
staggering them to smooth out the processing burden.

No, they are using the standard HZ tick time for the kernel’s periodic
timer but are using the hardware avaliable on modern systems to do
higher-rez timers for the POSIX apis.

chris

Interesting, that sounds like the APIC is available on all CPUs that have it. I thought the BIOS/mobo had final say on availability of APIC, maybe that is just the interrupt bits.

Evan Hillas wrote:

Interesting, that sounds like the APIC is available on all CPUs that
have it. I thought the BIOS/mobo had final say on availability of APIC,
maybe that is just the interrupt bits.

Not having source access (BeOS/Palm), I always assumed it would fall
back to whatever timers existed on a given system.

chris

“Evan Hillas” <blarg@blarg.blarg> wrote in message
news:cpfqbt$2u0$1@inn.qnx.com

Interesting, that sounds like the APIC is available on all CPUs that have
it. I thought the BIOS/mobo had final say on availability of APIC, maybe
that is just the interrupt bits.

I thought APIC was a feature provided by the chipset and not by the CPU.

Mario Charest wrote:

“Evan Hillas” <> blarg@blarg.blarg> > wrote in message
news:cpfqbt$2u0$> 1@inn.qnx.com> …

Interesting, that sounds like the APIC is available on all CPUs that have
it. I thought the BIOS/mobo had final say on availability of APIC, maybe
that is just the interrupt bits.


I thought APIC was a feature provided by the chipset and not by the CPU.

Both for interrupt servicing. There is the external APIC parts and there is a bunch of memory remappable registers sitting inside the PPro/P2/K7 and newer CPUs; one of them is a timer. They originally operated at frontside bus speed but may be deeper/faster for newer CPUs like the K8s.

This proximity to the processor core gives access time improvements at least as good external cache and hopefully better. However, if you compare the x86 against the PPC then even this setup looks slow for making adjustments. Can’t really beat a dedicated processor register. :slight_smile:

There could, in theory, be buffering/prefetching added to non-cached accesses and as long as order of reads and writes is maintained as a single explicit sequence then it should work. Heh, buffering writes and prefetching reads can’t be done while maintaining the sequence. Anyone know what the norm might be?


Evan

“Chris McKillop” <cdm@killerstuff.net> wrote in message
news:cpe9ev$rmv$1@inn.qnx.com

That’s not very desirable on common PC hardware due to amount of PCI bus
time needed to adjust the timer. The APIC in some server x86 setups has
the required hardware built right into the CPU, this is exactly what is
needed to efficiently do dynamic timeouts.

I’ve been thinking about that. Let say:

void foo(int inc ) {
int cnt;

for( cnt = 0; cnt < inc ; cnt ++ ) {
outp( BASE, cnt );
usleep ( 1 );
}

On operating system that do support us precision trough dynamic timer, the
code would actually generated 1 million interrupt per seconds! This gotta
hurt. Taking into account usleep must setup the timer, then the ISR kicks
in then the thread wakes up this uses a lot of CPU cycles not to mention
somewhat hurting real-time behavior because even if this thread is low
priority it will have an impact on higher thread prioirty. Depending on
processor speed and requested sleep time, it could actually be more
efficient to use nano_spin . Opinion?



BeOS didn’t use the the standard ISA timer chip but used the APIC.


What they might be doing here is having a high-res system tick but only
servicing the full system lists at the usual 10 msecs. And even
staggering them to smooth out the processing burden.


No, they are using the standard HZ tick time for the kernel’s periodic
timer but are using the hardware avaliable on modern systems to do
higher-rez timers for the POSIX apis.

chris

Mario Charest wrote:

I’ve been thinking about that. Let say:

void foo(int inc ) {
int cnt;

for( cnt = 0; cnt < inc ; cnt ++ ) {
outp( BASE, cnt );
usleep ( 1 );
}

On operating system that do support us precision trough dynamic timer, the
code would actually generated 1 million interrupt per seconds! This gotta
hurt. Taking into account usleep must setup the timer, then the ISR kicks
in then the thread wakes up this uses a lot of CPU cycles not to mention
somewhat hurting real-time behavior because even if this thread is low
priority it will have an impact on higher thread prioirty. Depending on
processor speed and requested sleep time, it could actually be more
efficient to use nano_spin . Opinion?

Yeah, that’s looks a tad rough. And nanosleep() can be worse of course. There would need to be a threshold based on CPU speed where the code path would busywait instead.


Evan

Mario Charest wrote:

“Chris McKillop” <> cdm@killerstuff.net> > wrote in message
news:cpe9ev$rmv$> 1@inn.qnx.com> …

That’s not very desirable on common PC hardware due to amount of PCI bus
time needed to adjust the timer. The APIC in some server x86 setups has
the required hardware built right into the CPU, this is exactly what is
needed to efficiently do dynamic timeouts.


I’ve been thinking about that. Let say:

void foo(int inc ) {
int cnt;

for( cnt = 0; cnt < inc ; cnt ++ ) {
outp( BASE, cnt );
usleep ( 1 );
}

nanospin_calibrate, nanospin_ns a.s.o. is the way to go for small
(active) waits.

Regards

Armin



On operating system that do support us precision trough dynamic timer, the
code would actually generated 1 million interrupt per seconds! This gotta
hurt. Taking into account usleep must setup the timer, then the ISR kicks
in then the thread wakes up this uses a lot of CPU cycles not to mention
somewhat hurting real-time behavior because even if this thread is low
priority it will have an impact on higher thread prioirty. Depending on
processor speed and requested sleep time, it could actually be more
efficient to use nano_spin . Opinion?




BeOS didn’t use the the standard ISA timer chip but used the APIC.


What they might be doing here is having a high-res system tick but only
servicing the full system lists at the usual 10 msecs. And even
staggering them to smooth out the processing burden.


No, they are using the standard HZ tick time for the kernel’s periodic
timer but are using the hardware avaliable on modern systems to do
higher-rez timers for the POSIX apis.

chris

Evan Hillas wrote:

Yeah, that’s looks a tad rough. And nanosleep() can be worse of
course. There would need to be a threshold based on CPU speed where the
code path would busywait instead.

I guess in keeping with the method it would limit the resolution instead of busywaiting or giving you the requested period.


Evan

Armin Steinhoff wrote:

nanospin_calibrate, nanospin_ns a.s.o. is the way to go for small
(active) waits.

What gets used and what should get used is not always the same. :wink:


Evan

How about multiple concurrent requests? Do they manage a queue of timeouts
and reprogram the timer when another thread requests a timeout that expires
before the current one would?

Marty Doane
Siemens L&A

“Chris McKillop” <cdm@killerstuff.net> wrote in message
news:cpcssn$rrm$1@inn.qnx.com

Marty Doane wrote:
You seem disappointed in QNX. Are there other OSes that do what you want
without requiring additional hardware? Which ones and how do work?

BeOS can do it, so can the PalmOS 6 kernel (not surprising, given the
same people built both of them). There are also patches to the Linux
kernel to add highres timers (> http://www.celinux.org> ). QNX takes the
traditional periodic timer approach for all timer based APIs, while
these other systems are setup to program a timer to fire dynamically as
new timeout requests are made in the system. So there is no overhead to
the timers if no one is asking for a timeout. They may or may not also
have periodic timers going for the kernel itself - Linux does, PalmOS 6
does not.

chris

Marty Doane wrote:

How about multiple concurrent requests? Do they manage a queue of timeouts
and reprogram the timer when another thread requests a timeout that expires
before the current one would?

Yep, stack the first one and put a smaller value in the register then when the newer/earlier request times out the older/later timeout is reloaded into the register.

As Mario has pointed out this still needs to be limited in the fineness of interrupts generation so may still produce late timeouts.


Evan

Evan Hillas wrote:

Marty Doane wrote:

How about multiple concurrent requests? Do they manage a queue of
timeouts
and reprogram the timer when another thread requests a timeout that
expires
before the current one would?


Yep, stack the first one and put a smaller value in the register then
when the newer/earlier request times out the older/later timeout is
reloaded into the register.

As Mario has pointed out this still needs to be limited in the fineness
of interrupts generation so may still produce late timeouts.

Or were you more interested in a particular OS?


Evan

No, just general education.

Thanks.

Marty Doane
Siemens L&A

“Evan Hillas” <blarg@blarg.blarg> wrote in message
news:cpkr40$p7g$1@inn.qnx.com

Evan Hillas wrote:
Marty Doane wrote:

How about multiple concurrent requests? Do they manage a queue of
timeouts
and reprogram the timer when another thread requests a timeout that
expires
before the current one would?


Yep, stack the first one and put a smaller value in the register then
when the newer/earlier request times out the older/later timeout is
reloaded into the register.

As Mario has pointed out this still needs to be limited in the fineness
of interrupts generation so may still produce late timeouts.


Or were you more interested in a particular OS?


Evan