ISR Misses Interrupts

David_Kuechenmeister · December 10, 2002, 2:39pm

Ok, I’ve got some results to post.

The IRQs in use are:
IRQ 0 heartbeat interrupt
IRQ 1 keyboard
IRQ 2 cascade to second interrupt controller
IRQ 3 COM2
IRQ 4 COM1
IRQ 5 XT Hard Disk
IRQ 7 My Timing Interrupt
IRQ 8 Real Time Clock
IRQ 9 Redirected IRQ 2, whatever is meant by that
IRQ 10 Free
IRQ 11 Serial Controller, Network Controller
IRQ 12 Display Controller
IRQ 13 Free
IRQ 14 IDE Controller
IRQ 15 Free
I reduced the priority of the handler thread to 15, as was suggested.
There are many more missed calls to the handler, now. The missed calls start
right away, as well, rather than taking several hours to deteriorate to this
condition. The latency of the ISR is steady at around 20 usec. Limitations
in the way the FPGA processes output limit the duration of the interrupt
pulse to 45 usec, even if it is ack’ed earlier than that. The timeout on the
pulse is at 250 usec, though, and it doesn’t extend beyond 45 usec after the
ISR runs.

The scheduling latency for the handler thread is between 2 to 3 millisec.
Sometimes this extends to between 4 to 6 millisec, but at this point the
system is out of control, anyway, so it doesn’t much matter. At the point
where the scheduling latency spans one or more interrupts, I see that
sometimes the interrupt is ack’ed and sometimes it times out. There isn’t
any obvious pattern to the acking and timing out, except that the first
interrupt always seems to be acked.

The application is multi-threaded. I’ve followed one of the examples in
the help file that came with our windows SDK. I have a main thread that
spawns the handler thread. The handler thread attaches to the interrupt and
blocks on InterruptWait.

I checked with our PC vendor and made sure the SMM features were disabled.
The fact that the first interrupt is ack’ed seems to verify that. I don’t
understand the long scheduling latency or the very non-deterministic
behavior that I see. If you have any additional suggestions, I’d appreciate
them.

Sincerely,
…dk

“Adam Mallory” <amallory@qnx.com> wrote in message
news:asqfdg$2vd$1@nntp.qnx.com…

David Kuechenmeister <> david.kuechenmeister@viasat.com> > wrote in message
news:asq6ei$gia$> 1@inn.qnx.com> …
[problem details omitted]

Can you post a list of the IRQ’s in use in your system (ie. NIC card,
serial
ports, peripherals etc)? Is there any other activity going on in the
system, while the test is being performed (ie. logging etc).

Also, modify your test to do the following:

-run you application doing the InterruptWait() at a priority band around
15
(not too high, just higher than most of the non-essential processes).
You’ll want to ensure you have a method of stopping the test (high
priority
shell to slay etc) later >

-make your application multithreaded (if it’s not already) with your main
code path in one thread and a busy/wait loop in the other.
eg: for (;;);

Let us know the results of the test.

Adam_Mallory1 · December 10, 2002, 3:49pm

The IRQs in use are:

IRQ 3 COM2
IRQ 4 COM1
IRQ 5 XT Hard Disk

^^^^^^^^^^^^^^
Are these devices in use during these trials? Do you have a parallel port
on this device?

I reduced the priority of the handler thread to 15, as was suggested.
There are many more missed calls to the handler, now. The missed calls
start
right away, as well, rather than taking several hours to deteriorate to
this
condition. The latency of the ISR is steady at around 20 usec. Limitations
in the way the FPGA processes output limit the duration of the interrupt
pulse to 45 usec, even if it is ack’ed earlier than that. The timeout on
the
pulse is at 250 usec, though, and it doesn’t extend beyond 45 usec after
the
ISR runs.

Did you add the busywait loop into this priority band as well? The idea
here is to avoid the system falling into idle for any reason. What priority
band does your application usually run at - what else is running in that
band or higher. Perhaps post the output of “pidin in”, “pidin arg”, “pidin”
so that we can all take a peek.

-Adam

Rennie_Allen2 · December 10, 2002, 4:13pm

David Kuechenmeister wrote:

Wouldn’t the more rapid onset of this condition, at a lower handler thread
priority, be more of an indication that something isn’t getting scheduled?

Not necessarily, it could simply be because, your other threads are
now pre-empting your HRT thread (totally normal - expected behavior).

If SMM is going to interfere, it wouldn’t matter to what priority a
particular task was set, would it?

Well, SMM latency would simply be obscured by the latency you’ve
just introduced on purpose.

I’ve talked to the tech rep at WinSystems and read
the chapter in the Pentium programmers manual, but neither seemed to
indicate that I could control the mode any more than to disable the Power
Managment and ACPI. Is there an undocumented register that controls this
feature?

Not in general, but I’m not sure what hardware you’re running on.
Certainly there could be all sorts of external (to the processor)
registers provided by the board mfg.

I don’t have any periperals, aside from a monitor,keyboard, hard
disk, and NIC running on this system. Doesn’t SMM need some sort of hardware
request on the SMI line to activate?

The SMI must be asserted to enter SMM.

And if the guys that build the PC don’t
know what SMM does, who would?

I’m sure the guys that build the PC know what it does, I was
simply suggesting double checking, since 2ms is an outrageous
period of time, and very typical of SMM latency.

I can’t imagine any QNX software being involved in a
latency of this magnitude (even a buggy driver seems unlikely
to introduce a 2ms latency).

I’ve got another reply to Adam’s request for more info. Possibly that will
shed some more light on what I could be doing to the scheduler.

I’m very interested in finding out what problem you are having.

David_Kuechenmeister · December 10, 2002, 8:29pm

“Adam Mallory” <amallory@qnx.com> wrote in message
news:at528c$b7d$1@nntp.qnx.com…

The IRQs in use are:

IRQ 3 COM2
IRQ 4 COM1
IRQ 5 XT Hard Disk

^^^^^^^^^^^^^^
Are these devices in use during these trials?
No, these devices are not used.

Do you have a parallel port on this device?

Yes, it is set to IRQ7, but there is nothing attached. I need the parallel

port to write my timing pulses. This is only a PC, by the way. There’s
nothing custom about it. It just happens to be on a PC-104 board.

I reduced the priority of the handler thread to 15, as was suggested.
There are many more missed calls to the handler, now. The missed calls
start
right away, as well, rather than taking several hours to deteriorate to
this
condition. The latency of the ISR is steady at around 20 usec.
Limitations
in the way the FPGA processes output limit the duration of the interrupt
pulse to 45 usec, even if it is ack’ed earlier than that. The timeout on
the
pulse is at 250 usec, though, and it doesn’t extend beyond 45 usec after
the
ISR runs.

Did you add the busywait loop into this priority band as well? The idea
here is to avoid the system falling into idle for any reason. What
priority
band does your application usually run at - what else is running in that
band or higher. Perhaps post the output of “pidin in”, “pidin arg”,
“pidin”
so that we can all take a peek.

My main() function is at priority 10. It starts the handler thread, then
blocks on a MsgReceive within a forever loop. The handler thread boosts its
own priority to 15, then blocks on InterruptWait in a forever loop.

Following are the results from the pidin command. My application processes
are the *.exe ones. Sorry about the crappy formatting.

Thanks,
…dk

pidin in :
CPU:X86 Processors:1 FreeMem:232Mb/255Mb BootTime:Dec 10 14:34:39 utc 2002
Processor1: 586 Intel 586 F5M8S1 167Mhz FPU

pidin arg:
pid Arguments
1 procnto
2 /sbin/tinit
3 slogger
12292 mqueue
5 pci-bios
6 devb-eide blk auto=partition dos exec=all cam quiet eide
dma,ioport=0x1f0,irq=14 eide dma,ioport=0x170,irq=15
7 devc-con -n4
8 fs-pkg -a/pkgs/base/safe-config/etc/system/package/packages
4105 pipe
135178 random -t
45067 devc-pty -n 32
77836 devc-par -p0x378
114701 devc-ser8250 -u3 3e8,5 -u4 2e8,9
77838 spooler -d/dev/par1
77839 io-net -ptcpip -ppppmgr -pqnet
155664 inetd
94225 devc-ser8250 -u1 3f8,4 -u2 2f8,3
151570 portmap
114707 devb-fdc cam quiet blk auto=partition,cache=100k
167956 pdebug 8000
163861 /usr/sbin/routed /var/log/route.log
167958 pdebug 8010
167959 dumper -d /var/dumps
176152 syslogd
184345 /usr/local/bin/ntpd -g -l /var/log/ntp.log -p /var/log/ntp.pid
319514 fs-nfs2 192.168.2.1:/c/model3880/configurations/servoconfig
/home/servo/config
233499 -sh
233500 login
233501 login
233502 login
880671 pidin arg
340000 inimgr.exe /home/servo/config/config3880.ini
385057 axisMgr.exe
430114 acquiProc.exe
475171 servoTest.exe
520228 axisIOProc.exe
565285 proc_timer.exe
610342 UserIOProc.01.exe
655399 pedIOProc.exe
700456 walkboxd.exe
745513 cmdProcessor.exe
794666 msgHandler.exe

pidin
pid tid name prio STATE Blocked
1 1 procnto 0f READY
1 2 procnto 15r RECEIVE 1
1 3 procnto 63r RECEIVE 1
1 4 procnto 63r RECEIVE 1
1 5 procnto 15r RECEIVE 1
1 6 procnto 10r RUNNING
1 7 procnto 15r RECEIVE 1
1 8 procnto 6r NANOSLEEP
1 9 procnto 10r RECEIVE 1
2 1 sbin/tinit 10o REPLY 1
3 1 proc/boot/slogger 10o RECEIVE 1
12292 1 sbin/mqueue 10o RECEIVE 1
5 1 proc/boot/pci-bios 10o RECEIVE 1
6 1 roc/boot/devb-eide 10o SIGWAITINFO
6 2 roc/boot/devb-eide 21r RECEIVE 1
6 3 roc/boot/devb-eide 10o RECEIVE 7
6 4 roc/boot/devb-eide 10o CONDVAR b0378fec
6 6 roc/boot/devb-eide 10o RECEIVE 4
6 7 roc/boot/devb-eide 10o RECEIVE 4
6 8 roc/boot/devb-eide 10o RECEIVE 4
7 1 /x86/sbin/devc-con 11o RECEIVE 1
8 1 .2/x86/sbin/fs-pkg 10o RECEIVE 1
8 2 .2/x86/sbin/fs-pkg 10o SIGWAITINFO
8 3 .2/x86/sbin/fs-pkg 10o RECEIVE 1
8 4 .2/x86/sbin/fs-pkg 10o RECEIVE 1
8 5 .2/x86/sbin/fs-pkg 10o RECEIVE 1
8 6 .2/x86/sbin/fs-pkg 10o RECEIVE 1
4105 1 sbin/pipe 10o RECEIVE 1
4105 2 sbin/pipe 10o RECEIVE 1
4105 3 sbin/pipe 10o RECEIVE 1
4105 4 sbin/pipe 10o RECEIVE 1
135178 1 usr/sbin/random 10o SIGWAITINFO
135178 2 usr/sbin/random 10o RECEIVE 1
135178 3 usr/sbin/random 10o NANOSLEEP
45067 1 sbin/devc-pty 10o RECEIVE 1
77836 1 sbin/devc-par 10o RECEIVE 1
77836 2 sbin/devc-par 9r CONDVAR 804f938
114701 1 sbin/devc-ser8250 24o RECEIVE 1
77838 1 usr/sbin/spooler 10o NANOSLEEP
77839 1 sbin/io-net 10o SIGWAITINFO
77839 2 sbin/io-net 10o RECEIVE 1
77839 3 sbin/io-net 10o RECEIVE 1
77839 6 sbin/io-net 33o RECEIVE 6
77839 7 sbin/io-net 10o RECEIVE 22
77839 8 sbin/io-net 21o RECEIVE 17
77839 9 sbin/io-net 10o RECEIVE 1
77839 10 sbin/io-net 18o RECEIVE 1
77839 13 sbin/io-net 10o CONDVAR 80c3174
155664 1 usr/sbin/inetd 10o SIGWAITINFO
94225 1 sbin/devc-ser8250 24o RECEIVE 1
151570 1 usr/bin/portmap 10o SIGWAITINFO
114707 1 sbin/devb-fdc 10o SIGWAITINFO
167956 1 usr/bin/pdebug 10o REPLY 77839
163861 1 usr/sbin/routed 10o SIGWAITINFO
167958 1 usr/bin/pdebug 10o REPLY 77839
167959 1 usr/sbin/dumper 10o RECEIVE 1
176152 1 usr/sbin/syslogd 10o SIGWAITINFO
176152 2 usr/sbin/syslogd 15o RECEIVE 1
176152 3 usr/sbin/syslogd 10o RECEIVE 1
176152 4 usr/sbin/syslogd 10o RECEIVE 1
184345 1 usr/local/bin/ntpd 10o SIGSUSPEND
319514 1 usr/sbin/fs-nfs2 10o RECEIVE 1
319514 2 usr/sbin/fs-nfs2 10o RECEIVE 1
319514 3 usr/sbin/fs-nfs2 10o RECEIVE 1
319514 4 usr/sbin/fs-nfs2 10o RECEIVE 1
319514 5 usr/sbin/fs-nfs2 10o RECEIVE 1
233499 1 bin/sh 10o SIGSUSPEND
233500 1 bin/login 10o REPLY 7
233501 1 bin/login 10o REPLY 7
233502 1 bin/login 10o REPLY 7
897055 1 bin/pidin 10o REPLY 1
340000 1 ./inimgr.exe 33o RECEIVE 1
385057 1 ./axisMgr.exe 10o CONDVAR b034b2e0
385057 2 ./axisMgr.exe 43o RECEIVE 1
385057 3 ./axisMgr.exe 43o RECEIVE 3
385057 4 ./axisMgr.exe 10o RECEIVE 5
385057 5 ./axisMgr.exe 10o DEAD
430114 1 ./acquiProc.exe 38o RECEIVE 1
475171 1 ./servoTest.exe 38o RECEIVE 3
475171 2 ./servoTest.exe 10o RECEIVE 1
520228 1 ./axisIOProc.exe 48o RECEIVE 1
565285 1 ./proc_timer.exe 10o RECEIVE 2
565285 2 ./proc_timer.exe 15o INTR
610342 1 /UserIOProc.01.exe 33o RECEIVE 1
655399 1 ./pedIOProc.exe 33o RECEIVE 1
700456 1 ./walkboxd.exe 10o RECEIVE 2
700456 2 ./walkboxd.exe 10o SEM 8058ba0
700456 3 ./walkboxd.exe 10o SEM 8058b98
745513 1 ./cmdProcessor.exe 10o CONDVAR b034b2e0
745513 2 ./cmdProcessor.exe 33o RECEIVE 3
745513 3 ./cmdProcessor.exe 10o SEM 8063b6c
745513 4 ./cmdProcessor.exe 43o RECEIVE 6
794666 1 ./msgHandler.exe 33o CONDVAR b034b2e0
794666 2 ./msgHandler.exe 33o JOIN 6
794666 3 ./msgHandler.exe 33o SEM 80a4b4c
794666 4 ./msgHandler.exe 33o JOIN 8
794666 5 ./msgHandler.exe 33o RECEIVE 4
794666 6 ./msgHandler.exe 33o REPLY 77839
794666 7 ./msgHandler.exe 33o NANOSLEEP
794666 8 ./msgHandler.exe 33o RECEIVE 7

David_Kuechenmeister · December 10, 2002, 9:45pm

I knocked the priority of the handler thread from max to 15, just above our
default of 10. Instead of taking hours to develop into the situation where
the scheduling latency is long and not deterministic, things get, in the
words of our control engineer, wacked-out, right away.

Wouldn’t the more rapid onset of this condition, at a lower handler thread
priority, be more of an indication that something isn’t getting scheduled?
If SMM is going to interfere, it wouldn’t matter to what priority a
particular task was set, would it?

Let me be the first to point out that I am completely in the dark about SMM.
Working on Motorola devices certainly didn’t prepare me for a CPU that has
this sort of behavior. I’ve talked to the tech rep at WinSystems and read
the chapter in the Pentium programmers manual, but neither seemed to
indicate that I could control the mode any more than to disable the Power
Managment and ACPI. Is there an undocumented register that controls this
feature? I don’t have any periperals, aside from a monitor,keyboard, hard
disk, and NIC running on this system. Doesn’t SMM need some sort of hardware
request on the SMI line to activate? And if the guys that build the PC don’t
know what SMM does, who would?

I’ve got another reply to Adam’s request for more info. Possibly that will
shed some more light on what I could be doing to the scheduler.

Thanks

“Rennie Allen” <rallen@csical.com> wrote in message
news:3DF5E847.7050003@csical.com…

David Kuechenmeister wrote:
Looks like SMM is represented by Power Managment and ACPI in our bios.
That’s what our WinSystems tech rep told me, anyway. These are both
disabled, so I don’t think I’m going to have any SMM problems.

Power management is not all that is done with SMM, for instance
a common use is to provide legacy support for USB devices. One
chip many years ago, emulated an 8237 DMA controller in SMM.

Bottom line is, just because you disabled power management in
the BIOS it does not necessarily mean that SMM is disabled.

I’ll have to go back and work through the suggestions Adam made, now,
since
I don’t think either of these features were enabled earlier

I wouldn’t be too quick to eliminate SMM. SMM is about the
only thing I have ever run into that can cause this sort
of a delay.

I did get the ack done for the FPGA, so we should be able to get an
additional data point for how long it takes the ISR to ack the
interrupt. If
I understand the basic problem with SMM, I shouldn’t even see the ISR
ack
the interrupt when the system is in SMM mode.

That’s right, QNX doesn’t exist on the CPU when SMM is
active. Disabling SMM can be a daunting task (as Chris
McKillop alluded to).

Chris_McKillop1 · December 11, 2002, 1:31am

That’s right, QNX doesn’t exist on the CPU when SMM is
active. Disabling SMM can be a daunting task (as Chris
McKillop alluded to).

Yep, I have heard from people that customers have had to lift a trace
off thier board to get it truly disabled.

chris

–
Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

David_Kuechenmeister · December 11, 2002, 3:56pm

I hope the manufacturer is right, because space is pretty tight on a PC-104
board.

Thanks.

“Chris McKillop” <cdm@qnx.com> wrote in message
news:at64ic$3bh$2@nntp.qnx.com…

That’s right, QNX doesn’t exist on the CPU when SMM is
active. Disabling SMM can be a daunting task (as Chris
McKillop alluded to).

Yep, I have heard from people that customers have had to lift a trace
off thier board to get it truly disabled.

chris

–
Chris McKillop <> cdm@qnx.com> > “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Adam_Mallory1 · December 11, 2002, 7:13pm

David Kuechenmeister <david.kuechenmeister@viasat.com> wrote in message
news:at5iao$c97$1@inn.qnx.com…

Yes, it is set to IRQ7, but there is nothing attached. I need the parallel
port to write my timing pulses. This is only a PC, by the way. There’s
nothing custom about it. It just happens to be on a PC-104 board.

If it’s on a PC-104 board, are you’re using a serial terminal to manipulate
the board or do you have VGA capability on that stack?

BTW, I’ve noticed it seems you’re running the full blown development
environment on your board - why?

I reduced the priority of the handler thread to 15, as was
suggested.
There are many more missed calls to the handler, now. The missed calls
start
right away, as well, rather than taking several hours to deteriorate
to
this
condition. The latency of the ISR is steady at around 20 usec.

Well now that I’ve seen what is running on your board (ie. everything and
the kitchen sink) - I would expect the situation to be exasperated, since
scheduling latency is definately going to affect a prio 15 thread.

You should elimate everything non-essential to running your system (ie.
OS+drivers+your software). Things such as the package file system etc
should not be on your target. Then if all seems to work, start adding
things one at a time.

Sorry if you’ve already stated this, but can you post the hardware
model/manuf, as others may have had similar experiences.

-Adam

David_Kuechenmeister · December 16, 2002, 12:57pm

“Adam Mallory” <amallory@qnx.com> wrote in message
news:at82hq$c5g$1@nntp.qnx.com…

David Kuechenmeister <> david.kuechenmeister@viasat.com> > wrote in message
news:at5iao$c97$> 1@inn.qnx.com> …

Yes, it is set to IRQ7, but there is nothing attached. I need the
parallel
port to write my timing pulses. This is only a PC, by the way. There’s
nothing custom about it. It just happens to be on a PC-104 board.

If it’s on a PC-104 board, are you’re using a serial terminal to
manipulate
the board or do you have VGA capability on that stack?

We have a VGA adapter card for the monitor.

BTW, I’ve noticed it seems you’re running the full blown development
environment on your board - why?

Lack of experience with QNX, mainly. This particular platform was also a
development board early in the project. Since it worked okay for a target,
we just never pared down the image.

I reduced the priority of the handler thread to 15, as was
suggested.
There are many more missed calls to the handler, now. The missed
calls
start
right away, as well, rather than taking several hours to deteriorate
to
this
condition. The latency of the ISR is steady at around 20 usec.

Well now that I’ve seen what is running on your board (ie. everything and
the kitchen sink) - I would expect the situation to be exasperated, since
scheduling latency is definately going to affect a prio 15 thread.

But why would the system run normally for half a day when the timing thread
is at 63, then start showing the same scheduling latency problem? The timing
thread, at 63, should pre-empt everything but the kernel, right?

You should elimate everything non-essential to running your system (ie.
OS+drivers+your software). Things such as the package file system etc
should not be on your target. Then if all seems to work, start adding
things one at a time.

I’ll do that this week. I need to make a Disk-On-Chip image for this system,
anyway. Is creating the image as trial and error as it appears? Start with a
few things, if it works, you might have too many, if it doesn’t work, you
definitely have too few? I have some other work that is easier to mark off
on a schedule to get done, too, so this might be a couple more days coming.

Sorry if you’ve already stated this, but can you post the hardware
model/manuf, as others may have had similar experiences.

The board is a WinSystems PPM-TX Pentium MMX 166 MHz.

David_Kuechenmeister · December 16, 2002, 1:01pm

“Rennie Allen” <rallen@csical.com> wrote in message
news:3DF612BE.6040505@csical.com…

David Kuechenmeister wrote:

Wouldn’t the more rapid onset of this condition, at a lower handler
thread
priority, be more of an indication that something isn’t getting
scheduled?

Not necessarily, it could simply be because, your other threads are
now pre-empting your HRT thread (totally normal - expected behavior).

There’s a good example of putting my foot in my mouth. Duh!

I’ll call WinSystems again, to make sure that there isn’t another way to
enable SMM. Why do you think the onset of the SMM-like symptoms is so
delayed when the HRT thread is at the highest priority?

Rennie_Allen2 · December 16, 2002, 2:09pm

David Kuechenmeister wrote:

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3DF612BE.6040505@csical.com> …

I’ll call WinSystems again, to make sure that there isn’t another way to
enable SMM. Why do you think the onset of the SMM-like symptoms is so
delayed when the HRT thread is at the highest priority?

I don’t know, but alot of SMM stuff is timer related (power management),
also not all SMM functions cause that much latency, some may be happening,
and they aren’t long enough to affect your app, but then one of the
high latency functions kicks in…

I’m not convinced it is SMM, but this is one horse you have to kick a
couple of times to make sure it’s dead

Adam_Mallory1 · December 16, 2002, 5:13pm

Lack of experience with QNX, mainly. This particular platform was also a
development board early in the project. Since it worked okay for a target,
we just never pared down the image.

ok.

But why would the system run normally for half a day when the timing
thread
is at 63, then start showing the same scheduling latency problem? The
timing
thread, at 63, should pre-empt everything but the kernel, right?

That’s the problem we’re trying to solve - I’m not suggesting that it should
not work, but that there could be a side effect from a component that isn’t
really a requirement for your target.

It will if you’re running roundrobin, you could be pre-empting a procmgr
thread, but interrupt handling should still continue.

You should elimate everything non-essential to running your system (ie.
OS+drivers+your software). Things such as the package file system etc
should not be on your target. Then if all seems to work, start adding
things one at a time.

I’ll do that this week. I need to make a Disk-On-Chip image for this
system,
anyway. Is creating the image as trial and error as it appears? Start with
a
few things, if it works, you might have too many, if it doesn’t work, you
definitely have too few? I have some other work that is easier to mark off
on a schedule to get done, too, so this might be a couple more days
coming.

I’m more interested in pearing down the image, and if the issue fails to
show up, then it’s less likely an issue with the OS, and more of a
sideeffect from one of the components you’ve included. After that, it’s
processes of elimination to figure out which one might be the guilty party -
so not quite trial/error.

The board is a WinSystems PPM-TX Pentium MMX 166 MHz.

Have you explored the SMM issue any farther (ie. a trace on the board to
determine if something might be triggering it)?

-Adam

David_Kuechenmeister · December 16, 2002, 9:50pm

“Adam Mallory” <amallory@qnx.com> wrote in message
news:atl1d0$ngd$1@nntp.qnx.com…

It will if you’re running roundrobin, you could be pre-empting a procmgr
thread, but interrupt handling should still continue.

We’re using FIFO scheduling.

I’m more interested in pearing down the image, and if the issue fails to
show up, then it’s less likely an issue with the OS, and more of a
sideeffect from one of the components you’ve included. After that, it’s
processes of elimination to figure out which one might be the guilty
party -
so not quite trial/error.

Here is the pidin output from an image that contains enough to run our
system. If you have some suggestions on what might still be deleted, we’ll
give them a try, but this seems to be the minimum. This is running on a
M-systems DiskOnChip, thus the devb-doc process. I also had a telnet session
active into the board, so I could gather the pidin information.

The early results are that interrupt latency is still from 20 to 30 usec and
scheduling latency is nominally at 50 usec. There are still frequent
scheduling latency measurements of greater than 200 usec, approximately once
per second. There are very rare measurements exceeding 500 usec,
approximately one every four or five minutes. I don’t see anything greater
that 1 msec, yet. I’ll let you know in the morning about how that has
developed.

I still plan to investigate the SMM a little further, but the real estate on
a PC-104 is so compact, that I don’t think I’ll be able to cut traces. I
think the SMI pin is one of the interior pins on the BGA, so it’s hard to
find, too. Maybe they have a test point or via for it.

pidin

pid tid name prio STATE Blocked
1 1 procnto 0f READY
1 2 procnto 10r RUNNING
1 3 procnto 63r RECEIVE 1
1 4 procnto 63r RECEIVE 1
1 5 procnto 10r REPLY 1
1 6 procnto 48r RECEIVE 1
1 7 procnto 48r RECEIVE 1
1 8 procnto 15r RECEIVE 1
4098 1 proc/boot/pci-bios 10r RECEIVE 1
4099 1 proc/boot/io-net 10r SIGWAITINFO
4099 2 proc/boot/io-net 18r RECEIVE 1
4099 3 proc/boot/io-net 10r RECEIVE 1
4099 4 proc/boot/io-net 10r RECEIVE 1
4099 5 proc/boot/io-net 21r RECEIVE 5
4099 6 proc/boot/io-net 10r RECEIVE 9
8196 1 proc/boot/devc-con 10r RECEIVE 1
16389 1 devb-doc 10r SIGWAITINFO
16389 2 devb-doc 21r RECEIVE 1
16389 3 devb-doc 15r RECEIVE 7
16389 4 devb-doc 10r CONDVAR b0378fec
16389 5 devb-doc 33r RECEIVE 4
16389 6 devb-doc 10r RECEIVE 4
16389 7 devb-doc 63r RECEIVE 4
16390 1 sbin/pipe 63r RECEIVE 1
16390 2 sbin/pipe 10r RECEIVE 1
16390 3 sbin/pipe 10r RECEIVE 1
16390 4 sbin/pipe 10r RECEIVE 1
16391 1 sbin/mqueue 10r RECEIVE 1
16392 1 /boot/devc-ser8250 10r RECEIVE 1
16393 1 sbin/devc-pty 10r RECEIVE 1
16394 1 proc/boot/inetd 10r SIGWAITINFO
20491 1 bin/sh 10r REPLY 8196
557068 1 proc/boot/telnetd 10r SIGWAITINFO
16397 1 sbin/routed 10r SIGWAITINFO
45070 1 proc/boot/fs-nfs2 10r RECEIVE 1
45070 2 proc/boot/fs-nfs2 10r RECEIVE 1
45070 3 proc/boot/fs-nfs2 10r RECEIVE 1
45070 4 proc/boot/fs-nfs2 10r RECEIVE 1
45070 5 proc/boot/fs-nfs2 33r RECEIVE 1
65551 1 rvo/bin/inimgr.exe 33r RECEIVE 1
110608 1 vo/bin/axisMgr.exe 10r CONDVAR b034b2e0
110608 2 vo/bin/axisMgr.exe 43r RECEIVE 1
110608 3 vo/bin/axisMgr.exe 43r RECEIVE 3
110608 4 vo/bin/axisMgr.exe 10r RECEIVE 5
110608 5 vo/bin/axisMgr.exe 10r DEAD
155665 1 /bin/acquiProc.exe 38r RECEIVE 1
200722 1 /bin/servoTest.exe 38r RECEIVE 3
200722 2 /bin/servoTest.exe 33r RECEIVE 1
245779 1 bin/axisIOProc.exe 48r RECEIVE 1
290836 1 bin/proc_timer.exe 10r RECEIVE 2
290836 2 bin/proc_timer.exe 63r INTR
335893 1 /UserIOProc.01.exe 33r RECEIVE 1
380950 1 /bin/pedIOProc.exe 33r RECEIVE 1
426007 1 o/bin/walkboxd.exe 10r RECEIVE 2
426007 2 o/bin/walkboxd.exe 10r SEM 8058ba0
426007 3 o/bin/walkboxd.exe 10r SEM 8058b98
471064 1 n/cmdProcessor.exe 10r CONDVAR b034b2e0
471064 2 n/cmdProcessor.exe 33r RECEIVE 3
471064 3 n/cmdProcessor.exe 10r SEM 8063b6c
471064 4 n/cmdProcessor.exe 43r RECEIVE 6
520217 1 bin/msgHandler.exe 33r CONDVAR b034b2e0
520217 2 bin/msgHandler.exe 33r JOIN 6
520217 3 bin/msgHandler.exe 33r SEM 80a512c
520217 4 bin/msgHandler.exe 33r JOIN 8
520217 5 bin/msgHandler.exe 33r RECEIVE 4
520217 6 bin/msgHandler.exe 33r REPLY 4099
520217 7 bin/msgHandler.exe 33r NANOSLEEP
520217 8 bin/msgHandler.exe 33r RECEIVE 7
557082 1 bin/sh 10r SIGSUSPEND
630811 1 bin/pidin 10r REPLY 1

pidin arg
pid Arguments
1 procnto
4098 pci-bios
4099 io-net -dspeedo -ptcpip
8196 devc-con -n2
16389 devb-doc blk automount=hd0t77:/
16390 pipe
16391 mqueue
16392 devc-ser8250
16393 devc-pty
16394 inetd
20491 sh
557068 telnetd
16397 routed
45070 fs-nfs2 192.168.2.1:/c/model3880/configurations/servoconfig
/home/servo
config
65551 inimgr.exe /home/servo/config/config3880.ini
110608 axisMgr.exe
155665 acquiProc.exe
200722 servoTest.exe
245779 axisIOProc.exe
290836 proc_timer.exe
335893 UserIOProc.01.exe
380950 pedIOProc.exe
426007 walkboxd.exe
471064 cmdProcessor.exe
520217 msgHandler.exe
557082 -sh
639003 pidin arg

David_Kuechenmeister · December 17, 2002, 1:35pm

After an overnight run, I have the same kind of behavior that prompted this
thread. The scheduling latency can be as high as 6 msec, missing two
interrupts. I know that the interrupts aren’t acked, because I have the FPGA
generated interrupt timeout after 250 usec, whereas an acked interrupt is
reset after 45 usec.

One interesting thing is how some of the process priorities have changed. I
know there will be changes in the priorities when messages are handled,
since the receiver floats to the sender’s priority, but these seem to be
more permanent. Following is a diff of the file I posted yesterday and one I
collected this morning. If you would like the entire file, let me know. I
don’t know what the significance of all the procnto or devb-doc threads are.
I would have figure one of each, but there are certainly more.

I’ve got another request in to WinSystems for information on their
implementation of SMM. It would be nice to watch the SMI pin. I don’t think
cutting it on a PGA is an option, tempting as that sounds.

diff -b pidin_1216.txt pidin_1217.txt

4c4
< 1 2 procnto 10r RUNNING

1 2 procnto 48r RECEIVE 1
8c8

< 1 6 procnto 48r RECEIVE 1

1 6 procnto 10r RUNNING
18c18

< 8196 1 proc/boot/devc-con 10r RECEIVE 1

8196 1 proc/boot/devc-con 33r RECEIVE 1
23,26c23,26

< 16389 5 devb-doc 33r RECEIVE 4
< 16389 6 devb-doc 10r RECEIVE 4
< 16389 7 devb-doc 63r RECEIVE 4
< 16390 1 sbin/pipe 63r RECEIVE 1

16389 5 devb-doc 63r RECEIVE 4
16389 6 devb-doc 63r RECEIVE 4
16389 7 devb-doc 10r RECEIVE 4
16390 1 sbin/pipe 10r RECEIVE 1
35c35

< 557068 1 proc/boot/telnetd 10r SIGWAITINFO

655372 1 proc/boot/telnetd 10r SIGWAITINFO
46c46

< 110608 4 vo/bin/axisMgr.exe 10r RECEIVE 5

110608 4 vo/bin/axisMgr.exe 33r RECEIVE 5

David_Kuechenmeister · December 24, 2002, 1:21pm

It’s about time to bring this thread back up to the root level of Outlook
Express. Replies were getting buried about 10 levels deep.

I’ve noticed that there is still some SMI activity, despite carrying out the
instructions from WinSystems regarding Power Management and ACPI. I haven’t
been able to conclusively prove that it is always present when the
scheduling latency takes a big hit, though. Probably some more work on the
logic analyzer trigger sequence will allow me to prove that.

Unfortunately WinSystems is closed for the holidays and I can’t get any
advice about lifting pins or permanently pulling a voltage up or down on
that particular pin. I’ll post more after I find some more information.

Is there an intel-compatible processor that doesn’t use SMM? AMD, maybe?

Thanks,
Dave Kuechenmeister

“David Kuechenmeister” <david.kuechenmeister@viasat.com> wrote in message
news:atn8lj$arn$1@inn.qnx.com…

After an overnight run, I have the same kind of behavior that prompted
this
thread. The scheduling latency can be as high as 6 msec, missing two
interrupts. I know that the interrupts aren’t acked, because I have the
FPGA
generated interrupt timeout after 250 usec, whereas an acked interrupt is
reset after 45 usec.

One interesting thing is how some of the process priorities have changed.
I
know there will be changes in the priorities when messages are handled,
since the receiver floats to the sender’s priority, but these seem to be
more permanent. Following is a diff of the file I posted yesterday and one
I
collected this morning. If you would like the entire file, let me know. I
don’t know what the significance of all the procnto or devb-doc threads
are.
I would have figure one of each, but there are certainly more.

I’ve got another request in to WinSystems for information on their
implementation of SMM. It would be nice to watch the SMI pin. I don’t
think
cutting it on a PGA is an option, tempting as that sounds.

diff -b pidin_1216.txt pidin_1217.txt

4c4
1 2 procnto 10r RUNNING

1 2 procnto 48r RECEIVE 1
8c8
1 6 procnto 48r RECEIVE 1

1 6 procnto 10r RUNNING
18c18
8196 1 proc/boot/devc-con 10r RECEIVE 1

8196 1 proc/boot/devc-con 33r RECEIVE 1
23,26c23,26
16389 5 devb-doc 33r RECEIVE 4
16389 6 devb-doc 10r RECEIVE 4
16389 7 devb-doc 63r RECEIVE 4
16390 1 sbin/pipe 63r RECEIVE 1

16389 5 devb-doc 63r RECEIVE 4
16389 6 devb-doc 63r RECEIVE 4
16389 7 devb-doc 10r RECEIVE 4
16390 1 sbin/pipe 10r RECEIVE 1
35c35
557068 1 proc/boot/telnetd 10r SIGWAITINFO

655372 1 proc/boot/telnetd 10r SIGWAITINFO
46c46
110608 4 vo/bin/axisMgr.exe 10r RECEIVE 5

110608 4 vo/bin/axisMgr.exe 33r RECEIVE 5

ISR Misses Interrupts

pidin

diff -b pidin_1216.txt pidin_1217.txt

4c4
< 1 2 procnto 10r RUNNING

< 1 6 procnto 48r RECEIVE 1

< 8196 1 proc/boot/devc-con 10r RECEIVE 1

< 16389 5 devb-doc 33r RECEIVE 4
< 16389 6 devb-doc 10r RECEIVE 4
< 16389 7 devb-doc 63r RECEIVE 4
< 16390 1 sbin/pipe 63r RECEIVE 1

< 557068 1 proc/boot/telnetd 10r SIGWAITINFO

< 110608 4 vo/bin/axisMgr.exe 10r RECEIVE 5

diff -b pidin_1216.txt pidin_1217.txt

4c4
1 2 procnto 10r RUNNING

1 2 procnto 48r RECEIVE 1
8c8
1 6 procnto 48r RECEIVE 1

1 6 procnto 10r RUNNING
18c18
8196 1 proc/boot/devc-con 10r RECEIVE 1

8196 1 proc/boot/devc-con 33r RECEIVE 1
23,26c23,26
16389 5 devb-doc 33r RECEIVE 4
16389 6 devb-doc 10r RECEIVE 4
16389 7 devb-doc 63r RECEIVE 4
16390 1 sbin/pipe 63r RECEIVE 1

16389 5 devb-doc 63r RECEIVE 4
16389 6 devb-doc 63r RECEIVE 4
16389 7 devb-doc 10r RECEIVE 4
16390 1 sbin/pipe 10r RECEIVE 1
35c35
557068 1 proc/boot/telnetd 10r SIGWAITINFO

655372 1 proc/boot/telnetd 10r SIGWAITINFO
46c46
110608 4 vo/bin/axisMgr.exe 10r RECEIVE 5

ISR Misses Interrupts

pidin

diff -b pidin_1216.txt pidin_1217.txt

4c4 < 1 2 procnto 10r RUNNING

< 1 6 procnto 48r RECEIVE 1

< 8196 1 proc/boot/devc-con 10r RECEIVE 1

< 16389 5 devb-doc 33r RECEIVE 4 < 16389 6 devb-doc 10r RECEIVE 4 < 16389 7 devb-doc 63r RECEIVE 4 < 16390 1 sbin/pipe 63r RECEIVE 1

< 557068 1 proc/boot/telnetd 10r SIGWAITINFO

< 110608 4 vo/bin/axisMgr.exe 10r RECEIVE 5

diff -b pidin_1216.txt pidin_1217.txt

4c4 1 2 procnto 10r RUNNING

1 2 procnto 48r RECEIVE 1 8c8 1 6 procnto 48r RECEIVE 1

1 6 procnto 10r RUNNING 18c18 8196 1 proc/boot/devc-con 10r RECEIVE 1

8196 1 proc/boot/devc-con 33r RECEIVE 1 23,26c23,26 16389 5 devb-doc 33r RECEIVE 4 16389 6 devb-doc 10r RECEIVE 4 16389 7 devb-doc 63r RECEIVE 4 16390 1 sbin/pipe 63r RECEIVE 1

16389 5 devb-doc 63r RECEIVE 4 16389 6 devb-doc 63r RECEIVE 4 16389 7 devb-doc 10r RECEIVE 4 16390 1 sbin/pipe 10r RECEIVE 1 35c35 557068 1 proc/boot/telnetd 10r SIGWAITINFO

655372 1 proc/boot/telnetd 10r SIGWAITINFO 46c46 110608 4 vo/bin/axisMgr.exe 10r RECEIVE 5

4c4
< 1 2 procnto 10r RUNNING

< 16389 5 devb-doc 33r RECEIVE 4
< 16389 6 devb-doc 10r RECEIVE 4
< 16389 7 devb-doc 63r RECEIVE 4
< 16390 1 sbin/pipe 63r RECEIVE 1

4c4
1 2 procnto 10r RUNNING

1 2 procnto 48r RECEIVE 1
8c8
1 6 procnto 48r RECEIVE 1

1 6 procnto 10r RUNNING
18c18
8196 1 proc/boot/devc-con 10r RECEIVE 1

8196 1 proc/boot/devc-con 33r RECEIVE 1
23,26c23,26
16389 5 devb-doc 33r RECEIVE 4
16389 6 devb-doc 10r RECEIVE 4
16389 7 devb-doc 63r RECEIVE 4
16390 1 sbin/pipe 63r RECEIVE 1

16389 5 devb-doc 63r RECEIVE 4
16389 6 devb-doc 63r RECEIVE 4
16389 7 devb-doc 10r RECEIVE 4
16390 1 sbin/pipe 10r RECEIVE 1
35c35
557068 1 proc/boot/telnetd 10r SIGWAITINFO

655372 1 proc/boot/telnetd 10r SIGWAITINFO
46c46
110608 4 vo/bin/axisMgr.exe 10r RECEIVE 5