[re-posting this qdn — on c.o.q i got only one reply …]
Hi QNXers,
we have a problem with a strange lockup of the following configura-
tion: it’s a Compact-PCI with AMD K6 CPU and Tulip Ethernet chip, the
OS is QNX 4.25 and TCP runtime with the newest patches (installed from
quics /updates last week). It has also a custom-made output board
(designed specially for this application by another third party) with
an PLX 9080 as its PCI interface. This board feeds the machine with
small chunks (256kB) of data which it reads from a large file (up to
several GB). This is the absolutely time-critical task. Our software
is designed as follows:
(1) ISR (IRQ is generated by PLX when it’s output buffer is less than
half-full):
clear IRQ
return irqproxy
(2) user mode driver:
Receive(irqproxy)
transfer 256kB of Data from memory buffer to PLX via DMA
if memory buffer less than half-full
Trigger(bufferproxy)
(3) another task, made to de-couple (2) from harddisk delays:
setup shared memory with (2)
Receive(bufferproxy)
read next large (5MB) chunk from HD into shm
This works fine, until there is high network load onto this machine.
Currently we can use a flood-ping (synthetic test) or ftp’ing a large
file onto that machine (a task which must be handled by the machine
in reality: while one large file is sent out via PLX, the user loads
the next file down to the harddisk!). Sooner or later (between 1s and
2minutes, unfortunately its only reproducable by trying long enough…)
the machine completely locks up (which means, i can still enter e.g.
'sin ’ at the console prompt, but it never returns). So i can’t
analyse the machine after the lockup, dumper doesn’t write any file
and i suppose even a dejaview via network would hang.
We have tried nearly everything (well, at least what i can think of):
setting priority of (2) and (3) high (even above Proc32), or low
(below 10, hoping to get a shell back), decoupling (2) completely from
the harddisk (by cyclic usage of data from memory), putting small
loops into (1) (from several tests i had the impression, that the OS
jumps faster into the ISR than the PLX can clear the IRQ on it’s local
side), clearing PLX’s complete IRQ Control & Status register (INTSCR),
ignoring IRQ’s which come ‘too soon’ etc.etc…
So finally my question is, whether anybody had the same problems with
K6 based board / Tulip Ethernet / QNX 4 / TCP/RT / PLX9080 or any idea
what i can test to tackle down the problem?
Any hints welcome & TIA,
Edelhard
s o f t w a r e m a n u f a k t u r — Software, that fits!
OO-Realtime Automation from Embedded-PCs up to distributed SMP Systems
info@software-manufaktur.de URL: http://www.software-manufaktur.de/
Fon: ++49+7073/50061-6, Fax: -5, Gaertnerstrasse 6, D-72119 Entringen