Bad scheduling latency after wireless LAN reconnection

I am developing inertial sensor software for a blimp. The software consists of about 7 simultaneously running processes that read data from sensors, communicate via the IPC mechanisms of QNX and compute stuff. As the computations are very expensive, the processor has a load of circa 90%.

To allow for monitoring from a base station, the processes permanently send some status information to a linux-based computer via a wireless LAN router using UDP. Now this all works very nicely, even when sometimes the blimp moves too far away for the wireless to maintain a connection to the base station. The base station is blind then (of course), and the QNX processes just continue sending their data that will never arrive.

But when the blimp returns into the reception radius of the wireless LAN, something strange happens: something on the QNX machine seems to use up a lot of computation time, and my processes starve for a short time. As you might have guessed already, this failure of them meeting their realtime requirements must not happen.

My questions are:

Can anybody relate to that?

Does anybody have an explanation for what happens there?

As my claim that the reconnection of the wireless kills my realtime scheduling is only a conjecture: does anybody have an idea how to determine more precisely what is happening there?

Does anybody know where else I might inquire concerning this problem?

Does anybody have an idea for a fix? :slight_smile:

Thanks a lot
uwe

Uwe,

First of all, how do you know your processes are starving for CPU time? Do you have a reliable way to measure that you aren’t meeting a schedule?

A very easy way to test your theory that it’s the reconnection is to lower the priority of the wireless ethernet driver + io-net so that it’s less than your processes.

If the problem goes away, then perhaps the driver/reconnection is the culprit.

But might it also be that when you come back in range that the base station sends so many new commands/position updates that you simply bog down trying to service them all? Or is the data flow one direction only (QNX/Blimp to Linux/Ground).

Tim

From the description, I’m not convinced that there is a “reconnection” event. UDP is not very different from just sending out a raw packet. UDP is connectionless, and as far as the protocol is concerned, when the blimp is out of range, the base just thinks the blimp is not sending. Likewise the blimp believes its packets are getting through, at least a much as it ever can. So unless you implement a protocol above UPD which has some kind of ACK/NAK or timeout, there is not reconnection event.

Of course I’m thinking of wired connections, and maybe wireless has some additional traffic that I don’t know about. If that is the case and the problem, your cpu hog must be the wireless driver.

I like Tim’s theory about too many commands, but your description didn’t indicated that this would happen either.

Have you taken a Kernel Trace Log and looked at it in the System Profiler? It would show you which processes/threads actually use the CPU.

Hmm these are all valid objections you are making here, and valuable information, too. Thanks a lot for that. I will do a kernel trace log ASAP to better determine the reason for the scheduling failure, and report back afterwards.

Thanks again
uwe