I’m sorry for the long delay in replying.
I tried disconnecting the Ethernet cable when ntpd was hogging the CPU, and it had no effect. Before that, I checked the Ethernet traffic using Wireshark, and it showed no NTP traffic during that time.
In all of the following tests, I was running ntpd at priority 9, so that I could usually use the console to check the system. There have been times when the console became unresponsive even when ntpd was at priority 9, and at those times I found the system would not respond to a telnet connection request, either. I don’t recall if I mentioned in my earlier posts that I am using “tinker panic 0” in the ntp.config file, so that ntpd does not exit even if the error in synchronization is large. I am not running the debugger; I have never used it.
I tried various combinations of options in the ntp.conf file, such as changing iburst, burst, maxpoll and minpoll. None were effective.
I tried changing the QNX tick size. It had been set at 50 us (the clock interrupt is used for some I/O). I found that changing the tick size on the fly could trigger or cure the problem, but this apparently depended on some internal state of the system. For instance, on an overnight runs with, a tick size of 250 us, I found one of two boards had ntpd using 95% of the CPU. In setting up that run, I started ntpd after setting the tick size.
Last night I tried setting up the network to use broadcast NTP packets instead of query-response. I used one MPC-5200 system as the broadcaster (it was synchronized to my desktop PC) and two others as broadcast clients. One of the clients is running a 1 ms tick, the other a 50 us tick. Over the course of this morning, the system with the 50 us tick has been experiencing transient episodes of ntpd hogging the CPU, but it recovers. That system is also losing and recovering synchronization, according to ntpq (has a space or an asterisk in it report). Loss of synch is correlated with ntpd hogging the CPU. The system I used as broadcaster did not show any evidence of hogging the CPU, but it is hard to be sure. I may run this again tonight with hogs running to log the usage on all three systems.
At this point I am thinking of two solutions. One is to use a longer tick (250 us to 1 ms will work for me) and set up another timer interrupt to run my I/O. I will have to devise a means of verifying that this really works, as I have had some hogging incidents with both of those tick sizes. The faint hope here is that the I/O interrupt (which is at most a few microseconds, but triggers IO processing threads) is interfering with ntpd somehow, and that separating it from the tick interrupt will effect a cure. The other is to hack into ntpd, cutting out the stuff I don’t need (encryption, variable polling rate, selecting the best time server), and simplify the algorithm so that it cannot hog the CPU. That is an undesireable choice, but it seems like the option with the best chance of success.
-Jim