Is there a way to programatically determine how much cpu time is being spent in procnto’s IDLE thread? We have a need to dynamically adjust DSP parameters based on system load and monitoring cpu consumption of the IDLE thread using an averaging algorithm seems like it would do the trick.
The hogs code seems to works only at the process, not thread, level.
The alternative is to write our own Idle process to run at priority 1 but we’d rather not clutter up the system if we don’t have to.
Ken, try AP; it’s pure magic… really… Get everything working with AP (shouldn’t take more than a day). Price it out with your sales team; and then decide if it is worth it to do the work yourself.
It’s no skin off my nose if you don’t use it; just trying to make your life easier (I am at the point where I wonder how I ever lived without it
But even with AP we would probably have to do something like this. If the user attempts to drive our meter at a frequency higher than it can handle it needs to be throttled back by tweaking the DSP parameters (in this case the DSP would begin to skip samples and flash an icon to indicate the condition). Even with AP it could saturate the partition (and any unused cycles left over from other partitions) so we need to manage that problem.
Ken, this is what AP is perfect for. You’ll know when the system is overloaded (the kernel will trigger a pulse notifying you of this, see: SCHED_APS_ATTACH_EVENTS). You then throttle back the DSP parms.
Of course, AP doesn’t tweak the DSP parms for you, but it does everything else (e.g. throttles the offender back - so that your management code can run - then notifies you who it throttled). Your code (that receives the notification of the overload condition) then tweaks the DSP parms, to correct the misconfiguration “permanently” (or at least until the next time the user and/or software/hardware misbehaves). You can also collect .kev files, take core dumps of processes that are misbehaving, etc., etc…
It only takes a day to try AP on a one of your problems; if you can’t overcome the temptation to pay the extra runtime cost after trying it out (there is no upfront cost - the source is on F27) then that means it’s worth the price
ps: If you are referring to the old price for the TDK, that has now been changed to zero.
Mario, we’ll go into this with eyes open, thanks. I’ll email our sales rep to get a firm quote.
I’m still not convinced we need it, but my mind is open to the possibility. If nothing else it would be fun to play with since the cost of entry is cheap.
The 4.0.1 Momentics docs say this under SchedCtl() in the SCHED_APS_ATTACH_EVENTS section: “Overload notification isn’t implemented in this release”.
That might be an oversight, and assuming it is, this would give us about half of what we’re after. We want to slide the DSP parameters around so that maximum speed is always available given current loading. This means throttling down when the cpu is overloaded and also throttling up when it’s not. There are several situations when we would throttle up, some are user-initiated events and others are automatic.
AP might send a pulse when the system is overloaded, but doesn’t look like it can be configured to send a pulse when it transitions to a not-overloaded state. That would leave us with the need to poll for current state, which is sort of like I was proposing in the first place (by tracking the number of cycles used by the Idle thread).
Plus, we may like to set the specific thresholds where those events occur (not sure if that would be necessary or not).
Ken, yes; I think for your app, settable thresholds would be nice to have. They are on the roadmap…
Now that you have provided this information, I agree that the bankruptcy event alone is not sufficient for a fully event driven approach, and you will need to sample using the SCHED_APS_PARTITION_STATS function. I do think though that you may be underestimating the amount of work necessary to get an idle thread CPU load implementation that will give you good behavior (i.e. merely recognizing how much idle time is available may not be a good indication of how much CPU your DSP thread should be able to use).
If you place your DSP thread into a partition with (say) 99% budget with 400ms of critical; then have your manager sample the partition usage to perform the DSP “throttling” you will have a very controlled and predictable result (and don’t forget, that you can also put your watchdog thread in a 1% partition, and be assured that when your DSP partition goes into bankruptcy, you will still be able to service the watchdog and not have a false reset, yet if your software really does go down the toilet, you will get a reset That last point is one of the most difficult behaviors to actually achieve…
We were surprised to learn about this price change last month too. Our local QNX rep stopped in to pay us a courtesy visit and while he was here were chatted about AP and he wondered why we didn’t at least try it. I joked that the 50K cost was WAY outside our price range so there was no reason to even experiment.
He then replied that the cost had been changed to $0. Apparently the original price was set by marketing people (with no clue?) and at that price it sold about as well as you expect it would
Bottom line is the new price is a 30% increase on the run time license. So if you are buying in volume and get a reasonable price (say $100 a licence) then AP would cost you $130. Far more palatable tho once you pass about 1600 run time licences you are actually worse off under the new pricing model.
I understand your comment about cluttering up system, but I think you could implement a priority 1 process that would replace IDLE's function fairly easily. Here is how I would do it. Have two threads, one runs at priority 1, and spins in a loop, incrementing a 64 bit variable. The other thread runs at a very very high priority, higher than any application process. It can have either a message passing or I/O interface and it can answer questions via a message like "how loaded am I", and if need be, issue pulses when thresholds are passed, up or down.
Mitchell, that technique was my original thought but when the source to hogs was published it seemed to be a natural way to derive the needed information. The procfs_info structure has user and system time accumulators in nanoseconds so it would provide a direct measurement of time spent in Idle (except that Idle is a procnto thread and the hogs technique only provides process level info, which prompted this original discussion).
So a priority 1 process A that does nothing but loop could be written. Then, a very high priority process B could simply open A’s proc/pid entry and periodically query its procfs_info struct to directly read cpu time consumed by process A (and handle wraps as necessary). A moving average algorithm could smooth things out, then it would issue pulses and make the loading info available as needed by the DSP thread.
The low priority process shouldn’t do a busy loop. The idle thread will put the process to “sleep” with the halt instruction. If the idle thread(s) never get a change to run because your custom “idle” process uses 100% of the CPU, the processor(s) will consume extra power and generate extra head for no good reason.
Because I don’t like claiming that something is “easy”, without some proof, I have attached a working DSPtuner program, as well as a DSPSim (simulator), that the DSP tuner “tunes”.
To use this, extract the two zip files into a Momentics workspace; then “Import” existing project into workspace (from inside the IDE).
On a VMware target with aps module installed:
aps create -b 90 DSP
on -X aps=DSP DSPSim &
DSPtuner
Of course, the DSP tuner could use a PID algorithm to tune DSPSim optimally, but that was left out for simplicity. The DSPSim, is just that; a simulator of your DSP thread/process, on which DSPtuner can operate.
Mario, D’oh! thanks for the reminder, I forgot that Idle does a halt. We’re so far ahead of required battery life it may not matter from a product standpoint but there’s no point in wasting power (they’ve already discontinued the large battery option because nobody needed it).
Rennie, thanks for the code, it’s really appreciated and will help a lot. It does look pretty straightforward.
There is a concern about how much overhead the AP will add, especially when it has to emulate a clock cycle counter. We are fighting for every cycle on this thing and just by removing ClockTime() calls (also emulated on the pxa270) we gained a couple of percent in performance. We’re about 25% from our ultimate performance goals and we’re running out of things to optimize. In retrospect, some boneheaded hardware design decisions were made that really hurt. (Sure, blame it on the hardware )
#
# $QNXLicenseA:
# Copyright 2007, QNX Software Systems. All Rights Reserved.
#
# You must obtain a written license from and pay applicable license fees to QNX
# Software Systems before you may reproduce, modify or distribute this software,
# or any work that includes all or part of this software. Free development
# licenses are available for evaluation and non-commercial purposes. For more
# information visit http://licensing.qnx.com or email licensing@qnx.com.
#
# This file may contain contributions from others. Please review this entire
# file for other proprietary rights or license notices, as well as the QNX
# Development Suite License Guide at http://licensing.qnx.com/license-guide/
# for other information.
# $
#
#include <asmoff.def>
.globl ClockCycles
.text
ClockCycles:
stmdb sp!, {r4,lr}
ldr r0, =_syspage_ptr
ldr r1, =qtimeptr
ldr ip, =callout_timer_value
ldr r0, [r0]
ldr r1, [r1]
/*
* Disable interrupts
*/
mrs r4, cpsr
orr r2, r4, #ARM_CPSR_I | ARM_CPSR_F
msr cpsr, r2
mov lr, pc
ldr pc, [ip]
ldr r2, =cycles
ldr lr, =last_cycles
ldmia r2, {r2,r3}
.ifdef VARIANT_le
adds r0, r0, r2
adc r1, r3, #0
.else
adds r1, r0, r3
adc r0, r2, #0
.endif
/*
* Adjust by timer_load if timestamp < last_cycles
*/
ldmia lr, {r2,r3}
.ifdef VARIANT_le
cmp r3, r1
bhi 0f
bne 1f
cmp r2, r0
bls 1f
0: ldr ip, =qtimeptr
ldr ip, [ip]
ldr ip, [ip, #TIMER_LOAD]
adds r0, r0, ip
adc r1, r1, #0
.else
cmp r2, r0
bhi 0f
bne 1f
cmp r3, r1
bls 1f
0: ldr ip, =qtimeptr
ldr ip, [ip]
ldr ip, [ip, #TIMER_LOAD]
adds r1, r1, ip
adc r0, r0, #0
.endif
/*
* Update last_cycles
*/
1: stmia lr, {r0,r1}
/*
* Restore interrupts and return
*/
msr cpsr, r4
ldmia sp!, {r4,pc}
Yeah, that is a lot bigger, but if you only call it 100 times/sec, it isn’t outrageous.
btw: the DSPtuner code has a few bugs, but you get the idea. Of course, you’d be tuning DSP parameters not adjusting how much spinning is done, since AP automatically controls the execution of the QNX code. You’re app would essentially be tapping into AP data stream in order to load adjust an external processor to match the available capacity of the QNX system (which is actually a pretty interesting use case, since AP would be throttling a device external to QNX).
I’m really bummed. Due to hardware issues this feature can’t be implemented on the current version of our product. Due to the simplistic way the DSP was interfaced to the pxa270 the only way the DSP can adjust it’s threshold dynamically is by downloading new code to it. There isn’t a single bit of i/o or shared resource anywhere that can be used to indicate when it should start pulse sampling. It takes about 200ms to reconfigure the DSP with new firmware, and we can’t afford that sort of interruption, so without another board spin it’s a no go.
However, we are reworking the product to increase performance. Part of that change is to dump the DSP in favor of an FPGA. The new design will provide for the feedback loop we want to implement. It’s probably six months away but the info in this thread will help a great deal.