Cache invalidation on x86

abwalker · April 10, 2008, 12:44am

On QNX is it possible (or advisable) to use WBINVD, even from an interrupt. I have seen the post on the boards openqnx.com/PNphpBB2-viewtopic-t9557-.html “How to invalidate CPU cache”, however in my application the data isn’t a single block of memory. If it is possible, I would really appreciate some direction on the code to get this working. If it isn’t possible, what alternatives do I have for invalidating the cache on x86?

Background

I’m looking to do some evaluation of worst-case execution time for the component parts of an algorithm, based on research papers from York university on timing schemas. The aim of this work is to get a pessimistic estimate of the time that a piece of code takes to execute. The target architecture for my application is x86.

Currently I have a framework in place for measuring the execution time of a block of code. This code is executing at maximum priority with the FIFO scheduler to minimize the variance introduced by context switches, if required, for the period of testing I can also potentially disable interrupts. All non-essential processes on the machine have been killed.

for(int i = 0; i < numCalls; i++) {
	SERIALIZE_CPU();	// CPUID with EAX = 0
	t1 = ClockCycles();	// RDTSC

	MyFunction();		// Function to time

	SERIALIZE_CPU();               
	times[i] = ClockCycles() - t1; 
}

In performing multiple tests of the same algorithm, the performance from the first iteration is significantly worse than subsequent iterations. I assume that the these results indicate that the data enters the cache in the first iteration, and that subsequent memory lookups are significantly faster. To get an accurate measurement of the worst-case execution time, I need to be able to ensure that the cache is in the worst possible state before commencing any measurements. This occurs when none of the data related to the function for test is in the cache. The approach recommended in the literature [1][2] involves using the WBINVD instruction to flush the cache. I know that WBINVD is a ring-0 instruction, and can’t be executed in an inline assembler block.

As an interim measure I’ve written code that reads and writes to a block of memory sufficiently large to fill the cache (I think), and not related to that used in the function under test. This appears to produce more consistent results, with significantly worse performance (in this case a good thing). However the time taken to do this is significant when compared to the time taken by some of my operations, and I’m not certain that everything in the cache has been spilled.

[1] Making Worst Case Execution Time Analysis for Hard Real-Time Tasks on State of the Art Processors Feasible (Petters and Farber)
[2] Estimation of Worst-Case Execution Time Using Statistical Analysis (Edgar)

mario · April 10, 2008, 3:52pm

I believe you can run the WBINVD instruction from an interrupt handler.

maschoen · April 11, 2008, 3:40pm

I think that would require the interrupt handlers to be running in Ring 0, which would seem to need to be the case.

mario · April 11, 2008, 5:06pm

In QNX4 interrupt handler are running in Ring 0 so I assume it’s the same for QNX6.

rgallen · April 11, 2008, 6:53pm

This is pretty much the universal approach to eliminating cache effect.

If you look at the code in many of the open source benchmarks out there, you’ll find many examples of code designed to burn cache, and it works pretty well. The fact that you quickly built your own and achieved good results, should convince you that if someone sat down and thought about it for a few days, and then refined it over a couple of years, that it would be pretty effective.

The time to execute the cache burning code shouldn’t matter should it?

You can simply sample ClockCycles() and then factor it out right?