Memory mapping behaviour in QNX

Koen · October 6, 2016, 8:35am

I’m working on porting our text to speech code to work on QNX (6.6). One challenge is to keep RAM consumption as low as possible, as our high quality voices need access to around 200 Mbyte of data (stored in a single file).

We hoped that we could memory map this 200 Mbyte of data but it looks like mmap() on QNX has a different behaviour than on other OS’s. What we observe is that when a (4Kbyte ?) page is memory mapped, it never gets released before the file is unmapped. Since the access to this 200 Mbyte data block is fairly random (depends on the input text), we end up with the complete 200 Mbyte of data within a few seconds of generated speech.

Could this behaviour be confirmed, and if so, do you know if are there other ways (maybe unique to QNX) where old pages are released back to the OS (we have to play nicely with other processes that are running like media player, navigation, GUI, etc.)?

A second observation is that memory mapping a big file takes several seconds. This is unexpected as well since I thought one of the advantages/purpose of memory mapping is skipping the long load times of data into RAM. We use this line of code to mmap() the data file:

pFileData = mmap(0, cFileData, PROT_READ, MAP_SHARED, hFile, 0);

I think PROT_READ and MAP_SHARED are not so special as to cause long mapping times (on Linux and iOS, mmap() returns immediately).

Any insights on this memory mapping behaviour on QNX would be greatly appreciated.

maschoen · October 8, 2016, 5:48am

I don’t know whether you are correct about the behavior you describe, but I suspect so. You might want to check whether the Posix standard prescribes anything about this. QNX is supposed to be Posix compliant, although there are some gotcha’s with Posix, that is there are some parts of Posix that are optional, so you can still call your OS Posix compliant without them.

On the other hand, writing some code to do what you want in QNX, let’s say an interface that looks like this:

void *ptr = get_address_of_voice_in_memory(void *at_offset);

using a resource manager that does the behavior your are describing is maybe a step away from trivial.

a nicer version might be

void *ptr = get_address_of_voice_in memory (enum my_prompt);

And you could add features like how long to keep the prompt in memory after you are done with it.

denkelly · October 8, 2016, 3:09pm

What we observe is that when a (4Kbyte ?) page is memory mapped, it never gets released before the file is unmapped.
This is consistent with the way QNX loads binaries and libraries. For example, all binaries have 4G virtual memory map, but on load, only a fraction of that map is populated - for example heap and stack are NOT assigned all the memory allocated in the memory map - only a minimal subset. As the stack and/or heap “grows”, more and more 4K pages are assigned to the virtual memory map. These pages are never “free’d” once assigned - until the executable process ends.

that memory mapping a big file takes several seconds.
Memory mapping a file follows the method above, i.e. portions of the memory map are only loaded “on demand”. A read “scan” of the entire file would map everything.

never gets released before the file is unmapped
You asked for the file to be mapped - so there is no reason it should be unmapped until your request it. It sounds like what you want is disk cache - not mmap(). If you want least recently used RAM to be free’d don’t map it - just depend on the block cache.

play nicely with other processes that are running like media player, navigation, GUI, etc.
With a system that complex, I am surprised you are concerned about 200Mb - it is a small fraction of the RAM a system like yours would possess.

Koen · October 10, 2016, 8:27am

Hi,

Thanks for the feedback.

POSIX: I read the POSIX documentation and it mentions that the implementation is up to the implementator. So, it’s not unexpected then that we see different behaviour.

Disk cache: We indeed would like to use memory mapping as a form of disk caching (which seems to work for us on Linux and iOS). Converting our code to fall back on a fseek/fread combo is not trivial. Code that assumes you can read from memory with pointers needs to be changed everywhere to use function calls to read data from disk. I’ve already changed portions of the code doing that but that results in hundreds of thousands of file I/O access to disk per second (requiring to write a caching mechanism to reduce disk access to mere thousands/second). Even if we would succeed in reducing our memory requirements that way, the integrator will complain about the very high number of file I/O access. We hoped that by using memory mapping the OS could handle this for us in a much more efficient/faster way and at the same time double as a disk cache

200MB: Unfortunately, it’s a big concern. We’re only allowed by the company that integrates all components to consume 50 MB (we might stretch it to 60 MB).

Koen

denkelly · October 10, 2016, 12:46pm

POSIX: …it’s not unexpected then that we see different behaviour.
Right. Linux and ios (BSD) are much more optimized for database access. Remember, these kernels are 10’s of times larger that the microkernel - lots more special-purpose code.

the integrator will complain about the very high number of file I/O access
This is likely true and possibly a worse criticism.

One suggestion - simply periodically close and re-establish the memory mapping. This will release all the allocated blocks from the virtual memory map and force it to “start over”. Of course, this is a trade-off between memory and performance. One word of caution - memory release/allocation can disrupt real-time scheduling as much time is spent in the kernel. (Safety-critical systems strive to avoid ALL dynamic memory allocation.)

Koen · October 10, 2016, 3:04pm

Thanks for the suggestion. I’ve been thinking about the same. But I face practical problems (as always) I think. If I unmap and remap the data file again, there is no guarantee that the data will be mapped to the same address (unless MAP_FIXED would be used?). If there is no guarantee that the data is mapped again to the same address then I would need to find some way to synchronise all pointers in our code to point to the new address range.

In the meantime, I found some low hanging fruit in our code where I could keep ~70 Mbyte of data on disk and access it via a fseek/fread combo. This helps me to give a rough idea how many times we access that portion of data. So far I count ~6.000.000 data accesses that are translated to fseek/fread for 13 seconds of speech. Even when I can implement an efficient cache this will translate to 1000’s of file I/O per second.

denkelly · October 10, 2016, 6:02pm

there is no guarantee that the data will be mapped to the same address
See the mmap() docs. You should be able to get it remapped at the original location.

void * mmap( void *where_i_want_it, … )

“The argument where_i_want_it is used as a hint to the system to where you want the object placed. If possible, the object will be placed at the address requested.”

The first call, where_i_want_it would be zero. On subsequent calls, where_i_want_it would be set to the original mapped value.

Koen · October 11, 2016, 2:41pm

If possible, the object will be placed at the address requested
The “if possible” makes this not workable I’m afraid. I would be surprised if we could get the same address always when the system has ran for a few hours (e.g. a long car trip) with plenty of other action from other processes.

Guess we’ll have to choose between 2 evils. High memory usage or very high amount of file I/O.

denkelly · October 11, 2016, 5:02pm

If you unmap and re-map in succession, you should have not issue getting the same address.