Regarding ThreadCtl runmask

lullaby · February 26, 2013, 6:24am

Hi all,

Regarding my previous query in openqnx.com/phpbbforum/viewt … =7&t=13625
My code segment is like the following:-

cpu_run_mask = 0x1;
ThreadCtl(_NTO_TCTL_RUNMASK_GET_AND_SET, (void)cpu_run_mask);
ClockCycles();
Write to disk and perform other calculation…
ClockCycles();
ThreadCtl(_NTO_TCTL_RUNMASK_GET_AND_SET, (void)*cpu_run_mask);

I have a query. I understood from ThreadCtl help page that “By default, a thread’s runmask is set to all ones, which allows it to run on any available processor. A value of 0x01 would, for example, force the thread to run only on the first processor.”
My question is "If the first processor is not available for some reason, what would happen if ThreadCtl to lock to first processor is called?
Does this cause some system hang?
What happens if I replace the cpu_run_mask with the default value 0xFFFF (all ones)? Does this mean the following operations (within ThreadCtl block) should be run on an available processor at that time? Or is it equivalent to not putting any thread locking? I mean, if the run mask is set to the default all ones value, the thread will run on any CPU and thereby the processor affinity is not satisfied.
I am not clear with the concept of this run mask. Please help. Also I am trying to analyse the system hang issue.

Thanks,
Lullaby

maschoen · February 26, 2013, 8:32pm

I don’t understand why you are calling ThreadCtl() a second time. The first call limits your thread to the first processor. The second call does the same thing.

Your use of the word “lock” already suggests a confusion. Yes, you are locking the thread to cpu 1. This is not the same as a mutex.

No, it would only cause your thread to hang.

Where are you doing the replacement?

This is really confusing, are you saying:

ThreadCtl( (Not real code) 0xffff)
Thread-block
ThreadCtl( 0xffff)

???
This would do nothing.

See, here the work “locking” is getting you into trouble.

Most programs would do one of two things with respect to cpu affinity.

Leave the default 0xffff
Set it at startup once

While there might be a hardware reason, in general, jacking around the affinity makes no sense.

This is making something very simple, very confusing. You set the affinity mask to limit the thread to specific processors. That is all it does.

Well now something that makes sense. Read my comments above and the documentation on ThreadCtl().
It doesn’t sound like cpu affinity is something you want or need to do. In fact, it should be needed rarely.
I’d be curious to know why you think you need it.

lullaby · February 27, 2013, 4:53am

Hi,
Thank you for the detailed reply. Please see the answers to your questions below:-

[Lullaby] >> I interpreted locking from QNX help page when I read the following. May be, my interpretation is wrong. Please correct me.

I need to calculate the execution time of write system call. For that I use ClockCycles() before and after the write() call. Since my multithreaded application run on a multi-core machine, I implemented this thread locking mechanism. When I read ThreadCtl, I interpreted it’s something like mutex lock-unlock. Also please read the following extract from help page of ThreadCtl:-

So my code somewhat look like:-
cpu_run_mask = 0x1;
while(1){ …
ThreadCtl(_NTO_TCTL_RUNMASK_GET_AND_SET, (void)cpu_run_mask);
ClockCycles();
Write to disk and perform other calculation…
ClockCycles();
ThreadCtl(_NTO_TCTL_RUNMASK_GET_AND_SET, (void)*cpu_run_mask);
…
}

I mean by the first call, I am locking the thread to first processor and by the second, I am unlocking the thread.

[Lullaby] >> In our case, when my application is running overnight in a quadcore machine, the whole system hangs. Even the system time is not updated. I am yet to find the issue. For analysing this issue, I suspect the ThreadCtl() function.

I mean, I just define the content of cpu_run_mask as 0xFF and build/run the application again.
Code look like:-
cpu_run_mask = 0xFF;
ThreadCtl(_NTO_TCTL_RUNMASK_GET_AND_SET, (void)cpu_run_mask);
ClockCycles();
Write to disk and perform other calculation…
ClockCycles();
ThreadCtl(_NTO_TCTL_RUNMASK_GET_AND_SET, (void)*cpu_run_mask);

[Lullaby] >> Yes, I mean the same. In my code, the above ThreadCtl block is in a while loop. As I need to calculate the ClockCycles after every write operation. Is this cause my system to hang after a long run? I interpret that after locking the operations within a ThreadCtl block, the operation will run on any available processor at that time if run mask is set as 0xFF.
If it is not so, could you please clarify a bit? So are you saying that the second statement is not needed. And if at all ThreadCtl is called anywhere once at the start of a thread, that thread will run only on one processor ( no matter if 0x1 or 0xFF is given as run mask). Is my new understanding correct? If so, can I rewrite my code as:-

On thread startup,
cpu_run_mask = 0xFF;
ThreadCtl(_NTO_TCTL_RUNMASK_GET_AND_SET, (void)*cpu_run_mask);
:
:
:
while(1)
{
ClockCycles();
Write to disk and perform other calculation…
ClockCycles();
:
:
}

and no need of calling the ThreadCtl() again, Is it right?

So are you telling that:- If I call
cpu_run_mask = 0xFF;
ThreadCtl(_NTO_TCTL_RUNMASK_GET_AND_SET, (void)*cpu_run_mask);
It is the same as not locking the thread.
or could you please tell me for my special requirement of calculating clockcycles in every loop, does my above rewritten code with ThreadCtl() only at startup work?

By now, I think you are pretty clear of my requirement. Could you please clarify me if my interpretation is wrong?

Thanks,
Lullaby

maschoen · February 27, 2013, 8:49am

Well in spite of what the documentation says, ThreadCtl() is merely locking the thread to a cpu. It’s not anything like a mutex lock.

Yes, if you want to use ClockCycles() on a multiprocessor, it would be a good thing to lock the thread to a single cpu.
You will be measuring the time between the call and return of the system write().
This measurement will be fairly pointless if you understand how the whole thing works.

If you open a file normally and write to it in this manner, then most likely the system will just copy the data to a cache and return. So in that case you will be measuring mostly the message passing data rate from one process (yours) to another (the file system). I say mostly because if the amount of data is small, then the overhead will dominate the measurement. Is that what you want to measure?

Of course if the cache is full, which could happen, then what happens is a little different. The file system will need to flush some “stale” data to disk, not necessarily yours, to free up space in the cache. It probably will not be flushing the same amount of data to disk as you have requested in your write. In this case, your measurement will be almost meaningless.

There are some file settings that will force the data to disk immediately, but if you measure that, you are not measuring the real throughput of the file system.

In other words, it’s fairly hard to measure the speed of the file system, because it operates asynchronously.

This all sounds suspiciously like a long and drawn out discussion that took place a few weeks ago in which we tried to tell the poster, that the only way to get a reasonable average I/O speed measurement would be to write a large amount of data. If you do that, there is no need to use ClockCycles(). The regular time functions will give you more than enough accuracy, and you can throw away the whole ThreadCtl() strategy.