lock mutex fails occasionally with enoent (2)

queBurro · October 21, 2013, 2:58pm

Hi,
I’m using pthread_mutex_timedlock to lock a mutex and occasionally my app is failing with errno == 2.
Any ideas? I’ve got my timeout set at 10 seconds (seems too long to me) but I’m not getting an ETIMEDOUT.

Cheers

Tim · October 22, 2013, 3:30pm

We’d need to see your code to help you more.

An errno of 2 indicates ENOENT (no such file or directory) which isn’t a listed error code for this function.

Which version of QNX are you using? Are you using the SMP kernel? Are you creating/destroying the mutex you are using or only creating it one time? What about threads that are using it? Without this kind of information we are only wildly guessing.

Tim

queBurro · October 22, 2013, 4:55pm

yup, I appreciate that. I’m really not doing anything weird though, see the following snippet.

[code]
pthread_mutex_t mxControlConnection = PTHREAD_MUTEX_INITIALIZER; //< declared like this once on startup.

LockMutex(&mxControlConnection, func); //called like this

bool LockMutex(pthread_mutex_t pMutex, const char aInfo )
{
struct timespec absoluteTimeout;
clock_gettime(CLOCK_REALTIME, &absoluteTimeout);
absoluteTimeout.tv_sec += 3;

if ((errno = pthread_mutex_timedlock(pMutex, &absoluteTimeout)) != EOK)
{
	switch (errno)
	{
		case EAGAIN:

//write to slogger etc.[/code]
In general that mutex will be around until I close the program thus I don’t usually call pthread_mutex_destroy ← is this bad practice?

also, I’m using 641 with the instrumented kernel, and I’ve got no idea why I get the errno2 when it’s not listed as a possible return.

Note - The host that’s playing up is showing this… and I don’t appear to be able to kill -15 or -9 the 663588 process, any ideas

# pidin | grep XXX 663588 1 tmp/ XXX _g 35r MUTEX (0xb0376614) 663588-01 #0 663588 2 tmp/ XXX _g 20r REPLY 249877 663588 3 tmp/ XXX _g 25r MUTEX (0xb0376614) 663588-01 #0 663588 4 tmp/ XXX _g 30r MUTEX (0xb0376614) 663588-01 #0 663588 5 tmp/ XXX _g 35r MUTEX (0xb0376614) 663588-01 #0

Cheers

I’m only creating the mutex

maschoen · October 22, 2013, 6:07pm

Well ENOENT is not a documented errno for pthread_mutex_timedlock(). The first thing to suspect is that the errno is not related to the problem. If that doesn’t seem to be the case, I would do the following:

int ret

…

ret = pthread_mutex_timedlock(&mutex, &timeout);
if (ret == ENOENT)
{
fprintf(stderr,“Debugging…\n”);
pthread_mutex_unlock(&mutex);
ret = pthread_mutex_timedlock(&mutex, &timeout);
if (ret == 0)
{
fprintf(stderr,“Unlocking fixed it\n”);
}
}
}

}

mario · October 22, 2013, 6:33pm

does the mutex live in shared memory ? How was it created/allocated ?

queBurro · October 23, 2013, 2:45pm

I’ve not seen the problem today but I’ve gone with a stategy (as per maschoen’s comment) of destroying the mutex and recreating if I get an ENOENT.

Thanks

Tim · October 23, 2013, 8:05pm

This line doesn’t look right:

   if ((errno = pthread_mutex_timedlock(pMutex, &absoluteTimeout)) != EOK)

You are assigning the result of pthread_mutex_timedlock to errno. Then checking that vs EOK. The problem is that errno is a GLOBAL variable in your process shared between ALL your threads. So if any other function call in any other thread sets errno between the time this code (errno = pthread_mutex_timedlock(pMutex, &absoluteTimeout)pthread_mutex_timedlock) gets executed and this code (!= EOK) gets executed you are going to get an invalid result. This can often happen on a multi-core system where threads run concurrently.

You should rewrite your line to look like the one Maschoen showed:

int ret = pthread_mutex_timedlock(&mutex, &timeout);
if (ret != EOK)
{

Tim

queBurro · October 23, 2013, 8:37pm

Aaahhhh! Good spot.
Will change that tomorrow. It’s old legacy code etc, but I’d not even thought of that. I’ll do a sweep of the lot to see if it’s been done elsewhere too. Cheers

mario · October 23, 2013, 9:50pm

errno is thread safe, the doc says:

“Each thread in a multi-threaded program has its own error value in its thread local storage. No matter which thread you’re in, you can simply refer to errno — it’s defined in such a way that it refers to the correct variable for the thread. For more information, see “Local storage for private data” in the documentation for ThreadCreate().”

Otherwise it would simply be useless.

maschoen · October 23, 2013, 11:42pm

I have to admit that I found this hard to believe until I checked the definition of errno:

#define errno (*_CSTD __get_errno_ptr())

Thanks Mario.

queBurro · October 24, 2013, 8:50am

thanks.
so summarising, although I can get away with overwriting errno’s value (because it’s threadsafe) I shouldn’t because it’s bad practice? best practice seems to be to zero/EOK it before a library call but not to set it to a non-zero value as I’ve done above.

qnx.org.uk/developers/docs/6 … errno.html

maschoen · October 24, 2013, 3:25pm

Awk! Yes I missed this totally. You should not be assigning a value to errno. Otherwise it’s value might be what you assigned, which may not be related to it’s documented value/meaning is.

mario · October 25, 2013, 2:54pm

In this case your ok, as the value the function returns is equal to errno, but most don`t do that. They will return an error code, such as -1 or a null pointeur, and THEN you can examine errno.

QNX did add some specific fonction that will return errno and these functions have _r as a suffix.