Copying a mqd_t

JohnRRocha · December 14, 2006, 10:18pm

Greetings all,

I have a question about duplicating a message queue descriptor.

I say duplicating loosely, becuase I am not refering to the dup() command. Rather I am having a discussion with a co-worker about code he wrote that does the following:

[color=brown]mqd_t m1;
mqd_t m2;

m1 = mq_open(…) /* to create and open the queue for RW */

m2 = m1 /* since mqd_t’s are castable to an integer this “works” */

Now he passes m2 to a new thread. And uses m1 for thread1 and m2 for thread2.

For some reason this just strikes me as wrong. I don’t know, and I cannot articulate it to him.

It seems to “work”, but I’m wondering if we are opening ourselves up for some interesting issues later.

Is this type of coding advisable? If not, please teach me why so I can do more than react with my gut. If it is fine, then please teach me why this is not a problem so I can accept this.

Thanks in advance for your help.

-=John

Tim · December 15, 2006, 3:15pm

John,

This is a classic case of both multi-threaded coding errors and resource leaks that can cause program crashes/insidious bugs.

My first question would by why your co-worker needs to do this duplication since both threads can share the copy of m1 if it’s not a local variable. If it is a local variable it should not be. That aside here is at least one pitfall:

If one of the threads (say thread 1), closes the queue, it will be closed for thread 2. Except thread 2 won’t know it’s closed and will continue to attempt to read/write to the queue which at this point will generate errors (which your code may not even be checking for).

In other words what your co-worker has done is analagous to the following code:

char *tmp1 = (char *)malloc(10); // Allocate 10 bytes of memory
char *tmp2 = tmp1; // A 2nd pointer now points to same memory

pictorially it looks like this:

tmp1-------->(10 bytes of ram)
tmp2-----------^

Now if the code does

free(tmp1); // Free the memory for tmp1

However tmp2 now points to freed memory. So anything tmp2 does is now hugely undefined. This may cause overwrites of the next person to use that memory (insidious bugs) or a straight crash (if your lucky).

There is no easy way to protect the queue via a semaphore so that both aren’t trying to read from it at the same time (or write to it at the same time).

Ideally in multi-threaded code you should have only 1 descriptor (m1). Access to this for read or write should be protected via a semaphore. If it’s not, all kinds of issues will crop up when you least expect it.

Tim

micro · December 15, 2006, 4:24pm

Since

typedef int mqd_t;

is defined in

mqueue.h,

you can safely assing as it were an integer (what it is).

its just a Number which acts as a “name” or “describer” for the Message Queue

–

And Tim, you allocate something, but mqd_t is just a integer, not more
There wonÂ´t be any evil outcomings or spurious bugs

But i go with you, as far he doesnÂ´t need the second mqd_t, he is good with just m1

Tim · December 15, 2006, 4:37pm

Micro,

It’s NOT just an integer. It’s a file descriptor. It may look like an integer but it represents something more. It’s the same thing as doing an ‘open()’ call. It’s highly dangerous to think it’s just an integer.

You might forget to ‘close’ it and leak descriptors if you treat it as a plain integer. Or you might close it twice (fine in this case but might be disasterous in other cases like memory freeing).

If you call mq_close(m1) in one thread I guarantee you that you closed the pipe to mqueue in the other thread.

Plus this kind of casual error leads to the memory kind I pointed out in my other post.

Plus John doesn’t describe what this app is doing. But my guess is that since 2 threads are needing access to the same queue (he’s opening it R/W) that one thread is writing to it and another is reading from it. So this is being used as a primitive way to exchange data between 2 threads. There are FAR faster and better solution in QNX than doing this. If on the other hand those 2 threads are both reading from the same queue, then a whole lot of strange behavior is going to result.

Tim

micro · December 15, 2006, 4:41pm

Yes, this is true, but this also happens if you just have m1
If you close it in Thread1, it is also closed int Thread2, same if m1 = m2
So there is not really a difference

I guessed the open and close were done on beginning and on end in just one thread, otherwise there would be a
m2 = mq_open too.

edit:
The thing about it is with mq_close() m1 does not get reset, there is still the same number in as before (at least the docu says so).
So if i have just m1=5 (for example) or m1=m2=5
and after mq_close i still have m1=5 or m1=m2=5
makes no diffrence

JohnRRocha · December 15, 2006, 5:58pm

We are using this code as a way to transfer data between two threads. The code is supose to be “as portable as possile” since it needs to run on QNX, RT-Linux and Solaris platforms. The project has been developing using POSIX functions and methodologies. However, our primary and first target is QNX.

Moreover, the queue is really being used a pipe rather than as a queue. One end is stuffing the data into M1 and the other is reading the data from M2.

During design reviews it was brought up that for inter-thread communiation we could just use mutex controlled linked lists or memory to transfer the data between threads. However, the overwhelming response was “why reinvent the wheel. We would pretty much end up creating another messaging system that is just like MQs … so use MQs.” Being a new member to the team I didn not press much beyond that.

I understand that this is analagous to two pointers referencing the same chunk of memory. From that regard it isn’t as scary because you can create access (wrapper) functions to these queues so that you can ensure the queue identifer is closed in only one location and check for premature closures on reads, etc.

I think I was more worried about race condition scenarios. Thread1 writing to m1 racing with Thread2 reading from m2 “at the same time”. I know the POSIX books say that the read and write are “thread safe” However, the example are always given with separate, distinct message queue identifiers for the two ends.

Tim · December 15, 2006, 7:10pm

Micro,

Here’s a little test program for you to run that illustrates the folly of what is being done


#include <iostream>
#include <sys/neutrino.h>
#include <sys/netmgr.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h> // sleep
#include <iostream>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#include <pthread.h>
#include <fcntl.h>
#include <errno.h>

void *receiveThread(void *obj)
{
	int *fd = (int*)obj; // The shared fd
	
	// Loop writing every 1/4 second
	while (1)
	{
		delay(250);
		write (*fd, "This should go in file1\n", 24);
	}
	
	return NULL;
}

int main ()
{
	pthread_t t1;
	int fd;

	// Open file 1
	fd = open("file1", O_CREAT | O_RDWR);
	printf ("fd is %d\n", fd);
	
	pthread_attr_t threadAttributes;
    struct sched_param threadSchedParam;
    pthread_attr_init(&threadAttributes);

    // Set the scheduling priority and the policy to round robin
    pthread_attr_setinheritsched(&threadAttributes, PTHREAD_EXPLICIT_SCHED);
    pthread_attr_setschedpolicy(&threadAttributes, SCHED_RR);
    threadSchedParam.sched_priority = 10;
    pthread_attr_setschedparam(&threadAttributes, &threadSchedParam);
	
    pthread_attr_setguardsize(&threadAttributes, 4096);
	pthread_attr_setstacksize(&threadAttributes, 32678);
	pthread_attr_setstacklazy(&threadAttributes, PTHREAD_STACK_NOTLAZY); 
	
    // Create the 2nd thread
    if (pthread_create(&t1, &threadAttributes, receiveThread, &fd) != EOK)
    {
		printf ("pthread_create failed\n");
		exit(-1);
    }
	
	// Wait 3 seconds for the other thread to write a bit.
	sleep(3);
	
	// Close the file
	close(fd);
	
	// Now open another file
	fd = open("file2", O_CREAT | O_RDWR);
	
	// Write some stuff to the 2nd file.
	int loopCounter = 0;
	while (loopCounter < 6)
	{
		delay(500);
		write (fd, "This should go in file2\n", 24);
		loopCounter++;
	}
	
	close(fd);
	  
	// Stop the receive thread
	pthread_cancel(t1);
	
	return 0;
}

The main thread opens a file and passes the descriptor to a thread it spawns (just like John does). Then the spawned thread writes to that file every 1/4 second. Meanwhile the main thread waits a bit then closes the file then opens another file into which it writes. Of course the O/S reassigns the same fd so now thread 1 writes to file 2 instead of file1. This is the output I get:

File1

This should go in file1
This should go in file1
This should go in file1
This should go in file1
This should go in file1
This should go in file1
This should go in file1
This should go in file1
This should go in file1
This should go in file1
This should go in file1

File2

This should go in file1
This should go in file1
This should go in file2
This should go in file1
This should go in file1
This should go in file2
This should go in file1
This should go in file1
This should go in file2
This should go in file1
This should go in file1
This should go in file2
This should go in file1
This should go in file1
This should go in file2
This should go in file1
This should go in file1
This should go in file2

This is NOT the behaviour I wanted nor that anyone likely wants.

BTW, this actually happened on a medium sized project I was on (about 8 developers). A thread was spawned which inherited a duplicate fd. The thread was only supposed to write out data once in a rare while (error stuff). Meanwhile the main code eventually closed the fd (after lots of processing and coding by several developers) and a 3rd thread opened it’s own file and got the same descriptor as the spawned thread. So once in a VERY rare while a the 3rd thread would have it’s file corrupted with garbage from thread 1. It took a LONG time to track down because it happened very infrequently and the guy who wrote the code for thread 3 was tearing his hair out trying to figure out how he was corrupting his file when it wasn’t even his problem. The lesson learned in multi-threading is DON’T duplicate pointers to file handles/memory etc. Make a special class (or struct) globally available and protect it with semaphore use. All our issues could have been avoided if the file name was passed instead of the fd. So sometimes an integer is not an integer especially when it’s a file descriptor

Tim

Tim · December 15, 2006, 7:20pm

John,

While Mqueue can function as a pipe it’s not a particularly fast one (under QNX or any other O/S for that matter). The reason is your effectively writing to another processes address space twice. So every message sent forces the kernel to copy data to the mqueue address space, wake up mqueue (denying your code the cpu) which then wakes up your read thread and again the kernel copies the data back to your address space then wakes up the read thread.

So if speed is an issue (and if it’s not, why are you using QNX) your FAR better off spending the extra time and creating your own mutex controlled linked lists (if your using C++ it’s virtually already done as part of the standard library).

Even if you stay with mqueue, I’d HIGHLY recommend your code change to pass the name of the queue as the thread argument and have the 2nd thread open it’s own mqueue access for the reasons I illustrated in the sample program above. Otherwise if one thread closes and opens another mqueue the other thread could suddenly find itself reading and writing to the wrong queue.

Tim

mario · December 15, 2006, 9:32pm

Mqueue to exchange data between thread, wow I’ve never seen that before. Talk about a waste resources. One advantage of threads is that they can exchange data MUCH faster then between processes. In fact, using mqueue, makes it even worst by having the data go through an external process, ouch… Might as well use TCPIP

Not that QNX6.3.2 (or SP3) offer a faster version of mqueue that uses async messages, that would greatly improved the performance.

I want also to add my voice, assuming mqd_t is an integer is asking for trouble. Next revision of OS MIGHT change mqd_t to be a structure or even a pointer. That being said it will probably never be changed because OS maker, POSIX people , compiler writer, etc know that people don’t always follow the rule ;-)

JohnRRocha · December 15, 2006, 11:53pm

Thanks folks.

I agree with all that you have stated, but it is nice to now have some more background to help articulate my concerns with my peers.

Many thanks!

-=John