semphore

There are two process a and b in my system,I found they don’t work after several days,I use pidin to check,the result is below:

a 10r sem 5689cc0
b 25r sem 5689cc0

I don’t use sem_wait() or sem_post() in it,but It’s my mistake to define

sem_t*sem=sem_open(“test”,O_CREAT,S_IRWXG|S_IRWXO|S_IRWXU,1);

in a and b,

Does it result in the mutex on the process?

thank you

If you claim that you do not call sem_wait() or sem_post() in either of the processes then the processes are not blocked on semaphore “test” but on another semaphore. Maybe you are using a library which have an internal semaphore and handling the library incorrectly.

Can I check which library cause it?by the way what’s the block in “pidin” just like 5689cc0?Can it give me some information?

Check? If you dont have source code then maybe in debugger, don’t know…

(Using 6.2.1 for the tests…)

About that sem_open() call:
You are using O_CREAT in both of the processes, i have tested that the second sem_open() call doesn’t fail buf also doesn’t do what is supposed to do. The semaphore is opened but the “counter” remains from previous calls, it doesnt reset to value specified in the sem_open() call. Maybe the semaphore is now in an undefined state. However even if i make the semaphore block in both processes, it only become REPLY blocked on mqueue process as expected.

(note: you have to use the sem_close() to close the semaphore created using sem_open(), calling sem_init()/sem_destroy() on a semaphore created using sem_open() may result in undefined behaviour)

“SEM 5689cc0”:
In pidin output the “STATE” and “Blocked” columns in this case means that the process is blocked on semaphore which is mapped in its virtual adress space on adress 0x05689cc0.

sem_t sem;
sem_init(&sem, 0, 0);
printf("semaphore %p\n", &sem);
sem_wait(&sem);

prints “semaphore 8047b4c” and pidin reports “SEM 8047b4c” in my case

sem_t *sem;
sem = sem_open("test", O_CREAT, S_IRWXG|S_IRWXO|S_IRWXU, 0);
sem_wait(sem);

(did i mention already that calling something “test” may in some cases cause very unrealiable results?)
pidin now reports “REPLY 12292” where 12292 is the PID (on my system) of the mqueue process (message queue and named semaphore manager)
(which reminds me to ask you if mqueue is running in your system because sem_open() will fail and subsequent sem_*() calls on such semaphore structure will fail if it’s not running…)

bottom line: the “sem 5689cc0” state of both processes would suggest inproper usage of the sem_*() functions

It can’t be one of the QNX librairies. As far as I know there is nothing in the libraries that would operate beyond the scope of the processes, only threads. That has to be a 3rd-party library or your own code.

That should be fairly simple to find. When the processes are blocked attached the debugger and use backtrace to find the sequence of function that lead to this.

  • Mario

sorry,can you tell me how to attached to the debugger and how to use backtrace to find the sequence of function?

This is well explained in the online doc under GDB section.

sorry,please tell me the step,I have read the doc,but I don’t know how to do

Which part don’t you understand???

To attach to a running process you start gdb with the ‘–pid=pid’ option. You obtain the pid of your process by using pidin.

Once in the debugger you can see the stack trace in several ways. One of which is ‘show stack’ then using ‘up’ and ‘down’ to move up/down the stack frames.

Tim

TIM,thank for your help,I do as you said.
1)gdb -pid=pid
2)ok,it shows me gdb attach with the pid,but when I use "show stack"or “stack”,it said the wrong command,then I try to use “backtrace” as mario said,it gives me some result
#0 xxxxx ??()
#1 xxxxx ??()
I want to know which function cause “sem”,Is it useful?

Your not quite there yet with #2

Your not adding the name of your program to the gdb line.

In other words if you just did:

gdb -pid=pid

Then you should have gotten a message saying the program was already running and did you want to kill it. When you said no, and then did the backtrace you will get the message you printed below because gdb didn’t attach to anything.

You need to add the name of your program to the gdb line as in:

gdb -pid=pid foo

Then the backtrace command will show the stack calls.

Please tell me that you are compiling with the -g option else you won’t be getting much debug info for gdb :slight_smile:

Tim

I have done as you tell me,but I can’t see the difference,for example the process is named as file
1)gdb -pid=pid file
2)backtrace
#0 xxxxx ??()
#1 xxxxx ??()

by the way, I have compiled “file” with -g option

I can’t imagine what your doing wrong.

For example on my system I have a jed editor running on another console at pid=123456.

I can do ‘gdb -pid=123456 jed’ and attach to the editor. Then I can do a backtrace and see it’s sitting in SignalWaitInfo().

The name of the file is not meant to be your source file but rather the name of your executable.

What your showing is normally associated with a program crash where the stack is corrupted. Since your program is still running it means your not attaching properly.

Tim

TIM,thank you,I have found my mistake,I use “gdb -pid=pid file” and there 's one “file” under /bin
I use “gdb -pid=pid /exe/file”,now it’s ok,the result of backtrace is below:
#0 0xb032a2ae in ??()
#1 0x08064087 in mco_sem_p()
#2 0x08059416 in mco_async_event_wait()
#3 0x0804de42 in main at file.c :135
I use “pidin” and it shows “file” in sem with 566bd2c
Could you tell me what’s wrong for my programme?

tim,please help me.

I did help you get the debugger attached properly.

Others were helping you with the semaphore aspect and I was hoping they’d chime in after you posted your debugger info.

While I’ve used semaphores many time, I don’t recognize the ‘mco_’ family of functions and a quick search of the header files in /usr/include doesn’t find them there either. Are you using any 3rd party code/libraries?

Also, without seeing your code it’s hard know what’s going on. For example, what does your code look like at line 135 (where your stuck). Your doing an async wait for something but I don’t know what that something is. Can you post the code around that line.

Tim

Yes,I use 3rd party code of extremedb,one realtime database,at line 135 I use one function of it mco_asyn_wait(),Do you think it cause semaphores?