spawn() hangs system

We are running QNX 6.1 Patch B on PowerPC’s with a custom BSP.

We have ported QNX to two custom boards, one based on the MPC7410 PPC and
another based on the MPC755. The MPC7410 system is running fine. The new
port to the MPC755 has a nasty problem. Anytime spawn() is invoked, the
entire QNX system hangs. All processes stop, regardless of priority. This
system hang doesn’t happen on the MPC7410.

It looks like it’s just spawn() that is the problem. We can start and kill
large processes from the ksh shell just fine.

This problem does not happen on our MPC7410 system. Other than this spawn
problem, both systems run great.

Both systems have MPC107 controllers, 128 MB of SDRAM, and the same Ethernet
controller. The MPC7410 system has 2 MB of external L2 cache, the MPC755
system has 1 MB of external L2. We believe that memory and cache integrity
are OK.

What could spawn() be doing to take down the whole kernel on the MPC755?

Here is a simple example program that runs fine on the MPC7410, but
completely hangs QNX on the MPC755:

#include <stdio.h>

#include <spawn.h>

int main(int argc, char** argv)

{

char* path ="/bin/ls";

printf(“About to spawn %s\n”, path);

fflush(stdout);

spawn(path, 0, NULL, NULL, NULL, NULL);

printf(“Spawn is done\n”);

return 0;

}

Here are the PPC registers on the MPC755 board seen by typical applications:

MSR = 0000.9932

HID0 = 0010.C0A4

HID1 = 8000.0000

L2CR = BB00.0060



We’ve submitted a request to QNX support for help on this. However, if
anyone has any thoughts regarding this problem then please share.

Thanks

Murtaza Amiji

Murtaza wrote:

#include <stdio.h

#include <spawn.h

int main(int argc, char** argv)

{

char* path ="/bin/ls";

printf(“About to spawn %s\n”, path);

fflush(stdout);

spawn(path, 0, NULL, NULL, NULL, NULL);

printf(“Spawn is done\n”);

return 0;

}

From the docs:

argv
A pointer to an argument vector. The value in argv[0] should point to
the filename of program being loaded, but can be NULL if no arguments
are being passed. The last member of argv must be a NULL pointer. The
value of argv can’t be NULL.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Argv is the second to last parameter for spawn. You have it set to NULL.

Rennie

Like Rennie said, you really should read spawn() doc carefully.
“spawn()” is the lowest function, there are other helper function
make life easy. In you case, you probably just need spawnl().

spawnl(P_NOWAIT, path, path, NULL);

-xtang

Rennie Allen <rgallen@attbi.com> wrote in message
news:bbr8bm$4lo$1@inn.qnx.com

Murtaza wrote:


#include <stdio.h

#include <spawn.h

int main(int argc, char** argv)

{

char* path ="/bin/ls";

printf(“About to spawn %s\n”, path);

fflush(stdout);

spawn(path, 0, NULL, NULL, NULL, NULL);

printf(“Spawn is done\n”);

return 0;

}


From the docs:

argv
A pointer to an argument vector. The value in argv[0] should point to
the filename of program being loaded, but can be NULL if no arguments
are being passed. The last member of argv must be a NULL pointer. The
value of argv can’t be NULL.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Argv is the second to last parameter for spawn. You have it set to NULL.

Rennie

You were right about the argv. I changed it to make argv a non NULL pointer
and got this particular piece of code to work. However, the source of the
problem is different. We run slinger on the target machine which displays
a web page and allows user to run certain commands from a web client
(commands such as ps) via a cgi-script. All this works like a charm on our
MPC7410 PPC. On the MPC755 system, when one of the commands are invoked
from the web page, the cgi script loads the command (by spawning it as a
child process) and displays most of the output on the web screen. However,
just before it is done displaying the output, the neutrino kernel crashes
and resets our target system. We think the crash occurs around the spawn()
call when cgi loads the process requested from a web client.

  • Murtaza

“Xiaodan Tang” <xtang@qnx.com> wrote in message
news:bbrj76$a83$1@nntp.qnx.com

Like Rennie said, you really should read spawn() doc carefully.
“spawn()” is the lowest function, there are other helper function
make life easy. In you case, you probably just need spawnl().

spawnl(P_NOWAIT, path, path, NULL);

-xtang

Rennie Allen <> rgallen@attbi.com> > wrote in message
news:bbr8bm$4lo$> 1@inn.qnx.com> …
Murtaza wrote:


#include <stdio.h

#include <spawn.h

int main(int argc, char** argv)

{

char* path ="/bin/ls";

printf(“About to spawn %s\n”, path);

fflush(stdout);

spawn(path, 0, NULL, NULL, NULL, NULL);

printf(“Spawn is done\n”);

return 0;

}


From the docs:

argv
A pointer to an argument vector. The value in argv[0] should point to
the filename of program being loaded, but can be NULL if no arguments
are being passed. The last member of argv must be a NULL pointer. The
value of argv can’t be NULL.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Argv is the second to last parameter for spawn. You have it set to
NULL.

Rennie

I first thought you are complaining spawn() is not working… Another
developer
in QNX actually send me an email, and after I re-read your first post, turns
out
you think spawn() brings down your system.

There is NEVER a reason for kernel to crash, so I will let kernel experts
investigate. (I assume you already send all necessary infos to the support)

-xtang


Murtaza <murti@yahoo.com> wrote in message news:bc2aoo$3je$1@inn.qnx.com

You were right about the argv. I changed it to make argv a non NULL
pointer
and got this particular piece of code to work. However, the source of the
problem is different. We run slinger on the target machine which
displays
a web page and allows user to run certain commands from a web client
(commands such as ps) via a cgi-script. All this works like a charm on
our
MPC7410 PPC. On the MPC755 system, when one of the commands are invoked
from the web page, the cgi script loads the command (by spawning it as a
child process) and displays most of the output on the web screen. However,
just before it is done displaying the output, the neutrino kernel crashes
and resets our target system. We think the crash occurs around the
spawn()
call when cgi loads the process requested from a web client.

  • Murtaza

“Xiaodan Tang” <> xtang@qnx.com> > wrote in message
news:bbrj76$a83$> 1@nntp.qnx.com> …
Like Rennie said, you really should read spawn() doc carefully.
“spawn()” is the lowest function, there are other helper function
make life easy. In you case, you probably just need spawnl().

spawnl(P_NOWAIT, path, path, NULL);

-xtang

Rennie Allen <> rgallen@attbi.com> > wrote in message
news:bbr8bm$4lo$> 1@inn.qnx.com> …
Murtaza wrote:


#include <stdio.h

#include <spawn.h

int main(int argc, char** argv)

{

char* path ="/bin/ls";

printf(“About to spawn %s\n”, path);

fflush(stdout);

spawn(path, 0, NULL, NULL, NULL, NULL);

printf(“Spawn is done\n”);

return 0;

}


From the docs:

argv
A pointer to an argument vector. The value in argv[0] should point to
the filename of program being loaded, but can be NULL if no arguments
are being passed. The last member of argv must be a NULL pointer. The
value of argv can’t be NULL.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Argv is the second to last parameter for spawn. You have it set to
NULL.

Rennie

\

Yes, we are not 100% sure if it is spawn() but think the crash is caused
by it.

When I run the small piece of code I had included in the original post (one
that spawn /bin/ls), it crashes the kernel and reboots the target when a
NULL argv is passed. I would had thought that it would only crash its own
process instead of bringing down the kernel with it. Maybe something the
kernel guys can look into.

  • Murtaza

“Xiaodan Tang” <xtang@qnx.com> wrote in message
news:bc2cle$b3c$1@nntp.qnx.com

I first thought you are complaining spawn() is not working… Another
developer
in QNX actually send me an email, and after I re-read your first post,
turns
out
you think spawn() brings down your system.

There is NEVER a reason for kernel to crash, so I will let kernel experts
investigate. (I assume you already send all necessary infos to the
support)

-xtang


Murtaza <> murti@yahoo.com> > wrote in message
news:bc2aoo$3je$> 1@inn.qnx.com> …
You were right about the argv. I changed it to make argv a non NULL
pointer
and got this particular piece of code to work. However, the source of
the
problem is different. We run slinger on the target machine which
displays
a web page and allows user to run certain commands from a web client
(commands such as ps) via a cgi-script. All this works like a charm on
our
MPC7410 PPC. On the MPC755 system, when one of the commands are invoked
from the web page, the cgi script loads the command (by spawning it as a
child process) and displays most of the output on the web screen.
However,
just before it is done displaying the output, the neutrino kernel
crashes
and resets our target system. We think the crash occurs around the
spawn()
call when cgi loads the process requested from a web client.

  • Murtaza

“Xiaodan Tang” <> xtang@qnx.com> > wrote in message
news:bbrj76$a83$> 1@nntp.qnx.com> …
Like Rennie said, you really should read spawn() doc carefully.
“spawn()” is the lowest function, there are other helper function
make life easy. In you case, you probably just need spawnl().

spawnl(P_NOWAIT, path, path, NULL);

-xtang

Rennie Allen <> rgallen@attbi.com> > wrote in message
news:bbr8bm$4lo$> 1@inn.qnx.com> …
Murtaza wrote:


#include <stdio.h

#include <spawn.h

int main(int argc, char** argv)

{

char* path ="/bin/ls";

printf(“About to spawn %s\n”, path);

fflush(stdout);

spawn(path, 0, NULL, NULL, NULL, NULL);

printf(“Spawn is done\n”);

return 0;

}


From the docs:

argv
A pointer to an argument vector. The value in argv[0] should point
to
the filename of program being loaded, but can be NULL if no
arguments
are being passed. The last member of argv must be a NULL pointer.
The
value of argv can’t be NULL.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Argv is the second to last parameter for spawn. You have it set to
NULL.

Rennie



\

Xiaodan Tang wrote:

I first thought you are complaining spawn() is not working… Another
developer
in QNX actually send me an email, and after I re-read your first post, turns
out
you think spawn() brings down your system.

There is NEVER a reason for kernel to crash, so I will let kernel experts
investigate. (I assume you already send all necessary infos to the support)

-xtang

I agree the kernel should never crash or hang, and this particular
instance is somewhat embarrassing, in that a simple NULL pointer
check would be sufficient.

That said, there is also no excuse for calling spawn incorrectly.

Rennie

Murtaza wrote:

Yes, we are not 100% sure if it is spawn() but think the crash is caused
by it.

When I run the small piece of code I had included in the original post (one
that spawn /bin/ls), it crashes the kernel and reboots the target when a
NULL argv is passed. I would had thought that it would only crash its own
process instead of bringing down the kernel with it. Maybe something the
kernel guys can look into.

Actually, I would think that rather than crash the process, errno should
simply be set to EINVAL (invalid argument)…

Rennie

Murtaza <murti@yahoo.com> wrote:

We have ported QNX to two custom boards, one based on the MPC7410 PPC and
another based on the MPC755. The MPC7410 system is running fine. The new
port to the MPC755 has a nasty problem. Anytime spawn() is invoked, the
entire QNX system hangs. All processes stop, regardless of priority. This
system hang doesn’t happen on the MPC7410.

Have you done any tests to ensure that you are setting up DRAM properly?
(timings and locations, etc). Might be worthwhile to write a little test
app that allocates memory in pagesize chunks and then memset()'s it to 0
until there is no free memory in the system and see if you get any crashes
or hangs.

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

This was resolved.

Turned out to be a problem with the customers debug callouts.

Best regards,

Erick


Chris McKillop <cdm@qnx.com> wrote in message
news:bcr250$9kk$1@nntp.qnx.com

Murtaza <> murti@yahoo.com> > wrote:

We have ported QNX to two custom boards, one based on the MPC7410 PPC
and
another based on the MPC755. The MPC7410 system is running fine. The new
port to the MPC755 has a nasty problem. Anytime spawn() is invoked, the
entire QNX system hangs. All processes stop, regardless of priority.
This
system hang doesn’t happen on the MPC7410.


Have you done any tests to ensure that you are setting up DRAM properly?
(timings and locations, etc). Might be worthwhile to write a little test
app that allocates memory in pagesize chunks and then memset()'s it to 0
until there is no free memory in the system and see if you get any crashes
or hangs.

chris


Chris McKillop <> cdm@qnx.com> > “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/