Inter-node spawn returned "ESRCH".

I’m trying to spawn a process on another node, and get
an error of ESRCH (“No such process”). Why?
What does that mean? That’s not a documented error code for “spawn”.
The other node is up, and shows in /net, so that’s not the
problem.

First, it looks like you have to give a pathname to
“spawn” that’s valid on the destination machine; “spawn”
doesn’t carry the executable across the net. Is that
right?

Second, what permissions do you need to spawn a process
on a remote machine? Currently, QNET is configured to
give the machines access to each other as “nobody”, which
allows access to public files only. Is that sufficient?

None of this seems to be documented?


John Nagle

John Nagle <nagle@overbot.com> wrote:

I’m trying to spawn a process on another node, and get
an error of ESRCH (“No such process”). Why?
What does that mean? That’s not a documented error code for “spawn”.
The other node is up, and shows in /net, so that’s not the
problem.

First, it looks like you have to give a pathname to
“spawn” that’s valid on the destination machine; “spawn”
doesn’t carry the executable across the net. Is that
right?

Second, what permissions do you need to spawn a process
on a remote machine? Currently, QNET is configured to
give the machines access to each other as “nobody”, which
allows access to public files only. Is that sufficient?

None of this seems to be documented?

As a first test, can you use “on -f node -u nobody” and “on -n node -u nobody”
to start a process on the remote machine? If so, then it is probably
how you are setting up in the inherit structure you are passing to spawn()
that is causing your troubles.

As for the location of the process’s root - unless you do a chroot() to
the remote machine (ie: chroot("/net/foo")), then the binary will be searched
and found on the local machine (this is “on -n” behavior vs. “on -f”).

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Chris McKillop wrote:

John Nagle <> nagle@overbot.com> > wrote:

I’m trying to spawn a process on another node, and get
an error of ESRCH (“No such process”). Why?
What does that mean? That’s not a documented error code for “spawn”.
The other node is up, and shows in /net, so that’s not the
problem.

First, it looks like you have to give a pathname to
“spawn” that’s valid on the destination machine; “spawn”
doesn’t carry the executable across the net. Is that
right?

Second, what permissions do you need to spawn a process
on a remote machine? Currently, QNET is configured to
give the machines access to each other as “nobody”, which
allows access to public files only. Is that sufficient?

None of this seems to be documented.



As a first test, can you use “on -f node -u nobody” and “on -n node -u nobody”
to start a process on the remote machine?

Only if I run as root. So this is some kind of permissions problem.

QNET is being started by

mount -Tio-net -o “maproot=99,mapany=99” npm-qnet.so

which has the effect that all file accesses to other machines appear
as “nobody” (which has a UID of 99 on both machines). Any user on QNET
can thus access public files on the other machines, but that’s all.
That’s what we wanted, and it seems to be a standard QNET setup.

But “on” with a named node won’t work unless invoked by root.
Even running locally as “nobody” doesn’t let me run as “nobody” remotely.

“nobody” has an /etc/passwd entry like this:

nobody:x:99:99:Nobody:/:

So what do I need to do?


John Nagle