How to rm /dev/name/local/names?

I have a problem that’s quite predictable on our system. One of the
programs goes away without removing it’s name from /dev/name/local. When it
attempts to re-start, it refuses to attach because the name is already
there. How can I rm the name?

For more details:

We are having some communication problems so I created a program (commtest)
that tests for each running program (name_open and name_close) and a system
call to check if the remote node is still there.

The running programs (about 12) are talking to the remote okay. If commtest
finds an running program down (slay, crash, whatever) then it brings it back
up with a system call. If the network is broken only a warning is printed.

Can slay -f several running programs and they are brought up okay with
commtest.

Can remove network cable and slay programs and commtest brings them up okay.

Plug in network cable and warnings stop.

But then if I slay the programs not all come up - all are tried by commtest
but some name_attach’s fail. pidin shows the program attempting to run
isn’t but the attached name is shown in /dev/name/local. It’s not the same
program every time. Sometimes it’s more than one. About 10% of time all
programs come back up.

Also, after reconnecting the network the remote programs can _connect to the
downed program but MsgSend fails.

So, I’m thinking I should rm the name before bringing the program back up.
But how?

Any help is appreciated.
Ken

So, I’m thinking I should rm the name before bringing the program back up.
But how?

I’ve never tried, can you rm the name from the shell in the case of it
failing to de-register when the program is slayed?

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

If a name exist in /dev/name/local/, and you think the program regist that
name is gone,
you can “ls /proc/mount/dev/name/local/name/” to find out “which” pid is
registing that
name. You can then pidin to find out if the process is existing.

If that pid is not existing, something seriously wrong with the “procnto”
pathname manager.

-xtang


Ken Price <kprice@harscotrack.com> wrote in message
news:bv6aup$35p$1@inn.qnx.com

I have a problem that’s quite predictable on our system. One of the
programs goes away without removing it’s name from /dev/name/local. When
it
attempts to re-start, it refuses to attach because the name is already
there. How can I rm the name?

For more details:

We are having some communication problems so I created a program
(commtest)
that tests for each running program (name_open and name_close) and a
system
call to check if the remote node is still there.

The running programs (about 12) are talking to the remote okay. If
commtest
finds an running program down (slay, crash, whatever) then it brings it
back
up with a system call. If the network is broken only a warning is
printed.

Can slay -f several running programs and they are brought up okay with
commtest.

Can remove network cable and slay programs and commtest brings them up
okay.

Plug in network cable and warnings stop.

But then if I slay the programs not all come up - all are tried by
commtest
but some name_attach’s fail. pidin shows the program attempting to run
isn’t but the attached name is shown in /dev/name/local. It’s not the
same
program every time. Sometimes it’s more than one. About 10% of time all
programs come back up.

Also, after reconnecting the network the remote programs can _connect to
the
downed program but MsgSend fails.

So, I’m thinking I should rm the name before bringing the program back up.
But how?

Any help is appreciated.
Ken

“Chris McKillop” <cdm@qnx.com> wrote in message
news:bv6j5c$90k$2@nntp.qnx.com

So, I’m thinking I should rm the name before bringing the program back
up.
But how?


I’ve never tried, can you rm the name from the shell in the case of it
failing to de-register when the program is slayed?

Uh, no. That’s the problem. rm -f /dev/name/local/ gives a Can’t
unlink : no such name or device.

Same thing you get on a valid name.

Is there a device driver that has to be slayed? Like io-net when the network
goes down?

I noticed that when the last name is detached in /dev/name/local, the local
and name dirs disappear. What’s controlling that?

I’ve played around with the problem some more and noted a few new things:

  1. Unplugging and replugging the network cable isn’t the only way for this
    to happen, just the quickest. Slaying some programs (and having commtest
    bring them back up) after many cycles (15+) will do it too.

  2. Slay doesn’t send a message to the program so I can’t guarentee a safe
    exit (name_detach, etc).

  3. Waiting a full second before starting a slayed program doesn’t make a
    difference.

  4. Seems like it happens quicker when network traffic is higher. Maybe slay
    happens in between send and reply.

I’m trying kill and see if I can limit it to just one program.


chris


Chris McKillop <> cdm@qnx.com> > “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

I looked into /proc/mount/dev/name/local and that list is the same as
/dev/name/local. That is, when a program dies but doesn’t detach it’s name
it shows up in both. The /proc/mount… creates a directory with the
attached name. Inside that is a read-only file with the name 0, pid, 02,
12. The pid is of the program that was slayed but hasn’t released it’s
name. The file and directory can not be removed.

Kill has the same problem as slay over time.

procnto isn’t all that easy for me to understand. It looks like it’s the
first to load then starts 12-15 threads. I know it’s a long shot but can I
kill one of those to restart the pathname manager?

I’ll try to create a simple example and post it here or maybe call tech
support. It does look serious.

Thanks,
Ken


“Xiaodan Tang” <xtang@qnx.com> wrote in message
news:bv6kms$atr$1@inn.qnx.com

If a name exist in /dev/name/local/, and you think the program regist that
name is gone,
you can “ls /proc/mount/dev/name/local/name/” to find out “which” pid is
registing that
name. You can then pidin to find out if the process is existing.

If that pid is not existing, something seriously wrong with the “procnto”
pathname manager.

-xtang


Ken Price <> kprice@harscotrack.com> > wrote in message
news:bv6aup$35p$> 1@inn.qnx.com> …
I have a problem that’s quite predictable on our system. One of the
programs goes away without removing it’s name from /dev/name/local.
When
it
attempts to re-start, it refuses to attach because the name is already
there. How can I rm the name?

For more details:

We are having some communication problems so I created a program
(commtest)
that tests for each running program (name_open and name_close) and a
system
call to check if the remote node is still there.

The running programs (about 12) are talking to the remote okay. If
commtest
finds an running program down (slay, crash, whatever) then it brings it
back
up with a system call. If the network is broken only a warning is
printed.

Can slay -f several running programs and they are brought up okay with
commtest.

Can remove network cable and slay programs and commtest brings them up
okay.

Plug in network cable and warnings stop.

But then if I slay the programs not all come up - all are tried by
commtest
but some name_attach’s fail. pidin shows the program attempting to run
isn’t but the attached name is shown in /dev/name/local. It’s not the
same
program every time. Sometimes it’s more than one. About 10% of time
all
programs come back up.

Also, after reconnecting the network the remote programs can _connect to
the
downed program but MsgSend fails.

So, I’m thinking I should rm the name before bringing the program back
up.
But how?

Any help is appreciated.
Ken
\

Ken Price <kprice@harscotrack.com> wrote in message
news:bv8g53$rte$1@inn.qnx.com

I looked into /proc/mount/dev/name/local and that list is the same as
/dev/name/local. That is, when a program dies but doesn’t detach it’s
name
it shows up in both. The /proc/mount… creates a directory with the
attached name. Inside that is a read-only file with the name 0, pid, 02,
12. The pid is of the program that was slayed but hasn’t released it’s
name. The file and directory can not be removed.

So does the process who registed the name (and slayed by you) actually
died? After slay, could you do a pidin to confirm?

If a process regist a name, and died, the pathname manager in procnto
would notice this, and delete the entry it registed. If it faild to do so,
it’s a more serious problem.

It also could be your process for any reason, don’t want to die even
if you slay it :slight_smile: That waht we want to find out.

-xtang


Kill has the same problem as slay over time.

procnto isn’t all that easy for me to understand. It looks like it’s the
first to load then starts 12-15 threads. I know it’s a long shot but can
I
kill one of those to restart the pathname manager?

I’ll try to create a simple example and post it here or maybe call tech
support. It does look serious.

Thanks,
Ken


“Xiaodan Tang” <> xtang@qnx.com> > wrote in message
news:bv6kms$atr$> 1@inn.qnx.com> …
If a name exist in /dev/name/local/, and you think the program regist
that
name is gone,
you can “ls /proc/mount/dev/name/local/name/” to find out “which” pid
is
registing that
name. You can then pidin to find out if the process is existing.

If that pid is not existing, something seriously wrong with the
“procnto”
pathname manager.

-xtang


Ken Price <> kprice@harscotrack.com> > wrote in message
news:bv6aup$35p$> 1@inn.qnx.com> …
I have a problem that’s quite predictable on our system. One of the
programs goes away without removing it’s name from /dev/name/local.
When
it
attempts to re-start, it refuses to attach because the name is already
there. How can I rm the name?

For more details:

We are having some communication problems so I created a program
(commtest)
that tests for each running program (name_open and name_close) and a
system
call to check if the remote node is still there.

The running programs (about 12) are talking to the remote okay. If
commtest
finds an running program down (slay, crash, whatever) then it brings
it
back
up with a system call. If the network is broken only a warning is
printed.

Can slay -f several running programs and they are brought up okay with
commtest.

Can remove network cable and slay programs and commtest brings them up
okay.

Plug in network cable and warnings stop.

But then if I slay the programs not all come up - all are tried by
commtest
but some name_attach’s fail. pidin shows the program attempting to
run
isn’t but the attached name is shown in /dev/name/local. It’s not the
same
program every time. Sometimes it’s more than one. About 10% of time
all
programs come back up.

Also, after reconnecting the network the remote programs can _connect
to
the
downed program but MsgSend fails.

So, I’m thinking I should rm the name before bringing the program back
up.
But how?

Any help is appreciated.
Ken


\

“Xiaodan Tang” <xtang@qnx.com> wrote in message
news:bv8pq7$6an$1@inn.qnx.com

Ken Price <> kprice@harscotrack.com> > wrote in message
news:bv8g53$rte$> 1@inn.qnx.com> …
I looked into /proc/mount/dev/name/local and that list is the same as
/dev/name/local. That is, when a program dies but doesn’t detach it’s
name
it shows up in both. The /proc/mount… creates a directory with the
attached name. Inside that is a read-only file with the name 0, pid,
02,
12. The pid is of the program that was slayed but hasn’t released it’s
name. The file and directory can not be removed.

So does the process who registed the name (and slayed by you) actually
died? After slay, could you do a pidin to confirm?

pidin and sin do not show the program after I slay it. It does show up in

/dev/name/local only. slay name and kill pid return unable to find and no
such process errors. The pid I see before slaying the program is the same
as in /proc/mount/dev/name/local//0,pid,02,12.

If a process regist a name, and died, the pathname manager in procnto
would notice this, and delete the entry it registed. If it faild to do so,
it’s a more serious problem.

It also could be your process for any reason, don’t want to die even
if you slay it > :slight_smile: > That waht we want to find out.

If it’s running, I would think pidin would show it. I trust pidin more than

/dev/name/local. I’ve seen the “don’t want to die” scenerio before but that
shows up as a zombie or something.

-xtang

The procnto is only in the boot image and is called without arguments.
Considering the problems I had in getting a working boot image, the problem
may be here.
Does this look right?

[virtual=x86,./bios +compress] boot = {
startup-bios
PATH=./:/proc/boot:/bin:/sbin:/usr/bin:/usr/sbin
LD_LIBRARY_PATH=/proc/boot:/bin:/lib:/lib/dll:/usr/lib procnto
}

Kill has the same problem as slay over time.

procnto isn’t all that easy for me to understand. It looks like it’s
the
first to load then starts 12-15 threads. I know it’s a long shot but
can
I
kill one of those to restart the pathname manager?

I’ll try to create a simple example and post it here or maybe call tech
support. It does look serious.

Thanks,
Ken


“Xiaodan Tang” <> xtang@qnx.com> > wrote in message
news:bv6kms$atr$> 1@inn.qnx.com> …
If a name exist in /dev/name/local/, and you think the program regist
that
name is gone,
you can “ls /proc/mount/dev/name/local/name/” to find out “which” pid
is
registing that
name. You can then pidin to find out if the process is existing.

If that pid is not existing, something seriously wrong with the
“procnto”
pathname manager.

-xtang


Ken Price <> kprice@harscotrack.com> > wrote in message
news:bv6aup$35p$> 1@inn.qnx.com> …
I have a problem that’s quite predictable on our system. One of the
programs goes away without removing it’s name from /dev/name/local.
When
it
attempts to re-start, it refuses to attach because the name is
already
there. How can I rm the name?

For more details:

We are having some communication problems so I created a program
(commtest)
that tests for each running program (name_open and name_close) and a
system
call to check if the remote node is still there.

The running programs (about 12) are talking to the remote okay. If
commtest
finds an running program down (slay, crash, whatever) then it brings
it
back
up with a system call. If the network is broken only a warning is
printed.

Can slay -f several running programs and they are brought up okay
with
commtest.

Can remove network cable and slay programs and commtest brings them
up
okay.

Plug in network cable and warnings stop.

But then if I slay the programs not all come up - all are tried by
commtest
but some name_attach’s fail. pidin shows the program attempting to
run
isn’t but the attached name is shown in /dev/name/local. It’s not
the
same
program every time. Sometimes it’s more than one. About 10% of
time
all
programs come back up.

Also, after reconnecting the network the remote programs can
_connect
to
the
downed program but MsgSend fails.

So, I’m thinking I should rm the name before bringing the program
back
up.
But how?

Any help is appreciated.
Ken




\