Virtual Circuits in QNX2

Hi there,

we are running a 12-node QNX2 system in a ‘cluster’ configuration with node
1 having overall control of the system. There are approximately 74 tasks
running on node 1 with around 50 tasks running on the other nodes. There is
extensive inter-node communications and over a period of time one of the
tasks is generating excessive amounts of virtual circuits which eventually
eat up all available tasks and crash the system.

Should the OS be cleaning up these VC’s or should they be killed from within
the application? Is there any way of killing these VC’s without crashing the
task from which they are generated?

Any info on the creation/killing of VC’s would be appreciated.

Thanks,

Russ Bilbey

Do you know how you are creating the vcses?
Anytime you have node to node communication a virtual circuit is created on
each side.
This can be just file access, queue access, explicit task send/receive, etc.

The majority of our vcs use, is using name_locate() to get a tid so we can
send a message to a specific task.
We also access queues on other nodes.
Whenever you do a name_locate() across the net, a vcs will be created on
each side.
Once you have finished with the vcs it should be removed using vc_release().
If you use name_locate() and then send to the tid, if the send fails you
should release the vcs and
relocate the task tid. We had a problem at one time because this wasn’t
being done.
This may be the problem your seeing. Maybe a failure has occured and
recovery includes relocating the task and not
clearing the previous vcs.

We’ve had other similar problems when nodes have been rebooted leaving open
vcses because of the way
the alive command was used in the sysinit files.

If this doesn’t help, maybe more info on your application would help.

ms…



Russ Bilbey <russ@bilbey.com> wrote in message
news:9ebf11$kq2$1@inn.qnx.com

Hi there,

we are running a 12-node QNX2 system in a ‘cluster’ configuration with
node
1 having overall control of the system. There are approximately 74 tasks
running on node 1 with around 50 tasks running on the other nodes. There
is
extensive inter-node communications and over a period of time one of the
tasks is generating excessive amounts of virtual circuits which eventually
eat up all available tasks and crash the system.

Should the OS be cleaning up these VC’s or should they be killed from
within
the application? Is there any way of killing these VC’s without crashing
the
task from which they are generated?

Any info on the creation/killing of VC’s would be appreciated.

Thanks,

Russ Bilbey

Thanks Mike, this is great info.

I’m only really starting the digging here so can’t fill you in too much. The
main problem is that this is someone else’s legacy code and no-one here
really knows how it all works. Until recently, we thought that the OS was
running out of tasks (real tasks as opposed to pseudo-tasks - in this case
VC’s) i.e the available tasks would hit 0 then the system crashed. We now
know that the ‘real’ tasks are stable at about 74 and the VC count is going
through the roof.

I look forward to your comments once I’ve dug a little deeper.

Many thanks once again.

Russ.


Mike Schneider <Mike.Schneider@us.heidelberg.com> wrote in message
news:9ebk7e$npm$1@inn.qnx.com

Do you know how you are creating the vcses?
Anytime you have node to node communication a virtual circuit is created
on
each side.
This can be just file access, queue access, explicit task send/receive,
etc.

The majority of our vcs use, is using name_locate() to get a tid so we can
send a message to a specific task.
We also access queues on other nodes.
Whenever you do a name_locate() across the net, a vcs will be created on
each side.
Once you have finished with the vcs it should be removed using
vc_release().
If you use name_locate() and then send to the tid, if the send fails you
should release the vcs and
relocate the task tid. We had a problem at one time because this wasn’t
being done.
This may be the problem your seeing. Maybe a failure has occured and
recovery includes relocating the task and not
clearing the previous vcs.

We’ve had other similar problems when nodes have been rebooted leaving
open
vcses because of the way
the alive command was used in the sysinit files.

If this doesn’t help, maybe more info on your application would help.

ms…



Russ Bilbey <> russ@bilbey.com> > wrote in message
news:9ebf11$kq2$> 1@inn.qnx.com> …
Hi there,

we are running a 12-node QNX2 system in a ‘cluster’ configuration with
node
1 having overall control of the system. There are approximately 74 tasks
running on node 1 with around 50 tasks running on the other nodes. There
is
extensive inter-node communications and over a period of time one of the
tasks is generating excessive amounts of virtual circuits which
eventually
eat up all available tasks and crash the system.

Should the OS be cleaning up these VC’s or should they be killed from
within
the application? Is there any way of killing these VC’s without crashing
the
task from which they are generated?

Any info on the creation/killing of VC’s would be appreciated.

Thanks,

Russ Bilbey
\

the job of the poller is to check integrity of both sides of a vc and to tear
it down if it is no longer valid.

i have some code that illustrates doing this on my web site…

go to staff.qnx.com/~randy/qnx2
and look in the vc_check/ subdir

it is possible to get a half-ended vc if a node goes up and down without
the poller detecting it. one solution is to run the poller on each node in
standalone mode. or look at the vc_check/ stuff to see how you can walk the
task table and remove a vc.

Mike Schneider <Mike.Schneider@us.heidelberg.com> wrote:

Do you know how you are creating the vcses?
Anytime you have node to node communication a virtual circuit is created on
each side.
This can be just file access, queue access, explicit task send/receive, etc.

The majority of our vcs use, is using name_locate() to get a tid so we can
send a message to a specific task.
We also access queues on other nodes.
Whenever you do a name_locate() across the net, a vcs will be created on
each side.
Once you have finished with the vcs it should be removed using vc_release().
If you use name_locate() and then send to the tid, if the send fails you
should release the vcs and
relocate the task tid. We had a problem at one time because this wasn’t
being done.
This may be the problem your seeing. Maybe a failure has occured and
recovery includes relocating the task and not
clearing the previous vcs.

We’ve had other similar problems when nodes have been rebooted leaving open
vcses because of the way
the alive command was used in the sysinit files.

If this doesn’t help, maybe more info on your application would help.

ms…



Russ Bilbey <> russ@bilbey.com> > wrote in message
news:9ebf11$kq2$> 1@inn.qnx.com> …
Hi there,

we are running a 12-node QNX2 system in a ‘cluster’ configuration with
node
1 having overall control of the system. There are approximately 74 tasks
running on node 1 with around 50 tasks running on the other nodes. There
is
extensive inter-node communications and over a period of time one of the
tasks is generating excessive amounts of virtual circuits which eventually
eat up all available tasks and crash the system.

Should the OS be cleaning up these VC’s or should they be killed from
within
the application? Is there any way of killing these VC’s without crashing
the
task from which they are generated?

Any info on the creation/killing of VC’s would be appreciated.

Thanks,

Russ Bilbey


Randy Martin randy@qnx.com
Manager of FAE Group, North America
QNX Software Systems www.qnx.com
175 Terence Matthews Crescent, Kanata, Ontario, Canada K2M 1W8
Tel: 613-591-0931 Fax: 613-591-3579

Previously, Randy Martin wrote in qdn.public.qnx2:

it is possible to get a half-ended vc if a node goes up and down without
the poller detecting it.

Randy,

Note that it is quite possible to get a half-ended vc without either
node going down. This is due to a bug that appears with faster CPU’s.


Mitchell Schoenbrun --------- maschoen@pobox.com

true… the notes on the site describe some of these things… thanks for the
detail…

Mitchell Schoenbrun <maschoen@pobox.com> wrote:

Previously, Randy Martin wrote in qdn.public.qnx2:

it is possible to get a half-ended vc if a node goes up and down without
the poller detecting it.

Randy,

Note that it is quite possible to get a half-ended vc without either
node going down. This is due to a bug that appears with faster CPU’s.



Mitchell Schoenbrun --------- > maschoen@pobox.com


Randy Martin randy@qnx.com
Manager of FAE Group, North America
QNX Software Systems www.qnx.com
175 Terence Matthews Crescent, Kanata, Ontario, Canada K2M 1W8
Tel: 613-591-0931 Fax: 613-591-3579

Randy,

thanks for the VC info, it has proved invaluable.

One of the header files on your staff site does not appear to download
(task.h), any chance you could either make this available or e-mail it/post
it to this group??

Many thanks,

Russ.
“Randy Martin” <randy@qnx.com> wrote in message
news:9f0bbk$eqk$1@nntp.qnx.com

true… the notes on the site describe some of these things… thanks for
the
detail…

Mitchell Schoenbrun <> maschoen@pobox.com> > wrote:
Previously, Randy Martin wrote in qdn.public.qnx2:

it is possible to get a half-ended vc if a node goes up and down
without
the poller detecting it.

Randy,

Note that it is quite possible to get a half-ended vc without either
node going down. This is due to a bug that appears with faster CPU’s.


Mitchell Schoenbrun --------- > maschoen@pobox.com



\

Randy Martin > randy@qnx.com
Manager of FAE Group, North America
QNX Software Systems > www.qnx.com
175 Terence Matthews Crescent, Kanata, Ontario, Canada K2M 1W8
Tel: 613-591-0931 Fax: 613-591-3579