Watcom execl()command problems.

It goes like this.

We have an application that we have built with QNX4.24. We have used
this application for several years running with photon1.12 and
X-windows. Do to the updating of QNX we have decided to move to the
newer QNX4.25 and photon1.14. I have made minor modification to the
directory structure and change some define names. I have tested the
newly created code on the QNX4.24/photon1.12 and it work just fine. On
the QNX4.25/photon1.14 system the code fails. During debug I was able
to determine the location of the failure is at the QNX command
‘execl(PATH, application, arg1, arg2, NULL)’. I have been able to debug
up to this point. Once this point is reached the an attempt to start the

new application/process is made and the debug session is finished. The
display
screen goes blank and nothing else happens. At this point I have to do
an
‘ALT-DEL-SHIFT-BACKSPACE’ to get back to a working system, but I,m no
longer in a Photon session. In the4.24/1.12 system the application
continue
correctly, in the 4.25/1.14 system the application appears to lock up.
I have
compared the data being loaded prior to the ‘execl’ command on both
systems. The data is the same in both.

I guess at this point with our testing I’m not sure where to go. I guess

the first question would be, have QNX made modification to the ‘execl’
function? Or have QNX made modifications to the OS that would cause this

type of problem? Could I have a setup problem with some part of QNX4.25

that needs to be made? Is there any further testing I could do to prove

out where the error may lye? Is there more information I can give you?

If you could help me out here that would be great.

Hope to hear from someone soon.

John Parsons <parsonsj@esi.com> wrote:

We have an application that we have built with QNX4.24. We have used
this application for several years running with photon1.12 and
X-windows. Do to the updating of QNX we have decided to move to the
newer QNX4.25 and photon1.14. I have made minor modification to the
directory structure and change some define names. I have tested the
newly created code on the QNX4.24/photon1.12 and it work just fine. On
the QNX4.25/photon1.14 system the code fails. During debug I was able
to determine the location of the failure is at the QNX command
‘execl(PATH, application, arg1, arg2, NULL)’. I have been able to debug
up to this point. Once this point is reached the an attempt to start the
new application/process is made and the debug session is finished. The
display
screen goes blank and nothing else happens. At this point I have to do
an
‘ALT-DEL-SHIFT-BACKSPACE’ to get back to a working system, but I,m no
longer in a Photon session. In the4.24/1.12 system the application
continue
correctly, in the 4.25/1.14 system the application appears to lock up.
I have
compared the data being loaded prior to the ‘execl’ command on both
systems. The data is the same in both.

Can you run “sin” from another node after the screen goes blank (but
before exiting from Photon)?

After you get back to text mode, are there any error messages on the
console?


Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.

Wojtek Lerch wrote:

John Parsons <> parsonsj@esi.com> > wrote:
We have an application that we have built with QNX4.24. We have used
this application for several years running with photon1.12 and
X-windows. Do to the updating of QNX we have decided to move to the
newer QNX4.25 and photon1.14. I have made minor modification to the
directory structure and change some define names. I have tested the
newly created code on the QNX4.24/photon1.12 and it work just fine. On
the QNX4.25/photon1.14 system the code fails. During debug I was able
to determine the location of the failure is at the QNX command
‘execl(PATH, application, arg1, arg2, NULL)’. I have been able to debug
up to this point. Once this point is reached the an attempt to start the
new application/process is made and the debug session is finished. The
display
screen goes blank and nothing else happens. At this point I have to do
an
‘ALT-DEL-SHIFT-BACKSPACE’ to get back to a working system, but I,m no
longer in a Photon session. In the4.24/1.12 system the application
continue
correctly, in the 4.25/1.14 system the application appears to lock up.
I have
compared the data being loaded prior to the ‘execl’ command on both
systems. The data is the same in both.

Can you run “sin” from another node after the screen goes blank (but
before exiting from Photon)?

YES;

Tryed a telnet from another station to the station running the wd
to the point of the execl()command. Do the execl() command the
telnet cannot connect to the remote site. At the bottom of the wd screen is
‘task complete’. Exit the wd session screen goes blank. No display of any
error condition. Do a sin from remote station, the initial process is held
and the child is (zombie) dead.

This may indicate the select() command of the child process timed out and
tried to exit back to parent. Failed to find parent and photon locks up!!
Could there be a problem with the select command.


After you get back to text mode, are there any error messages on the
console?

No errors of any sort at the main or remote stations.


Wojtek Lerch (> wojtek@qnx.com> ) QNX Software Systems Ltd.

John Parsons <parsonsj@esi.com> wrote:

Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
We have an application that we have built with QNX4.24. We have used
this application for several years running with photon1.12 and
X-windows. Do to the updating of QNX we have decided to move to the
newer QNX4.25 and photon1.14. I have made minor modification to the
directory structure and change some define names. I have tested the
newly created code on the QNX4.24/photon1.12 and it work just fine. On
the QNX4.25/photon1.14 system the code fails. During debug I was able
to determine the location of the failure is at the QNX command
‘execl(PATH, application, arg1, arg2, NULL)’. I have been able to debug
up to this point. Once this point is reached the an attempt to start the
new application/process is made and the debug session is finished. The
display
screen goes blank and nothing else happens. At this point I have to do
an
‘ALT-DEL-SHIFT-BACKSPACE’ to get back to a working system, but I,m no
longer in a Photon session. In the4.24/1.12 system the application
continue
correctly, in the 4.25/1.14 system the application appears to lock up.

Can you run “sin” from another node after the screen goes blank (but
before exiting from Photon)?

Tryed a telnet from another station to the station running the wd
application> to the point of the execl()command. Do the execl() command the
telnet cannot connect to the remote site. At the bottom of the wd screen is
‘task complete’. Exit the wd session screen goes blank. No display of any
error condition. Do a sin from remote station, the initial process is held
and the child is (zombie) dead.

Does Photon behave normally until you exit from wd?

By “blank”, do you mean the screen is completely black? Is there a mouse
cursor? Can you move it?

Could you post the complete output from sin? Run it twice – just before
and just after exiting from wd. This will help us figure out what
exactly is happening to various pieces of Photon.

This may indicate the select() command of the child process timed out and
tried to exit back to parent. Failed to find parent and photon locks up!!
Could there be a problem with the select command.

I don’t think so. I just did some experimenting, and managed to
reproduce a similar problem in a simpler situation that does not involve
Photon or a select() call.

My simple test case involves debugging a program that has two lines in
its main():

execlp( “echo”, “echo”, “Hello”, NULL );
perror(“exec”);

This is what happens when I tell WD to execute the exec() line:

WD says “TASK COMPLETED”.

A “sin” from another node lists both my test program and “echo”,
both REPLY-blocked on Proc. According to sin, the “echo” does not
have a parent and the test program does not have a child.

If I exit from WD now, the “echo” runs as if nothing happened and
everything returns to normal. But if I wait a while before exiting,
I see a SIGSEGV message from my test program, followed by the output
from “echo”. A few times I also saw an “exec: Interrupted function
call” message preceding the SIGSEGV.

While the WD is still there, any attempt to start a new process on
the same machine (e.g. by typing “sin” in a shell) makes the parent
hang. Sin (from another node) shows that it’s REPLY-blocked on Proc
and has no children. When I exit from WD, the child runs and
everything returns to normal.

In short, I can confirm that strange things happen when you try to
debug an exec() call, but they don’t seem to have much to do with
Photon. Now, since I’m just a Photon person, I’ll just let the
Proc/kernel folks take over from here…


Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.

Wojtek Lerch wrote:

John Parsons <> parsonsj@esi.com> > wrote:
Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
We have an application that we have built with QNX4.24. We have used
this application for several years running with photon1.12 and
X-windows. Do to the updating of QNX we have decided to move to the
newer QNX4.25 and photon1.14. I have made minor modification to the
directory structure and change some define names. I have tested the
newly created code on the QNX4.24/photon1.12 and it work just fine. On
the QNX4.25/photon1.14 system the code fails. During debug I was able
to determine the location of the failure is at the QNX command
‘execl(PATH, application, arg1, arg2, NULL)’. I have been able to debug
up to this point. Once this point is reached the an attempt to start the
new application/process is made and the debug session is finished. The
display
screen goes blank and nothing else happens. At this point I have to do
an
‘ALT-DEL-SHIFT-BACKSPACE’ to get back to a working system, but I,m no
longer in a Photon session. In the4.24/1.12 system the application
continue
correctly, in the 4.25/1.14 system the application appears to lock up.

Can you run “sin” from another node after the screen goes blank (but
before exiting from Photon)?

Tryed a telnet from another station to the station running the wd
application> to the point of the execl()command. Do the execl() command the
telnet cannot connect to the remote site. At the bottom of the wd screen is
‘task complete’. Exit the wd session screen goes blank. No display of any
error condition. Do a sin from remote station, the initial process is held
and the child is (zombie) dead.

Does Photon behave normally until you exit from wd?

Photon behaves as normal to the point of failure of the testing process.

By “blank”, do you mean the screen is completely black? Is there a mouse
cursor? Can you move it?

By “blank”, I mean the screen is completely black, no movement of anything cause
there is nothing there. From a remote station telnet session using the sin
command. It would appear that the chipsbios.ms and display drive have been closed.
They do not sowup on the list of acctive process.

Could you post the complete output from sin? Run it twice – just before
and just after exiting from wd. This will help us figure out what
exactly is happening to various pieces of Photon.

There are four sin output files attached.

sinbefore - Is during the debug session just before the execl() is executed.
sintaskcomplete - Is right after the execl() command is exicuted and the screen
goes blank.
blankscreen - Is when the blank screen is displayed.
sinafterreset - Is after the reset process (alt-del-shift-backspace) and I have
returned to a
normal screen.


This may indicate the select() command of the child process timed out and
tried to exit back to parent. Failed to find parent and photon locks up!!
Could there be a problem with the select command.

I don’t think so. I just did some experimenting, and managed to
reproduce a similar problem in a simpler situation that does not involve
Photon or a select() call.

Please note that the problem is present outside of the wd. The application fails
during normal run time. I’m using wd only to determine the problem.

My simple test case involves debugging a program that has two lines in
its main():

execlp( “echo”, “echo”, “Hello”, NULL );
perror(“exec”);

This is what happens when I tell WD to execute the exec() line:

WD says “TASK COMPLETED”.

A “sin” from another node lists both my test program and “echo”,
both REPLY-blocked on Proc. According to sin, the “echo” does not
have a parent and the test program does not have a child.

If I exit from WD now, the “echo” runs as if nothing happened and
everything returns to normal. But if I wait a while before exiting,
I see a SIGSEGV message from my test program, followed by the output
from “echo”. A few times I also saw an “exec: Interrupted function
call” message preceding the SIGSEGV.

While the WD is still there, any attempt to start a new process on
the same machine (e.g. by typing “sin” in a shell) makes the parent
hang. Sin (from another node) shows that it’s REPLY-blocked on Proc
and has no children. When I exit from WD, the child runs and
everything returns to normal.

In short, I can confirm that strange things happen when you try to
debug an exec() call, but they don’t seem to have much to do with
Photon. Now, since I’m just a Photon person, I’ll just let the
Proc/kernel folks take over from here…


Wojtek Lerch (> wojtek@qnx.com> ) QNX Software Systems Ltd.

John Parsons <parsonsj@esi.com> wrote:

Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
Tryed a telnet from another station to the station running the wd
application> to the point of the execl()command. Do the execl() command the
telnet cannot connect to the remote site. At the bottom of the wd screen is
‘task complete’. Exit the wd session screen goes blank. No display of any
error condition. Do a sin from remote station, the initial process is held
and the child is (zombie) dead.

Does Photon behave normally until you exit from wd?

Photon behaves as normal to the point of failure of the testing process.

My point of view is that as long as we’re still investigating this, the
testing process has not failed. :wink: But I’ll assume you meant “yes”.

By “blank”, do you mean the screen is completely black? Is there a mouse
cursor? Can you move it?

By “blank”, I mean the screen is completely black, no movement of anything cause
there is nothing there. From a remote station telnet session using the sin
command. It would appear that the chipsbios.ms and display drive have been closed.
They do not sowup on the list of acctive process.

Is there a traceinfo entry mentioning them?

Could you post the complete output from sin? Run it twice – just before
and just after exiting from wd. This will help us figure out what
exactly is happening to various pieces of Photon.

There are four sin output files attached.

sinbefore - Is during the debug session just before the execl() is executed.

Why am I seeing two sets of WD and “userintfapp” in there? Were you
running two WD sessions? Are they related? It would make it easier to
analyze the output from “sin” if there were as few irrelevant processes
running on your machine as possible.

sintaskcomplete - Is right after the execl() command is exicuted and the screen
goes blank.

Don’t you mean after execl() but before the screen goes blank? The
graphics driver in still running at this point.

BTW But the second WD is not, neither is the second “userintfapp”. Did
you exit from the second WD without executing the execl() call?

blankscreen - Is when the blank screen is displayed.

Yes, the graphics driver and the mode switcher are missing from this
one, and process 1122 “diagapp” has turned into a zombie. And these are
the only differences between the two logs – both the “userintfapp” and
the WD are still running. This doesn’t seem consistent with my
assumption that it’s exiting from WD that makes the screen go blank.
What exactly did you do between the previous “sin” and this one?
Or does the graphics driver die simply because you’re letting “diagapp”
run for a while, without having to touch the keyboard or the mouse?

sinafterreset - Is after the reset process (alt-del-shift-backspace) and I have
returned to a
normal screen.

I can’t explain how the Ctrl-Shift-Alt-Bkspace can do anything to the
screen if the mode switcher is already dead. But it does bring you back
to text mode, with the shell prompt and any previous shell commands and
their output visible, correct?

Please note that the problem is present outside of the wd. The application fails
during normal run time. I’m using wd only to determine the problem.

OK, so we really seem to have two problems here:

One is the WD problem I described before. It’s not quite clear to
me how much it has to do with what you’re describing.

The other problem is that the “diagapp” seems to kill your graphics
driver.

Can you tell me more about this “diagapp” program:

Can you run it from a pterm, or does it have to be execed by
“userintfapp”? If you run it from a pterm, does it also cause
problems? Can you run it under the debugger?

Does it do any drawing on the screen? Is it using any Pg calls, or
is all the drawing done by widgets (other than PtRaw)?

Is “userintfapp” also a Photon application?

\

Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.

Wojtek Lerch wrote:

John Parsons <> parsonsj@esi.com> > wrote:
Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
Tryed a telnet from another station to the station running the wd
application> to the point of the execl()command. Do the execl() command the
telnet cannot connect to the remote site. At the bottom of the wd screen is
‘task complete’. Exit the wd session screen goes blank. No display of any
error condition. Do a sin from remote station, the initial process is held
and the child is (zombie) dead.

Does Photon behave normally until you exit from wd?

Photon behaves as normal to the point of failure of the testing process.

My point of view is that as long as we’re still investigating this, the
testing process has not failed. > :wink: > But I’ll assume you meant “yes”.

Okay maybe we should try this one again. I do not exit from WD. During WD the screen
goes blank.

By “blank”, do you mean the screen is completely black? Is there a mouse
cursor? Can you move it?

By “blank”, I mean the screen is completely black, no movement of anything cause
there is nothing there. From a remote station telnet session using the sin
command. It would appear that the chipsbios.ms and display drive have been closed.
They do not sowup on the list of acctive process.

Is there a traceinfo entry mentioning them?

Not sure what you are asking!

Could you post the complete output from sin? Run it twice – just before
and just after exiting from wd. This will help us figure out what
exactly is happening to various pieces of Photon.

There are four sin output files attached.

sinbefore - Is during the debug session just before the execl() is executed.

Why am I seeing two sets of WD and “userintfapp” in there? Were you
running two WD sessions? Are they related? It would make it easier to
analyze the output from “sin” if there were as few irrelevant processes
running on your machine as possible.

The first session is initial start of testing. The second is a fork() just prior to the
execl(). So yes there are two WD sessions running. I have got as few processes running
as I now how at this time.

sintaskcomplete - Is right after the execl() command is exicuted and the screen
goes blank.

Don’t you mean after execl() but before the screen goes blank? The
graphics driver in still running at this point.

I mean that after I execute the execl() command the WD sreen shows task complete at the
bottom of the screen. At the remote station the sin command does nothing until I exit
the WD at the host.

BTW But the second WD is not, neither is the second “userintfapp”. Did
you exit from the second WD without executing the execl() call?

You exit the second WD after the execl() command is executed and the task complete
message is displayed.

blankscreen - Is when the blank screen is displayed.

Yes, the graphics driver and the mode switcher are missing from this
one, and process 1122 “diagapp” has turned into a zombie. And these are
the only differences between the two logs – both the “userintfapp” and
the WD are still running. This doesn’t seem consistent with my
assumption that it’s exiting from WD that makes the screen go blank.
What exactly did you do between the previous “sin” and this one?
Or does the graphics driver die simply because you’re letting “diagapp”
run for a while, without having to touch the keyboard or the mouse?

I exited the WD after execl(). This allowed the ‘sintaskcomplete’ to complete. The
screen goes blank and then I did a blankscreen. The reason was to show the difference
just before and just after blank screen condision.

sinafterreset - Is after the reset process (alt-del-shift-backspace) and I have
returned to a
normal screen.

I can’t explain how the Ctrl-Shift-Alt-Bkspace can do anything to the
screen if the mode switcher is already dead. But it does bring you back
to text mode, with the shell prompt and any previous shell commands and
their output visible, correct?

Yes!

Please note that the problem is present outside of the wd. The application fails
during normal run time. I’m using wd only to determine the problem.

OK, so we really seem to have two problems here:

One is the WD problem I described before. It’s not quite clear to
me how much it has to do with what you’re describing.

The other problem is that the “diagapp” seems to kill your graphics
driver.

Can you tell me more about this “diagapp” program:

diagapp is a diagnostic application that could possible kill chipsbios.ms and Pg.chips
so that a new display screen size could be generated. Is there a way to stop the
execl() so that a debug session could be started to determine where in diagapp the
failure accures?

Can you run it from a pterm, or does it have to be execed by
“userintfapp”? If you run it from a pterm, does it also cause
problems? Can you run it under the debugger?

You need to runn userintfapp to set the correct params for the test to be run.

Does it do any drawing on the screen? Is it using any Pg calls, or
is all the drawing done by widgets (other than PtRaw)?

It could do at different points, but I have no way to determine if I ever get close to
any of that could.

Is “userintfapp” also a Photon application?

It needs to be run uder Photon to run the testing.

Boy this is a lot of fun ain’t!!


Wojtek Lerch (> wojtek@qnx.com> ) QNX Software Systems Ltd.

John Parsons <parsonsj@esi.com> wrote:

Wojtek Lerch wrote:

John Parsons <> parsonsj@esi.com> > wrote:
Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
Tryed a telnet from another station to the station running the wd
application> to the point of the execl()command. Do the execl() command the
telnet cannot connect to the remote site. At the bottom of the wd screen is
‘task complete’. Exit the wd session screen goes blank. No display of any
error condition. Do a sin from remote station, the initial process is held
and the child is (zombie) dead.

Does Photon behave normally until you exit from wd?

Photon behaves as normal to the point of failure of the testing process.

My point of view is that as long as we’re still investigating this, the
testing process has not failed. > :wink: > But I’ll assume you meant “yes”.

Okay maybe we should try this one again. I do not exit from WD. During WD the screen
goes blank.

Is that immediately after executing the exec() call?
A few seconds after the exec()?
A random amount of time after the exec()?
When you press a key after the exec()?
When you just sit and wait long enough?
None of the above?

What I am trying to ask is whether the screen going blank seems to be an
immediate reaction to something that you do, or does it always happen a
fixed amount of time after something you do, or does it perhaps seem to
happen after a completely unpredictable amount of time while you just
sit and stare at the monitor?

BTW Forgive me if I sound a bit impatient, but you must understand that
all I know about your software is what you have told me. Try to be
careful what you say and how you say it – English is not my first
language and I may not interpret things like “Exit the wd session screen
goes blank” the way you intended. And we don’t want to waste time
trying to investigate things that don’t exist, do we? :wink:

By “blank”, do you mean the screen is completely black? Is there a mouse
cursor? Can you move it?

By “blank”, I mean the screen is completely black, no movement of anything cause
there is nothing there. From a remote station telnet session using the sin
command. It would appear that the chipsbios.ms and display drive have been closed.
They do not sowup on the list of acctive process.

Is there a traceinfo entry mentioning them?

Not sure what you are asking!

“traceinfo” is a QNX utility that gives you a log of important events
that happened in your system recently. Crashes are listed among them.
If your graphics driver crashes, there will be an entry in the traceinfo
log.

There are four sin output files attached.

sinbefore - Is during the debug session just before the execl() is executed.

Why am I seeing two sets of WD and “userintfapp” in there? Were you
running two WD sessions? Are they related? It would make it easier to
analyze the output from “sin” if there were as few irrelevant processes
running on your machine as possible.

The first session is initial start of testing. The second is a fork() just prior to the
execl(). So yes there are two WD sessions running. I have got as few processes running
as I now how at this time.

sintaskcomplete - Is right after the execl() command is exicuted and the screen
goes blank.

Don’t you mean after execl() but before the screen goes blank? The
graphics driver in still running at this point.

I mean that after I execute the execl() command the WD sreen shows task complete at the
bottom of the screen. At the remote station the sin command does nothing until I exit
the WD at the host.

That’s because youre running it locally in a telnet session, and the WD
problem that I mentioned before prevents “sin” from running. But if you
run "sin -n " from another node, it will work immediately. This
machine is connected to a QNX network, isn’t it?

BTW But the second WD is not, neither is the second “userintfapp”. Did
you exit from the second WD without executing the execl() call?

You exit the second WD after the execl() command is executed and the task complete
message is displayed.

Now this is getting a bit too complicated for me. From what you have
said so far, this is how I imagine what is happening:

You start “wd userintfapp” in a pterm and run it until it forks.

Then, you find the forked child’s pid and attach a new WD to it; you
leave the first WD alone and only play with the second WD from now
on. When you’re talking about “the” WD, you’re referring to this
second WD and not the first one.

Under the second WD, you let the child call exec(). WD says “task
terminated”. From now on, Photon behaves more or less normally
until , but you can’t run things like
“sin” in your telnet session until you exit from the (second) WD
(that’s what I call “the WD/Proc problem”).

Once you exit from the (second) WD, “sin” runs. From its output, I
can see that the graphics driver is still alive at this point, and
“diagapp” is doing some file I/O.

After , the screen turns blank. When you
run sin, it shows that “diagapp” has died and turned into a zombie,
and that neither the graphics driver is running.

If the above is correct, could you fill in the blanks? If not, could you
please give me the exact scenario in at least as much detail as the
above?

blankscreen - Is when the blank screen is displayed.

Yes, the graphics driver and the mode switcher are missing from this
one, and process 1122 “diagapp” has turned into a zombie. And these are
the only differences between the two logs – both the “userintfapp” and
the WD are still running. This doesn’t seem consistent with my
assumption that it’s exiting from WD that makes the screen go blank.
What exactly did you do between the previous “sin” and this one?
Or does the graphics driver die simply because you’re letting “diagapp”
run for a while, without having to touch the keyboard or the mouse?

I exited the WD after execl(). This allowed the ‘sintaskcomplete’ to complete. The
screen goes blank and then I did a blankscreen. The reason was to show the difference
just before and just after blank screen condision.

Didn’t you just say that the screen goes blank “during WD”?
You’re not just trying to confuse me, are you? :wink:

sinafterreset - Is after the reset process (alt-del-shift-backspace) and I have
returned to a
normal screen.

I can’t explain how the Ctrl-Shift-Alt-Bkspace can do anything to the
screen if the mode switcher is already dead. But it does bring you back
to text mode, with the shell prompt and any previous shell commands and
their output visible, correct?

Yes!

There is only one explanation I can think of at this point: the blank
screen is already in text mode, but you’re looking at an empty console.
The Ctrl-Alt-Shift-BkSp shuts down your Photon, but also causes a
text-mode console switch to your first console.

To see whether that is the case, run a shell on every text-mode console
before starting Photon. This way, there will be something on every
text-mode console, and you’ll be able to distinguish between a
graphics-mode blank screen and text mode.

BTW If you run “crttrap start” from your telnet session when the screnn
is blank, it should restart the graphics driver. It might be
interesting to see what is going on in Photon before you kill it…

Can you tell me more about this “diagapp” program:

diagapp is a diagnostic application that could possible kill chipsbios.ms and Pg.chips
so that a new display screen size could be generated. Is there a way to stop the

Uh… Isn’t it then possible that your “diagapp” indeed kills
chipsbios.ms and Pg.chips, and then dies? Why didn’t you mention before
that it can do that?

execl() so that a debug session could be started to determine where in diagapp the
failure accures?

You could have a command-line option of environment variable causing
“diagapp” to call raise(SIGSTOP) at startup – this will let you attach
a WD to it. Or, you could have a command-line option to “userintfapp”
that makes it run “wd diagapp …” instead of just “diagapp …”.

Can you run it from a pterm, or does it have to be execed by
“userintfapp”? If you run it from a pterm, does it also cause
problems? Can you run it under the debugger?

You need to runn userintfapp to set the correct params for the test to be run.

Does it do any drawing on the screen? Is it using any Pg calls, or
is all the drawing done by widgets (other than PtRaw)?

It could do at different points, but I have no way to determine if I ever get close to
any of that could.

You could fprintf some messages to a file… Or run diagapp under WD
the way I described above.

Is “userintfapp” also a Photon application?

It needs to be run uder Photon to run the testing.

What I meant was does it make any Photon library calls – does it
perhaps create any widgets? This shouldn’t really matter, unless there’s
a bug somewhere that makes the graphics driver crash when a Photon app
execs another Photon app. But we don’t know at this point whether the
driver crashes or gets killed, do we…


Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.

Wojtek Lerch wrote:

John Parsons <> parsonsj@esi.com> > wrote:

Wojtek Lerch wrote:

John Parsons <> parsonsj@esi.com> > wrote:
Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
Tryed a telnet from another station to the station running the wd
application> to the point of the execl()command. Do the execl() command the
telnet cannot connect to the remote site. At the bottom of the wd screen is
‘task complete’. Exit the wd session screen goes blank. No display of any
error condition. Do a sin from remote station, the initial process is held
and the child is (zombie) dead.

Does Photon behave normally until you exit from wd?

Photon behaves as normal to the point of failure of the testing process.

My point of view is that as long as we’re still investigating this, the
testing process has not failed. > :wink: > But I’ll assume you meant “yes”.

Okay maybe we should try this one again. I do not exit from WD. During WD the screen
goes blank.

Is that immediately after executing the exec() call?

It seem so, but on further investigation I believe that the intended program is getting to
some point and either failing or locking the system. Not sure how to prove either.

A few seconds after the exec()?
A random amount of time after the exec()?
When you press a key after the exec()?
When you just sit and wait long enough?
None of the above?

None of the above.

What I am trying to ask is whether the screen going blank seems to be an
immediate reaction to something that you do, or does it always happen a
fixed amount of time after something you do, or does it perhaps seem to
happen after a completely unpredictable amount of time while you just
sit and stare at the monitor?

Always happen a fixed amount of time after the execl() command. Reason why I started with
this command as the problem. After the execl() command I’m unable to debug any further.


BTW Forgive me if I sound a bit impatient, but you must understand that
all I know about your software is what you have told me. Try to be
careful what you say and how you say it – English is not my first
language and I may not interpret things like “Exit the wd session screen
goes blank” the way you intended. And we don’t want to waste time
trying to investigate things that don’t exist, do we? > :wink:

Don’t worry about it. I’m sure we will figure this out. I’ll try to improve how I describe
the different items. English as a written language is very difficult at the best of times.
Trying to be technical with correct english and so someone else understands ain’t easy some
times. :wink:

By “blank”, do you mean the screen is completely black? Is there a mouse
cursor? Can you move it?

By “blank”, I mean the screen is completely black, no movement of anything cause
there is nothing there. From a remote station telnet session using the sin
command. It would appear that the chipsbios.ms and display drive have been closed.
They do not sowup on the list of acctive process.

Is there a traceinfo entry mentioning them?

Not sure what you are asking!

“traceinfo” is a QNX utility that gives you a log of important events
that happened in your system recently. Crashes are listed among them.
If your graphics driver crashes, there will be an entry in the traceinfo
log.

That would be nice, just how do I look at this information from the traceinfo.


There are four sin output files attached.

sinbefore - Is during the debug session just before the execl() is executed.

Why am I seeing two sets of WD and “userintfapp” in there? Were you
running two WD sessions? Are they related? It would make it easier to
analyze the output from “sin” if there were as few irrelevant processes
running on your machine as possible.

The first session is initial start of testing. The second is a fork() just prior to the
execl(). So yes there are two WD sessions running. I have got as few processes running
as I now how at this time.

sintaskcomplete - Is right after the execl() command is exicuted and the screen
goes blank.

Don’t you mean after execl() but before the screen goes blank? The
graphics driver in still running at this point.

I mean that after I execute the execl() command the WD sreen shows task complete at the
bottom of the screen. At the remote station the sin command does nothing until I exit
the WD at the host.

That’s because youre running it locally in a telnet session, and the WD
problem that I mentioned before prevents “sin” from running. But if you
run "sin -n " from another node, it will work immediately. This
machine is connected to a QNX network, isn’t it?

BTW But the second WD is not, neither is the second “userintfapp”. Did
you exit from the second WD without executing the execl() call?

You exit the second WD after the execl() command is executed and the task complete
message is displayed.

Now this is getting a bit too complicated for me. From what you have
said so far, this is how I imagine what is happening:

You start “wd userintfapp” in a pterm and run it until it forks.

Then, you find the forked child’s pid and attach a new WD to it; you
leave the first WD alone and only play with the second WD from now
on. When you’re talking about “the” WD, you’re referring to this
second WD and not the first one.

Yes you are correct so far. Helpful future hint WD1 (first wd session) WD2 (second wd
session).

Under the second WD, you let the child call exec(). WD says “task
terminated”. From now on, Photon behaves more or less normally
until ,

I exit WD2, because the task is said to be complete. In the QNX4.24 world when I get the task
complete and exit WD2 the diagapp continues to run and the diagnostics works correctly. It
would seem that on the QNX4.25 system diagapp fails for some reason. I cannot debug past the
execl() so it shows up as the problem. The problem maybe in something that the diagapp is
trying to do.

but you can’t run things like
“sin” in your telnet session until you exit from the (second) WD
(that’s what I call “the WD/Proc problem”).

Once you exit from the (second) WD, “sin” runs. From its output, I
can see that the graphics driver is still alive at this point, and
“diagapp” is doing some file I/O.

This could be true, but I see nothing at this point.

After ,

Not sure what to insert at this point. It does appear that “diagapp has died and turned into
a zombie, and that neither the graphics driver is running”. At this point I’m lost as to
what to do next. I do not have a graphics driver, the screen is blank and diagapp is dead.

the screen turns blank. When you
run sin, it shows that “diagapp” has died and turned into a zombie,
and that neither the graphics driver is running.

If the above is correct, could you fill in the blanks? If not, could you
please give me the exact scenario in at least as much detail as the
above?

blankscreen - Is when the blank screen is displayed.

Yes, the graphics driver and the mode switcher are missing from this
one, and process 1122 “diagapp” has turned into a zombie. And these are
the only differences between the two logs – both the “userintfapp” and
the WD are still running. This doesn’t seem consistent with my
assumption that it’s exiting from WD that makes the screen go blank.
What exactly did you do between the previous “sin” and this one?
Or does the graphics driver die simply because you’re letting “diagapp”
run for a while, without having to touch the keyboard or the mouse?

I exited the WD after execl(). This allowed the ‘sintaskcomplete’ to complete. The
screen goes blank and then I did a blankscreen. The reason was to show the difference
just before and just after blank screen condision.

Didn’t you just say that the screen goes blank “during WD”?
You’re not just trying to confuse me, are you? > :wink:

Sorry your correct I have exited the WD2 then the screen goes blank. :slight_smile:

sinafterreset - Is after the reset process (alt-del-shift-backspace) and I have
returned to a
normal screen.

I can’t explain how the Ctrl-Shift-Alt-Bkspace can do anything to the
screen if the mode switcher is already dead. But it does bring you back
to text mode, with the shell prompt and any previous shell commands and
their output visible, correct?

Yes!

There is only one explanation I can think of at this point: the blank
screen is already in text mode, but you’re looking at an empty console.
The Ctrl-Alt-Shift-BkSp shuts down your Photon, but also causes a
text-mode console switch to your first console.

To see whether that is the case, run a shell on every text-mode console
before starting Photon. This way, there will be something on every
text-mode console, and you’ll be able to distinguish between a
graphics-mode blank screen and text mode.

BTW If you run “crttrap start” from your telnet session when the screnn
is blank, it should restart the graphics driver. It might be
interesting to see what is going on in Photon before you kill it…

Did the “crttrap start” idea and I got the photon backup. A sin command shows that the
diagapp is dead. So this is where the problem must be. Diagapp should continue to run until
the completion of the diagnostic test. How do I start a WD3 after the execl() so I can find
out where diagapp is being killed.


Can you tell me more about this “diagapp” program:

diagapp is a diagnostic application that could possible kill chipsbios.ms and Pg.chips
so that a new display screen size could be generated. Is there a way to stop the

Uh… Isn’t it then possible that your “diagapp” indeed kills
chipsbios.ms and Pg.chips, and then dies? Why didn’t you mention before
that it can do that?

Yes it is possible. I did not mention it because I was not aware of where the problem was or
is. You must remmeber that it works fine on a QNX4.24 system and does not work on a QNX4.25.
So I’m trying to determine where in the whole application the failure maybe. This application
is a proven customer code of very large size.

execl() so that a debug session could be started to determine where in diagapp the
failure accures?

You could have a command-line option of environment variable causing
“diagapp” to call raise(SIGSTOP) at startup – this will let you attach
a WD to it.

Do not now how to do this.


Or, you could have a command-line option to “userintfapp”
that makes it run “wd diagapp …” instead of just “diagapp …”.

I do not think this would be possible under the present structure of the code.

Can you run it from a pterm, or does it have to be execed by
“userintfapp”? If you run it from a pterm, does it also cause
problems? Can you run it under the debugger?

You need to runn userintfapp to set the correct params for the test to be run.

Does it do any drawing on the screen? Is it using any Pg calls, or
is all the drawing done by widgets (other than PtRaw)?

It could do at different points, but I have no way to determine if I ever get close to
any of that could.

You could fprintf some messages to a file… Or run diagapp under WD
the way I described above.

Is “userintfapp” also a Photon application?

It needs to be run uder Photon to run the testing.

What I meant was does it make any Photon library calls – does it
perhaps create any widgets?

NO.

This shouldn’t really matter, unless there’s
a bug somewhere that makes the graphics driver crash when a Photon app
execs another Photon app. But we don’t know at this point whether the
driver crashes or gets killed, do we…

If I get in to diagapp far enough it is possible that the drive is getting killed.

By the way, I have inherited this code from people that are no longer at the company. So
there are great parts of this code that I do not lknow very well. This makes it hard to know
just what diagapp does at what time.

Sure hope this helps. Sounds to me we have narrowed it down to something in diagapp.


Wojtek Lerch (> wojtek@qnx.com> ) QNX Software Systems Ltd.

Previously, John Parsons wrote in qdn.public.qnx4:

By “blank”, I mean the screen is completely black, no movement of anything cause
there is nothing there.

I have seen something similiar before with Photon and WD. If you are
running wd against a photon app you must be EXTREMELY CAREFULL that you
DO NOT single-step into any of the photon shared library functions. As
soon as you single-step into one of them (which essentially sets a
breakpoint inside the shared lib) you impact ALL photon applications.
This always resulted in a blank screen for me, and a locked system that
needed to be rebooted.

Hope that helps.

Cheers,
Camz.


Martin Zimmerman camz@passageway.com
Camz Software Enterprises www.passageway.com/camz/qnx/
QNX Programming & Consulting

John Parsons <parsonsj@esi.com> wrote:

Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
Tryed a telnet from another station to the station running the wd
application> to the point of the execl()command. Do the execl() command the
telnet cannot connect to the remote site. At the bottom of the wd screen is
‘task complete’. Exit the wd session screen goes blank. No display of any
error condition. Do a sin from remote station, the initial process is held
and the child is (zombie) dead.

What I am trying to ask is whether the screen going blank seems to be an
immediate reaction to something that you do, or does it always happen a
fixed amount of time after something you do, or does it perhaps seem to
happen after a completely unpredictable amount of time while you just
sit and stare at the monitor?

Always happen a fixed amount of time after the execl() command. Reason why I started with
this command as the problem. After the execl() command I’m unable to debug any further.

Of course not. Were you expecting WD to automatically load “diagapp”
for you? I don’t think I have ever thought of trying something like
that, but it wasn’t possible in QNX 2.24 either, was it?

Is there a traceinfo entry mentioning them?
Not sure what you are asking!
“traceinfo” is a QNX utility that gives you a log of important events
that happened in your system recently. Crashes are listed among them.
If your graphics driver crashes, there will be an entry in the traceinfo
log.

That would be nice, just how do I look at this information from the traceinfo.

“traceinfo” is a program. Just run it.

Under the second WD, you let the child call exec(). WD says “task
terminated”. From now on, Photon behaves more or less normally
until ,

I exit WD2, because the task is said to be complete. In the QNX4.24 world when I get the task
complete and exit WD2 the diagapp continues to run and the diagnostics works correctly. It
would seem that on the QNX4.25 system diagapp fails for some reason. I cannot debug past the
execl() so it shows up as the problem. The problem maybe in something that the diagapp is
trying to do.

Yes. My suspicion is that it kills your driver and then either exits or
crashes.

but you can’t run things like
“sin” in your telnet session until you exit from the (second) WD
(that’s what I call “the WD/Proc problem”).

Once you exit from the (second) WD, “sin” runs. From its output, I
can see that the graphics driver is still alive at this point, and
“diagapp” is doing some file I/O.

This could be true, but I see nothing at this point.



After ,

Not sure what to insert at this point. It does appear that "diagapp has died and turned into

Something along the lines of “After I execute the execl() in WD2 and
wait for about ten seconds” would suffice…

You could have a command-line option of environment variable causing
“diagapp” to call raise(SIGSTOP) at startup – this will let you attach
a WD to it.
Do not now how to do this.
Or, you could have a command-line option to “userintfapp”
that makes it run “wd diagapp …” instead of just “diagapp …”.

I do not think this would be possible under the present structure of the code.

Would it be impossible to simply replace the call

execl( path, “diagapp”, options, NULL )

with

execlp( “wd”, “wd”, path, options, NULL )

This way, userintfapp will run the WD3 for you.

If you want to be able to turn this change on and off without
recompiling userintfapp, an environment variable should do the trick:

if ( getenv( “RUN_DIAGAPP_IN_WD” ) )
execlp( “wd”, “wd”, path, options, NULL )
else
execl( path, “diagapp”, options, NULL )

Once you have “diagapp” in WD3, set a breakpoint on the code that kills
the driver and let it run…

If I get in to diagapp far enough it is possible that the drive is getting killed.

That’s my theory of what happens…

Sure hope this helps. Sounds to me we have narrowed it down to something in diagapp.

It certainly looks that way.

\

Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.

Wojtek Lerch wrote:

John Parsons <> parsonsj@esi.com> > wrote:
Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
Wojtek Lerch wrote:
John Parsons <> parsonsj@esi.com> > wrote:
Tryed a telnet from another station to the station running the wd
application> to the point of the execl()command. Do the execl() command the
telnet cannot connect to the remote site. At the bottom of the wd screen is
‘task complete’. Exit the wd session screen goes blank. No display of any
error condition. Do a sin from remote station, the initial process is held
and the child is (zombie) dead.

What I am trying to ask is whether the screen going blank seems to be an
immediate reaction to something that you do, or does it always happen a
fixed amount of time after something you do, or does it perhaps seem to
happen after a completely unpredictable amount of time while you just
sit and stare at the monitor?

Always happen a fixed amount of time after the execl() command. Reason why I started with
this command as the problem. After the execl() command I’m unable to debug any further.

Of course not. Were you expecting WD to automatically load “diagapp”
for you? I don’t think I have ever thought of trying something like
that, but it wasn’t possible in QNX 2.24 either, was it?

I never tryed it in 4.24, but I would think that it wasn’t possible.

Is there a traceinfo entry mentioning them?
Not sure what you are asking!
“traceinfo” is a QNX utility that gives you a log of important events
that happened in your system recently. Crashes are listed among them.
If your graphics driver crashes, there will be an entry in the traceinfo
log.

That would be nice, just how do I look at this information from the traceinfo.

“traceinfo” is a program. Just run it.

Cool. I like that one, I’ll have to work with it to decode what it is telling me.


Under the second WD, you let the child call exec(). WD says “task
terminated”. From now on, Photon behaves more or less normally
until ,

I exit WD2, because the task is said to be complete. In the QNX4.24 world when I get the task
complete and exit WD2 the diagapp continues to run and the diagnostics works correctly. It
would seem that on the QNX4.25 system diagapp fails for some reason. I cannot debug past the
execl() so it shows up as the problem. The problem maybe in something that the diagapp is
trying to do.

Yes. My suspicion is that it kills your driver and then either exits or
crashes.

Yes, I’m almost sure now that it does this.

but you can’t run things like
“sin” in your telnet session until you exit from the (second) WD
(that’s what I call “the WD/Proc problem”).

Once you exit from the (second) WD, “sin” runs. From its output, I
can see that the graphics driver is still alive at this point, and
“diagapp” is doing some file I/O.

This could be true, but I see nothing at this point.



After ,

Not sure what to insert at this point. It does appear that "diagapp has died and turned into

Something along the lines of “After I execute the execl() in WD2 and
wait for about ten seconds” would suffice…

Okay, “After I execute the execl() the screen goes blank almost immediately”.

Oh by the way, I have continued to play with debugging the problem further. If I run crttrap start
from the telnet session I get the photon screen back as I have said. I end up back at WD1.
Pressing the ‘F5’ key will run WD1 to ‘task complete’. On the pterm screen there is and indication
of error conditions as follows.


Wojtek,

Will writting the above I realized that the error condition that was being displayed was the answer
to the problem. On a QNX4.24 system the ‘chipsbios.ms’ and ‘Pg.chips’ live in ‘/usr/photon/bin/crt’
and ‘/usr/photon/bin’. On a QNX4.25 system chipsbios.ms and Pg.chips lives in
‘/qnx4/graphics/drivers’. In the code I have been working on someone had hard coded the location of
chipsbios.ms and Pg.chips. So the program could not find them from one O/S to the other. By
correcting the location request the diagnostics now works.

I would like to thank you for all your help in this matter. I’m still finding ways to debug code on
a QNX system and your help has been great.

With best Regards
John