memory leak detection

Hello,
I have a nasty bug that appears to kill Proc32. My system (QNX 4.25
Patch E) can run fine for at least 18 hours, but then gets into a
condition where the OS dies if:

  1. My program enters a certain portion of code. That code includes a
    malloc but the malloc seems to successfully complete. It might be
    croaking on a timed select() call or an interrupt handler, although both
    are handled many, many times without any problem.
  2. I run the “sysmon” utility on another console.
  3. I log in on another console.

I suspect that there’s some kind of memory leak, so I was hoping that
someone could tell me about tools for finding leaks in QNX 4.25. Any
help would be very much appreciated.

Thanks,

Mark Faust

P.S. I have the QNX 2000 conference notes that Ian Zagorskih mentioned
in the devtools news group on 7/23, but I have not fully digested them
yet.

Mark Faust <mark_faust@sri.com> wrote:

Hello,
I have a nasty bug that appears to kill Proc32. My system (QNX 4.25
Patch E) can run fine for at least 18 hours, but then gets into a
condition where the OS dies if:

The OS dies… do you get a Proc register dump? If so, can you post it.
If not, can you describe the symptoms of the condition after failure.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

The OS dies… do you get a Proc register dump? If so, can you post it.
If not, can you describe the symptoms of the condition after failure.

I do get a Proc register dump. I will have to disable the watchdog
timer and wait for another crash before I can post the dump from one of
the current batch of crashes, but here’s what I wrote down from an
earlier dump (this one occurred when I tried to login):

ver 4.25L
ldt 0 fault e+0
cs:eip=f0:89c1 ss:esp=f8:1460 ef1=12097
ds=f8 es=f8 fs=0 gs=0
Stack(f8.1424)

Thanks

Mark Faust <mark_faust@sri.com> wrote:

The OS dies… do you get a Proc register dump? If so, can you post it.
If not, can you describe the symptoms of the condition after failure.

I do get a Proc register dump. I will have to disable the watchdog
timer and wait for another crash before I can post the dump from one of
the current batch of crashes, but here’s what I wrote down from an
earlier dump (this one occurred when I tried to login):

I always like to clarify – when a customers says “The OS dies” it can
mean anything from Proc dump, to a process spinning at a high priority…
some of which are OS issues, and some of which are the OS doing exactly
what it is supposed to, but not what was expected.

A Proc register dump definitely qualifies as “OS died” from our
point of view.

ver 4.25L
ldt 0 fault e+0
cs:eip=f0:89c1 ss:esp=f8:1460 ef1=12097
ds=f8 es=f8 fs=0 gs=0
Stack(f8.1424)

A full dump next time would probably be helpful… I don’t read Proc
dumps myself… but someone might be able to tell you something about
that one…

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

It may take a while to get another Proc dump. In the meantime, do you
have any suggestions for my original question about tools for finding
memory leaks. I am looking for actual purpose designed tools for that
or combinations of regular system utilities that can help identify
leaks, anything would help.

Thanks,

Mark

David Gibbs wrote:

Mark Faust <> mark_faust@sri.com> > wrote:
The OS dies… do you get a Proc register dump? If so, can you post it.
If not, can you describe the symptoms of the condition after failure.

I do get a Proc register dump. I will have to disable the watchdog
timer and wait for another crash before I can post the dump from one of
the current batch of crashes, but here’s what I wrote down from an
earlier dump (this one occurred when I tried to login):

I always like to clarify – when a customers says “The OS dies” it can
mean anything from Proc dump, to a process spinning at a high priority…
some of which are OS issues, and some of which are the OS doing exactly
what it is supposed to, but not what was expected.

A Proc register dump definitely qualifies as “OS died” from our
point of view.

ver 4.25L
ldt 0 fault e+0
cs:eip=f0:89c1 ss:esp=f8:1460 ef1=12097
ds=f8 es=f8 fs=0 gs=0
Stack(f8.1424)

A full dump next time would probably be helpful… I don’t read Proc
dumps myself… but someone might be able to tell you something about
that one…

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

There are lots of malloc replacements available on the net to detect leaks.
You could invent your own which wouldn’t be too bad. Guard areas, reference
counts on malloc/free etc etc etc.

The proc dump you sent in doesn’t seem to be bombing in Proc itself (which
is strange), and it looks to be on reference(read) to a non-existant page.
It may be related to your interrupt handler and what it’s attempting to
touch. Does the system crash w/o your application running in 18hrs? If you
change your interrupt handler to just return a proxy (or nothing), touching
nothing outside of itself, and doing nothing else, does it still crash?

\

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

“Mark Faust” <mark_faust@sri.com> wrote in message
news:3D484237.F691930C@sri.com

It may take a while to get another Proc dump. In the meantime, do you
have any suggestions for my original question about tools for finding
memory leaks. I am looking for actual purpose designed tools for that
or combinations of regular system utilities that can help identify
leaks, anything would help.

Thanks,

Mark

David Gibbs wrote:

Mark Faust <> mark_faust@sri.com> > wrote:
The OS dies… do you get a Proc register dump? If so, can you post
it.
If not, can you describe the symptoms of the condition after failure.

I do get a Proc register dump. I will have to disable the watchdog
timer and wait for another crash before I can post the dump from one
of
the current batch of crashes, but here’s what I wrote down from an
earlier dump (this one occurred when I tried to login):

I always like to clarify – when a customers says “The OS dies” it can
mean anything from Proc dump, to a process spinning at a high
priority…
some of which are OS issues, and some of which are the OS doing exactly
what it is supposed to, but not what was expected.

A Proc register dump definitely qualifies as “OS died” from our
point of view.

ver 4.25L
ldt 0 fault e+0
cs:eip=f0:89c1 ss:esp=f8:1460 ef1=12097
ds=f8 es=f8 fs=0 gs=0
Stack(f8.1424)

A full dump next time would probably be helpful… I don’t read Proc
dumps myself… but someone might be able to tell you something about
that one…

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

The proc dump you sent in doesn’t seem to be bombing in Proc itself (which
is strange), and it looks to be on reference(read) to a non-existant page.

It may be related to your interrupt handler and what it’s attempting to
touch. Does the system crash w/o your application running in 18hrs?

Even if the application is running, the system doesn’t crash unless you
do something, like try to log in on a different console or run a program
(it was the sysmon utility, to be precise). Sometimes the application
can go for days and not crash even if we repeatedly subject it to the
action that we originally thought was the cause of the crash.

If you change your interrupt handler to just return a proxy (or nothing), touching
nothing outside of itself, and doing nothing else, does it still crash?

Here’s the code for the interrupt handler. The counter value was just
there for debugging, so we can get rid of that and do the test that you
suggest.

extern pid_t pidInterruptProxy;
extern volatile unsigned counter;

// The hardware interrupt handler
#pragma off( check_stack );
pid_t far DIOInterruptHandler()
{
counter++;
return( pidInterruptProxy );
} // end of DIOInterruptHandler()
#pragma on( check_stack );

A full dump next time would probably be helpful… I don’t read Proc
dumps myself… but someone might be able to tell you something about
that one…

Here’s a full dump from our latest crash

Version 425.L Feb 15 2001
Proc fault 1, ldt 100 sys/Proc32; fault e+0
cs:eip=5:89c1 ss:esp=d:f7c0f50 ef1=12097 ds=d es=d fs=0 gs=0
eax/44b2 ebx/3d496bd3 ecx/48dc4fd edx/1 esi/0 edi/1 ebp/f7c0f5c
Stack (d:f7c0f50)
20d32a0a 048dc4fd ffffff01 0f7c0fa0 00003820 0000a3b6 00008a48 00003820
0000a3b6 00008000 20d32a0a 00008c04 0001dc20 00000000 00000000 00000000
3d496db3 00000001 00000001 00000001 0f7c0fb8 00000000 00005965 0001dc20
00008fd5 00000015 0f7c0fe8 00005965 0000000b 00003822 0f7c0fd0 000057aa
Process entry (addr 6050)
00000000 00000001 00000000 00000001 00000000 00000000 30020207 00001e1e
00005840 0100000d 00006108 ffffffff 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000501 000d0005 00007118 00000000 000000cc
00000022 00000000 00000034 0000c140 00000000 00000000 0001e490 00000000
00000000 00000000 00000000 ffff0001 00000000 00000000 00000000

This looks to be a timer related issue - Although the top portion of the
dump doesn’t match what you had posted previously. The segment selector
makes more sense now - what you should do is contact technical support (if
you have a plan) or your sales rep and you can try out Proc32 version N,
which addresses some problems regarding the timers (insertion/del/etc).

\

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

“Mark Faust” <mark_faust@sri.com> wrote in message
news:3D498BB7.B2B28BE2@sri.com

A full dump next time would probably be helpful… I don’t read Proc
dumps myself… but someone might be able to tell you something about
that one…

Here’s a full dump from our latest crash

Version 425.L Feb 15 2001
Proc fault 1, ldt 100 sys/Proc32; fault e+0
cs:eip=5:89c1 ss:esp=d:f7c0f50 ef1=12097 ds=d es=d fs=0 gs=0
eax/44b2 ebx/3d496bd3 ecx/48dc4fd edx/1 esi/0 edi/1 ebp/f7c0f5c
Stack (d:f7c0f50)
20d32a0a 048dc4fd ffffff01 0f7c0fa0 00003820 0000a3b6 00008a48 00003820
0000a3b6 00008000 20d32a0a 00008c04 0001dc20 00000000 00000000 00000000
3d496db3 00000001 00000001 00000001 0f7c0fb8 00000000 00005965 0001dc20
00008fd5 00000015 0f7c0fe8 00005965 0000000b 00003822 0f7c0fd0 000057aa
Process entry (addr 6050)
00000000 00000001 00000000 00000001 00000000 00000000 30020207 00001e1e
00005840 0100000d 00006108 ffffffff 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000501 000d0005 00007118 00000000 000000cc
00000022 00000000 00000034 0000c140 00000000 00000000 0001e490 00000000
00000000 00000000 00000000 ffff0001 00000000 00000000 00000000

Thank you very much. Is Proc32 version N a beta version? Version L
came with patch E and is that latest released version that I know about.

Mark

“Mark Faust” <mark_faust@sri.com> wrote in message
news:3D49E916.6938CDE6@sri.com

Thank you very much. Is Proc32 version N a beta version? Version L
came with patch E and is that latest released version that I know about.

Proc32 version N is a beta version, and hasn’t been released.


Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>