Nasty looking Photon bug

I’ve just spent two days tracking down a very nasty
bug that caused a fairly mature program to crash.
The interesting fact is that while the program runs
fine under Photon 1.13, it fails under 1.14.
I’ve also isolated the problem to being in one
of the dynamic libraries, phlib_render_11, phlib_s11
or Aplib_s11. Most likely is is the Aplib_s11 as
you will see.

I’ve distilled the problem down to a very time
workable program which is uploaded to my
quics “maschoen” account as the file edit.pax.F.

Here is the jist of the problem. The following
subroutine is called. The widget data that “button”
points to is saved after the ApCreateWidget() call.
After the second ApCreateWidget() call is made, the
save data is compared with “button” again and found
to be different.

I hope that this is actually something simple and that
someone can point out my dumb mistake, or else report
that a fix is already on the way.

Thanks



void init_widgets()
{
ApDBase_t *widget_dbase;

if ((widget_dbase = ApOpenDBase( ABM_button_picture )) == NULL)
{
fprintf(stderr,“edit: ApOpenDBase open failed 1\n”);
exit(-1);
}

if ((button = ApCreateWidget(widget_dbase,“main_button”,0,0,0, NULL))
== NULL)
{
fprintf(stderr,“ApCreateWidget failed 1\n”);
exit(-1);
}

if ((label1=ApCreateWidget(widget_dbase,“but_label1_lbl”,0,0,0,NULL))
== NULL)
{
fprintf(stderr,“ApCreateWidget failed 2”);
exit(-1);
}
}

not sure if this is related, but something to always watch
for is that your parent widget is set before you create a
new widget… either from a database or not.

e.g.

PtSetParentWidget( parent );

widget = ApCreateWidget( blah…)

i’ve found that unless you explicitly set a parent then you could
have behaviour that you don’t expect.

the technique you are using (a widget database) is so so common that i
would hope there would be no bugs in there… at least none that i have
ever seen. but likely in all of our code we ensure that a parent widget
is always set.

Mitchell Schoenbrun <maschoen@pobox.com> wrote:

I’ve just spent two days tracking down a very nasty
bug that caused a fairly mature program to crash.
The interesting fact is that while the program runs
fine under Photon 1.13, it fails under 1.14.
I’ve also isolated the problem to being in one
of the dynamic libraries, phlib_render_11, phlib_s11
or Aplib_s11. Most likely is is the Aplib_s11 as
you will see.

I’ve distilled the problem down to a very time
workable program which is uploaded to my
quics “maschoen” account as the file edit.pax.F.

Here is the jist of the problem. The following
subroutine is called. The widget data that “button”
points to is saved after the ApCreateWidget() call.
After the second ApCreateWidget() call is made, the
save data is compared with “button” again and found
to be different.

I hope that this is actually something simple and that
someone can point out my dumb mistake, or else report
that a fix is already on the way.

Thanks



void init_widgets()
{
ApDBase_t *widget_dbase;

if ((widget_dbase = ApOpenDBase( ABM_button_picture )) == NULL)
{
fprintf(stderr,“edit: ApOpenDBase open failed 1\n”);
exit(-1);
}

if ((button = ApCreateWidget(widget_dbase,“main_button”,0,0,0, NULL))
== NULL)
{
fprintf(stderr,“ApCreateWidget failed 1\n”);
exit(-1);
}

if ((label1=ApCreateWidget(widget_dbase,“but_label1_lbl”,0,0,0,NULL))
== NULL)
{
fprintf(stderr,“ApCreateWidget failed 2”);
exit(-1);
}
}


Randy Martin randy@qnx.com
Manager of FAE Group, North America
QNX Software Systems www.qnx.com
175 Terence Matthews Crescent, Kanata, Ontario, Canada K2M 1W8
Tel: 613-591-0931 Fax: 613-591-3579

Mitchell Schoenbrun <maschoen@pobox.com> wrote:

I’ve just spent two days tracking down a very nasty
bug that caused a fairly mature program to crash.
The interesting fact is that while the program runs
fine under Photon 1.13, it fails under 1.14.
I’ve also isolated the problem to being in one
of the dynamic libraries, phlib_render_11, phlib_s11
or Aplib_s11. Most likely is is the Aplib_s11 as
you will see.

I’ve distilled the problem down to a very time
workable program which is uploaded to my
quics “maschoen” account as the file edit.pax.F.

Here is the jist of the problem. The following
subroutine is called. The widget data that “button”
points to is saved after the ApCreateWidget() call.
After the second ApCreateWidget() call is made, the
save data is compared with “button” again and found
to be different.

When you’re creating the second widget, it becomes the brother of the
first one, and the library changes button->brother from NULL to the new
widget’s address.

I hope that this is actually something simple and that
someone can point out my dumb mistake, or else report
that a fix is already on the way.

Your mistake was to assume that the data that a widget pointer points to
doesn’t change unless you set that widget’s resources.

void init_widgets()
{
ApDBase_t *widget_dbase;

if ((widget_dbase = ApOpenDBase( ABM_button_picture )) == NULL)
{
fprintf(stderr,“edit: ApOpenDBase open failed 1\n”);
exit(-1);
}

if ((button = ApCreateWidget(widget_dbase,“main_button”,0,0,0, NULL))
== NULL)
{
fprintf(stderr,“ApCreateWidget failed 1\n”);
exit(-1);
}

if ((label1=ApCreateWidget(widget_dbase,“but_label1_lbl”,0,0,0,NULL))
== NULL)
{
fprintf(stderr,“ApCreateWidget failed 2”);
exit(-1);
}
}


Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.

Previously, Randy Martin wrote in qdn.public.qnx4.photon:

not sure if this is related, but something to always watch
for is that your parent widget is set before you create a
new widget… either from a database or not.

e.g.

PtSetParentWidget( parent );

Randy,

Thanks for the feedback. The PtSetParentWidget()
occurs in a routine before this is called. I agree
that it is unlikely that the code is generally bad.
I suspect rather that something about the way the
widgets are configured in the database might cause
this. Of course there is no way for me to track this
down which is why I posted it.

The problem is confirmed on at least three
computers of each type. 3 with 1.13 work and
3 with 1.14 fail. That is for the full fledged
bug. I haven’t tested the diminitive version as
extensively yet.

Mitchell

Mitchell Schoenbrun --------- maschoen@pobox.com

Previously, Wojtek Lerch wrote in qdn.public.qnx4.photon:

When you’re creating the second widget, it becomes the brother of the
first one, and the library changes button->brother from NULL to the new
widget’s address.

Dang, back to the drawing board. I’m still seeing the problem
occur only in 1.14 and not in 1.13. I’ll be back. :frowning:.

Mitchell Schoenbrun --------- maschoen@pobox.com

Mitchell Schoenbrun <maschoen@pobox.com> wrote:

Previously, Wojtek Lerch wrote in qdn.public.qnx4.photon:

When you’re creating the second widget, it becomes the brother of the
first one, and the library changes button->brother from NULL to the new
widget’s address.

Dang, back to the drawing board. I’m still seeing the problem
occur only in 1.14 and not in 1.13. I’ll be back. > :frowning:> .

I really have no idea why you didn’t see it in 1.13…


Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.

Previously, Wojtek Lerch wrote in qdn.public.qnx4.photon:

I really have no idea why you didn’t see it in 1.13…

I did not test this program in 1.13. Stupid of me I guess.
The real program is huge and is part of an ensamble of
programs with lots of associated .bmp files. All told
the thing weighs in at around 7Meg.

I’ve distilled it down to a single program of about 200
lines, and three .bmp files. I’ve uploaded it to my
account on QUICS “maschoen”.

It aborts in the ozone somewhere, not in my code. It
works perfectly in 1.13, but fails under both 1.14 and
1.14A.

I’d like the make the problem even more obvious, but the
failure is somewhat delicate in the current abreviated
state. There are any number of minor things that can be
removed from the code that will cause the abort to go away.
There probably over a dozen different single lines that can
be removed that cause the problem to hide. I’ve tried
almost every combination to see what could and what could
not be removed as to make it as easy as possible for you
to track down.

I’ve spent three days getting this down to this state. If
only it were an error on my part I’d be quite happy but
I don’t think I have any way to find it at this point.

Thank you for any assistance that you can render.

Mitchell

Mitchell Schoenbrun --------- maschoen@pobox.com

Once compiled, you run the program and press the OK button.
Under 1.13 nothing happens, which is good. Under 1.14 the
program crashes and the window disappears.


Previously, Mitchell Schoenbrun wrote in qdn.public.qnx4.photon:

Previously, Wojtek Lerch wrote in qdn.public.qnx4.photon:

I really have no idea why you didn’t see it in 1.13…

I did not test this program in 1.13. Stupid of me I guess.
The real program is huge and is part of an ensamble of
programs with lots of associated .bmp files. All told
the thing weighs in at around 7Meg.

I’ve distilled it down to a single program of about 200
lines, and three .bmp files. I’ve uploaded it to my
account on QUICS “maschoen”.

It aborts in the ozone somewhere, not in my code. It
works perfectly in 1.13, but fails under both 1.14 and
1.14A.

I’d like the make the problem even more obvious, but the
failure is somewhat delicate in the current abreviated
state. There are any number of minor things that can be
removed from the code that will cause the abort to go away.
There probably over a dozen different single lines that can
be removed that cause the problem to hide. I’ve tried
almost every combination to see what could and what could
not be removed as to make it as easy as possible for you
to track down.

I’ve spent three days getting this down to this state. If
only it were an error on my part I’d be quite happy but
I don’t think I have any way to find it at this point.

Thank you for any assistance that you can render.

Mitchell

Mitchell Schoenbrun --------- > maschoen@pobox.com


Mitchell Schoenbrun --------- maschoen@pobox.com

Mitchell Schoenbrun <maschoen@pobox.com> wrote:

Once compiled, you run the program and press the OK button.
Under 1.13 nothing happens, which is good. Under 1.14 the
program crashes and the window disappears.

It seems that the crash can be prevented by adding

back_button_ready_img->flags &= ~Ph_RELEASE_TRANSPARENCY_MASK;

after

PhMakeTransBitmap(back_button_ready_img,Pg_WHITE);

I think the problem is that you’re giving this image to the same widget
several times. The second time around, the widget frees the transparecy
bitmap, and the next time it tries to free it again, which corrupts the
heap.

Again, I’m not sure why it didn’t crash on 1.13. From what I’ve heard,
Watcom’s free() usually tolerates attempts to free the same memory twice
– maybe the pattern of malloc() and free() calls that the 1.14 library
makes in this case is just unlucky…


Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.

That apparently was the problem. Thank you for the assistance.
Might I suggest that a reasonable kindness on the part of
the API routines would be to only try to release the
transparency bitmap once?

Previously, Wojtek Lerch wrote in qdn.public.qnx4.photon:

Mitchell Schoenbrun <> maschoen@pobox.com> > wrote:
Once compiled, you run the program and press the OK button.
Under 1.13 nothing happens, which is good. Under 1.14 the
program crashes and the window disappears.

It seems that the crash can be prevented by adding

back_button_ready_img->flags &= ~Ph_RELEASE_TRANSPARENCY_MASK;

after

PhMakeTransBitmap(back_button_ready_img,Pg_WHITE);

I think the problem is that you’re giving this image to the same widget
several times. The second time around, the widget frees the transparecy
bitmap, and the next time it tries to free it again, which corrupts the
heap.

Again, I’m not sure why it didn’t crash on 1.13. From what I’ve heard,
Watcom’s free() usually tolerates attempts to free the same memory twice
– maybe the pattern of malloc() and free() calls that the 1.14 library
makes in this case is just unlucky…


Wojtek Lerch (> wojtek@qnx.com> ) QNX Software Systems Ltd.


Mitchell Schoenbrun --------- maschoen@pobox.com

Mitchell Schoenbrun <maschoen@pobox.com> wrote:

Might I suggest that a reasonable kindness on the part of
the API routines would be to only try to release the
transparency bitmap once?

The RELEASE flags define a manual method of controlling which copy of a
PhImage_t structure will free the various pieces of image data when
given to PhReleaseImage(). The API defines it to be manual, and there’s
no way of changing that without inventing a new API, incompatible with
the existing PhImage_t structure. I agree that it would be nicer to
have some kind of a reference-counting mechanism allowing a widget to
know whether the image is still in use by other widgets. Unfortunately,
there’s no way PhImage_t can be that mechanism.

The way things are, it’s your responsibility to avoid creating two
copies of a PhImage_t that have the same REEASE flag set. If you are
creating multiple copies – for instance, by giving the image to
different widgets, or to the same widgets several times – you have to
make sure that the one that has the flag is the last to be destroyed (or
replaced by a new image). If you have no way of knowing that, clear the
flag for all widgets, and call PhReleaseImage() all are gone. Or just
keep the image forever – but make sure that this doesn’t create a
serious memory leak.


\

Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.

Previously, Wojtek Lerch wrote in qdn.public.qnx4.photon:

So where is it that I’m calling PhReleaseImage()? I’m not
doing it directly from my code.


Mitchell Schoenbrun --------- maschoen@pobox.com

Previously, Wojtek Lerch wrote in qdn.public.qnx4.photon:

Mitchell Schoenbrun <> maschoen@pobox.com> > wrote:
Might I suggest that a reasonable kindness on the part of
the API routines would be to only try to release the
transparency bitmap once?

The RELEASE flags define a manual method of controlling which copy of a
PhImage_t structure will free the various pieces of image data when
given to PhReleaseImage(). The API defines it to be manual, and there’s
no way of changing that without inventing a new API, incompatible with
the existing PhImage_t structure.

I think that there is an easy solution here, Wojtek. When
PhReleaseImage releases any memory, it also sets the appropriate
pointer in the PhImage_t structure to NULL. Now, at least, it will
behave as if there is no transparency mask, and there will be no
crash. Same goes for any other image data, I would think. Am I
missing something?

Andrew

Mitchell Schoenbrun <maschoen@pobox.com> wrote:

Previously, Wojtek Lerch wrote in qdn.public.qnx4.photon:

So where is it that I’m calling PhReleaseImage()? I’m not
doing it directly from my code.

The widget does it when it’s being destroyed, or when you give it a new
image. Essentially, the RELEASE flags are a way of telling the widget,
“oh, and by the way, free these when you no longer need the image”.

\

Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.

Andrew Thomas <Andrew@cogent.ca> wrote:

Previously, Wojtek Lerch wrote in qdn.public.qnx4.photon:
Mitchell Schoenbrun <> maschoen@pobox.com> > wrote:
Might I suggest that a reasonable kindness on the part of
the API routines would be to only try to release the
transparency bitmap once?

The RELEASE flags define a manual method of controlling which copy of a
PhImage_t structure will free the various pieces of image data when
given to PhReleaseImage(). The API defines it to be manual, and there’s
no way of changing that without inventing a new API, incompatible with
the existing PhImage_t structure.

I think that there is an easy solution here, Wojtek. When
PhReleaseImage releases any memory, it also sets the appropriate
pointer in the PhImage_t structure to NULL. Now, at least, it will
behave as if there is no transparency mask, and there will be no
crash. Same goes for any other image data, I would think. Am I
missing something?

Yes. Each widget allocates its own private copy of the PhImage_t
structure. (That’s how each can have a different set of the RELEASE
flags.) Even though PhReleaseImage() does set its pointers to NULL,
it’s basically a waste of time in this case – it doesn’t help other
widgets, and the structure is about to get freed anyway…

\

Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.

Well if there is no programatic way to prevent this
very troublesome type of crash, I think it should be
made up for in documenation. While I don’t doubt that
this potential condition is mentioned somewhere, I
think it should be more prominent.


Previously, Wojtek Lerch wrote in qdn.public.qnx4.photon:

Mitchell Schoenbrun <> maschoen@pobox.com> > wrote:
Previously, Wojtek Lerch wrote in qdn.public.qnx4.photon:

So where is it that I’m calling PhReleaseImage()? I’m not
doing it directly from my code.

The widget does it when it’s being destroyed, or when you give it a new
image. Essentially, the RELEASE flags are a way of telling the widget,
“oh, and by the way, free these when you no longer need the image”.

\

Wojtek Lerch (> wojtek@qnx.com> ) QNX Software Systems Ltd.


Mitchell Schoenbrun --------- maschoen@pobox.com

Mitchell Schoenbrun <maschoen@pobox.com> wrote:

Well if there is no programatic way to prevent this
very troublesome type of crash, I think it should be
made up for in documenation. While I don’t doubt that
this potential condition is mentioned somewhere, I
think it should be more prominent.

It might be too late to change this in 1.14, but in Photon 2, the
description of the Pt_ARG_LABEL_IMAGE resource says:

Set the flags member of the PhImage_t structure to:

Ph_RELEASE_IMAGE | Ph_RELEASE_PALETTE |
Ph_RELEASE_TRANSPARENCY_MASK | Ph_RELEASE_GHOST_BITMAP

before providing the image to the widget. If you do this, the memory
used for the image is released when image is replaced or the widget
is unrealized or destroyed.

BTW I think “unrealized” is a mistake…


Wojtek Lerch (wojtek@qnx.com) QNX Software Systems Ltd.