Compiler generates bad code... or debugger leading me astray

We are now seriously evaluating porting our QNX4 software to QNX6. I
have ported most of it, but now (when I am actually trying to run it) I
am having a very odd problem.

Here is the actual snippet of code in question:

— snip
1 event_msg.class_num = i;
2 LLSCptr =
Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr;
3 if ( LLSCptr != NULL)
4 {
5 unsigned short j;
6 for (j = 0; j
<Low_level_class_ptr[event_msg.class_num].num_sub_class; j++)
7 {
8 event_msg.sub_class_num = j;
9 LLDEVptr = LLSCptr->low_level_device_ptr;
10 {
— snip

If I step through with the debugger, I can print the value of
“Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr”, and
it is correct;
however, the variable LLSCptr does not contain the correct value if I
print it’s value on line 3 (immediately after assignment).

LLSCptr is on the stack, Low_level_class_ptr array is static.

Prior to assignment LLSCptr contains NULL. After assignment it contains
a nonsense address (which of course causes the program to crash on
line 9).

This program is single threaded (so another thread clobbering the stack
is not possible).

Any ideas on why a simple assignment doesn’t work ?

Rennie

David Bacon wrote:

The generic suggestion about optimization does not appear likely to me to be
applicable here, since the variable clearly exists and got something
assigned to it.

I agree.

Nevertheless, optimization could be at the root of some bad
code generation.

That’s what I thought also. I should have mentioned, that I already had
tried with optimizations off (-O0). Maybe -O0 isn’t right ? Please
note that this is C code being compiled with the C++ compiler (for
historical reasons). QCC is used to invoke the compiler.

Could you post the corresponding assembly code, with and without
optimization?

Well, since it doesn’t work without optimizations either, I haven’t
produced assembly with -O0. Here is the assembly with default
optimizations (I’ll follow up later with the assembly with no
optimizations). Looks like LLSCptr is at -44 on the stack, and that the
value of Low_level_class_ptr[].low_level_sub_class_ptr is stored in eax.
Don’t know what the “sall” opcode is for though, if it has buggered
eax then the index operation that supposedly reloads eax with the value
in question, may be wrong.

Line 2 {
movzwl -24(%ebp),%eax
movl %eax,%edx
movl %edx,%eax
sall $5,%eax
movl Low_level_class_ptr,%edx
movl 28(%edx,%eax),%eax
movl %eax,-44(%ebp)
}

Line 3 {
cmpl $0,-44(%ebp)
je .L876
}

Line 2 is the relevant line.

Thanks…

Rennie

“Rennie Allen” <rallen@csical.com> wrote in message
news:3E50A0A1.5050409@csical.com

We are now seriously evaluating porting our QNX4 software to QNX6. I
have ported most of it, but now (when I am actually trying to run it) I
am having a very odd problem.

Have you disable optimisation (-Od I beleive)

It’s quite possible the LLSCptr variable does not exists and is optimized
out.

Here is the actual snippet of code in question:

— snip
1 event_msg.class_num = i;
2 LLSCptr =
Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr;
3 if ( LLSCptr != NULL)
4 {
5 unsigned short j;
6 for (j = 0; j
Low_level_class_ptr[event_msg.class_num].num_sub_class; j++)
7 {
8 event_msg.sub_class_num = j;
9 LLDEVptr = LLSCptr->low_level_device_ptr;
10 {
— snip

If I step through with the debugger, I can print the value of
“Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr”, and
it is correct;
however, the variable LLSCptr does not contain the correct value if I
print it’s value on line 3 (immediately after assignment).

LLSCptr is on the stack, Low_level_class_ptr array is static.

Prior to assignment LLSCptr contains NULL. After assignment it contains
a nonsense address (which of course causes the program to crash on
line 9).

This program is single threaded (so another thread clobbering the stack
is not possible).

Any ideas on why a simple assignment doesn’t work ?

Rennie

The generic suggestion about optimization does not appear likely to me to be
applicable here, since the variable clearly exists and got something
assigned to it. Nevertheless, optimization could be at the root of some bad
code generation.

Could you post the corresponding assembly code, with and without
optimization? (I think it’s -O0 for gcc to explicitly turn off
optimization, by the way.)

dB

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:b2r6db$jg9$1@inn.qnx.com

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3E50A0A1.5050409@csical.com> …
We are now seriously evaluating porting our QNX4 software to QNX6. I
have ported most of it, but now (when I am actually trying to run it) I
am having a very odd problem.

Have you disable optimisation (-Od I beleive)

It’s quite possible the LLSCptr variable does not exists and is optimized
out.

Here is the actual snippet of code in question:

— snip
1 event_msg.class_num = i;
2 LLSCptr =
Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr;
3 if ( LLSCptr != NULL)
4 {
5 unsigned short j;
6 for (j = 0; j
Low_level_class_ptr[event_msg.class_num].num_sub_class; j++)
7 {
8 event_msg.sub_class_num = j;
9 LLDEVptr = LLSCptr->low_level_device_ptr;
10 {
— snip

If I step through with the debugger, I can print the value of
“Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr”, and
it is correct;
however, the variable LLSCptr does not contain the correct value if I
print it’s value on line 3 (immediately after assignment).

LLSCptr is on the stack, Low_level_class_ptr array is static.

Prior to assignment LLSCptr contains NULL. After assignment it contains
a nonsense address (which of course causes the program to crash on
line 9).

This program is single threaded (so another thread clobbering the stack
is not possible).

Any ideas on why a simple assignment doesn’t work ?

Rennie

“David Bacon” <dbacon@qnx.com> wrote in message
news:b2re0g$qni$1@nntp.qnx.com

The generic suggestion about optimization does not appear likely to me to
be
applicable here, since the variable clearly exists and got something
assigned to it.

From what I’ve seen of compiler, it’s not because a variable is in your code
that it will exists in real live, the compilier may decide to :

LLDEVptr = Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr;

and get rid of LLSCptr. That would make sense if there is no more register
availble.


Nevertheless, optimization could be at the root of some ba
code generation.

Could you post the corresponding assembly code, with and without
optimization? (I think it’s -O0 for gcc to explicitly turn off
optimization, by the way.)

dB

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:b2r6db$jg9$> 1@inn.qnx.com> …

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3E50A0A1.5050409@csical.com> …
We are now seriously evaluating porting our QNX4 software to QNX6. I
have ported most of it, but now (when I am actually trying to run it)
I
am having a very odd problem.

Have you disable optimisation (-Od I beleive)

It’s quite possible the LLSCptr variable does not exists and is
optimized
out.

Here is the actual snippet of code in question:

— snip
1 event_msg.class_num = i;
2 LLSCptr =
Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr;
3 if ( LLSCptr != NULL)
4 {
5 unsigned short j;
6 for (j = 0; j
Low_level_class_ptr[event_msg.class_num].num_sub_class; j++)
7 {
8 event_msg.sub_class_num = j;
9 LLDEVptr = LLSCptr->low_level_device_ptr;
10 {
— snip

If I step through with the debugger, I can print the value of
“Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr”,
and
it is correct;
however, the variable LLSCptr does not contain the correct value if I
print it’s value on line 3 (immediately after assignment).

LLSCptr is on the stack, Low_level_class_ptr array is static.

Prior to assignment LLSCptr contains NULL. After assignment it
contains
a nonsense address (which of course causes the program to crash on
line 9).

This program is single threaded (so another thread clobbering the
stack
is not possible).

Any ideas on why a simple assignment doesn’t work ?

Rennie
\

Your clock seems to be running very slow, or else I’ve forgotten what time
zone I’m in. :slight_smile:

Anyway, I take it from the assembly code that event_msg.class_num is an
unsigned short, and that the elements of Low_level_class_ptr are 32 bytes
wide (“sall” is clearly a “shift left” instruction). Not too much
optimization there, judging from the first two “movl” instructions, so
optimization isn’t the issue. The low_level_sub_class_ptr would be the last
field in the struct or class describing those array elements, and the long
and the short of it is that there appears to be nothing whatsoever wrong
with the generated code if the foregoing inferences are all correct. (I
agree with your “Looks like…” statement, of course.) You presumably have
your own reasons for indexing the array with event_msg.class_num rather than
with i; it might be interesting to see what the debugger has to say about
the value of Low_level_class_ptr_.low_level_sub_class_ptr.

Could you possibly post a self-contained program which demonstrates the
problem. A code generation or debugger bug provoked by such innocent code
would be of great interest to many! Thanks

dB

“Rennie Allen” <rallen@csical.com> wrote in message
news:3E50F39D.6090306@csical.com…_

David Bacon wrote:
The generic suggestion about optimization does not appear likely to me
to be
applicable here, since the variable clearly exists and got something
assigned to it.

I agree.

Nevertheless, optimization could be at the root of some bad
code generation.

That’s what I thought also. I should have mentioned, that I already had
tried with optimizations off (-O0). Maybe -O0 isn’t right ? Please
note that this is C code being compiled with the C++ compiler (for
historical reasons). QCC is used to invoke the compiler.

Could you post the corresponding assembly code, with and without
optimization?

Well, since it doesn’t work without optimizations either, I haven’t
produced assembly with -O0. Here is the assembly with default
optimizations (I’ll follow up later with the assembly with no
optimizations). Looks like LLSCptr is at -44 on the stack, and that the
value of Low_level_class_ptr[].low_level_sub_class_ptr is stored in eax.
Don’t know what the “sall” opcode is for though, if it has buggered
eax then the index operation that supposedly reloads eax with the value
in question, may be wrong.

Line 2 {
movzwl -24(%ebp),%eax
movl %eax,%edx
movl %edx,%eax
sall $5,%eax
movl Low_level_class_ptr,%edx
movl 28(%edx,%eax),%eax
movl %eax,-44(%ebp)
}

Line 3 {
cmpl $0,-44(%ebp)
je .L876
}

Line 2 is the relevant line.

Thanks…

Rennie

David Bacon wrote:

Your clock seems to be running very slow, or else I’ve forgotten
what time zone I’m in. > :slight_smile:

Hmmm, my post shows here as being at 14:37 PST (which is when it was).

Anyway, I take it from the assembly code that event_msg.class_num is
an unsigned short, and that the elements of Low_level_class_ptr are
32 bytes wide (“sall” is clearly a “shift left” instruction).

Excuse my ignorance, I have almost 0 gnu assembler time…

Not too much optimization there, judging from the first two “movl”
instructions, so> optimization isn’t the issue.

That’s the conclusion I came to (particularly when -O0 produced the same
behaviour).

The low_level_sub_class_ptr would be the last field in the struct or
class describing those array elements, and the long and the short
of it is that there appears to be nothing whatsoever wrong with the
generated code if the foregoing inferences are all correct. (I agree
with your “Looks like…” statement, of course.) You presumably
have your own reasons for indexing the array with
event_msg.class_num rather than with i;

Not my reasons; this is code I have inherited - could just as easily be
indexed with i (event_msg.class_num needs to be initialized by the same
loop). The program this comes from is about 180,000 lines of code, and
this is the only place where there appears to be an issue, and it is
100% repeatable (even the garbage value that is stored in LLSCptr is
always the same).

it might be interesting to see what the debugger has to say about the
value of Low_level_class_ptr> .low_level_sub_class_ptr.

_The debugger prints the same (correct) value for this array indexed
with either i or event_msg.class_num or the immediate value 0 (which is
what both i and event_msg.class_num are when the problem occurs). The
problem is that the code that assigns the value to the stack location
-44 doesn’t work (why it doesn’t work is a mystery to me at this point).

Here is screenshot from an actual debug session (the code is inside a
function called SendAllDevicesOnline.
\

Breakpoint 4, SendAllDevicesOnline () at Online.cpp:201
201 for (i = 0; i < HIGH_LEVEL_DEVICE_CLASS; i++)
(gdb) n
203 event_msg.class_num = i;
(gdb) p event_msg.class_num
$1 = 20
(gdb) n
204 LLSCptr = Low_level_class_ptr[ event_msg.class_num].low_level_sub_class_ptr;
(gdb) p event_msg.class_num
$2 = 0
(gdb) p Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr
$3 = (struct low_level_sub_class_struct *) 0x80acac8
(gdb) n
205 if ( LLSCptr != NULL)
(gdb) p LLSCptr
$4 = (struct low_level_sub_class_struct *) 0x1080a
(gdb)
------------------------------------------------------------------------

The bogus value for LLSCptr is always 0x1080a. 0x80acac8 is correct (at
least I can safely de-reference it)._

Could you possibly post a self-contained program which demonstrates
the problem. A code generation or debugger bug provoked by such
innocent code would be of great interest to many! Thanks

I could provide you the whole program, but I doubt that I could snip
this code out of the environment and have it behave this way, since (as
you imply) this would have been caught long ago.

With the whole program, you could re-produce the above debug session in
less than 5 minutes. The question is: how can line 204 can be executed,
with the result “LLSCptr != Low_level_class_ptr[0].low_level_sub_class_ptr”
is true on line 205 ?

FWIW: here are the local declarations from the top of SendAllDevicesOnline:

struct ucos__s_event event_msg;
struct low_level_device_struct *LLDEVptr;
struct low_level_sub_class_struct *LLSCptr;
struct high_level_device_struct *HLDEVptr;
unsigned short i;

Rennie

Inspection of your mail headers reveals a Date line with “+0000” rather than
the “-0800” I would expect from someone in PST. So it looked from my point
of view as though you had replied at 9:37am EST to a message I wrote here on
a snowy afternoon!

Actually I guessed on the meaning of “sall” and then did a quick doublecheck
using Google.

What you have posted so far is insufficient for the solution of this
mystery. There must be something else going on. Without posting the whole
180,000 lines, could you perhaps at least send along the source and the “-S”
output of gcc for the function or method that contains this code? It might
be helpful to have the declarations of the “struct low_level_…” structures
handy too. Thanks

dB

“Rennie Allen” <rallen@csical.com> wrote in message
news:3E51EFA0.9010605@csical.com

David Bacon wrote:
Your clock seems to be running very slow, or else I’ve forgotten
what time zone I’m in. > :slight_smile:

Hmmm, my post shows here as being at 14:37 PST (which is when it was).

Anyway, I take it from the assembly code that event_msg.class_num is
an unsigned short, and that the elements of Low_level_class_ptr are
32 bytes wide (“sall” is clearly a “shift left” instruction).

Excuse my ignorance, I have almost 0 gnu assembler time…

Not too much optimization there, judging from the first two “movl”
instructions, so> optimization isn’t the issue.

That’s the conclusion I came to (particularly when -O0 produced the same
behaviour).

The low_level_sub_class_ptr would be the last field in the struct or
class describing those array elements, and the long and the short
of it is that there appears to be nothing whatsoever wrong with the
generated code if the foregoing inferences are all correct. (I agree
with your “Looks like…” statement, of course.) You presumably
have your own reasons for indexing the array with
event_msg.class_num rather than with i;

Not my reasons; this is code I have inherited - could just as easily be
indexed with i (event_msg.class_num needs to be initialized by the same
loop). The program this comes from is about 180,000 lines of code, and
this is the only place where there appears to be an issue, and it is
100% repeatable (even the garbage value that is stored in LLSCptr is
always the same).

it might be interesting to see what the debugger has to say about the
value of Low_level_class_ptr> _.low_level_sub_class_ptr.

The debugger prints the same (correct) value for this array indexed
with either i or event_msg.class_num or the immediate value 0 (which is
what both i and event_msg.class_num are when the problem occurs). The
problem is that the code that assigns the value to the stack location
-44 doesn’t work (why it doesn’t work is a mystery to me at this point).

Here is screenshot from an actual debug session (the code is inside a
function called SendAllDevicesOnline.
\

Breakpoint 4, SendAllDevicesOnline () at Online.cpp:201
201 for (i = 0; i < HIGH_LEVEL_DEVICE_CLASS; i++)
(gdb) n
203 event_msg.class_num = i;
(gdb) p event_msg.class_num
$1 = 20
(gdb) n
204 LLSCptr =
ow_level_class_ptr[ event_msg.class_num].low_level_sub_class_ptr;
(gdb) p event_msg.class_num
$2 = 0
(gdb) p Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr
$3 = (struct low_level_sub_class_struct *) 0x80acac8
(gdb) n
205 if ( LLSCptr != NULL)
(gdb) p LLSCptr
$4 = (struct low_level_sub_class_struct *) 0x1080a
(gdb)
------------------------------------------------------------------------

The bogus value for LLSCptr is always 0x1080a. 0x80acac8 is correct (at
least I can safely de-reference it).

Could you possibly post a self-contained program which demonstrates
the problem. A code generation or debugger bug provoked by such
innocent code would be of great interest to many! Thanks

I could provide you the whole program, but I doubt that I could snip
this code out of the environment and have it behave this way, since (as
you imply) this would have been caught long ago.

With the whole program, you could re-produce the above debug session in
less than 5 minutes. The question is: how can line 204 can be executed,
with the result “LLSCptr !=
Low_level_class_ptr[0].low_level_sub_class_ptr”
is true on line 205 ?

FWIW: here are the local declarations from the top of
SendAllDevicesOnline:

struct ucos__s_event event_msg;
struct low_level_device_struct *LLDEVptr;
struct low_level_sub_class_struct *LLSCptr;
struct high_level_device_struct *HLDEVptr;
unsigned short i;

Rennie_

David Bacon wrote:

Inspection of your mail headers reveals a Date line with “+0000” rather than
the “-0800” I would expect from someone in PST. So it looked from my point
of view as though you had replied at 9:37am EST to a message I wrote here on
a snowy afternoon!

Ahh, your right. Some how my TZ got trashed. Should be fixed now.

Actually I guessed on the meaning of “sall” and then did a quick doublecheck
using Google.

Never would have guessed, rotl just seems so much more intuitive for
shift left than sall :slight_smile:

What you have posted so far is insufficient for the solution of this
mystery. There must be something else going on.

Yeah. Any ideas ? I would like to focus in some direction, but I can’t
even imagine what could cause this problem (besides bad code generation
somewhere along the line) ? I mean if the index into memory is correct
and there is a mov that puts it on the stack ? How can it not be there ?

Without posting the whole
180,000 lines, could you perhaps at least send along the source and the “-S”
output of gcc for the function or method that contains this code? It might
be helpful to have the declarations of the “struct low_level_…” structures
handy too. Thanks

Sure, I have attached the .s file, and the appropriate header.

Thanks again.

rallen@csical.com sed in <3E51EFA0.9010605@csical.com>:

(gdb) p Low_level_class_ptr[event_msg.class_num].low_level_sub_class_ptr
$3 = (struct low_level_sub_class_struct *) 0x80acac8
(gdb) n
205 if ( LLSCptr != NULL)
(gdb) p LLSCptr
$4 = (struct low_level_sub_class_struct *) 0x1080a

The lower 16bit of 0x1080a is high16 of 0x80acac8.
Could be aligment disagreement between memory and program
(especially by mixing the compiler)

Digging futher, the struct low_level_class_struct COULD have
different alignment between compilers;
(Yes, full post of the header file DID help)

struct low_level_class_struct
{ /* offset /
unsigned short class_num; /
packed 0 aligned 0 /
char name[20]; /
packed 2 aligned 4 /
unsigned short type; /
packed 22 aligned 24 /
unsigned short num_sub_class; /
packed 24 aligned 26 */
struct low_level_sub_class_struct low_level_sub_class_ptr;
/
packed 26 aligned 28 */
};

So if the Low_level_class_ptr[] is filled with packed compiler code
and read by aligned compiler code, you can explain the debug session.
(Both parties are correct in their respect; now who to blame?)

v(code’s idea)
0a 08 01 00
c8 ca 0a 08
^(gdb & Low_level_class_ptr’s idea)

I’ve ran into similar case porting GRUB.

kabe

kabe@sra-tohoku.co.jp wrote:

Digging futher, the struct low_level_class_struct COULD have
different alignment between compilers;
(Yes, full post of the header file DID help)

struct low_level_class_struct
{ /* offset /
unsigned short class_num; /
packed 0 aligned 0 /
char name[20]; /
packed 2 aligned 4 /
unsigned short type; /
packed 22 aligned 24 /
unsigned short num_sub_class; /
packed 24 aligned 26 */
struct low_level_sub_class_struct low_level_sub_class_ptr;
/
packed 26 aligned 28 */
};

So if the Low_level_class_ptr[] is filled with packed compiler code
and read by aligned compiler code, you can explain the debug session.
(Both parties are correct in their respect; now who to blame?)

Each invokation of the compiler for each module has -fpack-struct specified
(which is why I didn’t consider this a potential problem). I thought I had
done several “make cleans” since introducing the -fpack-struct flag; but
evidently I had not.

Being a build management idiot is preferable to a compiler bug any day :slight_smile:

Thanks for spending the time to look at this.

Rennie

Rennie Allen wrote:

(which is why I didn’t consider this a potential problem). I thought I had
done several “make cleans” since introducing the -fpack-struct flag; but
evidently I had not.

Being a build management idiot is preferable to a compiler bug any day > :slight_smile:

Thanks for spending the time to look at this.

I spoke to soon. The -fpack-struct was being passed to each file, however,
I have removed the -fpack-struct entirely as there is a pack(1) directive
in the header anyway. It still does not work. I do believe that there
is a bug in the compiler with respect to the pragma pack() directive.

Can anyone confirm/deny this ?

Rennie

No, I think you spoke too soon of a compiler bug… :slight_smile:

I think Kabe got onto the right trail here. Assuming pack(1) means “to hell
with alignment and all the performance consequences of unaligned access”,
and pack() means “restore the default”, that header file you provided is
first doing maximally tight packing and then (right at the end) returning to
normal.

The discrepancy in what GDB sees may well be a bug in GDB or in GCC’s symbol
output; if it were me, I’d revert to the mighty printf statement in
diagnosing that section of code…

dB

“Rennie Allen” <rallen@csical.com> wrote in message
news:3E53F2E8.8020101@csical.com

Rennie Allen wrote:

(which is why I didn’t consider this a potential problem). I thought I
had
done several “make cleans” since introducing the -fpack-struct flag; but
evidently I had not.

Being a build management idiot is preferable to a compiler bug any day
:slight_smile:

Thanks for spending the time to look at this.

I spoke to soon. The -fpack-struct was being passed to each file,
however,
I have removed the -fpack-struct entirely as there is a pack(1) directive
in the header anyway. It still does not work. I do believe that there
is a bug in the compiler with respect to the pragma pack() directive.

Can anyone confirm/deny this ?

Rennie

[…snip…]

David Bacon wrote:

The discrepancy in what GDB sees may well be a bug in GDB or in GCC’s symbol
output; if it were me, I’d revert to the mighty printf statement in
diagnosing that section of code…

Hmmmm, I’m still on the fence here. Now when I printf these 2 values they
are indeed both the same, but they are both the invalid value (from the
debugger output). This shows consistancy and explains the crash, but
what about the correct value I was able to access in the debugger ? It
still sounds like an alignment issue, but an alignment issue that I can’t
do anything about (which sounds remarkably like a bug to me).

I found the following with a google; sounds familiar…


This is the mail archive of the gcc-prs@gcc.gnu.org mailing list for the
GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index
] Message Nav: [Date Prev] [Date Next][Thread Prev] [Thread Next]
Other format: [Raw text]

other/4957: #pragma pack(1) … #pragma pack(4) does not restore the
original alignment

  • From: igusarov at akella dot com
  • To: gcc-gnats at gcc dot gnu dot org
  • Date: 27 Nov 2001 16:29:12 -0000
  • Subject: other/4957: #pragma pack(1) … #pragma pack(4) does not
    restore the original alignment
  • Reply-to: igusarov at akella dot com
    [Get raw message]




Number: 4957
Category: other
Synopsis: #pragma pack(1) … #pragma pack(4) does not restore the original alignment
Confidential: no
Severity: critical
Priority: medium
Responsible: unassigned
State: open
Class: wrong-code
Submitter-Id: net
Arrival-Date: Tue Nov 27 08:36:00 PST 2001
Closed-Date:
Last-Modified:
Originator: Igor A. Goussarov
Release: Reading specs from /usr/lib/gcc-lib/i386-unknown-freebsd4.4/3.0.1/specs Configured with: ./configure
–prefix=/usr Thread model: posix gcc version 3.0.1
Organization:
Environment:
i386-FreeBSD 4.4-RELEASE, kernel was rebuilt to include firewall support and certain devices… This shouldn’t affect gcc?
Description:
The program is using packed structures to represent network

packets. The definition of these structures looks like

#pragma pack(1)
struct TNetCommand_Foo
{
// …
};
#pragma pack(4)

i.e. the alignment is set to 1 byte, then the structure is
defined, then the alignment is set back to 4 bytes.
Each cpp file includes several headers whith such code.
I emphasize that each header doesrestore* the alignment
after it had set it to 1. The problem is the following:
if a given cpp file includes approximately 4 such headers,
the definition of structures in the headers included after
these ones are treated asif alignment was not properly
restored.

As the result, the linked executable file is completely
inoperative because some translation units treat this
structure asif it was packed, while the others - asif it
wasn’t.

Rennie Allen wrote:

David Bacon wrote:

The discrepancy in what GDB sees may well be a bug in GDB or in GCC’s
symbol
output; if it were me, I’d revert to the mighty printf statement in
diagnosing that section of code…


Hmmmm, I’m still on the fence here. Now when I printf these 2 values they
are indeed both the same, but they are both the invalid value (from the
debugger output). This shows consistancy and explains the crash, but
what about the correct value I was able to access in the debugger ? It
still sounds like an alignment issue, but an alignment issue that I can’t
do anything about (which sounds remarkably like a bug to me).

I have confirmed that the problem is alignment. By re-compiling everything
in the system (there is far more than just this process) with the pack
directives ommitted everything works fine. The problem is that these
data structures are stored on flash, and we want to maintain binary
compatibility with our QNX4 product (it is used run big 24/7 processes -
and with binary compatibility we will be able to convert a customer to
the new product without shutting the plant down).

If there is something wrong with how the pragma pack directive is being
used, then I would very much like to know, otherwise consider this a
bug report.

Thanks for everyones assistance.

Rennie