Why does GCC pass a multiple of 16 bytes on stk?

Hi all,
I’m looking at the code gcc (qcc) generates for QNX. I’ve noticed that
the compiler always pushes a multiple of 16 bytes onto the stack when
passing parameters. As there is no guarantee of alignment, I’m wondering
why the compiler does this. Anyone know the reason? GCC even does this
on user-written functions, not just system calls.
Thanks for any info you have,
Randy Hyde

Randall Hyde <randall.nospam.hyde@ustraffic.net> wrote:

Hi all,
I’m looking at the code gcc (qcc) generates for QNX. I’ve noticed that
the compiler always pushes a multiple of 16 bytes onto the stack when
passing parameters. As there is no guarantee of alignment, I’m wondering
why the compiler does this. Anyone know the reason? GCC even does this
on user-written functions, not just system calls.

As long as you aren’t writing assembly routines, the compiler should be able
to enforce stack alignment.

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

“Chris McKillop” <cdm@qnx.com> wrote in message
news:bo9h6g$9i7$3@nntp.qnx.com

As long as you aren’t writing assembly routines, the compiler should be
able
to enforce stack alignment.

Well, guess what I’m doing :slight_smile:

I’m still curious, though, what is the point of 16-byte alignment? And,
quite frankly,
the compiler doesn’t seem to be preserving 16-byte alignment. Consider
this code:

void *myalloc( int n, int j, int k, int l, int m )
{
int i;
char b;
short c;

exit( 1 );
return (void *)malloc( n );
}

Here’s the x86 assembly output:

myalloc:
// return address = 4 bytes

pushl %ebp
// ebp takes it to 8 bytes

movl %esp,%ebp
subl $24,%esp
// allocation takes it to 44 bytes


addl $-12,%esp // to round parameter list size to 16 bytes?
pushl $1
call exit

addl $16,%esp // Okay, I should have compiled with -O2 :slight_smile:

addl $-12,%esp // Another alignment to 16 bytes?
movl 8(%ebp),%eax
pushl %eax
call malloc
addl $16,%esp
movl %eax,%eax
movl %eax,%edx
movl %edx,%eax
jmp .L2
.align 4
…L2:
leave
ret


I just don’t see the reason here because the stack is not aligned on a
boundary
of 16 bytes to begin with, so there is no way that dropping this stack by a
multiple
of 16 bytes will maintain such alignment.

Are there going to be any ill effects if you don’t reserve a multiple of 16
bytes
for a parameter list?
Thanks,
Randy Hyde

Duh, I just notice that $24 is 24 decimal, not hex.
Still wondering why the compiler enforces 16-byte alignment, though.
Cheers,
Randy Hyde

“Randall Hyde” <randall.nospam.hyde@ustraffic.net> wrote in message
news:boc4dt$q71$1@inn.qnx.com

“Chris McKillop” <> cdm@qnx.com> > wrote in message
news:bo9h6g$9i7$> 3@nntp.qnx.com> …

As long as you aren’t writing assembly routines, the compiler should be
able
to enforce stack alignment.

Well, guess what I’m doing > :slight_smile:

I’m still curious, though, what is the point of 16-byte alignment? And,
quite frankly,
the compiler doesn’t seem to be preserving 16-byte alignment. Consider
this code:

void *myalloc( int n, int j, int k, int l, int m )
{
int i;
char b;
short c;

exit( 1 );
return (void *)malloc( n );
}

Here’s the x86 assembly output:

myalloc:
// return address = 4 bytes

pushl %ebp
// ebp takes it to 8 bytes

movl %esp,%ebp
subl $24,%esp
// allocation takes it to 44 bytes


addl $-12,%esp // to round parameter list size to 16 bytes?
pushl $1
call exit

addl $16,%esp // Okay, I should have compiled with -O2 > :slight_smile:

addl $-12,%esp // Another alignment to 16 bytes?
movl 8(%ebp),%eax
pushl %eax
call malloc
addl $16,%esp
movl %eax,%eax
movl %eax,%edx
movl %edx,%eax
jmp .L2
.align 4
.L2:
leave
ret


I just don’t see the reason here because the stack is not aligned on a
boundary
of 16 bytes to begin with, so there is no way that dropping this stack by
a
multiple
of 16 bytes will maintain such alignment.

Are there going to be any ill effects if you don’t reserve a multiple of
16
bytes
for a parameter list?
Thanks,
Randy Hyde

Randall Hyde <randall.nospam.hyde@ustraffic.net> wrote:

Duh, I just notice that $24 is 24 decimal, not hex.
Still wondering why the compiler enforces 16-byte alignment, though.

quad-word alignment provides access improvements on modern x86 CPUs.

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

“Chris McKillop” <cdm@qnx.com> wrote in message
news:bodvd2$890$4@nntp.qnx.com

Randall Hyde <> randall.nospam.hyde@ustraffic.net> > wrote:
Duh, I just notice that $24 is 24 decimal, not hex.
Still wondering why the compiler enforces 16-byte alignment, though.


quad-word alignment provides access improvements on modern x86 CPUs.

Oh I didn’t know that. I did some test a while ago and curiously in some
case using byte alignment proved faster depending on the size of the object
(struct). When byte was used the object became much smaller and making
better used for the cache ending up significantly faster then the aligned
version.

chris


Chris McKillop <> cdm@qnx.com> > “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

“Chris McKillop” <cdm@qnx.com> wrote in message
news:bodvd2$890$4@nntp.qnx.com

Randall Hyde <> randall.nospam.hyde@ustraffic.net> > wrote:
Duh, I just notice that $24 is 24 decimal, not hex.
Still wondering why the compiler enforces 16-byte alignment, though.


quad-word alignment provides access improvements on modern x86 CPUs.

chris

Except for the fact that you wind up burning up cache lines by having so
many
“holes” in the stack area. I’d like to see the research that suggests that
passing
parameter blocks that are multiples of 16 bytes improves performance in the
average case across all modern x86 CPUs.

However, if the 16-byte block is strictly for performance purposes, then
that
means I can safely ignore it, correct?
Cheers,
Randy Hyde

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:bog3lp$lmq$1@inn.qnx.com

“Chris McKillop” <> cdm@qnx.com> > wrote in message
news:bodvd2$890$> 4@nntp.qnx.com> …
Randall Hyde <> randall.nospam.hyde@ustraffic.net> > wrote:
Duh, I just notice that $24 is 24 decimal, not hex.
Still wondering why the compiler enforces 16-byte alignment, though.


quad-word alignment provides access improvements on modern x86 CPUs.

Oh I didn’t know that. I did some test a while ago and curiously in some
case using byte alignment proved faster depending on the size of the
object
(struct). When byte was used the object became much smaller and making
better used for the cache ending up significantly faster then the aligned
version.

This has been my experience as well. As cache lines are typically 32-64
bytes
long on modern x86 CPUs, wasting an average of eight bytes per function call
seems like a good way to use up cache lines. Even with a four-way set
associative
cache, I could easily see some of the nested function calls in the standard
library
winding up making several cache lines unavailable. Given memory speeds, the
cost of a cache miss is going to blow away the ocassional savings of not
having
any mis-aligned (across a cache line) accesses.
Cheers,
Randy Hyde