l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2002 Feb 27 23:32

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] another gcc question
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] another gcc question


#define N 5

const int NN = 5;

int main()
  int i = 0;
  int m = 0;
  int n = 5;
  for (i=0 ; i < n ; i=i+1)

  for (i=0 ; i<N ; i=i+1)

  for (i=0 ; i<NN ; i=i+1)

  return 0;
$ gcc gcc -funroll-all-loops -S sample.c
I see (sample.s)
	.file	"sample.c"
	.version	"01.01"
.globl NN
.section	.rodata
	.align 4
	.type	 NN,@object
	.size	 NN,4
	.long 5
	.align 4
.globl main
	.type	 main,@function
	pushl %ebp
	movl %esp,%ebp
	subl $24,%esp
	movl $0,-4(%ebp)
	movl $0,-8(%ebp)
	movl $5,-12(%ebp)
	movl $0,-4(%ebp)
	.p2align 4,,7
	movl -4(%ebp),%eax
	cmpl -12(%ebp),%eax
	jl .L6
	jmp .L4
	.p2align 4,,7
	movl $0,-8(%ebp)
	incl -4(%ebp)
	jmp .L3
	.p2align 4,,7
	movl $0,-4(%ebp)
	.p2align 4,,7
	cmpl $4,-4(%ebp)
	jle .L10
	jmp .L8
	.p2align 4,,7
	movl $0,-8(%ebp)
	incl -4(%ebp)
	jmp .L7
	.p2align 4,,7
	movl $0,-4(%ebp)
	.p2align 4,,7
	movl -4(%ebp),%eax
	cmpl NN,%eax
	jl .L14
	jmp .L12
	.p2align 4,,7
	movl $0,-8(%ebp)
	incl -4(%ebp)
	jmp .L11
	.p2align 4,,7
	xorl %eax,%eax
	jmp .L2
	.p2align 4,,7
	.size	 main,.Lfe1-main
	.ident	"GCC: (GNU) 2.95.2 20000220 (Debian GNU/Linux)"

When I inspect the above, I see loops included.
-12(%ebp) (3 32-bit offset from %ebp) is set to 5 and -4(%ebp) is incl
until it is cmpl to be no longer less than -12(%ebp).

Labels even show loops when you watch it. I count about 3 when I quickly
scan it.

This would lead me to believe the generated asm, code is not unrolled if I
understand the expectation of the unrolling process. (I would guess
unrolling loops would mean a shift from looped asm structures to linear
processing - increasing speed by droping compares, but increasing size of
code by up to n times the original loop where n = number of iterations. (I
could be wrong on this, please say so if I am.)

Also, a diff on the output of the two .s files (one attempt with
-funroll-all-loops, and the other with -funroll-loops) shows no

I bet different values for n, N, and NN would lead to different
4-byte offset from (%ebx) for each value if other runs/trials were

No time to examine this in more deatil right now.

I am still rather new to Intel assembly, but this will change.


On Wed, 27 Feb 2002, Rod Roark wrote:
> You can compile with -S and then look at the assembler output file.
> -- Rod
>    http://www.sunsetsystems.com/
> On Wednesday 27 February 2002 09:31, Peter Jay Salzman wrote:
> > another optimization question:
> >
> >    int n = 5;
> >    for (i=0; i<n; ++i)
> >
> > can gcc unroll this loop the way it can (for instance)
> >
> >    #define N 5
> >    for (i=0; i<N; ++i)
> >
> > if it can't, what about
> >
> >    const int n = 5;
> > 	for (i=0; i<n; ++i)
> >
> > pete
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech

vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
O'Reilly and Associates
For numerous book donations.