[RFC] Flow: Handle CLOBBERs like SETs if the reg stays live

2005-10-14 Thread Andreas Krebbel
I had posted this at first under a somewhat misleading subject (someone could
say completely wrong) - hope that has improved yet ;-)

Hello,

> Just as an FYI, Kenny and I have replaced the global liveness analyzer
> with one from df.c, and removed the need for make_accurate_live_analysis
> (ie df.c now does the partial avail liveness stuff).
> 
> We are currently in the process of changing all the other users of life
> analysis to use the df.c liveness, which should solve this problem
> (since they will all then use the accurate life analysis).

Good to hear! Unfortunately this will come too late to fix the bug in gcc 
4.1. What about the following workaround? This one could be fine for 
the 4.1.0 release:

Remember the situation after reload. r1 is used for an uninitialized variable
hence it is live from the beginning to the read of the variable.
The special liveness analyzer in global alloc not considering uninitialized 
quantities to conflict with others assigns r1 also to a local pseudo.

bb 1 start: r1 dead
  insn 1: r1 = 0;
  insn 2: use r1; clobber r1; REG_DEAD (r1) !!!
bb 1 end: r1 live

Due to the REG_DEAD note regrename assumes it is safe to rename r1 locally to r5
resulting in:

bb 1 start: r1 live !!!
  insn 1: r5 = 0;
  insn 2: use r5; clobber r5; REG_DEAD (r5)
bb 1 end: r1 live

Because live at start has changed due to a local change an assertion in flow is
triggered.

The cause of the problem is that we already enter regrename with wrong liveness
info which in turn is caused by using a different liveness analyzer in global 
alloc
than elsewhere.  We have the register r1 live at the end of the basic block but 
the
last rtx writing this value is a clobber.  This situation should only occur if 
we had 
overlapping live ranges which were assigned to the same hard reg. Under normal
circumstances this should be considered a bug and an assertion would be the 
right
choice. But knowing that this could only be caused by the global alloc liveness 
analyzer the flow analyzer should avoid emitting a REG_DEAD note. That way the
value of the uninitialized variable (which is undefined anyway) would be 
clobbered
what should not cause any trouble.

The attached patch modifies mark_set_1 to consider a CLOBBER a normal SET if the
clobbered register is live afterwards.

So as a rfc what do you (or anybody else) think about the attached patch for 
mainline gcc.

Bootstrapped on i686, s390 and s390x without testsuite regressions.

This patch fixes the 920501-4.c regression on s390x.

Bye,

-Andreas-


2005-10-13  Andreas Krebbel  <[EMAIL PROTECTED]>

* flow.c (mark_set_1): Handle CLOBBERs like SETs if the register
is live afterwards.


Index: gcc/flow.c
===
RCS file: /cvs/gcc/gcc/gcc/flow.c,v
retrieving revision 1.635
diff -p -c -r1.635 flow.c
*** gcc/flow.c  22 Aug 2005 16:58:46 -  1.635
--- gcc/flow.c  13 Oct 2005 09:25:51 -
*** mark_set_1 (struct propagate_block_info 
*** 2819,2825 
  else
SET_REGNO_REG_SET (pbi->local_set, i);
}
! if (code != CLOBBER)
SET_REGNO_REG_SET (pbi->new_set, i);
  
  some_was_live |= needed_regno;
--- 2819,2825 
  else
SET_REGNO_REG_SET (pbi->local_set, i);
}
! if (code != CLOBBER || needed_regno)
SET_REGNO_REG_SET (pbi->new_set, i);
  
  some_was_live |= needed_regno;


Re: Update on GCC moving to svn

2005-10-14 Thread Joseph S. Myers
On Fri, 14 Oct 2005, Kaveh R. Ghazi wrote:

> their corresponding svn commands posted on the website?  Hmm, I guess
> we would need to update these pages to the svn equivalents which would
> pretty much cover the basics of a how-to guide:
> 
> http://gcc.gnu.org/cvs.html
> http://gcc.gnu.org/cvswrite.html

I understand patches for at least some of (contrib, maintainer-scripts, 
wwwdocs) to convert to svn are ready.  Could whichever patches are ready 
be posted to gcc-patches?  That way people can test them, and see if 
anything still needs converting.  (In the case of 
maintainer-scripts/gcc_release, all of the mainline, 4.0 and 3.4 versions 
are relevant as each branch's version is used for releases from that 
branch.  In the case of contrib/gcc_update each active branch will need 
its version patched but the same patch should work in each case.  
Otherwise I think only mainline versions are relevant.)

-- 
Joseph S. Myers   http://www.srcf.ucam.org/~jsm28/gcc/
[EMAIL PROTECTED] (personal mail)
[EMAIL PROTECTED] (CodeSourcery mail)
[EMAIL PROTECTED] (Bugzilla assignments and CCs)


Re: Update on GCC moving to svn

2005-10-14 Thread Daniel Berlin
(Sorry for the C-n-P, my regular email is down till sunday afternoon. 
Hopefully notes won't screw this email up too badly)


>Has any thought been put into helping the 200+ people with write
>access migrate? 

Of course
There is a basic how to guide for CVS users on the wiki, at 
http://gcc.gnu.org/wiki/SvnHelp

It has been there since February, and the repository it references for the 
same
amount of time.

Once the new test repo is ready (As i've mentioned before, the apple 
branches take a while
to convert, it looks like another 5 hours is still left), i will update it 
with the gcc.gnu.org location.

You'll also note that for those wanting to test anonymous rsync, rsync 
gcc.gnu.org:: now shows a gcc-svn to rsync from.
(You can start now if you want, i've been rsync'ing it from my machine as 
it converts.  Obviously, until it's completely finished, the repo you get 
won't be complete, but you won't need to retransfer anything that you 
rsync now, since all the existing revisions are immutable)

> I.e. a quick how-to guide for simple cvs actions and
>their corresponding svn commands posted on the website?  Hmm, I guess
>we would need to update these pages to the svn equivalents which would
>pretty much cover the basics of a how-to guide:

>http://gcc.gnu.org/cvs.html
>http://gcc.gnu.org/cvswrite.html

I plan on updating these pages, using the current text on the wiki (It was 
written by a combination of Giovanni and I), as far as the edit history 
shows.


>Also, we have a bunch of mirrors sites.  Are these updated through
>cvs?  If so, what are we doing about that?

We have one mirror site who mirrors our CVS.  That was one of the people i 
copied
on the last status email.

>Thanks,
>--Kaveh



Re: Successfull build of gcc-4.0.2 on mips-sgi-irix6.5

2005-10-14 Thread Albert Chin
On Wed, Oct 12, 2005 at 02:29:56PM +0200, Rainer Emrich wrote:
> Compiler version: 4.0.2
> Platform: mips-sgi-irix6.5
> configure flags:
> - - --prefix=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install
> - - --with-gnu-as
> - - --with-as=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install/bin/as
> - - --with-gnu-ld
> - - --with-ld=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install/bin/ld
> - - --disable-shared --enable-threads=posix --enable-haifa --disable-nls
> - - --disable-libmudflap --with-gmp=/appl/shared/gnu/IRIX64/mips-sgi-irix6.5
> - - --with-mpfr=/appl/shared/gnu/IRIX64/mips-sgi-irix6.5
> - - --enable-languages=c,ada,c++,f95,objc

Can the resulting GCC build Emacs _and_ XEmacs?

-- 
albert chin ([EMAIL PROTECTED])


Passing va_args...

2005-10-14 Thread Kalaky
Hello,

Once I saw a gcc macro that passes variables arguments to another
variable argument function..example:

function_1 (int z, ...);
function_2 (int z, ...);
{
 return function_1 (z, MACRO);
}

Does anyone remember the macro name ?

TIA


Re: Linking of object files from different compilers for ARM

2005-10-14 Thread Daniel Jacobowitz
On Fri, Oct 14, 2005 at 09:49:42AM +0300, Yaroslav Karulin wrote:
>   Hello!
> 
>   I have two files: foo.c and main.c. foo.c is compiled with RVTC 2.2 
> compiler. main.c is compiled with gcc compiler (configured with 
> --target=arm-elf). I cannot link them together using gcc linker.
>   But it's possible to link files if I use CodeSourcery version of gcc.
> CodeSourcery guys writes that they have added full EABI support and hope 
> to submit it to the gcc 4.1.
>   So, the question is what's the difference between CodeSourcery's 
> version of gcc and FSF version? And is EABI support really submitted to 
> the gcc 4.1?

The difference is that it's configured for an EABI target, not an ELF
(legacy) target.  Build an arm-none-eabi compiler instead of an arm-elf
compiler and it should work.

-- 
Daniel Jacobowitz
CodeSourcery, LLC


Re: Successfull build of gcc-4.0.2 on mips-sgi-irix6.5

2005-10-14 Thread Rainer Emrich
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Albert Chin schrieb:
> On Wed, Oct 12, 2005 at 02:29:56PM +0200, Rainer Emrich wrote:
> 
>>Compiler version: 4.0.2
>>Platform: mips-sgi-irix6.5
>>configure flags:
>>- - --prefix=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install
>>- - --with-gnu-as
>>- - --with-as=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install/bin/as
>>- - --with-gnu-ld
>>- - --with-ld=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install/bin/ld
>>- - --disable-shared --enable-threads=posix --enable-haifa --disable-nls
>>- - --disable-libmudflap --with-gmp=/appl/shared/gnu/IRIX64/mips-sgi-irix6.5
>>- - --with-mpfr=/appl/shared/gnu/IRIX64/mips-sgi-irix6.5
>>- - --enable-languages=c,ada,c++,f95,objc
> 
> 
> Can the resulting GCC build Emacs _and_ XEmacs?
> 

Didn't check. Perhaps I have the time to check at the beginning of November.

Rainer

- --
Rainer Emrich
TECOSIM GmbH
Im Eichsfeld 3
65428 Rüsselsheim

Phone: +49(0)6142/8272 12
Mobile: +49(0)163/56 949 20
Fax.:   +49(0)6142/8272 49
Web: www.tecosim.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDT84i3s6elE6CYeURAvb6AKCKOZd9teFV+hdEGaOx0XqkvqmQWwCdEL2P
kekyVEwvbs+G8S03Vl8v3ZU=
=zYeW
-END PGP SIGNATURE-


Re: Passing va_args...

2005-10-14 Thread Kalaky
obviously, no. __VA_ARGS__ is a identifier for variadic macros.

I'am looking for some way to pass variable arguments to another
function that receives variable arguments without using va_list.

On 10/14/05, Jairo Balart <[EMAIL PROTECTED]> wrote:
> are you looking for __VA_ARGS__?
>
> Regards,
> Jairo
>
> On Friday 14 October 2005 15:19, Kalaky wrote:
> > Hello,
> >
> > Once I saw a gcc macro that passes variables arguments to another
> > variable argument function..example:
> >
> > function_1 (int z, ...);
> > function_2 (int z, ...);
> > {
> > return function_1 (z, MACRO);
> > }
> >
> > Does anyone remember the macro name ?
> >
> > TIA
>


Re: Passing va_args...

2005-10-14 Thread Andreas Schwab
Kalaky <[EMAIL PROTECTED]> writes:

> I'am looking for some way to pass variable arguments to another
> function that receives variable arguments without using va_list.

This is impossible.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: Passing va_args...

2005-10-14 Thread Kean Johnston

I'am looking for some way to pass variable arguments to another
function that receives variable arguments without using va_list.



This is impossible.


USL C has a very neat construct called '&...' which was designed
for exactly this purpose. One day when I have idle cycles (yeah right)
I will look into adding this as an extension.

Kean


bitmaps in gcc

2005-10-14 Thread Brian Makin

In reference to this on the wiki.

Bitmaps, also called sparse bit sets, are implemented
using a linked list with a cache. This is probably not
the most time-efficient representation, and it is not
unusual for bitmap functions to show up high on the
execution profile. Bitmaps are used for many things,
such as for live register sets at the entry and exit
of basic blocks in RTL, or for a great number of data
flow problems. See bitmap.c (and sbitmap.c for GCC's
simple bitmap implementation).

Can someone point me to a testcase where bitmap
functions show up high on the profile?


Can anyone give me some background on the use of
bitmaps in gcc?

Are they assumed to be sparse?
How critical is the memory consumption of bitsets?
What operations are the most speed critical?
Would it be desirable to merge bitmap and sbitmap into
one datastructure?
Anyone have good ideas for improvements?

Anything else anyone would want to add?

I think I may take a look at this.  Once I figure out
the requirments maybe we can speed it up a bit.


Brian N. Makin







__ 
Start your day with Yahoo! - Make it your home page! 
http://www.yahoo.com/r/hs


Re: bitmaps in gcc

2005-10-14 Thread Ian Lance Taylor
Brian Makin <[EMAIL PROTECTED]> writes:

> Bitmaps, also called sparse bit sets, are implemented
> using a linked list with a cache. This is probably not
> the most time-efficient representation, and it is not
> unusual for bitmap functions to show up high on the
> execution profile. Bitmaps are used for many things,
> such as for live register sets at the entry and exit
> of basic blocks in RTL, or for a great number of data
> flow problems. See bitmap.c (and sbitmap.c for GCC's
> simple bitmap implementation).
> 
> Can someone point me to a testcase where bitmap
> functions show up high on the profile?

PR 8361.

Ian


Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Kean Johnston

All,

I am getting a lot of test suite failures with almost all of
the vect/* tests. I am using pr18400.c from the test suite
as an example here, becuase its about the smallest one I
can find. Here is what is generated at -O2:

.file   "pr18400.c"
.version"01.01"
.text
.align 16
.globl  sig_ill_handler
.type   sig_ill_handler, @function
sig_ill_handler:
pushl   %ebp
movl%esp, %ebp
subl$20, %esp
pushl   $0
callexit
.size   sig_ill_handler, .-sig_ill_handler
.align 16
.globl  check_vect
.type   check_vect, @function
check_vect:
pushl   %ebp
movl%esp, %ebp
subl$16, %esp
pushl   $sig_ill_handler
pushl   $4
callsignal
/APP
.byte 0xf2,0x0f,0x10,0xc0
/NO_APP
popl%eax
popl%edx
pushl   $0
pushl   $4
callsignal
addl$16, %esp
leave
ret
.size   check_vect, .-check_vect
.section.rodata
.align 32
.type   C.0.1905, @object
.size   C.0.1905, 32
C.0.1905:
.long   0
.long   3
.long   6
.long   9
.long   12
.long   15
.long   18
.long   21
.text
.align 16
.globl  main1
.type   main1, @function
main1:
pushl   %ebp
movl$8, %ecx
movl%esp, %ebp
pushl   %edi
cld
pushl   %esi
leal-40(%ebp), %edi
subl$64, %esp
movl$C.0.1905, %esi
rep
movsl
xorl%edx, %edx
leal-40(%ebp), %esi
leal-72(%ebp), %ecx
.align 16
.L6:
leal0(,%edx,4), %eax
addl$4, %edx
cmpl$8, %edx
*** At this point, the registers have the following values:
*** %eax = 0,  %ecx = 0x8047d84,  %edx = 4,  %ebx = 0x8047dec
*** %esi = 0x8047da4, %edi = 0x8047dc4, %ebp = 0x8047dcc
*** This is guaranteed to cause a SIGSEGV, and it does, becuase
*** %esi is aligned on a 16-byte boundary. But ... see below ...
movdqa  (%esi,%eax), %xmm0
movdqa  %xmm0, (%ecx,%eax)
jne .L6
movb$1, %dl
.align 16
.L8:
movl-4(%ecx,%edx,4), %eax
cmpl-4(%esi,%edx,4), %eax
jne .L18
incl%edx
cmpl$9, %edx
jne .L8
addl$64, %esp
xorl%eax, %eax
popl%esi
popl%edi
popl%ebp
ret
.L18:
callabort
.size   main1, .-main1
.align 16
.globl  main
.type   main, @function
main:
pushl   %ebp
movl%esp, %ebp
pushl   %ecx
pushl   %ecx
andl$-16, %esp
subl$16, %esp
callcheck_vect
leave
*** Looks like it was trying to align the stack on a 16-byte
*** boundary here. But on entry into main1(), its doing 3
*** push's at the beginning. Thus teh offsets into teh stack
*** (like the leal -40(%ebp), %edi close to the top of main1)
*** appear to be being incorrectly calculated.
jmp main1
.size   main, .-main
.ident  "GCC: (GNU) 4.0.3 20051013 (prerelease)"


Thats the first problem. I then compiled with -O6, and got this:
.file   "pr18400.c"
.version"01.01"
.text
.align 16
.globl  sig_ill_handler
.type   sig_ill_handler, @function
sig_ill_handler:
pushl   %ebp
movl%esp, %ebp
subl$20, %esp
pushl   $0
callexit
.size   sig_ill_handler, .-sig_ill_handler
.align 16
.globl  check_vect
.type   check_vect, @function
check_vect:
pushl   %ebp
movl%esp, %ebp
subl$16, %esp
pushl   $sig_ill_handler
pushl   $4
callsignal
/APP
.byte 0xf2,0x0f,0x10,0xc0
/NO_APP
popl%eax
popl%edx
pushl   $0
pushl   $4
callsignal
addl$16, %esp
leave
ret
.size   check_vect, .-check_vect
.section.rodata
.align 32
.type   C.0.1905, @object
.size   C.0.1905, 32
C.0.1905:
.long   0
.long   3
.long   6
.long   9
.long   12
.long   15
.long   18
.long   21
.text
.align 16
.globl  main1
.type   main1, @function
main1:
pushl   %ebp
movl$8, %ecx
movl%esp, %ebp
pushl   %edi
cld
pushl   %esi
leal-40(%ebp), %edi
subl$64, %esp
movl$C.0.1905, %esi
rep
movsl
xorl%edx, %edx
leal-40(%ebp), %esi
leal-72(%ebp), %ecx
.align 16
.L6:
leal0(,%edx,4), %eax
addl$4

Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Andrew Pinski


On Oct 14, 2005, at 3:11 PM, Kean Johnston wrote:


All,

I am getting a lot of test suite failures with almost all of
the vect/* tests. I am using pr18400.c from the test suite
as an example here, becuase its about the smallest one I
can find. Here is what is generated at -O2:



Can you just fix your OS instead?

-- Pinski



Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Daniel Jacobowitz
On Fri, Oct 14, 2005 at 03:13:55PM -0400, Andrew Pinski wrote:
> 
> On Oct 14, 2005, at 3:11 PM, Kean Johnston wrote:
> 
> >All,
> >
> >I am getting a lot of test suite failures with almost all of
> >the vect/* tests. I am using pr18400.c from the test suite
> >as an example here, becuase its about the smallest one I
> >can find. Here is what is generated at -O2:
> 
> 
> Can you just fix your OS instead?

Could you possibly give an even less helpful response?  Please, make an
effort to be polite and helpful if you're going to answer questions.

If you're saying that Kean's assumptions about the incoming stack
alignment are wrong, why?  What should it be?

Note, I don't know or care about the answer.  I'm just annoyed at your
tone.

-- 
Daniel Jacobowitz
CodeSourcery, LLC


Re: many libgcc in MacOS X 10.4 build

2005-10-14 Thread Mike Stump

On Oct 13, 2005, at 8:21 PM, Jack Howarth wrote:
   What exactly are all of the new libgcc versions created when  
building

the current gcc cvs on MacOS X 10.4.


They allow targeting different OS versions with one compiler, from  
the doc:


@item [EMAIL PROTECTED]
The earliest version of MacOS X that this executable will run on
is @var{version}.  Typical values of @var{version} include @code{10.1},
@code{10.2}, and @code{10.3.9}.

The default for this option is to make choices that seem to be most
useful.

The ppc64 one is for 64-bit code...



Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Andrew Pinski


On Oct 14, 2005, at 3:11 PM, Kean Johnston wrote:


All,

I am getting a lot of test suite failures with almost all of
the vect/* tests. I am using pr18400.c from the test suite
as an example here, becuase its about the smallest one I
can find. Here is what is generated at -O2:


Can you try -fno-optimize-sibling-calls and see if that works?
If so, then the problem is that we sibling calls should not be
done in main.  To fix the testcase anyways to be correct is to
put "return 0;" after the call to main1.  Since right now the
return of main could be anything.

Thanks,
Andrew Pinski



Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Kean Johnston

Can you just fix your OS instead?

My OS is just fine, thank you very much. You disappoint me.
I expected better from you.

It is most likely of absolutely no consequence to you, and
this has nothing to do with GCC so this is the very last I
will say on this subject to you, but you really did hurt me
with your comment. Hurt *me*, specifically, the person trying
to help out with GCC. If you are upset at what other people
in my company are doing, I would humbly request you take
it up with them and leave me out of it. Thank you.

Kean


Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Kean Johnston

Can you try -fno-optimize-sibling-calls and see if that works?

Yes, it did, thank you.


If so, then the problem is that we sibling calls should not be
done in main.  To fix the testcase anyways to be correct is to
put "return 0;" after the call to main1.  Since right now the
return of main could be anything.


I don't think the test case is inocrrect. In a nutshell,
it has:

int main1() {
  ... stuff ...

  return 0;
}

int main(void)
{
  return main1();
}

That should be perfectly valid, and return 0 from main1
which subsequently returns 0 from main.

What does the fact that -fno-optimize-sibling-calls worked
indicate really? Without that option something really does
seem to be mis-calculating the stack offsets by 4. What may
be of interest here is that aside from the vect/* tests,
the only other test that is failing is sibcall-6.

Kean


Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Andrew Pinski


On Oct 14, 2005, at 3:43 PM, Kean Johnston wrote:


What does the fact that -fno-optimize-sibling-calls worked
indicate really? Without that option something really does
seem to be mis-calculating the stack offsets by 4. What may
be of interest here is that aside from the vect/* tests,
the only other test that is failing is sibcall-6.


It indicated that sibling calling optimization in main should
be disabled for targets that need to up the stack alignment,
otherwise you get the stack alignment of a lower one than
that is required.  You have to look to see what changed
between 3.4.0 and 4.0.0 that caused this since it is a
regression.  I think the issue is that we are detecting them
at the tree level but not rejecting them when expanding.  So you
have to look at the expand functions for that.

Also main1 should be marked as noinline to make sure that at
-O3 and above, they work correctly.

The reason why nobody notices this before is because most x86 OS's
now a days align their stack going into main as 16byte aligned
which was what my comment about fixing your OS was about, it was
more of a joke rather than anything else.

sibcall-6 is a different issue, really and unrelated to the
current problem you are looking into.


-- Pinski



Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Andrew Pinski


On Oct 14, 2005, at 3:55 PM, Andrew Pinski wrote:



On Oct 14, 2005, at 3:43 PM, Kean Johnston wrote:


What does the fact that -fno-optimize-sibling-calls worked
indicate really? Without that option something really does
seem to be mis-calculating the stack offsets by 4. What may
be of interest here is that aside from the vect/* tests,
the only other test that is failing is sibcall-6.


It indicated that sibling calling optimization in main should
be disabled for targets that need to up the stack alignment,
otherwise you get the stack alignment of a lower one than
that is required.  You have to look to see what changed
between 3.4.0 and 4.0.0 that caused this since it is a
regression.  I think the issue is that we are detecting them
at the tree level but not rejecting them when expanding.  So you
have to look at the expand functions for that.


Note I filed PR 24374 for this problem since it is a regression
from 3.4.0.

Thanks,
Andrew Pinski



Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Kean Johnston

It indicated that sibling calling optimization in main should
be disabled for targets that need to up the stack alignment,
otherwise you get the stack alignment of a lower one than

While that may be true, I think the problem is broader.

I took out the main1() function and put it into a separate
file, and compiled just that. So now there is no carnal
knowledge of main or its stack alignment. The generated
code for this stand-alone main1() makes no attempt to
align the stack or the stack variables it is going to be
passing to the movdqa instruction. Unless thats what you
mean by:

that is required.  You have to look to see what changed
between 3.4.0 and 4.0.0 that caused this since it is a
regression.  I think the issue is that we are detecting them
at the tree level but not rejecting them when expanding.  So you
have to look at the expand functions for that.


You're using internals verbiage thats beyond me :) I'm a
simple porter, I have very little understanding of the actual
internals of GCC.


The reason why nobody notices this before is because most x86 OS's
now a days align their stack going into main as 16byte aligned
which was what my comment about fixing your OS was about, it was
more of a joke rather than anything else.

Ok I appologise Andrew. I took it as a SCO-bash. My bad.

However, I dont think the stack being aligned on a 16-byte
boundary into main will help, unless GCC is assuming (and I
dont see how it possibly could) that every function would
likewise be aligned. The fact that a stand-alone version of
main1() was not correctly aligned leads me to believe that
the real error is that gcc is not making an attempt to
align the stack variables for use by the alignment-sensitive
vector insns.

Also, when you say "stack going into main is 16 byte aligned",
what specifically do you mean? that its 16-byte aligned before
the call to main() itself? That at the first insn in main, most
likely a push %ebp, its 16-byte aligned (i.e does the call
to main from crt1.o have to take the push of the return address
into account)?

Kean

PS, here is the generated assembly for main() as a stand-alone
function, nothing else defined in the .c file:

.file   "foo.c"
.version"01.01"
.section.rodata
.align 32
.type   C.0.1458, @object
.size   C.0.1458, 32
C.0.1458:
.long   0
.long   3
.long   6
.long   9
.long   12
.long   15
.long   18
.long   21
.text
.align 16
.globl  main1
.type   main1, @function
main1:
pushl   %ebp
movl$8, %ecx
movl%esp, %ebp
pushl   %edi
cld
pushl   %esi
leal-40(%ebp), %edi
subl$64, %esp
movl$C.0.1458, %esi
rep
movsl
xorl%edx, %edx
leal-40(%ebp), %esi
leal-72(%ebp), %ecx
.align 16
.L2:
leal0(,%edx,4), %eax
addl$4, %edx
cmpl$8, %edx
movdqa  (%esi,%eax), %xmm0
movdqa  %xmm0, (%ecx,%eax)
jne .L2
movb$1, %dl
.align 16
.L4:
movl-4(%ecx,%edx,4), %eax
cmpl-4(%esi,%edx,4), %eax
jne .L14
incl%edx
cmpl$9, %edx
jne .L4
addl$64, %esp
xorl%eax, %eax
popl%esi
popl%edi
popl%ebp
ret
.L14:
callabort
.size   main1, .-main1
.ident  "GCC: (GNU) 4.0.3 20051013 (prerelease)"

# cat foo.c
#define N 8

int main1 ()
{
  int b[N] = {0,3,6,9,12,15,18,21};
  int a[N];
  int i;

  for (i = 0; i < N; i++)
{
  a[i] = b[i];
}

  /* check results:  */
  for (i = 0; i < N; i++)
{
  if (a[i] != b[i])
abort ();
}

  return 0;
}


Re: bitmaps in gcc

2005-10-14 Thread Richard Henderson
On Fri, Oct 14, 2005 at 11:27:15AM -0700, Brian Makin wrote:
> Can anyone give me some background on the use of
> bitmaps in gcc?

There's pleanty of material in the list archives.  This is not
the first time that this topic has come up.


r~


Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Daniel Jacobowitz
On Fri, Oct 14, 2005 at 01:43:03PM -0700, Kean Johnston wrote:
> >It indicated that sibling calling optimization in main should
> >be disabled for targets that need to up the stack alignment,
> >otherwise you get the stack alignment of a lower one than
> While that may be true, I think the problem is broader.
> 
> I took out the main1() function and put it into a separate
> file, and compiled just that. So now there is no carnal
> knowledge of main or its stack alignment. The generated
> code for this stand-alone main1() makes no attempt to
> align the stack or the stack variables it is going to be
> passing to the movdqa instruction.

Main is supposed to emit special code to handle aligning the stack, if
the OS doesn't do it already.  Other functions don't get this
treatment.

It may be that you need to define something you haven't already for
your port if you aren't getting the compensation code in main.

-- 
Daniel Jacobowitz
CodeSourcery, LLC


Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Richard Henderson
On Fri, Oct 14, 2005 at 01:43:03PM -0700, Kean Johnston wrote:
> However, I dont think the stack being aligned on a 16-byte
> boundary into main will help, unless GCC is assuming (and I
> dont see how it possibly could) that every function would
> likewise be aligned.

Yes, in fact that's *exactly* what GCC is assuming.
And it will be true for all code that GCC generates.

> Also, when you say "stack going into main is 16 byte aligned",
> what specifically do you mean?

That the first argument to main is 16-byte aligned.  This implies
that the return address from main is at addr % 16 == 12.

If you're able to fix this in your OS, don't forget to do the
same with the pthread entry function and signal handlers.


r~


Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Jason Molenda
On Fri, Oct 14, 2005 at 01:43:03PM -0700, Kean Johnston wrote:

> Also, when you say "stack going into main is 16 byte aligned",
> what specifically do you mean? that its 16-byte aligned before
> the call to main() itself? That at the first insn in main, most
> likely a push %ebp, its 16-byte aligned (i.e does the call
> to main from crt1.o have to take the push of the return address
> into account)?


The stack alignment is computed before the saved-EIP is pushed on
the stack by a CALL instruction.  So on function entry, the ESP has
already been decremented by 4 off of its 16-byte alignment.
Conventioanlly the EBP is pushed, making the ESP 8 bytes off its
16-byte alignment.

If your ABI does not require 16-byte stack frame alignment, aligning
it correctly in main() will not fix all the problems unless you can
recompile all of the code (and fix all the hand-written assembly)
on the entire system.  If you're 16-byte aligned and you call into
a library that only requires 4-byte alignment (the traditional SysV
x86 ABI--Pinski says it's been updated to require 16-byte alignment
but I don't know when that happened) and that library function calls
into a newly-gcc-recompiled function, you can crash over there
because of a misaligned operation.

J


Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Kean Johnston

Yes, in fact that's *exactly* what GCC is assuming.
And it will be true for all code that GCC generates.

How can that possibly ever work? Is the assumption then
that the only code GCC will ever work with is code that
GCC compiled? In effect what this implies is that GCC
is re-defining the ABI. It also means it is impossible
for GCC to inter-operate with vendor supplied libraries
like libc. If I use a libc function that has a callback,
like ftw() or bsearch() or qsort(), then I cannot have
them call a function that was compiled with gcc, becasue
no ABI previously defined has made it a requirement for
every stack frame to be 16-byte aligned.

Kean


Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Andrew Pinski


On Oct 14, 2005, at 4:43 PM, Kean Johnston wrote:


It indicated that sibling calling optimization in main should
be disabled for targets that need to up the stack alignment,
otherwise you get the stack alignment of a lower one than

While that may be true, I think the problem is broader.


This is the way it is designed.  GCC assumes at every entry
point except for main, that the stack has been aligned to 16bytes.

There was a bug in bugzila about this before.  And powers (not me)
decided that this was by design and that it is the OS which needs
to make sure that the stack is 16byte aligned even though the
original SYSV ABI does not require this.  Since SYSV ABI
does not take into account SSE and it does not even mention them
at all.  I had wished that someone would have updated the SYSV ABI
(like they did for altivec) to talk about SSE/SSE2 and how arguments
are passed.  Instead GCC just follows what Intel's compiler does
(which is known to have changed).

Almost all other x86 OS's have changed to take the bigger alignment
into account which is why some powers have decided this is by design
that we only change the alignment when invoking main.

I assume you will have the same issues with threads and their stack
alignment.

Maybe it is time to update the SYSV abi for SSE and have SCO
(and other companies) update their OS for the new ABI.

Note: I am not saying that GCC should not be fixed for the case
of main which needs to be fixed at least until everyone agrees
on an ABI which takes SSE into account.

Lucky, x86_64 does not have this trouble.


Here is what the current problem with main, we realign the stack
to 16byte alignment, we call checking function. Then we leave
main (destroying the stack alignment) and then we jump to main1
(this is what is meant by sibling call optimization).  The problem
is that we should have jumped directly to main1 but made it a call
which keep the stack aligned correctly.

-- Pinski 



Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Richard Henderson
On Fri, Oct 14, 2005 at 02:14:10PM -0700, Kean Johnston wrote:
> How can that possibly ever work? Is the assumption then
> that the only code GCC will ever work with is code that
> GCC compiled? In effect what this implies is that GCC
> is re-defining the ABI.

Yes.  You can thank Intel for this.

With the introduction of SSE1, something had to change in order
to satisfy hardware constraints.  Intel initially proposed some
scheme that performed dynamic stack alignment in functions that 
use SSE1 instructions, and multiple entry points to avoid
redundant realignments.

This turned out to be horribly complex, and in many cases 
resulted in yet another register being unavailable to user code,
leaving at times only 4 (!) with pic+alloca+sse.

This kind of solution was unacceptably costly, so we simply 
changed the default code generation scheme to maintain proper
stack alignment at all times.

GCC code will interoperate with other compilers if you don't use
the 128-bit vector modes, but if you do, then we *require* that
the stack be maintained aligned.

This has been the status quo since 1998 or 1999, and is unlikely
to change.

> It also means it is impossible
> for GCC to inter-operate with vendor supplied libraries
> like libc.

Yep.  Too bad, so sad.

If you've users that actually need to use SSE in callbacks from
bsearch/qsort or the like, then they'll have to write

  int __attribute__((noinline))
  real_callback(const void *a, const void *b)
  {
...
  }

  int callback(const void *a, const void *b)
  {
void * volatile x = __builtin_alloca(1);
return real_callback (a, b);
  }



r~


RE: bitmaps in gcc

2005-10-14 Thread Meissner, Michael
One of the classic places that sparse bitmaps were used in GCC in the
past is in register allocation phase, where you essentially have a 2D
sparse matrix with # of basic blocks on one axis and pseudo register
number on the other axis.  When you are compiling very large functions,
the number of basic blocks and the number of pseudo registers are very
large, and if the table wasn't compressed (most registers aren't live
past a single basic block) it was very significant.  I haven't looked at
this area in the last two years or so when I wasn't working on GCC, so
it might have changed.  Unfortunately I don't recall whether we were
using compressed bitmaps before I wrote the original versions of the
compressed bit vectors, but the idea was to encapsulate everything
within macros so it could be changed in the future.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
Brian Makin
Sent: Friday, October 14, 2005 1:27 PM
To: gcc@gcc.gnu.org
Subject: bitmaps in gcc


In reference to this on the wiki.

Bitmaps, also called sparse bit sets, are implemented using a linked
list with a cache. This is probably not the most time-efficient
representation, and it is not unusual for bitmap functions to show up
high on the execution profile. Bitmaps are used for many things, such as
for live register sets at the entry and exit of basic blocks in RTL, or
for a great number of data flow problems. See bitmap.c (and sbitmap.c
for GCC's simple bitmap implementation).

Can someone point me to a testcase where bitmap functions show up high
on the profile?


Can anyone give me some background on the use of bitmaps in gcc?

Are they assumed to be sparse?
How critical is the memory consumption of bitsets?
What operations are the most speed critical?
Would it be desirable to merge bitmap and sbitmap into one
datastructure?
Anyone have good ideas for improvements?

Anything else anyone would want to add?

I think I may take a look at this.  Once I figure out the requirments
maybe we can speed it up a bit.


Brian N. Makin







__
Start your day with Yahoo! - Make it your home page! 
http://www.yahoo.com/r/hs




Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Kean Johnston

Yes.  You can thank Intel for this.

Thank you Intel :)



With the introduction of SSE1, something had to change in order
to satisfy hardware constraints.  Intel initially proposed some
scheme that performed dynamic stack alignment in functions that 
use SSE1 instructions, and multiple entry points to avoid

redundant realignments.

Ok I am no compiler expert,  so this may be totally impossible,
and if so I'd appreciate an education, but this is what I
instinctively thought of when first thinking about this problem.

There are a very limited number of instructions that require
16-byte alignment. The two main places you have to worry about
that alignment are when passing arguments to a function, and
local stack variables. I guess the compiler is safe to assume
that if you are using normal memory, say from a malloc() to
hold the alignment-sensitive data that you have done your
own alignment. So lets take the first case. You have some code
that is going to be passing some vector parameter, something
that is alignment-conscious. I am assuming the compiler knows
that. Before pushing the data onto the stack, the compiler
could arrange things such that those parameters would be
neatly aligned on a 16-byte boundary. The only assumption
that would need to hold true is that the called function was
also compiled with gcc. Using imaginary data types here, where
int128_t is alignment-sensitive, suppose we had:

  int func (int128_t x, int y, int128_t z) {
  }

  int otherfunc (void) {
int128_t foo = 123;
int128_t bar = 234;
return func(foo, 0, bar);
  }

When generating teh call to func(), could gcc not align the
stack to 16-bytes, such that the first argument is propperly
aligned. You then push the next argument, a simple 4-byte
, and then re-align to the next 16-byte boundary for the
third argument.  the code in func() could take this scheme
into account, and know the exact offsets into the stack that
it needs to get to the args.

The second case is where you have function-local variables
that are alignment-constrained. In this case, wouldn't
simple analysis of the function contents determine whether
or not any alignment-specific insns are being used, and
if so, to automatically align the stack to 16 bytes if
there is. That way, only those functions that actually use
such insns pay the (small) penalty of rounding up the stack.
If cxourse, all of that could be consitional on targets
that don't always align functions on a 16-byte boundary.

This seems far less invasive that redefining an ABI.


GCC code will interoperate with other compilers if you don't use
the 128-bit vector modes, but if you do, then we *require* that
the stack be maintained aligned.


I think, and I may be wrong here, but I think if I simply
make sure that entry to main is correctly aligned, then
the majority of code will just work. Assuming I am compiling
some program with gcc, if main is correctly aligned, and all
gcc code goes to lengths to ensure that alignment, then the
only time it can get *out* of alignment is if gcc code has
made a call to non-aligned libc code, which in turn makes
calls back into gcc code (a la qsort, ftw, etc). Those cases
are relatively rare. The only other time it's likely to be
an issue is with signal delivery, and I am pretty certain I
can persuade the kernel folks to ensure that the stack frame
is always aligned to 16 bytes when that happens.

Kean


Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Richard Henderson
On Fri, Oct 14, 2005 at 04:35:48PM -0700, Kean Johnston wrote:
> This seems far less invasive that redefining an ABI.

No, it isn't less invasive.

Your first case is not too difficult.  No more difficult, really,
than supporting alloca.  Indeed, this is more or less exactly the
code we emit in main.  (Which got bypassed with the tailcall.)

Your second case results in a variable displacement between the
local stack frame and the incoming function arguments.  This means
we need an extra register to hold this displacement (or, equivalently,
a pointer to the base of the arguments).  Combine this with -fpic
and alloca and 4 of the 8 general registers are consumed.  At this
point, lots of code stops compiling.

You're coming into this discussion about 6 years late.

> I think, and I may be wrong here, but I think if I simply
> make sure that entry to main is correctly aligned, then
> the majority of code will just work.

Yes.  Which is exactly why the dynamic alignment solutions were
rejected.  There's simply not enough gain.

> The only other time it's likely to be
> an issue is with signal delivery...

And the code in the thread libraries that calls the thread start
routine.


r~


Re: Severe problems with vectorizing stuff in 4.0.3 HEAD

2005-10-14 Thread Robert Dewar

Kean Johnston wrote:


Ok I am no compiler expert,  so this may be totally impossible,
and if so I'd appreciate an education, but this is what I
instinctively thought of when first thinking about this problem.


Note that never mind SSE1, even conventional 8-byte fpt, while
not requiring fpt for correctness, but sure does for efficiency.