date:20050221

Re: [RFA:] change back name of initial rtl dump suffix to ".rtl".

2005-02-21 Thread Paolo Bonzini

> Ah, that's triggered by -fdump-rtl-expand-detailed (it is revision 2.28, 
> for which I could not find an entry on gcc-patches).

Do you know of a reason why that isn't on by default?
Because -fdump-rtl-expand-detailed includes *two* copies of the RTL: one 
lacks the prologue and epilogue but is interleaved with trees, the other 
is the standard -fdump-rtl-expand dump.

> ISTR the name change was to avoid a switch named -fdump-rtl-rtl.
To invent an option name alias and use a minor repetition in it
as a reason for changing the old behavior is Bad.
It is not merely an option name alias.  It came together with a redesign 
of the way RTL dumps work, to integrate their management with tree dumps 
and to allow (in the future) to have various levels of detail in the RTL 
dumps as well.

I never had a problem with the rename because I use fname*.00.* (or the 
analogous completion key sequence) to invoke an editor on the RTL dump 
(or in general any other dump).

Paolo

Re: [RFA:] change back name of initial rtl dump suffix to ".rtl".

2005-02-21 Thread Hans-Peter Nilsson

> Date: Mon, 21 Feb 2005 11:19:59 +0100
> From: Paolo Bonzini <[EMAIL PROTECTED]>

> > Do you know of a reason why that isn't on by default?
> 
> Because -fdump-rtl-expand-detailed includes *two* copies of the RTL: one 
> lacks the prologue and epilogue but is interleaved with trees, the other 
> is the standard -fdump-rtl-expand dump.

That's not really a compelling reason to make it optional; we're
talking dumps that are supposed to be maximally informational!
(But maybe a reason to make it just one dump, the interleaved one.)

brgds, H-P

Re: How does g++ implement error handling?

2005-02-21 Thread Jonathan Wakely

On Mon, Feb 21, 2005 at 12:32:52PM +0800, Euler Herbert wrote:

> Hi everyone, 

Hi,

> I am trying to find out how g++ implements error handling. 
> My environment is CPU AMD Athlon(tm) XP 2000+, Debian GNU/Linux 
> 3.1, GCC 3.4.3-6, and Glibc 2.3.2.ds1-20. After researching 
> assembling codes generated by g++, I have known the beginning 
> and ending part. That is, we will call __cxa_allocate_exception 
> when writing 'throw expression', then the constructor of the 
> exception class, and then __cxa_throw. __cxa_throw calls 
> _Unwind_RaiseException. But this function is so complex for 
> me that I cannot get the outline of the real process. However, 
> I found uw_install_context (which is a macro) will be called 
> if _Unwind_RaiseException succeeds, and the body of 
> uw_install_context contains calling of longjmp. So setjmp will 
> be somewhere, but I did no find it. Could somebody tell me the 
> outline of the unwinding process? 

I think this behaviour is defined by this spec:
http://www.codesourcery.com/cxx-abi/abi-eh.html

Which is part of the C++ ABI spec:
http://www.codesourcery.com/cxx-abi/

Hope that helps,

jon


-- 
"A woman drove me to drink, I never had the courtesy to thank her."
- W.C. Fields

Re: [RFA:] change back name of initial rtl dump suffix to ".rtl".

2005-02-21 Thread Steven Bosscher

On Feb 21, 2005 11:40 AM, Hans-Peter Nilsson <[EMAIL PROTECTED]> wrote:

> > Date: Mon, 21 Feb 2005 11:19:59 +0100
> > From: Paolo Bonzini <[EMAIL PROTECTED]>
> 
> > > Do you know of a reason why that isn't on by default?
> > 
> > Because -fdump-rtl-expand-detailed includes *two* copies of the RTL: one 
> > lacks the prologue and epilogue but is interleaved with trees, the other 
> > is the standard -fdump-rtl-expand dump.
> 
> That's not really a compelling reason to make it optional; we're
> talking dumps that are supposed to be maximally informational!

No, they're supposed to provide just the information that is
useful to whoever is looking at the dumps.  Sometimes the more
elaborate dumps are useful, most of the time they are not.

Gr.
Steven

Question regarding c++ constructors

2005-02-21 Thread Mile Davidovic

Hello all

I have stupid question regarding constructors and destructors in c++. 
After compilation there are two assembler functions (__ZN4testC2Ev,
__ZN4testC1Ev)
that represent constructor, same is for destructors (__ZN4testD2Ev,
__ZN4testD1Ev). 
Functions are completely the same. 
What is the reason for such compilere behaviour? 

Here is small sample:

class test 
{
public:
test();
~test();
};

test::test()
{

}
test::~test()
{
}

Here is output after objdump:

SYMBOL TABLE:
[  2](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 1) 0x __ZN4testC2Ev
[  4](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x0006 __ZN4testC1Ev
[  5](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x000c __ZN4testD2Ev
[  6](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x0012 __ZN4testD1Ev


Disassembly of section .text:

 <__ZN4testC2Ev>:
   0:   55  push   %ebp
   1:   89 e5   mov%esp,%ebp
   3:   5d  pop%ebp
   4:   c3  ret
   5:   90  nop

0006 <__ZN4testC1Ev>:
   6:   55  push   %ebp
   7:   89 e5   mov%esp,%ebp
   9:   5d  pop%ebp
   a:   c3  ret
   b:   90  nop

000c <__ZN4testD2Ev>:
   c:   55  push   %ebp
   d:   89 e5   mov%esp,%ebp
   f:   5d  pop%ebp
  10:   c3  ret
  11:   90  nop

0012 <__ZN4testD1Ev>:
  12:   55  push   %ebp
  13:   89 e5   mov%esp,%ebp
  15:   5d  pop%ebp
  16:   c3  ret 



 


Thanks in advanced
Mile

Re: Question regarding c++ constructors

2005-02-21 Thread Jonathan Wakely

On Mon, Feb 21, 2005 at 12:45:51PM +0100, Mile Davidovic wrote:

> Hello all
> 
> I have stupid question regarding constructors and destructors in c++. 
> After compilation there are two assembler functions (__ZN4testC2Ev,
> __ZN4testC1Ev)
> that represent constructor, same is for destructors (__ZN4testD2Ev,
> __ZN4testD1Ev). 
> Functions are completely the same. 
> What is the reason for such compilere behaviour? 

I think this is the same question as answered here:
http://gcc.gnu.org/ml/gcc-help/2004-11/msg00160.html

I thought there was something about this on one of the
http://gcc.gnu.org/gcc-3.X/changes.html pages (for some X)
But I can't find it.

Hope that helps,

jon


-- 
"Fanaticism consists in redoubling your efforts when you have forgotten 
 your aims."
- George Santayana

Re: [RFA:] change back name of initial rtl dump suffix to ".rtl".

2005-02-21 Thread Gabriel Dos Reis

Paolo Bonzini <[EMAIL PROTECTED]> writes:

| > > ISTR the name change was to avoid a switch named -fdump-rtl-rtl.
| > To invent an option name alias and use a minor repetition in it
| > as a reason for changing the old behavior is Bad.
| 
| It is not merely an option name alias.  It came together with a
| redesign of the way RTL dumps work, to integrate their management with
| tree dumps and to allow (in the future) to have various levels of
| detail in the RTL dumps as well.

maybe an option with argument?

-fdump-rtl=detailed
-fdump-rtl=classic  # same as -fdump-rtl


You won't need to rename option.

-- Gaby

Re: MMX built-ins performance oddities

2005-02-21 Thread Prakash Punnoor

Paolo Bonzini schrieb:
- vector version is about 3% faster than above instead of 10% slower -
wow!
So why is gcc 4.0 producing worse code when using intel style
intrinsics and why isn't the union version using builtins as fast as
using the vector version?

I can answer why unions are slower: that's because they are spilled to
memory on every assignment -- GCC 4.0 knows how to replace structs with
different scalar variables (one per item), but not unions.  GCC 3.4 knew
about none of these possibilities.
Ok, thanks for the explanation, though I am not sure, whether I understood
everythuing.

About why vectors are faster, well, a lot of the vector support has been
rewritten in GCC 4.0 so that may be the case.
I do not know exactly why builtins are still slower, but you may want to
create a PR and add me on the CC list ([EMAIL PROTECTED]).

I have some good news: My tests yesterday was flawed. Somehow the compiler
wasn't completely switched. It seems gcc 3.4.3 was still used, but I don't
understand why it behaved differently after gcc 4 was installed. Perhaps
something in the script for switching compilers is wrong
Nevertheless, today I did it correctly and gcc 4 just blew away. I didn't test
the unions version, only built-ins vs Intel style and this time they were
pretty identical and what is more, hell faster than gcc 3.4.3 generated code.
The devs did really good work.
Some numbers
Fastest runs with gcc3.4.3: about 2.9 sec (slowest when using Intel style: 
3,1sec
gcc 4.0: <2.4 sec
Just astonishing!
So I guess considering gcc 4.0 this report can be forgotten, but gcc 3.4.3 has
serious issues. It even miscompiles SSE code. Are plans to fix this or won't
time be spent on it anymore? I'd rather see gcc 4.0 getting into a wonderful
stable state. :-)
Sorry for my mistake.
--
Prakash Punnoor
formerly known as Prakash K. Cheemplavam


signature.asc
Description: OpenPGP digital signature

Re: moving v16sf reg with multiple sub-regs

2005-02-21 Thread Dylan Cuthbert

Further investigation.
(B
(BIf I remove the define_expand for movv16sf and throw in a dummy define_insn 
(Bthat supports reg<->reg mem<->reg reg<->mem, then the redundant move is 
(Boptimized away.
(B
(BBut of course, the store load and move all use 4 instructions each so this 
(Bproduces inefficient code.
(B
(BAny idea how I can get the same removal of redundant temporaries and still 
(Bget the multiple instructions for each operation interspersed nicely?
(B
(BDylan
(B
(B
(B"Dylan Cuthbert" <[EMAIL PROTECTED]> wrote in message 
(Bnews:[EMAIL PROTECTED]
(B> Hi there,
(B>
(B> I have implemented a move of a v16sf type like this because it is held by 
(B> 4 v4sf registers:
(B>
(B> --- snip ---
(B>
(B> (define_expand "movv16sf"
(B>  [(set (match_operand:V16SF 0 "nonimmediate_operand" "")
(B> (match_operand:V16SF 1 "general_operand" ""))]
(B>  ""
(B>  "  if ((reload_in_progress | reload_completed) == 0
(B>  && !register_operand (operands[0], V16SFmode)
(B>  && !nonmemory_operand (operands[1], V16SFmode))
(B>operands[1] = force_reg (V16SFmode, operands[1]);
(B>
(B> move_v16sf( operands );
(B> DONE;
(B> ")
(B>
(B> --- end snip ---
(B>
(B>
(B> and in the config's .c file:
(B>
(B>
(B> --- snip ---
(B>
(B> void
(B> move_v16sf (operands )
(B> rtx operands[];
(B> {
(B>  rtx op0 = operands[0];
(B>  rtx op1 = operands[1];
(B>  enum rtx_code code0 = GET_CODE (operands[0]);
(B>  enum rtx_code code1 = GET_CODE (operands[1]);
(B>  int subreg_offset0 = 0;
(B>  int subreg_offset1 = 0;
(B>  enum delay_type delay = DELAY_NONE;
(B>
(B>  if (code0 == REG)
(B>{
(B>  int regno0 = REGNO (op0) + subreg_offset0;
(B>
(B>  if (code1 == REG)
(B> {
(B>   int regno1 = REGNO (op1) + subreg_offset1;
(B>
(B>   /* Just in case, don't do anything for assigning a register
(B>  to itself, unless we are filling a delay slot.  */
(B>   if (regno0 == regno1 && set_nomacro == 0) return;
(B>
(B>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0  ), gen_rtx_SUBREG( 
(B> V4SFmode, op1, 0   ) );
(B>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_SUBREG( 
(B> V4SFmode, op1, 16  ) );
(B>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_SUBREG( 
(B> V4SFmode, op1, 32  ) );
(B>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_SUBREG( 
(B> V4SFmode, op1, 48  ) );
(B> }
(B>  else if (code1 == MEM)
(B> {
(B>   rtx src_reg;
(B>
(B>   src_reg = copy_addr_to_reg ( XEXP (op1,0) );
(B>
(B>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0  ), gen_rtx_MEM( 
(B> V4SFmode, src_reg ) );
(B>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_MEM( 
(B> V4SFmode, plus_constant( src_reg, 16 ) ) );
(B>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_MEM( 
(B> V4SFmode, plus_constant( src_reg, 32 ) ) );
(B>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_MEM( 
(B> V4SFmode, plus_constant( src_reg, 48 ) ) );
(B> }
(B>
(B>}
(B>
(B>  else if (code0 == MEM)
(B>{
(B>  if (code1 == REG)
(B> {
(B>   rtx dest_reg;
(B>
(B>   dest_reg = copy_addr_to_reg ( XEXP (op0,0) );
(B>
(B>   emit_move_insn( gen_rtx_MEM( V4SFmode, dest_reg ), gen_rtx_SUBREG 
(B> (V4SFmode, op1, 0  ) );
(B>   emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 16) ), 
(B> gen_rtx_SUBREG (V4SFmode, op1, 16 ) );
(B>   emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 32) ), 
(B> gen_rtx_SUBREG (V4SFmode, op1, 32 ) );
(B>   emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 48) ), 
(B> gen_rtx_SUBREG (V4SFmode, op1, 48 ) );
(B> }
(B>}
(B>
(B> }
(B> --- end snip ---
(B>
(B>
(B> This works ok, but it produces inefficient code, here some sample source 
(B> code:
(B>
(B> --- snip ---
(B>
(B> typedef int v4 __attribute__((mode(V4SF)));
(B> typedef int m4 __attribute__((mode(V16SF)));
(B>
(B> v4 vec1, vec2;
(B> m4 frog;
(B>
(B> int main( int argc, char* argv[] )
(B> {
(B> m4 blob;
(B>
(B> asm( "some_instruction %0,%1,%2,%3" : "=&j" (blob): "j" (vec1), "j" 
(B> (vec2), "j" (frog) );
(B> asm( "some_instruction2 %0,%1" : "=&j" (frog) : "j" (blob) );
(B>
(B> return 0;
(B> }
(B>
(B> --- end snip ---
(B>
(B> where j is the register class for v4sf and v16sf types.
(B> This produces a move of the v16sf type between the two asm instructions, 
(B> when it doesn't need to, does anyone have any ideas why this move isn't 
(B> eliminated?
(B>
(B> #APP
(B>some_instruction r10,r22,r20,r00
(B> #NO_APP
(B>move r00,r10
(B>move r01,r11
(B>move r02,r12
(B>move r03,r13
(B> #APP
(B>some_instruction2 r10, r00
(B>
(B>
(B> r10 isn't needed to be preserved (it isn't written out) but it seems to be 
(B> making a copy anyway.  Worse, if "blob" is defined in global space like 
(B> "frog", then it also writes out r10 to memory when it shouldn't.
(B>
(B>
(B> Any ideas app

Does backend need to worry about overlap?

2005-02-21 Thread Andy Hutchinson

If I have RTL pattern such as:
(SET (MEM...) (MEM...))
(define_insn in backend target.md)
do I need to guard against the possibility that the two operands 
overlap? Or does the front/middle end take care of any C/C++ language 
specific needs here? (perhaps by using a register as an intermediate)

Thank you

RFC -- CSE compile-time stupidity

2005-02-21 Thread Jeffrey A Law


Sigh.  I can't wait for this code to become less critical, both in terms
of runtime and compile time performance.  There are so many things in 
cse.c that are just plain bad

cse.c has a fair amount of complexity in its hash table management
code to avoid spending too much time invalidating/removing items
in the hash table when a value of an expression changes.

Specifically we have REG_TICK, REG_IN_TABLE and other fields to track 
when entries are valid.  It's fairly common to have code which looks
something like this (from mention_regs):

  for (i = regno; i < endregno; i++)
{
  if (REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i))
remove_invalid_refs (i);

  REG_IN_TABLE (i) = REG_TICK (i);
  SUBREG_TICKED (i) = -1;
}


Looks pretty simple and straightforward.  Let's dig a little deeper into
those accessor macros...

/* Get the number of times this register has been updated in this
   basic block.  */

#define REG_TICK(N) (get_cse_reg_info (N)->reg_tick)

/* Get the point at which REG was recorded in the table.  */

#define REG_IN_TABLE(N) (get_cse_reg_info (N)->reg_in_table)

/* Get the SUBREG set at the last increment to REG_TICK (-1 if not a
   SUBREG).  */

#define SUBREG_TICKED(N) (get_cse_reg_info (N)->subreg_ticked)

Hmmm, it's starting to get interesting -- each macro has a function call
to get_cse_reg_info, which looks like:


static inline struct cse_reg_info *
get_cse_reg_info (unsigned int regno)
{
  struct cse_reg_info *p = &cse_reg_info_table[regno];

  /* If this entry has not been initialized, go ahead and initialize
 it.  */
  if (p->timestamp != cse_reg_info_timestamp)
get_cse_reg_info_1 (regno);

  return p;
}



OUCH OUCH OUCH.  Yes, this is inlined, but that's of little/no value
with the function call to get_cse_reg_info_1.  Let me show why by
looking at the tree dumps for the conditional above:


  if (REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i))
remove_invalid_refs (i);

  # BLOCK 5
  # PRED: 4 [50.0%]  (true,exec)
:;
  D.20813_497 = regno_328 * 20;
  ivtmp.200_122 = (struct cse_reg_info *) D.20813_497;
  # SUCC: 6 [100.0%]  (fallthru)
OK.  We're just getting the offset of the entry within the table
that we want.

  # BLOCK 6
  # PRED: 21 [89.0%]  (dfs_back,true,exec) 5 [100.0%]  (fallthru)
  # ivtmp.200_499 = PHI ;
  # i_33 = PHI ;
:;
  cse_reg_info_table.99_335 = cse_reg_info_table;
  p_336 = cse_reg_info_table.99_335 + ivtmp.200_499;
  D.20520_337 = p_336->timestamp;
  cse_reg_info_timestamp.100_338 = cse_reg_info_timestamp;
  if (D.20520_337 != cse_reg_info_timestamp.100_338) goto ; else
goto ;
  # SUCC: 7 [51.2%]  (true,exec) 8 [48.8%]  (false,exec)
Check and see if the entry's timestamp indicates the entry is
valid/initialized.

  # BLOCK 7
  # PRED: 6 [51.2%]  (true,exec)
:;
  get_cse_reg_info_1 (i_33);
  # SUCC: 8 [100.0%]  (fallthru,exec)
Initialize this entry (if necessary).

  # BLOCK 8
  # PRED: 6 [48.8%]  (false,exec) 7 [100.0%]  (fallthru,exec)
:;
  D.20361_341 = p_336->reg_in_table;
  if (D.20361_341 >= 0) goto ; else goto ;
  # SUCC: 9 [79.0%]  (true,exec) 15 [21.0%]  (false,exec)
Load REG_IN_TABLE for this register.  So far so good.


  # BLOCK 9
  # PRED: 8 [79.0%]  (true,exec)
:;
  cse_reg_info_table.99_383 = cse_reg_info_table;
  p_384 = cse_reg_info_table.99_383 + ivtmp.200_499;
  D.20529_385 = p_384->timestamp;
  cse_reg_info_timestamp.100_386 = cse_reg_info_timestamp;
  if (D.20529_385 != cse_reg_info_timestamp.100_386) goto ; else
goto ;
  # SUCC: 10 [51.2%]  (true,exec) 11 [48.8%]  (false,exec)
Totally useless -- we're verifying that this entry is valid again.
There's no way this entry can be invalid at this point.  So, let's
see.  3 loads, 1 comparison and 1 arithmetic operation we don't need.

  # BLOCK 10
  # PRED: 9 [51.2%]  (true,exec)
:;
  get_cse_reg_info_1 (i_33);
  # SUCC: 11 [100.0%]  (fallthru,exec)
This can't ever be reached at runtime because there's no way
the entry would need initialization again.

  # BLOCK 11
  # PRED: 9 [48.8%]  (false,exec) 10 [100.0%]  (fallthru,exec)
:;
  D.20363_389 = p_384->reg_in_table;
  cse_reg_info_table.99_393 = cse_reg_info_table;
  p_394 = cse_reg_info_table.99_393 + ivtmp.200_499;
  D.20538_395 = p_394->timestamp;
  cse_reg_info_timestamp.100_396 = cse_reg_info_timestamp;
  if (D.20538_395 != cse_reg_info_timestamp.100_396) goto ; else
goto ;
  # SUCC: 12 [51.2%]  (true,exec) 13 [48.8%]  (false,exec)
Totally useless.  We don't need to load the REG_IN_TABLE field
again -- we already loaded it in block #8 above.  But the calls
(which can never be reached at runtime) must be considered as
potentially modifying REG_IN_TABLE.  Ugh.  Worse yet we then proceed
to verify this entry is valid AGAIN.  4 loads, 1 comparison and 1
arithmetic operation that we don't need.

  # BLOCK 12
  # PRED: 11 [51.2%]  (true,exec)
:;
  get_cse_reg_info_1 (i_33);
  # SUCC: 13 [100.0%]  (fallthru,exec)
ANother call that can't be reach

Re: Will people install gfortran in 4.0? [was Re: Shipping gmp and mpfr with gcc-4.0?]

2005-02-21 Thread Gerald Pfeifer

On Sat, 19 Feb 2005, Bradley Lucier wrote:
> But I think that in many installations people simply won't dance through 
> these hoops and gfortran will not be installed in 4.0.

To add a concrete example, unlike g77 in earlier versions of GCC, gfortran 
is not and will not be part of the standard gcc40 port in FreeBSD.

Gerald

Re: Will people install gfortran in 4.0?

2005-02-21 Thread Tobias Schlüter

(added [EMAIL PROTECTED] to the CC list, as that is the appropriate list for
this dicussion)

Gerald Pfeifer wrote:
> To add a concrete example, unlike g77 in earlier versions of GCC, gfortran 
> is not and will not be part of the standard gcc40 port in FreeBSD.

Do you have a pointer to where this decision is explained?

Thanks,
- Tobi

Re: Will people install gfortran in 4.0?

2005-02-21 Thread Steve Kargl

On Mon, Feb 21, 2005 at 06:35:43PM +0100, Tobias Schl?ter wrote:
> (added [EMAIL PROTECTED] to the CC list, as that is the appropriate list for
> this dicussion)
> 
> Gerald Pfeifer wrote:
> > To add a concrete example, unlike g77 in earlier versions of GCC, gfortran 
> > is not and will not be part of the standard gcc40 port in FreeBSD.
> 
> Do you have a pointer to where this decision is explained?
> 

Why?  I build gfortran routinely of both i386-*-freebsd and
amd64-*-freebsd.  gfortran has many Fortran 95 features that
g77 will never have.  I can also state that on some of my
Fortran 77 codes, gfortran generated binaries out performs
the g77 binaries.

-- 
Steve

Re: RFC -- CSE compile-time stupidity

2005-02-21 Thread Steven Bosscher

On Feb 21, 2005 06:22 PM, Jeffrey A Law <[EMAIL PROTECTED]> wrote:
> I realize that we are trying to write clean, easy to read code by using
> a level of abstraction, but the abstraction is really getting in the
> way of achieving reasonable compile-time performance.

I don't think this is a problem with abstraction in general, but more
with very lame attempts at abstraction, like this one, in the critical
paths.  Looks like just ill design in this case.

> Comments?

Nice catch!  I just hope we don't have too many of these ;-)

Gr.
Steven

Re: RFC -- CSE compile-time stupidity

2005-02-21 Thread Kazu Hirata

Hi Jeff,

> Fixing cse.c to not use the accessor macros for REG_IN_TABLE, REG_TICK
> and SUBREG_TICKED saves about 1% compilation time for the components
> of cc1.  Yes, that's a net 1% improvement by dropping the abstraction
> layer.

Yes, I've noticed the problem.  In my defense, the code in question
was even worse before I touched it. :-) With the old code, every time
we access a cse_reg_info entry that is different from the last access,
we were generating a function call.  Nowadays, we avoid calls to
get_cse_reg_info_1 95% of the time.

Of course, it's tough to beat the performance of your explicit
initialization approach, but here are couple of things that I have
thought about while keeping some abstraction layer.

The first thought is to expose the timestamp update to the user of
those macros that you mentioned.

/* Find a cse_reg_info entry for REGNO.  */

static inline struct cse_reg_info *
get_cse_reg_info (unsigned int regno)
{
  struct cse_reg_info *p = &cse_reg_info_table[regno];

  /* If this entry has not been initialized, go ahead and initialize
 it.  */
  if (p->timestamp != cse_reg_info_timestamp)
{
  get_cse_reg_info_1 (regno);
  p->timestamp = cse_reg_info_timestamp;  /* <- Look! */
}

  return p;
}

This way, DOM may be able to do jump threading to some extent and
remove a lot of the timestamp checks.  Of couse, jump threading
opportunities are blocked when we have a non-pure/const function call
like so:

  for (i = regno; i < endregno; i++)
{
  if (REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i))
remove_invalid_refs (i);   /* <- Look! */

  REG_IN_TABLE (i) = REG_TICK (i);
  SUBREG_TICKED (i) = -1;
}

The second thought is to initialize all of cse_reg_info entries at the
beginning of cse_main.  Set aside a bitmap with as many bits as
max_regs.  Whenever we use one of these accessor macros for register
k, set a bit k saying "cse_reg_info_table[k] is in use."  This way,
when we are done with a basic block, we can walk the bitmap and
reinitialize those that are used.  Again, a good optimizer should be
able to eliminate most of these bit sets, but a non-pure/const
function call will block the cleanup opportunities.  Of course, this
bitmap walk is far more expensive than cse_reg_info_timestamp++.

Kazu Hirata

Re: RFC -- CSE compile-time stupidity

2005-02-21 Thread Jeffrey A Law

On Mon, 2005-02-21 at 19:05 +0100, Steven Bosscher wrote:
> On Feb 21, 2005 06:22 PM, Jeffrey A Law <[EMAIL PROTECTED]> wrote:
> > I realize that we are trying to write clean, easy to read code by using
> > a level of abstraction, but the abstraction is really getting in the
> > way of achieving reasonable compile-time performance.
> 
> I don't think this is a problem with abstraction in general, but more
> with very lame attempts at abstraction, like this one, in the critical
> paths.  Looks like just ill design in this case.
INHO this kind of style is a fundamental problem when using abstraction
layers and lazily allocation/initialization of data structures
(regardless of whether or not the abstraction comes from macros or 
inlined functions). 

It's particularly problematical in languages which provide the ability
to hide the underlying implementation and force all accesses to go 
through methods.  If the class doesn't provide methods which assume
the underlying data has already been allocated/initialized/checked,
then performance suffers (Having dealt with this problem in large C++
codebases in my past, it's something I know to look for...)

> 
> > Comments?
> 
> Nice catch!  I just hope we don't have too many of these ;-)
Well, even within cse.c we still have all the accesses to REG_QTY
which have the same basic structure.  So code like this will exhibit
the same behavior:

  if (REG_P (x)
  && REGNO_QTY_VALID_P (REGNO (x)))
{
  int x_q = REG_QTY (REGNO (x));
  struct qty_table_elem *x_ent = &qty_table[x_q];

  if (GET_MODE (x) == x_ent->mode
  && x_ent->const_rtx != NULL_RTX)
return 0;
}

And when you start looking at everywhere we use REG_QTY/REG_QTY_VALID_P
you start to cry.  I'll be experimenting with dropping the abstraction
layer for those too to see if there's any noticeable improvement.
cse.c is kinda odd in that its profile is so bloody flat -- which
is sometimes a good indicator that we're burning too much time just
trying to access key data structures.

We certainly have other places which are potential problem areas.  For
example get_stmt_ann, get_var_ann and friends all do the same kind of
thing.

jeff

Re: RFC -- CSE compile-time stupidity

2005-02-21 Thread Jeffrey A Law

On Mon, 2005-02-21 at 13:26 -0500, Kazu Hirata wrote:
> Hi Jeff,
> 
> > Fixing cse.c to not use the accessor macros for REG_IN_TABLE, REG_TICK
> > and SUBREG_TICKED saves about 1% compilation time for the components
> > of cc1.  Yes, that's a net 1% improvement by dropping the abstraction
> > layer.
> 
> Yes, I've noticed the problem.  In my defense, the code in question
> was even worse before I touched it. :-) With the old code, every time
> we access a cse_reg_info entry that is different from the last access,
> we were generating a function call.  Nowadays, we avoid calls to
> get_cse_reg_info_1 95% of the time.
I'm not assigning any blame :-)  


> Of course, it's tough to beat the performance of your explicit
> initialization approach, but here are couple of things that I have
> thought about while keeping some abstraction layer.
> 
> The first thought is to expose the timestamp update to the user of
> those macros that you mentioned.
> 
> /* Find a cse_reg_info entry for REGNO.  */
> 
> static inline struct cse_reg_info *
> get_cse_reg_info (unsigned int regno)
> {
>   struct cse_reg_info *p = &cse_reg_info_table[regno];
> 
>   /* If this entry has not been initialized, go ahead and initialize
>  it.  */
>   if (p->timestamp != cse_reg_info_timestamp)
> {
>   get_cse_reg_info_1 (regno);
>   p->timestamp = cse_reg_info_timestamp;  /* <- Look! */
> }
> 
>   return p;
> }
> 
> This way, DOM may be able to do jump threading to some extent and
> remove a lot of the timestamp checks.  Of couse, jump threading
> opportunities are blocked when we have a non-pure/const function call
> like so:
Nope.  That won't solve the problem, even  within a conditional like
REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i))

You can effectively get this code by changing the order of statements
in cse_get_reg_info_1 and inlining it.  The problem is we're going to
have two codepaths (one where we initialize, one where we do not) which
merge so that we can test that in_table >= 0.ie:
  # BLOCK 6
  # PRED: 21 [89.0%]  (dfs_back,true,exec) 5 [100.0%]  (fallthru)
  # ivtmp.206_69 = PHI ;
  # ivtmp.204_820 = PHI ;
  # i_15 = PHI ;
:;
  cse_reg_info_table.98_525 = cse_reg_info_table;
  p_526 = cse_reg_info_table.98_525 + ivtmp.204_820;
  D.20578_527 = p_526->timestamp;
  cse_reg_info_timestamp.99_528 = cse_reg_info_timestamp;
  if (D.20578_527 != cse_reg_info_timestamp.99_528) goto ; else goto
;
  # SUCC: 7 [71.0%]  (true,exec) 8 [29.0%]  (false,exec)

  # BLOCK 7
  # PRED: 6 [71.0%]  (true,exec)
:;
  p_526->reg_tick = 1;
  p_526->reg_in_table = -1;
  p_526->subreg_ticked = 0;
  D.20589_121 = (int) ivtmp.206_69;
  p_526->reg_qty = D.20589_121;
  cse_reg_info_timestamp.97_733 = cse_reg_info_timestamp;
  p_526->timestamp = cse_reg_info_timestamp.97_733;
  # SUCC: 8 [100.0%]  (fallthru,exec)

  # BLOCK 8
  # PRED: 6 [29.0%]  (false,exec) 7 [100.0%]  (fallthru,exec)
:;
  D.20419_531 = p_526->reg_in_table;
  if (D.20419_531 >= 0) goto ; else goto ;
  # SUCC: 9 [79.0%]  (true,exec) 15 [21.0%]  (false,exec)

  # BLOCK 9
  # PRED: 8 [79.0%]  (true,exec)
:;
  D.20597_641 = p_526->timestamp;
  cse_reg_info_timestamp.99_642 = cse_reg_info_timestamp;
  if (D.20597_641 != cse_reg_info_timestamp.99_642) goto ; else
goto ;
  # SUCC: 10 [71.0%]  (true,exec) 11 [29.0%]  (false,exec)

You'll see the assignment to cse_reg_info[regno]->timestamp in block #7.
Then the merge point in block #8 to test the value of REG_IN_TABLE.  We
then check the timestamp again in block #9.  block #7 is not a 
direct predecessor or block #9, so jump threading isn't going to help.


Jeff

Re: RFC -- CSE compile-time stupidity

2005-02-21 Thread Kazu Hirata

Hi Jeff,

> The second thought is to initialize all of cse_reg_info entries at the
> beginning of cse_main.  Set aside a bitmap with as many bits as
> max_regs.  Whenever we use one of these accessor macros for register
> k, set a bit k saying "cse_reg_info_table[k] is in use."  This way,
> when we are done with a basic block, we can walk the bitmap and
> reinitialize those that are used.  Again, a good optimizer should be
> able to eliminate most of these bit sets, but a non-pure/const
> function call will block the cleanup opportunities.  Of course, this
> bitmap walk is far more expensive than cse_reg_info_timestamp++.

I just tried this.  It comes very close to the current timestamp
approach but loses by 0.3% or so in compile time.  I'm attaching a
patch for completeness.

Kazu Hirata

Index: cse.c
===
RCS file: /cvs/gcc/gcc/gcc/cse.c,v
retrieving revision 1.343
diff -c -d -p -r1.343 cse.c
*** cse.c   21 Feb 2005 02:02:19 -  1.343
--- cse.c   21 Feb 2005 19:41:04 -
*** struct cse_reg_info
*** 328,346 
  /* A table of cse_reg_info indexed by register numbers.  */
  struct cse_reg_info *cse_reg_info_table;
  
! /* The size of the above table.  */
! static unsigned int cse_reg_info_table_size;
! 
! /* The index of the first entry that has not been initialized.  */
! static unsigned int cse_reg_info_table_first_uninitialized;
! 
! /* The timestamp at the beginning of the current run of
!cse_basic_block.  We increment this variable at the beginning of
!the current run of cse_basic_block.  The timestamp field of a
!cse_reg_info entry matches the value of this variable if and only
!if the entry has been initialized during the current run of
!cse_basic_block.  */
! static unsigned int cse_reg_info_timestamp;
  
  /* A HARD_REG_SET containing all the hard registers for which there is
 currently a REG expression in the hash table.  Note the difference
--- 328,334 
  /* A table of cse_reg_info indexed by register numbers.  */
  struct cse_reg_info *cse_reg_info_table;
  
! static sbitmap cse_reg_info_init_bitmap;
  
  /* A HARD_REG_SET containing all the hard registers for which there is
 currently a REG expression in the hash table.  Note the difference
*** notreg_cost (rtx x, enum rtx_code outer)
*** 841,908 
  static void
  init_cse_reg_info (unsigned int nregs)
  {
!   /* Do we need to grow the table?  */
!   if (nregs > cse_reg_info_table_size)
! {
!   unsigned int new_size;
! 
!   if (cse_reg_info_table_size < 2048)
!   {
! /* Compute a new size that is a power of 2 and no smaller
!than the large of NREGS and 64.  */
! new_size = (cse_reg_info_table_size
! ? cse_reg_info_table_size : 64);
! 
! while (new_size < nregs)
!   new_size *= 2;
!   }
!   else
!   {
! /* If we need a big table, allocate just enough to hold
!NREGS registers.  */
! new_size = nregs;
!   }
  
!   /* Reallocate the table with NEW_SIZE entries.  */
!   if (cse_reg_info_table)
!   free (cse_reg_info_table);
!   cse_reg_info_table = xmalloc (sizeof (struct cse_reg_info)
!* new_size);
!   cse_reg_info_table_size = new_size;
!   cse_reg_info_table_first_uninitialized = 0;
! }
  
!   /* Do we have all of the first NREGS entries initialized?  */
!   if (cse_reg_info_table_first_uninitialized < nregs)
  {
!   unsigned int old_timestamp = cse_reg_info_timestamp - 1;
!   unsigned int i;
! 
!   /* Put the old timestamp on newly allocated entries so that they
!will all be considered out of date.  We do not touch those
!entries beyond the first NREGS entries to be nice to the
!virtual memory.  */
!   for (i = cse_reg_info_table_first_uninitialized; i < nregs; i++)
!   cse_reg_info_table[i].timestamp = old_timestamp;
! 
!   cse_reg_info_table_first_uninitialized = nregs;
  }
  }
  
! /* Given REGNO, initialize the cse_reg_info entry for REGNO.  */
  
  static void
! get_cse_reg_info_1 (unsigned int regno)
  {
!   /* Set TIMESTAMP field to CSE_REG_INFO_TIMESTAMP so that this
!  entry will be considered to have been initialized.  */
!   cse_reg_info_table[regno].timestamp = cse_reg_info_timestamp;
! 
!   /* Initialize the rest of the entry.  */
!   cse_reg_info_table[regno].reg_tick = 1;
!   cse_reg_info_table[regno].reg_in_table = -1;
!   cse_reg_info_table[regno].subreg_ticked = -1;
!   cse_reg_info_table[regno].reg_qty = -regno - 1;
  }
  
  /* Find a cse_reg_info entry for REGNO.  */
--- 829,857 
  static void
  init_cse_reg_info (unsigned int nregs)
  {
!   unsigned int i;
  
!   cse_reg_info_table = xmalloc (sizeof (struct cse_reg_info) * nregs);
  
!   for (i = 0; i < nregs; i++)
  {
!   cse_reg_info_table[i].reg_tick = 1;
!   cse_reg_info_t

Re: GCC 4.0 Status Report (2005-02-03)

2005-02-21 Thread Matt Rice

sorry for the cross-posting but this proposal will
require some synergy between the three projects

> * Project Title
Frameworks for GNU Toolchain

> * Project Contributors
Matt Rice

> * Dependencies
Framework support in libc/bfd/binutils

> * Delivery Date
gcc portion is done, but blocked by the above.

> * Description
>  What will you be doing?

[GCC]
Enabling the existing target framework code for darwin
for other platforms,
which adds a new header search mechanism for the
preprocessor
and passes options to ld analogous to -L and -l but
with a different library search mechanism

2 new options would be added to gcc, -F and -framework

-F is a combination of -I and -L which requires a
specific directory structure,

for example #include 

a simple comparison of the difference between the
header search mechanism
with -I/adir 
/adir/foo/foo.h

with -F/adir
/adir/foo.framework/Headers/foo.h

-framework is not used by gcc other than to pass to
the linker, along which -F
which is also used for searching for library files,

[Binutils]

so when given the options "-F/adir -framework foo"
ld will look for /adir/foo.framework/foo
if "foo" is a symlink it will attempt to get the
realpath() of the file
the symlink should be inside the "foo.framework"
directory or a subdirectory of it.

for example "foo.framework/foo -> Versions/2/foo"

this creates a versioning method so that different
binaries can be linked to different versions
of a framework.

this will create an entry much like DT_NEEDED 
to "foo.framework/Versions/2/foo

[1]: the ELF docs say:
"If a shared object name has one or more slash (/)
characters anywhere in the name, such as /usr/lib/lib2
or directory/file, the dynamic linker uses that string
directly as the path name. If the name has no slashes,
such as lib1, three facilities specify shared object
path searching."

which means it can't be used to search for a library
linked relatively to a search path.

making a DT_FRAMEWORK_NEEDED entry of
"foo.framework/Versions/2/foo"


[Glibc]

will apply the to the DT_FRAMEWORK_NEEDED entry to
framework search path (analogous to LD_LIBRARY_PATH
etc..)
looking for /adir/foo.framework/Versions/2/foo



more information on frameworks available at

http://developer.apple.com/documentation/MacOSX/Conceptual/BPFrameworks/index.html

>  What parts of the compiler will be modified?

machine-specific code enabling parts of
gcc/config/darwin-c.c for other platforms
adding specs for new ld options


>  What will the benefits of your change be?  (Be
specific -- for
>  example, if you will be improving code quality,
indicate what kind
>  of code, and, if possible, how great the
improvement will be.)

be able to use a relocatable self contained directory
which both shared library and header files
with simpler usage than lots of -I/-L or symlinks,
easy cp -R installation and a shared library
versioning mechanism.

>  What risks do you foresee?  (If you say "none",
you'll be asked to
>  resubmit...)  What will you be doing to mitigate
those risks? 

gcc:
 sending options to ld it doesn't understand,
 messing up the existing header search path.





__ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250

Re: RFC -- CSE compile-time stupidity

2005-02-21 Thread Jeffrey A Law

On Mon, 2005-02-21 at 14:42 -0500, Kazu Hirata wrote:
> Hi Jeff,
> 
> > The second thought is to initialize all of cse_reg_info entries at the
> > beginning of cse_main.  Set aside a bitmap with as many bits as
> > max_regs.  Whenever we use one of these accessor macros for register
> > k, set a bit k saying "cse_reg_info_table[k] is in use."  This way,
> > when we are done with a basic block, we can walk the bitmap and
> > reinitialize those that are used.  Again, a good optimizer should be
> > able to eliminate most of these bit sets, but a non-pure/const
> > function call will block the cleanup opportunities.  Of course, this
> > bitmap walk is far more expensive than cse_reg_info_timestamp++.
> 
> I just tried this.  It comes very close to the current timestamp
> approach but loses by 0.3% or so in compile time.  I'm attaching a
> patch for completeness.
Thanks.  I suspect this won't cut the mustard either -- the SET_BIT
twiddles memory, which, due to lameness in our aliasing code, is
likely going to cause us to miss the CSE opportunities that are
so important here.

Beyond that, I suspect we get killed zeroing the sbitmap every time we
start a new cse block.  I may play with it though.  Who knows, maybe
there'll be something that we can do with it.

jeff

Re: Will people install gfortran in 4.0?

2005-02-21 Thread Gerald Pfeifer

On Mon, 21 Feb 2005, Tobias Schlüter wrote:
>> To add a concrete example, unlike g77 in earlier versions of GCC, gfortran 
>> is not and will not be part of the standard gcc40 port in FreeBSD.
> Do you have a pointer to where this decision is explained?

That took place in private e-mail, but I happen to be that maintainer
of FreeBSD's lang/gcc40 port, so I should be able to recall the relevant 
points. ;-)

On Mon, 21 Feb 2005, Steve Kargl wrote:
> Why?  I build gfortran routinely of both i386-*-freebsd and 
> amd64-*-freebsd.  gfortran has many Fortran 95 features that g77 will 
> never have.  I can also state that on some of my Fortran 77 codes, 
> gfortran generated binaries out performs the g77 binaries.

Well, I certainly don't deny that, and in fact the FreeBSD lang/gcc40
port has an option to enable gfortran (just set WITH_FORTRAN=something)
and there is a lang/gfortran port which does just that.

I'm fully aware of your development/testing on FreeBSD, and didn't want
to imply to gfortran won't work on that platform -- just that the default
gcc40 port there hasn't enabled it by default.

Gerald
-- 
Gerald Pfeifer (Jerry)   [EMAIL PROTECTED]   http://www.pfeifer.com/gerald/

Re: Will people install gfortran in 4.0?

2005-02-21 Thread Tobias Schlüter

Gerald Pfeifer wrote:
> On Mon, 21 Feb 2005, Tobias Schlüter wrote:
> 
>>>To add a concrete example, unlike g77 in earlier versions of GCC, gfortran 
>>>is not and will not be part of the standard gcc40 port in FreeBSD.
>>
>>Do you have a pointer to where this decision is explained?
> 
> That took place in private e-mail, but I happen to be that maintainer
> of FreeBSD's lang/gcc40 port, so I should be able to recall the relevant 
> points. ;-)

Out of curiosity, what are these points?  Will FreeBSD be shipping g77
instead, or is Fortran simply not important to FreeBSD?

- Tobi

Re: Will people install gfortran in 4.0?

2005-02-21 Thread Steve Kargl

On Tue, Feb 22, 2005 at 12:17:05AM +0100, Gerald Pfeifer wrote:
> On Mon, 21 Feb 2005, Tobias Schl?ter wrote:
> >> To add a concrete example, unlike g77 in earlier versions of GCC, gfortran 
> >> is not and will not be part of the standard gcc40 port in FreeBSD.
> > Do you have a pointer to where this decision is explained?
> 
> That took place in private e-mail, but I happen to be that maintainer
> of FreeBSD's lang/gcc40 port, so I should be able to recall the relevant 
> points. ;-)
> 
> On Mon, 21 Feb 2005, Steve Kargl wrote:
> > Why?  I build gfortran routinely of both i386-*-freebsd and 
> > amd64-*-freebsd.  gfortran has many Fortran 95 features that g77 will 
> > never have.  I can also state that on some of my Fortran 77 codes, 
> > gfortran generated binaries out performs the g77 binaries.
> 
> Well, I certainly don't deny that, and in fact the FreeBSD lang/gcc40
> port has an option to enable gfortran (just set WITH_FORTRAN=something)
> and there is a lang/gfortran port which does just that.
> 
> I'm fully aware of your development/testing on FreeBSD, and didn't want
> to imply to gfortran won't work on that platform -- just that the default
> gcc40 port there hasn't enabled it by default.
> 

If there's a WITH_FORTRAN optin, then I'm fine with how you
intend to maintain the gcc4.0 port.   One other item to keep
in mind, gfortran is the *only* Fortran 95 available on 
non-i386 FreeBSD.  One can use NAG or Intel's compiler to 
generate an IA32 binarie that runs on amd64, but it can't
take advantage of more than 4 GB of memory.

-- 
Steve

Re: Will people install gfortran in 4.0?

2005-02-21 Thread Steve Kargl

On Tue, Feb 22, 2005 at 12:22:37AM +0100, Tobias Schl?ter wrote:
> Gerald Pfeifer wrote:
> > On Mon, 21 Feb 2005, Tobias Schl?ter wrote:
> > 
> >>>To add a concrete example, unlike g77 in earlier versions of GCC, gfortran 
> >>>is not and will not be part of the standard gcc40 port in FreeBSD.
> >>
> >>Do you have a pointer to where this decision is explained?
> > 
> > That took place in private e-mail, but I happen to be that maintainer
> > of FreeBSD's lang/gcc40 port, so I should be able to recall the relevant 
> > points. ;-)
> 
> Out of curiosity, what are these points?  Will FreeBSD be shipping g77
> instead, or is Fortran simply not important to FreeBSD?
> 

FreeBSD is fairly conservative with updating the system compiler.
It is currently
kargl[219] gcc -v
Using built-in specs.
Configured with: FreeBSD/i386 system compiler
Thread model: posix
gcc version 3.4.2 [FreeBSD] 20040728

I don't expect FreeBSD to move to gcc 4.x for the system 
compiler for several years. 

Also, note FreeBSD is a complete, bundled operator system
(unlike Linux which is a kernel that each system vendor 
bundles with its own userland).

-- 
Steve

Re: RFC -- CSE compile-time stupidity

2005-02-21 Thread Mark Mitchell

Jeffrey A Law wrote:
Sigh.  I can't wait for this code to become less critical, both in terms
of runtime and compile time performance.  There are so many things in 
cse.c that are just plain bad

cse.c has a fair amount of complexity in its hash table management
code to avoid spending too much time invalidating/removing items
in the hash table when a value of an expression changes.
Specifically we have REG_TICK, REG_IN_TABLE and other fields to track 
when entries are valid.  It's fairly common to have code which looks
something like this (from mention_regs):

/* Get the number of times this register has been updated in this
   basic block.  */
#define REG_TICK(N) (get_cse_reg_info (N)->reg_tick)
/* Get the point at which REG was recorded in the table.  */
#define REG_IN_TABLE(N) (get_cse_reg_info (N)->reg_in_table)
/* Get the SUBREG set at the last increment to REG_TICK (-1 if not a
   SUBREG).  */
#define SUBREG_TICKED(N) (get_cse_reg_info (N)->subreg_ticked)
I don't see anything wrong with getting rid of these macros, and having 
the callers do:

  info = get_cse_reg_info (N);
  info->tick ... info->reg_in_table ...
--
Mark Mitchell
CodeSourcery, LLC
[EMAIL PROTECTED]
(916) 791-8304

Re: computation_cost too blind to hard regs

2005-02-21 Thread DJ Delorie


> On Jan 19, 2005, at 22:48, Richard Henderson wrote:
> > On Wed, Jan 19, 2005 at 09:19:58PM -0500, DJ Delorie wrote:
> >> -  int regno = 0;
> >> +  int regno = FIRST_PSEUDO_REGISTER;
> >
> > To be totally safe, you need LAST_VIRTUAL_REGISTER+1.
> 
> Also, this definitely needs a comment.

Ok, I'm back.  How's this?

* tree-ssa-loop-ivopts.c (computation_cost): Start register
numbering at LAST_VIRTUAL_REGISTER+1 to avoid possibly using hard
registers in unsupported ways.
* expmed.c (init_expmed): Likewise.


Index: expmed.c
===
RCS file: /cvs/gcc/gcc/gcc/expmed.c,v
retrieving revision 1.224
diff -p -U3 -r1.224 expmed.c
--- expmed.c18 Feb 2005 12:06:00 -  1.224
+++ expmed.c22 Feb 2005 00:35:19 -
@@ -145,7 +145,8 @@ init_expmed (void)
   memset (&all, 0, sizeof all);
 
   PUT_CODE (&all.reg, REG);
-  REGNO (&all.reg) = 1;
+  /* Avoid using hard regs in ways which may be unsupported.  */
+  REGNO (&all.reg) = LAST_VIRTUAL_REGISTER + 1;
 
   PUT_CODE (&all.plus, PLUS);
   XEXP (&all.plus, 0) = &all.reg;
Index: tree-ssa-loop-ivopts.c
===
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivopts.c,v
retrieving revision 2.48
diff -p -U3 -r2.48 tree-ssa-loop-ivopts.c
--- tree-ssa-loop-ivopts.c  17 Feb 2005 16:19:45 -  2.48
+++ tree-ssa-loop-ivopts.c  22 Feb 2005 00:35:19 -
@@ -2427,7 +2427,8 @@ computation_cost (tree expr)
   rtx seq, rslt;
   tree type = TREE_TYPE (expr);
   unsigned cost;
-  int regno = 0;
+  /* Avoid using hard regs in ways which may be unsupported.  */
+  int regno = LAST_VIRTUAL_REGISTER + 1;
 
   walk_tree (&expr, prepare_decl_rtl, ®no, NULL);
   start_sequence ();

RFC: objc_msgSend efficiency patch

2005-02-21 Thread Dale Johannesen

Simple Objective C programs such as
#include 
void foo(void) {
  Object *o;
  [o++ free];
}
result in calling objc_msgSend indirectly through a pointer, instead
of directly as they did in 3.3.  This seems to happen only at low 
optimization
levels; still, it's a performance regression.  The reason is that the 
gimplifier
puts the result of the OBJ_TYPE_REF into a temp due to the 
postincrement.
I'm pretty sure this is unnecessary for ObjC, like this:

Index: gimplify.c
===
RCS file: /cvs/gcc/gcc/gcc/gimplify.c,v
retrieving revision 1.1.2.146.2.17
diff -u -d -b -w -p -r1.1.2.146.2.17 gimplify.c
--- gimplify.c  27 Jan 2005 01:04:53 -  1.1.2.146.2.17
+++ gimplify.c  21 Feb 2005 20:34:39 -
@@ -3871,7 +3871,13 @@ gimplify_expr (tree *expr_p, tree *pre_p
case OBJ_TYPE_REF:
  {
enum gimplify_status r0, r1;
-   r0 = gimplify_expr (&OBJ_TYPE_REF_OBJECT (*expr_p), pre_p, 
post_p,
+   /* Postincrements in OBJ_TYPE_REF_OBJECT don't affect the
+  value of the OBJ_TYPE_REF, so force them to be emitted
+  during subexpression evaluation rather than after the
+  OBJ_TYPE_REF. This permits objc_msgSend calls in 
Objective
+  C to use direct rather than indirect calls when the
+  object expression has a postincrement.  */
+   r0 = gimplify_expr (&OBJ_TYPE_REF_OBJECT (*expr_p), pre_p, 
NULL,
is_gimple_val, fb_rvalue);
r1 = gimplify_expr (&OBJ_TYPE_REF_EXPR (*expr_p), pre_p, 
post_p,
is_gimple_val, fb_rvalue);

However, I'm not sure whether the comment is accurate for C++.   Can 
somebody
rule on that?  Thanks.

Re: RFC: objc_msgSend efficiency patch

2005-02-21 Thread Mark Mitchell

Dale Johannesen wrote:
+   /* Postincrements in OBJ_TYPE_REF_OBJECT don't affect the
+  value of the OBJ_TYPE_REF, so force them to be emitted
+  during subexpression evaluation rather than after the
+  OBJ_TYPE_REF. This permits objc_msgSend calls in Objective
+  C to use direct rather than indirect calls when the
+  object expression has a postincrement.  */
+   r0 = gimplify_expr (&OBJ_TYPE_REF_OBJECT (*expr_p), pre_p, 
NULL,
is_gimple_val, fb_rvalue);
r1 = gimplify_expr (&OBJ_TYPE_REF_EXPR (*expr_p), pre_p, 
post_p,
is_gimple_val, fb_rvalue);
I think that change is unsafe.  I fixed a bug a while back where the 
gimplifier was unable to insert the postincrement inline; in some 
circumstances, I believe it will abort if you don't pass down post_p.

--
Mark Mitchell
CodeSourcery, LLC
[EMAIL PROTECTED]
(916) 791-8304

Re: RFC: objc_msgSend efficiency patch

2005-02-21 Thread Dale Johannesen

On Feb 21, 2005, at 5:21 PM, Mark Mitchell wrote:
Dale Johannesen wrote:
+   /* Postincrements in OBJ_TYPE_REF_OBJECT don't affect the
+  value of the OBJ_TYPE_REF, so force them to be emitted
+  during subexpression evaluation rather than after the
+  OBJ_TYPE_REF. This permits objc_msgSend calls in 
Objective
+  C to use direct rather than indirect calls when the
+  object expression has a postincrement.  */
+   r0 = gimplify_expr (&OBJ_TYPE_REF_OBJECT (*expr_p), 
pre_p, NULL,
is_gimple_val, fb_rvalue);
r1 = gimplify_expr (&OBJ_TYPE_REF_EXPR (*expr_p), pre_p, 
post_p,
is_gimple_val, fb_rvalue);
I think that change is unsafe.  I fixed a bug a while back where the 
gimplifier was unable to insert the postincrement inline; in some 
circumstances, I believe it will abort if you don't pass down post_p.
OK, thanks.  I can look into doing this in the ObjC-specific hook, 
unless you have
a better idea?

Re: RFC: objc_msgSend efficiency patch

2005-02-21 Thread Mark Mitchell

Dale Johannesen wrote:
OK, thanks.  I can look into doing this in the ObjC-specific hook, 
unless you have
a better idea?
I've looked harder.
See PR19148 for details about my patch.  I think the key is that you not 
recursively call gimplify_expr with fb_lvalue.  Since we're calling 
recursively with fb_rvalue that particular case is probably OK.

But, I'm not sure that C++ would never put a postincrement in there in a 
way that would screw things up.  I don't think that performance of 
unoptimized code in Objective-C is a worthwhile benefit in Stage 3 when 
measured against the risk of wrong code in C++.  So, it seems to me that 
this should probably be done as a special case in the Objective-C front 
end, if it makes sense there.  Or, someone who knows more than I should 
state definitively that the problematic case cannot appear in any 
language using OBJ_TYPE_REF, and then modify the OBJ_TYPE_REF 
documentation to indicate that.

--
Mark Mitchell
CodeSourcery, LLC
[EMAIL PROTECTED]
(916) 791-8304

Re: computation_cost too blind to hard regs

2005-02-21 Thread Richard Henderson

On Mon, Feb 21, 2005 at 07:36:33PM -0500, DJ Delorie wrote:
>   * tree-ssa-loop-ivopts.c (computation_cost): Start register
>   numbering at LAST_VIRTUAL_REGISTER+1 to avoid possibly using hard
>   registers in unsupported ways.
>   * expmed.c (init_expmed): Likewise.

Ok.


r~

Re: Will people install gfortran in 4.0?

2005-02-21 Thread Marcin Dalecki

On 2005-02-22, at 00:25, Steve Kargl wrote:
On Tue, Feb 22, 2005 at 12:22:37AM +0100, Tobias Schl?ter wrote:
Gerald Pfeifer wrote:
On Mon, 21 Feb 2005, Tobias Schl?ter wrote:
To add a concrete example, unlike g77 in earlier versions of GCC, 
gfortran
is not and will not be part of the standard gcc40 port in FreeBSD.
Do you have a pointer to where this decision is explained?
That took place in private e-mail, but I happen to be that maintainer
of FreeBSD's lang/gcc40 port, so I should be able to recall the 
relevant
points. ;-)
Out of curiosity, what are these points?  Will FreeBSD be shipping g77
instead, or is Fortran simply not important to FreeBSD?
FreeBSD is fairly conservative with updating the system compiler.
That is fairly wrong. Usually FreeBSD current is ahead of the Linux
gang in terms of the compiler since about 5 years. They used to be more
conservative but this changed since about gcc 3.0 time. Nowadays they 
are
usually ahead of the Linux bunch. I think it would not be too surprising
to see them integrate 4.0 very quickly after it turns out to be not too 
buggy.

Re: moving v16sf reg with multiple sub-regs

2005-02-21 Thread James E Wilson

Dylan Cuthbert wrote:
> asm( "some_instruction %0,%1,%2,%3" : "=&j" (blob): "j" (vec1), "j" 
> (vec2), "j" (frog) );
> asm( "some_instruction2 %0,%1" : "=&j" (frog) : "j" (blob) );

It is the goal of the register allocator to use as few registers as
possible,
which means that we will try to use the same register for input and
output here.
Until we get to reload, where we see the early clobber (&), and then are
forced
to add a copy so that the instruction has separate input and output
registers.

Early clobbers are bad.  Don't ever use them unless you have to.  Just
because
the instruction operates on pieces of the input does not mean & is
necessary.
You only add the & if the input and output  operands must be in different
non-overlapping registers.

This is just a guess.  Try compiling with -da and looking at the register
assignments in the .lreg and .greg files, and also at what reload did.
It is
possible that there could be something else going on.
-- 
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com

Re: Will people install gfortran in 4.0?

2005-02-21 Thread Steve Kargl

On Tue, Feb 22, 2005 at 02:58:42AM +0100, Marcin Dalecki wrote:
> On 2005-02-22, at 00:25, Steve Kargl wrote:
> >
> >FreeBSD is fairly conservative with updating the system compiler.
> 
> That is fairly wrong. Usually FreeBSD current is ahead of the Linux
> gang in terms of the compiler since about 5 years. They used to be more
> conservative but this changed since about gcc 3.0 time. Nowadays they 
> are usually ahead of the Linux bunch. I think it would not be too surprising
> to see them integrate 4.0 very quickly after it turns out to be not too 
> buggy.

As one of the few people who argued with David O'Brien on 
the inclusion of g77 into FreeBSD when g77 finally was
included in a gcc distribution, I can assure you that FreeBSD 
will be conservative in switching to gcc 4.x.  There is
simply too much internals churn for the people who maintain
gcc within FreeBSD to jump on gcc 4.x. 

-- 
Steve

Re: Propagation of "pointer" attribute on registers

2005-02-21 Thread James E Wilson

Pat Haugen wrote:
Is there a reason REG_POINTER isn't propagated to the target register for
rtl insns of the form "reg_x = regP_y + reg_z", where regP_y is a reg
marked as REG_POINTER?
Probably historical oversight.
However, note that the hppa-hp-hpux port relies on the REG_POINTER 
settings being conservatively correct.  This is because it uses 
segmented addresses, and in a reg+reg address, it requires knowing which 
one is the pointer when emitting assembly language.  If REG_POINTER is 
set for things that aren't actually pointer values, then the 
hppa-hp-hpux port can break.  I'd think it would be safe to add the 
reg=reg+reg case here, but it should be checked.

The aliasing code could perhaps be using the newer REG_ATTRS field 
instead of REG_POINTER.  REG_EXPR didn't exist when alias.c was written, 
and may help here.  Or maybe not.  The REG_EXPR case in 
reg_scan_mark_refs is even more restrictive than the REG_POINTER code; 
it doesn't handle REG+CONST.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com

Re: gcc #pragma pack(push[,n]) target spesific ?

2005-02-21 Thread James E Wilson

Matthew J Fletcher wrote:
Is this correct ?, comments in the appropreate config.h files indicate that 
HANDLE_PRAGMA_PACK_PUSH_POP, is mostly used for wine compatablity, but is 
there any reason why HANDLE_PRAGMA_PACK_PUSH_POP is not present on all 
targets ? also there is not a configure option.
The pragma documentation states that we discouraged use of pragmas in 
the past, because of language design problems fixed in the ISO C99 
standard.  Because of this, pragmas are only enabled for targets that 
require them, for compatibility with other compilers.

As you mentioned yourself, HANDLE_PRAGMA_PACK_PUSH_POP is for win32 
compatibility, and the *-elf targets don't require win32 compatibility, 
generally.  The only targets this is enabled for are ones that have 
required use of it.

It might be reasonable to change the policy, and enable it for more 
targets, or perhaps just for the embedded elf ones.  It would also be 
reasonable to improve the docs.  I'd suggest filing a bug report into 
bugzilla if you haven't already.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com

Re: TYPE_MIN_VALUE and pointer types

2005-02-21 Thread James E Wilson

Richard Guenther wrote:
this is because I'm feeding int_fits_type_p a (constant) pointer type:
The comment at the top of int_fits_type_p says it is supposed to be an 
integer type, which seems clear enough.

This question might make more sense if you explain what you are doing, 
and why you need to check whether a constant fits in a pointer type.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com

interworking problem

2005-02-21 Thread aram bharathi

hi,
   thanks for identifying the error.
   i have gone through the a.out file after this error. that is producing full 
of arm instructions other than the main function(which is compiled for 
-mthumb). where should i change for this corresponding error. 
   i have added one new add instruction in thumb mode. based on that i have 
written one sample program to emit the corresponding new instruction. gcc works 
fine. when i assemble the function it emits correct opcode. but when i give the 
same function to the ld it emits full of arm instructions instead of thumb 
instructions. which routine checks for this corresponding change. 
thanks in advance  
- Original Message -
From: "Ian Lance Taylor" 
To: "aram bharathi" <[EMAIL PROTECTED]>
Subject: Re: adding new instruction
Date: 12 Feb 2005 21:06:07 -0500

> 
> "aram bharathi" <[EMAIL PROTECTED]> writes:
> 
> >   i like to add a new instruction based on thumb ISA. i have 
> > added the instruction in both as and gcc. both of them are 
> > working correctly. but when i call ld it shows an error like
> >
> > /home/.../arm-elf-ld : 
> > /home/../arm-elf/lib/libc.a(printf.o)(printf): warning : 
> > interworking not enabled
> > first occurance : /tmp/cc00zhyh.o : thumb call to arm
> > /tmp//cc00zhyh.o(.text+0x4e>: In function 'main'
> > new.c:internal error: dangerous error
> >
> > whether i have to change anything in the ld. i have searched for 
> > the ld source file but i counldnt get one in the ld folder. which 
> > file has to modified first and what kind of changes are needed.
> 
> The source code for that error is in the bfd directory.  In general,
> if you want to link ARM and Thumb code together, you should compile
> all your code with the -mthumb-interwork option.  See the
> documentation.
> 
> Ian

-- 
__
Check out the latest SMS services @ http://www.linuxmail.org 
This allows you to send and receive SMS through your mailbox.


Powered by Outblaze

Re: __LDBL_MAX__ exceeds range of 'long double'

2005-02-21 Thread James E Wilson

Jonathan Wakely wrote:
$ gcc4x -c bug.c -pedantic -save-temps
bug.c: In function 'main':
bug.c:1: error: floating constant exceeds range of 'long double'
This is easy enough to explain.  Grepping for the message shows that it 
is printed from c-lex.c, and is only printed when -pedantic. 
-save-temps means we run cc1 twice, once to produce the .i file, and 
once to read the .i file.  So the problem is either writing the .i file 
or reading the .i file.

Looking further, I noticed that FreeBSD sets the rounding mode to 
53-bits.  real.c knows this, and computes values appropriately, i.e. 
96-bit values with 53-bits of fraction.  Unfortunately, this value isn't 
getting propagated properly to the C preprocessor when using 
-save-temps/-E, and this problem exists because 
init_adjust_machine_modes is run in the wrong place.  It is run from 
backend_init, but it needs to be run always, as we need it to compute 
the FP predefined macros correctly.

The value really is wrong, even though it is the same as gcc-3.4.  It 
has 11 too many bits of fraction.

Targets using paired-double long double formats are probably also wrong. 
Anything that overrides the normal IEEE FP formats will be wrong.

Attached is a mostly untested patch to fix this.  It works for your 
testcase.  And it now prints a different correct value of __LDBL_MAX__ 
in the .i file.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com
2005-02-21  James E Wilson  <[EMAIL PROTECTED]>

* toplev.c (backend_init): Don't call init_adjust_machine_modes here.
(do_compile): Do call it here.

Index: toplev.c
===
RCS file: /cvs/gcc/gcc/gcc/toplev.c,v
retrieving revision 1.942
diff -p -p -r1.942 toplev.c
*** toplev.c12 Feb 2005 00:26:52 -  1.942
--- toplev.c22 Feb 2005 04:58:04 -
*** process_options (void)
*** 1954,1961 
  static void
  backend_init (void)
  {
-   init_adjust_machine_modes ();
- 
init_emit_once (debug_info_level == DINFO_LEVEL_NORMAL
  || debug_info_level == DINFO_LEVEL_VERBOSE
  #ifdef VMS_DEBUGGING_INFO
--- 1954,1959 
*** do_compile (void)
*** 2092,2097 
--- 2090,2100 
/* Don't do any more if an error has already occurred.  */
if (!errorcount)
  {
+   /* This must be run always, because it is needed to compute the FP
+predefined macros, such as __LDBL_MAX__, for targets using non
+default FP formats.  */
+   init_adjust_machine_modes ();
+ 
/* Set up the back-end if requested.  */
if (!no_backend)
backend_init ();

Re: How does g++ implement error handling?

2005-02-21 Thread Aaron W. LaFramboise

Jonathan Wakely wrote:

> On Mon, Feb 21, 2005 at 12:32:52PM +0800, Euler Herbert wrote:

>>be somewhere, but I did no find it. Could somebody tell me the 
>>outline of the unwinding process? 

> I think this behaviour is defined by this spec:
> http://www.codesourcery.com/cxx-abi/abi-eh.html

What's awful about this is that the interesting portion, "Level III"
, is
basically empty.

I'm also interested in any overview-level information about the Dwarf2
unwinding mechanism.

Aaron W. LaFramboise

41 matches

Mail list logo