Re: CSE & compare/branch template problem

2009-12-23 Thread Paolo Bonzini

On 12/21/2009 08:10 PM, Richard Henderson wrote:

(define_insn_and_split "*cmp"
   [(set (match_operand:SI 0 "register_operand" "=r")
 (lt:SI (match_operand:SI 1 "register_operand" "r")
(match_operand:SI 2 "register_operand" "r")))]
   ""
   "cmp %0,%1,%2\;andi $0,$0,1"
   ""
   [(set (match_dup 0)
 (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_CMP))
(set (match_dup 0) (and:SI (match_dup 0) (const_int 1)))]
   "")


It's actually the MSB that is affected, and the entire register is set 
to zero if a == b.  Basically cmp/cmpu prepare rD so that a signed 
compare-with-zero-and-branch will do the requested conditional branch.


So, branches are easy, but cstores are tricky.  Something like this 
should work; indeed you do not need any CC mode:


;; cbranch expander, possibly use cmp/cmpu to make operand 0 into a
;; signed comparison with zero
(define_expand "cbranchsi4"
[(set (pc)
  (if_then_else
(match_operator 0 "ordered_comparison_operator"
 [(match_operand:SI 1 "register_operand" "")
  (match_operand:SI 2 "register_operand_or_0" "")])
(label_ref (match_operand 3 ""))
(pc)))]
"enum rtx_code signed =
   signed_condition (GET_CODE (operands[0]));
 if (operands[2] != const0_rtx || signed != GET_CODE (operands[0]))
   {
 rtx reg = gen_reg_rtx (SImode);
 if (signed != GET_CODE (operands[0]))
   emit_insn (gen_cmpusi (reg, operands[1], operands[2]));
 else
   emit_insn (gen_cmpsi (reg, operands[1], operands[2]));

 operands[1] = reg;
 operands[2] = const0_rtx;
 operands[0] = gen_rtx_fmt_ee (signed, SImode, reg, const0_rtx);
   }")

;; branch instructions do a signed comparison with 0 (needs
;; a predicate signed_comparison_operator), you could also
;; write a pattern for indirect conditional branches
(define_insn "*branch"
[(set (pc)
  (if_then_else
(match_operator 0 "signed_comparison_operator"
 [(match_operand:SI 1 "register_operand" "")
  (const_int 0)])
(label_ref (match_operand 2 ""))
(pc)))]
""
"b%0i %1,%2"
"")

;; unspecs for cmp/cmpu
(define_insn "cmpsi"
[(set (match_operand:SI 0 "register_operand" "=r")
  (unspec
[(match_operand:SI 1 "register_operand" "r")
 (match_operand:SI 2 "register_operand" "r")] UNSPEC_CMP))]
""
"cmp %0,%1,%2"
"")

(define_insn "cmpusi"
[(set (match_operand:SI 0 "register_operand" "=r")
  (unspec
[(match_operand:SI 1 "register_operand" "r")
 (match_operand:SI 2 "register_operand" "r")] UNSPEC_CMPU))]
""
"cmp %0,%1,%2"
"")

;; these are used for cstore tricks when the old contents of rD are
;; significant
(define_insn "*cmpsi4"
[(set (match_operand:SI 0 "register_operand" "+r")
  (unspec
[(match_dup 0)
 (match_operand:SI 1 "register_operand" "r")
 (match_operand:SI 2 "register_operand" "r")] UNSPEC_CMP))]
""
"cmp %0,%1,%2"
"")

(define_insn "*cmpusi4"
[(set (match_operand:SI 0 "register_operand" "+r")
  (unspec
[(match_dup 0)
 (match_operand:SI 1 "register_operand" "r")
 (match_operand:SI 2 "register_operand" "r")] UNSPEC_CMPU))]
""
"cmp %0,%1,%2"
"")

;; some cstore patterns: cstoresi4 should canonicalize lt/ltu to gt/gtu,
;; as should CANONICALIZE_COMPARISON.
;;
;; common code takes care of ge/geu/le/leu as long as the rtx_costs say
;; it's profitable.  Same for a != b for nonzero b.
;;
;;   ...
;;   if (GET_CODE (operands[1]) == LT || GET_CODE (operands[1]) == LTU)
;; {
;;   operands[1] =
;; gen_rtx_fmt_ee (swap_condition (GET_CODE (operands[1])),
;; SImode, operands[3], operands[2]);
;;   operands[2] = XEXP (operands[1], 0);
;;   operands[3] = XEXP (operands[1], 1);
;; }
;;   else if (GET_CODE (operands[1]) == NE && operands[3] != const0_rtx)
;; FAIL;
;;

;; preset rD to 1 to implement a == b
(define_insn_and_split "eqsi3"
[(set (match_operand:SI 0 "register_operand" "=&r")
(eq:SI (match_operand:SI 1 "register_operand" "r")
   (match_operand:SI 2 "register_operand" "r")))]
""
""
""
[(set (match_dup 0) (const_int 1))
 (set (match_dup 0) (unspec:SI [(match_dup 0)
(match_dup 1) (match_dup 2)]
   UNSPEC_CMP))
 (set (match_dup 0) (and:SI (match_dup 0) (const_int 1))]
"")

;; use a GTU 0 to implement a != 0.  but cmpu does not accept immediates
(define_insn_and_split "nesi3"
[(set (match_operand:SI 0 "register_operand" "=&r")
(ne:SI (match_operand:SI 1 "register_operand" "r")
   (const_int 0)))]
""
""
""
[(set (match_dup 0) (const_int 0))
 (set (match_dup 0) (unspec

Re: Question on PR36873

2009-12-23 Thread Jie Zhang

On 12/23/2009 02:43 PM, Jie Zhang wrote:

Hi,

We just got a similar problem on Blackfin GCC recently. Let me take the
test code from the bug as an example:


I reduce the test case to a simpler one:

$ cat foo.c
unsigned int
foo (volatile unsigned short *p)
{
  return *p;
}

I the tree dump "foo.c.126t.optimized", GCC refused to eliminate D.1256 
because the first statement contains a volatile operand:


  D.1256 ={v} *p;
  return (unsigned int) D.1256;

I'm not familiar with the trees. Is it possible to replace D.1256 and 
have something like below?


  return (unsigned int) {v} *p;

I experiment a little. It seems {v} will be lost in SSA name replacing 
during out of SSA transform. Can anyone pointed me if it's possible to 
do the replace but still keep {v}? Or I should find another way to do 
that? Or it's wrong to do this optimization?


Thanks,

Jie


Re: Which optimizer should remove redundant subreg of sign_extension?

2009-12-23 Thread Paolo Bonzini

On 12/22/2009 07:24 PM, Jeff Law wrote:

On 12/22/09 11:16, Andrew Hutchinson wrote:

I came across this RTL on AVR in combine dump (part of va-arg-9.c test)

(set (reg:QI 25 r25 [+1 ])
(subreg:QI (sign_extend:HI (reg:QI 49)) 1))

The sign extension is completely redundant - the upper part of
register is not used elsewhere
- but the RTL remains unchanged through all the optimizers and
sign_extension appears in final code.

Which RTL optimisation should be taking care of this? Propagation?
It would help me look in the right place to understand and perhaps fix
issue.

I suspect the presence of hard register is why it does not get
removed. (the hard register is the function return value)

I'd look at combine, though I think it's more concerned with determining
that an extension is redundant because the bits already have the proper
value rather than the bits not being used later. It might be the case
that you can extend what's already in combine to do what you want.


I think that if you add the simplification to simplify-rtx.c's 
simplify_subreg, combine should pick it up automagically.


Paolo


Re: Question on PR36873

2009-12-23 Thread Dave Korn
Jie Zhang wrote:

> typedef unsigned short u16;
> typedef unsigned int u32;
> 
> u32 a(volatile u16* off) {
> return *off;
> }

> mingw32-gcc-4.3.0.exe -c -O2 -fomit-frame-pointer -mtune=core2 test.c
> 
> it produces:
>  <_a>:
>0:   8b 44 24 04 mov0x4(%esp),%eax
>4:   0f b7 00movzwl (%eax),%eax
>7:   0f b7 c0movzwl %ax,%eax  <== The redundant insn
>a:   c3  ret

  How does it look at the RTL level?  I wonder if this situation is similar to
the one being discussed in the other current thread "Which optimizer should
remove redundant subreg of sign_extension?"

cheers,
  DaveK



Re: How to implement pattens with more that 30 alternatives

2009-12-23 Thread Richard Earnshaw

On Wed, 2009-12-23 at 10:11 +0530, Mohamed Shafi wrote:
> 2009/12/22 Richard Earnshaw :
> >
> > On Mon, 2009-12-21 at 18:44 +, Paul Brook wrote:
> >> > > > I am doing a port in GCC 4.4.0 for a 32 bit target. As a part of
> >> > > > scheduling framework i have to write the move patterns with more
> >> > > > clarity, so that i could control the scheduling with the help of
> >> > > > attributes. Re-writting the pattern resulted in movsi pattern with 41
> >> > > > alternatives :(
> >> > >
> >> > > Use rtl expressions instead of alternatives. e.g. arm.md:arith_shiftsi
> >> >
> >> > Or use the more modern iterators approach.
> >>
> >> Aren't iterators for generating multiple insns (e.g. movsi and movdi) from 
> >> the
> >> same pattern, whereas in this case we have a single insn  that needs to 
> >> accept
> >> many different operand combinartions?
> >
> > Yes, but that is often better, I suspect, than having too fancy a
> > pattern that breaks the optimization simplifications that genrecog does.
> >
> > Note that the attributes that were requested could be made part of the
> > iterator as well, using a mode_attribute.
> >
>   I can't find a back-end that does this. Can you show me a example?

I think the mips port is currently the most comprehensive example for
use of iterators.

R.



Re: Question on PR36873

2009-12-23 Thread Jie Zhang

On 12/23/2009 06:12 PM, Dave Korn wrote:

Jie Zhang wrote:


typedef unsigned short u16;
typedef unsigned int u32;

u32 a(volatile u16* off) {
 return *off;
}



mingw32-gcc-4.3.0.exe -c -O2 -fomit-frame-pointer -mtune=core2 test.c

it produces:
<_a>:
0:   8b 44 24 04 mov0x4(%esp),%eax
4:   0f b7 00movzwl (%eax),%eax
7:   0f b7 c0movzwl %ax,%eax<== The redundant insn
a:   c3  ret


   How does it look at the RTL level?  I wonder if this situation is similar to
the one being discussed in the other current thread "Which optimizer should
remove redundant subreg of sign_extension?"


With my native GCC on Debian AMD64 unstable, in t.c.128r.expand:

(insn 6 5 7 3 t.c:5 (set (reg:HI 58 [ D.1595 ])
(mem/v:HI (reg/v/f:DI 60 [ off ]) [2 S2 A16])) -1 (nil))

(insn 7 6 8 3 t.c:5 (set (reg:SI 61)
(zero_extend:SI (reg:HI 58 [ D.1595 ]))) -1 (nil))

In t.c.201r.shorten:

(insn:TI 6 3 7 t.c:5 (set (reg:HI 0 ax [orig:58 D.1595 ] [58])
(mem/v:HI (reg/v/f:DI 5 di [orig:60 off ] [60]) [2 S2 A16])) 53 
{*movhi_1} (expr_list:REG_DEAD (reg/v/f:DI 5 di [orig:60 off ] [60])

(nil)))

(insn:TI 7 6 18 t.c:5 (set (reg:SI 0 ax [orig:61 D.1595 ] [61])
(zero_extend:SI (reg:HI 0 ax [orig:58 D.1595 ] [58]))) 114 
{*zero_extendhisi2_movzwl} (nil))


There is a volatile flag for mem operand. If there is no such flag, I 
think one of RTL passes might combine them. It looks similar with the 
issue in the thread you mentioned. But the cause is different.



Regards,
Jie


Unnecessary PRE optimization

2009-12-23 Thread Bingfeng Mei
Hello,
I encounter an issue with PRE optimization, which created worse
code than no optimization.

This the test function: 

void foo(int *data, int *m_v4w, int num)
{
  int i;
  int m0;
  for( i=0; i

Re: Unnecessary PRE optimization

2009-12-23 Thread Steven Bosscher
On Wed, Dec 23, 2009 at 12:49 PM, Bingfeng Mei  wrote:
> Hello,
> I encounter an issue with PRE optimization, which created worse

Is this at -O2 or -O3?

Ciao!
Steven


RE: Unnecessary PRE optimization

2009-12-23 Thread Bingfeng Mei
-O2 

> -Original Message-
> From: Steven Bosscher [mailto:stevenb@gmail.com] 
> Sent: 23 December 2009 12:01
> To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org; dber...@dberlin.org
> Subject: Re: Unnecessary PRE optimization
> 
> On Wed, Dec 23, 2009 at 12:49 PM, Bingfeng Mei 
>  wrote:
> > Hello,
> > I encounter an issue with PRE optimization, which created worse
> 
> Is this at -O2 or -O3?
> 
> Ciao!
> Steven
> 
> 


How should I prototype cpp_define in target patch?

2009-12-23 Thread Andrew Hutchinson

I want to post patch for

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42457

The code moved out to -c.c file ALREADY uses:

builtin_define_std
cpp_define

Both from c-cppbuiltin.c. These have no prototypes defined in gcc.

So of course there are warnings emitted.

Is this OK?
Should I locally define prototypes?
Something else?

Andy



Re: Unnecessary PRE optimization

2009-12-23 Thread Paolo Bonzini

On 12/23/2009 01:01 PM, Steven Bosscher wrote:

On Wed, Dec 23, 2009 at 12:49 PM, Bingfeng Mei  wrote:

Hello,
I encounter an issue with PRE optimization, which created worse


Is this at -O2 or -O3?


I think this could be fixed if fwprop propagated addresses into loops; 
it doesn't because it made performance worse on x86.  The real reason is 
"address_cost on x86 sucks and nobody knows how to fix it exactly", but 
the performance hit was bad enough that we (Steven Bosscher and I) 
decided to put that hack into fwprop.


Paolo


Re: Unnecessary PRE optimization

2009-12-23 Thread Joern Rennecke

Quoting Paolo Bonzini :


On 12/23/2009 01:01 PM, Steven Bosscher wrote:

On Wed, Dec 23, 2009 at 12:49 PM, Bingfeng Mei  wrote:

Hello,
I encounter an issue with PRE optimization, which created worse


Is this at -O2 or -O3?


I think this could be fixed if fwprop propagated addresses into loops;
it doesn't because it made performance worse on x86.  The real reason
is "address_cost on x86 sucks and nobody knows how to fix it exactly",


And nobody has bothered to put in a bug report into bugzilla to describe
what they see malfunctioning... or if they did, it doesn't mention
address_cost.


but the performance hit was bad enough that we (Steven Bosscher and I)
decided to put that hack into fwprop.


So if this is only useful for a limited set of targets, why isn't it
controlled by an option or a target hook so that it is only turned on
on the targets where it is deemed to make sense overall?


Re: Unnecessary PRE optimization

2009-12-23 Thread Paolo Bonzini

On 12/23/2009 03:05 PM, Joern Rennecke wrote:


So if this is only useful for a limited set of targets, why isn't it
controlled by an option or a target hook so that it is only turned on
on the targets where it is deemed to make sense overall?


Well, this optimization is basically the opposite of loop-invariant 
motion, so there is some merit in not doing it: you could also tweak 
loop-invariant motion to not hoist unnecessarily (and increase register 
pressure unnecessarily) instead of relying on fwprop undoing it.  Indeed 
in trunk loop-invariant motion is not hoisting cheap addresses anymore. 
 Bingfeng is seeing the problem because in his case PRE is doing the 
hoisting rather than LIM.


Paolo


RE: Unnecessary PRE optimization

2009-12-23 Thread Bingfeng Mei
Do you mean if TARGET_ADDRES_COST (non-x86) is defined properly, 
this should be fixed?  Or it requires extra patch?

Bingfeng

> -Original Message-
> From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On 
> Behalf Of Paolo Bonzini
> Sent: 23 December 2009 13:28
> To: Steven Bosscher
> Cc: Bingfeng Mei; gcc@gcc.gnu.org; dber...@dberlin.org
> Subject: Re: Unnecessary PRE optimization
> 
> On 12/23/2009 01:01 PM, Steven Bosscher wrote:
> > On Wed, Dec 23, 2009 at 12:49 PM, Bingfeng 
> Mei  wrote:
> >> Hello,
> >> I encounter an issue with PRE optimization, which created worse
> >
> > Is this at -O2 or -O3?
> 
> I think this could be fixed if fwprop propagated addresses 
> into loops; 
> it doesn't because it made performance worse on x86.  The 
> real reason is 
> "address_cost on x86 sucks and nobody knows how to fix it 
> exactly", but 
> the performance hit was bad enough that we (Steven Bosscher and I) 
> decided to put that hack into fwprop.
> 
> Paolo
> 
> 


Re: Preserving order of variable declarations

2009-12-23 Thread Diego Novillo
On Tue, Dec 22, 2009 at 11:47, Aravinda  wrote:

> Is this is the only way it can be done ? Or can I have
> (non-temporary)variables inserted whos order can be preserved always ?

Not really, the compiler will reorder variables in the stack in almost
arbitrary ways.  Wrapping them in structs is the safest approach.


Diego.


Re: Unnecessary PRE optimization

2009-12-23 Thread Paolo Bonzini

On 12/23/2009 03:27 PM, Bingfeng Mei wrote:

Do you mean if TARGET_ADDRES_COST (non-x86) is defined properly,
this should be fixed?  Or it requires extra patch?


No, if TARGET_ADDRESS_COST was fixed for x86 (and of course defined 
properly for your target), we could fix this very easily.


Paolo


target hooks / plugins

2009-12-23 Thread Joern Rennecke

Target hooks would often be interesting for plugins to modify.  And
some proposed new plugin callbacks would also be interesting to have as
target hooks.  Therefore, I would like target hooks to become writeable
by plugins, and make it much easier to add new target hooks in the GCC
sources.

Right now, to make a new target hook, you have to add a new field in
target.h, define a new default in target-def.h, place the new macro
in exactly the right position there of the right initializer macro,
describe the new hook in tm.texi, and if you need a new function with
a bunch of parameters returning a constant, you have to add this to
hooks.c .

I would like to be able to do all this by adding a single entry in a
new definition file; and the information should also be usable by
plugin sources so that they can automatically make wrappers for all
function-type hooks.

Most of the ICI unroll_parameter_handler / graphite_parameter_handler
callbacks can the be made into target hooks.


RE: Unnecessary PRE optimization

2009-12-23 Thread Bingfeng Mei
It seems that just commenting out this check in fwprop.c should work.
 
 /* Do not propagate loop invariant definitions inside the loop.  */
/*  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
return;*/

Bingfeng

> -Original Message-
> From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On 
> Behalf Of Paolo Bonzini
> Sent: 23 December 2009 15:01
> To: Bingfeng Mei
> Cc: Steven Bosscher; gcc@gcc.gnu.org; dber...@dberlin.org
> Subject: Re: Unnecessary PRE optimization
> 
> On 12/23/2009 03:27 PM, Bingfeng Mei wrote:
> > Do you mean if TARGET_ADDRES_COST (non-x86) is defined properly,
> > this should be fixed?  Or it requires extra patch?
> 
> No, if TARGET_ADDRESS_COST was fixed for x86 (and of course defined 
> properly for your target), we could fix this very easily.
> 
> Paolo
> 
> 


Re: How should I prototype cpp_define in target patch?

2009-12-23 Thread Joseph S. Myers
On Wed, 23 Dec 2009, Andrew Hutchinson wrote:

> builtin_define_std
> cpp_define
> 
> Both from c-cppbuiltin.c. These have no prototypes defined in gcc.

They do have prototypes, in c-common.h and cpplib.h.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Unnecessary PRE optimization

2009-12-23 Thread Ramana Radhakrishnan

On Wed, 2009-12-23 at 16:00 +0100, Paolo Bonzini wrote:
> On 12/23/2009 03:27 PM, Bingfeng Mei wrote:
> > Do you mean if TARGET_ADDRES_COST (non-x86) is defined properly,
> > this should be fixed?  Or it requires extra patch?
> 
> No, if TARGET_ADDRESS_COST was fixed for x86 (and of course defined 
> properly for your target), we could fix this very easily.

This problem appears to affect the ARM port as well - I can see this
being useful for the ARM port and might force us to look at
TARGET_ADDRESS_COST a little more carefully - so if you're happy to post
the fwprop patch I'm happy to test for performance on the ARM and look
at ADDRESS_COST carefully on the ARM.

cheers
Ramana
> 
> Paolo



Re: Unnecessary PRE optimization

2009-12-23 Thread Paolo Bonzini

On 12/23/2009 04:19 PM, Bingfeng Mei wrote:

It seems that just commenting out this check in fwprop.c should work.


Yes, but it would pessimize x86.

Paolo


Re: How should I prototype cpp_define in target patch?

2009-12-23 Thread Andrew Hutchinson

Doh!


Joseph S. Myers wrote:

On Wed, 23 Dec 2009, Andrew Hutchinson wrote:

  

builtin_define_std
cpp_define

Both from c-cppbuiltin.c. These have no prototypes defined in gcc.



They do have prototypes, in c-common.h and cpplib.h.

  


Re: Unnecessary PRE optimization

2009-12-23 Thread H.J. Lu
On Wed, Dec 23, 2009 at 8:41 AM, Paolo Bonzini  wrote:
> On 12/23/2009 04:19 PM, Bingfeng Mei wrote:
>>
>> It seems that just commenting out this check in fwprop.c should work.
>
> Yes, but it would pessimize x86.
>

Is there a bug open for x86? Can't we make it target dependent, something
like

 /* Do not propagate loop invariant definitions inside the loop.  */
 if (targetm.foobar
&& DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
   return;


-- 
H.J.


Re: Unnecessary PRE optimization

2009-12-23 Thread Paolo Bonzini

On 12/23/2009 06:47 PM, H.J. Lu wrote:

On Wed, Dec 23, 2009 at 8:41 AM, Paolo Bonzini  wrote:

On 12/23/2009 04:19 PM, Bingfeng Mei wrote:


It seems that just commenting out this check in fwprop.c should work.


Yes, but it would pessimize x86.



Is there a bug open for x86? Can't we make it target dependent, something
like

  /* Do not propagate loop invariant definitions inside the loop.  */
  if (targetm.foobar
 &&  DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
return;


I'll open a bug.  The solution is to actually understand what the 
address costs are on x86 (apparently it's not true that the more complex 
addressing modes are always better, probably because of instruction 
sizes), not to add a target macro.


Paolo


Re: Unnecessary PRE optimization

2009-12-23 Thread H.J. Lu
On Wed, Dec 23, 2009 at 10:06 AM, Paolo Bonzini  wrote:
> On 12/23/2009 06:47 PM, H.J. Lu wrote:
>>
>> On Wed, Dec 23, 2009 at 8:41 AM, Paolo Bonzini  wrote:
>>>
>>> On 12/23/2009 04:19 PM, Bingfeng Mei wrote:

 It seems that just commenting out this check in fwprop.c should work.
>>>
>>> Yes, but it would pessimize x86.
>>>
>>
>> Is there a bug open for x86? Can't we make it target dependent, something
>> like
>>
>>  /* Do not propagate loop invariant definitions inside the loop.  */
>>  if (targetm.foobar
>>     &&  DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
>>    return;
>
> I'll open a bug.  The solution is to actually understand what the address
> costs are on x86 (apparently it's not true that the more complex addressing
> modes are always better, probably because of instruction sizes), not to add
> a target macro.
>

I will ask around to see if there are any guild lines for this
after the bug is open.

Thanks.


-- 
H.J.


Re: Unnecessary PRE optimization

2009-12-23 Thread Xinliang David Li
Similar situation happens in non loop context as well. PRE commoned
address computation without knowing the existence of advanced
addressing mode, which result in unnecessary address computation
instruction.  The forward substitution code makes local heuristics and
looks at each use individually -- it does not know if the propagation
will happen for all uses and therefore exposes DCE opportunity -- so a
precise cost estimation is not available. Even so, for such cases, a
simple change of 'gain > 0' into 'gain >= 0' in
should_replace_address_p can do the job.

For LIM case discussed in this thread, it is trickier to estimate the
cost of forward substitution without knowing the register pressure --
forward prop MAY increase the live range of the propagated value
(RHS), even though in this case it does not, and it actually shrinks
the LR of the LHS temps, thus reducing register pressure overall.   I
have submitted a live range shrink (LRS) patch a while back, but it
was not accepted.  This address computation propagation can be easily
implemented in the LPS pass with precise knowledge of the change of
register pressure.

In general it will be tricky for latter passes to clean up the messes.
The fundamental problem is that the address computation is exposed to
PRE prematurely (for a given target  ) at GIMPLE level. In this case,
if the INDIRECT_REFs are expressed as MEM_REFs, such problem might be
avoided.  A similar  issue (for ARM) is reported in bug 40956.

Thanks,

David



On Wed, Dec 23, 2009 at 10:06 AM, Paolo Bonzini  wrote:
>
> On 12/23/2009 06:47 PM, H.J. Lu wrote:
>>
>> On Wed, Dec 23, 2009 at 8:41 AM, Paolo Bonzini  wrote:
>>>
>>> On 12/23/2009 04:19 PM, Bingfeng Mei wrote:

 It seems that just commenting out this check in fwprop.c should work.
>>>
>>> Yes, but it would pessimize x86.
>>>
>>
>> Is there a bug open for x86? Can't we make it target dependent, something
>> like
>>
>>  /* Do not propagate loop invariant definitions inside the loop.  */
>>  if (targetm.foobar
>>     &&  DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
>>    return;
>
> I'll open a bug.  The solution is to actually understand what the address 
> costs are on x86 (apparently it's not true that the more complex addressing 
> modes are always better, probably because of instruction sizes), not to add a 
> target macro.
>
> Paolo


Re: Which optimizer should remove redundant subreg of sign_extension?

2009-12-23 Thread Andrew Hutchinson


Paolo Bonzini wrote:


I think that if you add the simplification to simplify-rtx.c's 
simplify_subreg, combine should pick it up automagically.


Paolo



There we have it! There is apparently already  this optimization 
performed - so I will have to dig further into why it does not a happen.


simplify_subreg()

/* If we're requesting the lowpart of a zero or sign extension,
there are three possibilities.  If the outermode is the same
as the origmode, we can omit both the extension and the subreg.
If the outermode is not larger than the origmode, we can apply
the truncation without the extension.  Finally, if the outermode
is larger than the origmode, but both are integer modes, we
can just extend to the appropriate mode.  */




tree check: expected SSA_NAME, have var_decl

2009-12-23 Thread Aravinda
Hi,

After the tree-loop pass, I need to perform some analysis on the loop
and insert a "gimple function call statement" before the loop body.
The function call looks like,

foo(a, 1, 1); for a loop that looks like,

for (i = 0; i < 20; i ++) {
a[i]++;
}

When compiled with -O2, after the tree loop init, the loops have SSA
names for variables. Based on the analysis I do on the loop, I get a
TREE_NODE for the variable 'a', and two integer_one_node s.
I have trouble constructing the gimple_call_statement and adding it
before the loop_body. I am running into 'expected ssa_name have
var_decl' error. I am not sure I can use make_ssa_name for the
TREE_NODE of variable 'a' since I am yet to build a gimple_stmt that
will contain this variable.

How could I insert a call statement after a loop analysis ?

Thanks,
Aravinda


Approval as AVR maintainer

2009-12-23 Thread Andrew Hutchinson

How does one get to be maintainer of port?

Specifically AVR port - so that I do not need to get approval to commit 
changes. The time it takes now is rather longer than getting approval on 
other parts of GCC.


The process does not seem to be written down anywhere - but I am sure 
someone will correct me if I am wrong.


Seasonal Greetings

Andy



RE: Approval as AVR maintainer

2009-12-23 Thread Weddington, Eric
 

> -Original Message-
> From: Andrew Hutchinson [mailto:andrewhutchin...@cox.net] 
> Sent: Wednesday, December 23, 2009 3:25 PM
> To: Denis Chertykov; Anatoly Sokolov; GCC Development; 
> Weddington, Eric
> Subject: Approval as AVR maintainer
> 
> How does one get to be maintainer of port?
> 
> Specifically AVR port - so that I do not need to get approval 
> to commit 
> changes. The time it takes now is rather longer than getting 
> approval on 
> other parts of GCC.

AFAIK, you ask the current maintainers of the AVR port, Denis and Anatoly, 
which you have just done so.

I haven't asked yet, but I'm also interested.


Re: target hooks / plugins

2009-12-23 Thread Joern Rennecke

Quoting Joern Rennecke :

Right now, to make a new target hook, you have to add a new field in
target.h, define a new default in target-def.h, place the new macro
in exactly the right position there of the right initializer macro,
describe the new hook in tm.texi, and if you need a new function with
a bunch of parameters returning a constant, you have to add this to
hooks.c .

I would like to be able to do all this by adding a single entry in a
new definition file; and the information should also be usable by
plugin sources so that they can automatically make wrappers for all
function-type hooks.


I've attached what I have so far.
There is an issue that the struct gcc_target member names don't always agree
with the TARGET_* macro names.

Should I rather change one or the other to make them agree, or add an extra
parameter in the definition file macros to specify these names independently?


Index: target.def
===
--- target.def  (revision 0)
+++ target.def  (revision 0)
@@ -0,0 +1,263 @@
+/* Target hook definitions.
+   Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009
+   Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; see the file COPYING3.  If not see
+   .
+
+   In other words, you are welcome to use, share and improve this program.
+   You are forbidden to forbid anyone else to use, share and improve
+   what you give them.   Help stamp out software-hoarding!  */
+
+/* The following macros should be provided by the including file:
+
+   DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT): Define a function-valued hook.
+   DEFHOOKPOD(DOC, TYPE, NAME, INIT): Define a piece-of-data 'hook'.  */
+
+/* Defaults for optional macros:
+   DEFHOOKPODX(NAME, NAME, INIT): Like DEFHOOKPOD, but share documentation
+   with the previous 'hook'.  */
+#ifndef DEFHOOKPODX
+#define DEFHOOKPODX(NAME, TYPE, INIT) DEFHOOKPOD (NAME, 0, TYPE, INIT)
+#endif
+   
+/* HOOKSTRUCT(FRAGMENT): Declarator fragments to encapsulate all the
+   members into a struct gcc_target, which in turn contains several
+   sub-structs.  */
+#ifndef HOOKSTRUCT
+#define HOOKSTRUCT(FRAGMENT)
+#endif
+
+HOOKSTRUCT (struct gcc_target {)
+
+/* Functions that output assembler for the target.  */
+#define HOOK_PREFIX "TARGET_ASM_"
+HOOKSTRUCT (struct asm_out {)
+
+DEFHOOKPOD
+(open_paren,
+"These target hooks are C string constants, describing the syntax in the\
+ assembler for grouping arithmetic expressions.  If not overridden, they\
+ default to normal parentheses, which is correct for most assemblers.",
+ const char *, "(")
+DEFHOOKPODX (close_paren, const char *, ")")
+
+DEFHOOKPOD
+(byte_op,
+"@deftypevrx {Target Hook} {const char *} TARGET_ASM_ALIGNED_HI_OP\n"
+"@deftypevrx {Target Hook} {const char *} TARGET_ASM_ALIGNED_SI_OP\n"
+"@deftypevrx {Target Hook} {const char *} TARGET_ASM_ALIGNED_DI_OP\n"
+"@deftypevrx {Target Hook} {const char *} TARGET_ASM_ALIGNED_TI_OP\n"
+"@deftypevrx {Target Hook} {const char *} TARGET_ASM_UNALIGNED_HI_OP\n"
+"@deftypevrx {Target Hook} {const char *} TARGET_ASM_UNALIGNED_SI_OP\n"
+"@deftypevrx {Target Hook} {const char *} TARGET_ASM_UNALIGNED_DI_OP\n"
+"@deftypevrx {Target Hook} {const char *} TARGET_ASM_UNALIGNED_TI_OP\n"
+"These hooks specify assembly directives for creating certain kinds\
+ of integer object.  The @code{TARGET_ASM_BYTE_OP} directive creates a\
+ byte-sized object, the @code{TARGET_ASM_ALIGNED_HI_OP} one creates an\
+ aligned two-byte object, and so on.  Any of the hooks may be\
+ @code{NULL}, indicating that no suitable directive is available.\n\n"
+
+"The compiler will print these strings at the start of a new line,\
+ followed immediately by the object's initial value.  In most cases,\
+ the string should contain a tab, a pseudo-op, and then another tab.",
+ const char *, "\t.byte\t")
+DEFHOOKPOD
+(aligned_op, "*", struct asm_int_op,
+ ({ TARGET_ASM_ALIGNED_HI_OP, TARGET_ASM_ALIGNED_SI_OP,
+TARGET_ASM_ALIGNED_DI_OP, TARGET_ASM_ALIGNED_TI_OP }) )
+DEFHOOKPOD
+(unaligned_op, "*", struct asm_int_op,
+ ({ TARGET_ASM_UNALIGNED_HI_OP, TARGET_ASM_UNALIGNED_SI_OP,
+TARGET_ASM_UNALIGNED_DI_OP, TARGET_ASM_UNALIGNED_TI_OP }) )
+/* Assembler instructions for creating various kinds of integer object.  */
+
+DEFHOOK
+(integer,
+"The @code{assemble_integer} function uses this hook to output an\
+ integer object.  @var{x} is the ob