Reloading an auto-increment addresses

2011-02-11 Thread Mohamed Shafi
Hello all,

I am porting GCC 4.5.1 for a private target. For one particular test
reloading pass is being asked to reload the following instruction:

(insn 45 175 46 11 pr20601-1.c:90 (set (reg/f:PQI 3 g3 [70])
(mem/f:PQI (pre_inc:PQI (reg/f:PQI 1 g1 [orig:55 prephitmp.16
] [55])) [2 S1 A32])) 9 {movpqi_op} (expr_list:REG_INC (reg/f:PQI 1 g1
[orig:55 prephitmp.16 ] [55])
(nil)))

The address is invalid in this. Base address should always be stored
in the address register. This instruction gets reloaded  in the
following manner:

(insn 175 43 202 11 pr20601-1.c:90 (set (reg/f:PQI 1 g1 [orig:55
prephitmp.16 ] [55])
(reg/f:PQI 12 as0 [orig:49 e.4 ] [49])) 9 {movpqi_op} (nil))

(insn 202 175 203 11 pr20601-1.c:90 (set (reg/f:PQI 1 g1 [orig:55
prephitmp.16 ] [55])
(plus:PQI (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55])
(const_int 1 [0x1]))) 14 {addpqi3} (nil))

(insn 203 202 45 11 pr20601-1.c:90 (set (reg:PQI 28 a0)
(reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55])) 9 {movpqi_op} (nil))

(insn 45 203 46 11 pr20601-1.c:90 (set (reg/f:PQI 3 g3 [70])
(mem/f:PQI (reg:PQI 28 a0) [2 S1 A32])) 9 {movpqi_op} (nil))

The issue with this reload is that there is no move operation between
GP registers and address registers. So insn 203 is invalid. I am
catching these kinds in secondary reloads, but auto-increment
addressing modes are not handled in that . So if i try to do that in
TARGET_SECONDARY_RELOAD i am getting assert failure from
reload1.c:emit_input_reload_insns() due to the following code:

  /* Auto-increment addresses must be reloaded in a special way.  */
  if (rl->out && ! rl->out_reg)
{
  /* We are not going to bother supporting the case where a
 incremented register can't be copied directly from
 OLDEQUIV since this seems highly unlikely.  */
  gcc_assert (rl->secondary_in_reload < 0);

How can i overcome this failure?  Can some one suggest a solution?

Thanks for the help.

Regards,
Shafi


Re: Reloading an auto-increment addresses

2011-02-11 Thread Paulo J. Matos



On 11/02/11 09:46, Mohamed Shafi wrote:


How can i overcome this failure?  Can some one suggest a solution?




Have you defined TARGET_LEGITIMATE_ADDRESS_P and also BASE_REG_CLASS 
correctly for your target?




Re: Reloading an auto-increment addresses

2011-02-11 Thread Mohamed Shafi
On 11 February 2011 15:28, Paulo J. Matos  wrote:
>
>
> On 11/02/11 09:46, Mohamed Shafi wrote:
>>
>> How can i overcome this failure?  Can some one suggest a solution?
>>
>
>
> Have you defined TARGET_LEGITIMATE_ADDRESS_P and also BASE_REG_CLASS
> correctly for your target?
>
>

Yes, I have. Register allocator is allocating the wrong registers for
the base registers. This probably is due to the fact that address
registers cannot be saved and restored directly, a secondary reload is
required. There is also the restriction that there is no move
operation between the address registers. For that also a secondary
reload is required. (I know its weird). I am trying to figure out why
register allocator is not assigning a base register. But even then,
reload could be asked to reload a auto-increment addresses.

Shafi


Pruning for torture tests?

2011-02-11 Thread IainS

Hi,
On Darwin, we have a number of gcc.c-torture fails reported for both  
ppc and i386 which are bogus (nothing to do with gcc - but simply  
warning output from a system tool).


For dg-based tests these are pruned - I wonder if it would be worth  
adding a prune capability to the torture suites?


opinions?
Iain



Volatile memory is not general operand

2011-02-11 Thread Paulo J. Matos

Hi,

I just noticed something very surprising. There's a clause in 
general_operand (recog.c):


if (! volatile_ok && MEM_VOLATILE_P (op))
return 0;

Oh... so, a MEM_VOLATILE_P is _not_ a general operand? Why? This is also 
not referred to in the documentation of general operand so it kind of 
caught me by surprise.


Before gimplification I have: INT_PRIORITIESD.10391[iD.10456] ={v} 0;

This is converted to a couple of insn that combine tries to combine into:
(set (mem/s/v:QImode ...) (const_int 0))

I have a
(set (match_operand:QImode "nonimmediate_operand" "")
 (match_operand:QImode "general_operand" ""))

However, the insn doesn't match cause nonimmediate_operand returns false 
for (mem/s/v ...).


Why is this? And what can I do to match this set? I guess I can only 
create another pattern. I am just amazed that a volatile memory is not a 
general_operand.


Cheers,

PMatos



Re: Volatile memory is not general operand

2011-02-11 Thread Eric Botcazou
> I just noticed something very surprising. There's a clause in
> general_operand (recog.c):
>
>  if (! volatile_ok && MEM_VOLATILE_P (op))
>  return 0;
>
> Oh... so, a MEM_VOLATILE_P is _not_ a general operand? Why? This is also
> not referred to in the documentation of general operand so it kind of
> caught me by surprise.

It's more of the other way around: MEM_VOLATILE_P is a general operand unless 
explicitly requested via init_recog_no_volatile.  Some passes, like combine, 
don't track the volatileness of operands precisely, so they disable their 
manipulation altogether to avoid generating wrong code.

-- 
Eric Botcazou


Scheduling automaton question

2011-02-11 Thread Bernd Schmidt
Suppose I have two insns, one reserving (A|B|C), and the other reserving
A. I'm observing that when the first one is scheduled in an otherwise
empty state, it reserves the A unit and blocks the second one from being
scheduled in the same cycle. This is a problem when there's an
anti-dependence of cost 0 between the two instructions.

Vlad - two questions. Is this behaviour what you would expect to happen,
and how much work do you think would be involved to fix it (i.e. make
the first one transition to a state where we can still reserve any two
out of the three units)?


Bernd


Re: Volatile memory is not general operand

2011-02-11 Thread Paulo J. Matos

On 11/02/11 12:03, Eric Botcazou wrote:

  if (! volatile_ok&&  MEM_VOLATILE_P (op))
  return 0;

.

It's more of the other way around: MEM_VOLATILE_P is a general operand unless
explicitly requested via init_recog_no_volatile.  Some passes, like combine,
don't track the volatileness of operands precisely, so they disable their
manipulation altogether to avoid generating wrong code.



But the piece of code I quoted above is in general_regs. A (mem/v ...) 
will never be a general_operand, will it?


Without that, the insn

(set (mem/s/v:QImode ...) (const_int 0))

is never matched by

(set (match_operand:QImode "nonimmediate_operand" "")
 (match_operand:QImode "general_operand" ""))



Re: Scheduling automaton question

2011-02-11 Thread Alexander Monakov


On Fri, 11 Feb 2011, Bernd Schmidt wrote:

> Suppose I have two insns, one reserving (A|B|C), and the other reserving
> A. I'm observing that when the first one is scheduled in an otherwise
> empty state, it reserves the A unit and blocks the second one from being
> scheduled in the same cycle. This is a problem when there's an
> anti-dependence of cost 0 between the two instructions.
> 
> Vlad - two questions. Is this behaviour what you would expect to happen,
> and how much work do you think would be involved to fix it (i.e. make
> the first one transition to a state where we can still reserve any two
> out of the three units)?

Could you please clarify a bit: would the modified behavior match what your
target CPU does?  The current behavior matches CPUs without lookahead in
instruction dispatch: the first insn goes to the first matching execution
unit (A), the second has to wait.

Alexander


Re: Scheduling automaton question

2011-02-11 Thread Bernd Schmidt
On 02/11/2011 02:13 PM, Alexander Monakov wrote:
> Could you please clarify a bit: would the modified behavior match what your
> target CPU does?  The current behavior matches CPUs without lookahead in
> instruction dispatch: the first insn goes to the first matching execution
> unit (A), the second has to wait.

The CPU I'm working on needs to specify explicitly which unit an insn is
using, but to generate optimal code that assignment must be made _after_
scheduling all the insns in a given cycle.


Bernd



Re: Volatile memory is not general operand

2011-02-11 Thread Michael Matz
Hi,

On Fri, 11 Feb 2011, Paulo J. Matos wrote:

> On 11/02/11 12:03, Eric Botcazou wrote:
> > >   if (! volatile_ok&&  MEM_VOLATILE_P (op))
> > >   return 0;
> >.
> >
> > It's more of the other way around: MEM_VOLATILE_P is a general operand
> > unless
> > explicitly requested via init_recog_no_volatile.  Some passes, like combine,
> > don't track the volatileness of operands precisely, so they disable their
> > manipulation altogether to avoid generating wrong code.
> >
> 
> But the piece of code I quoted above is in general_regs. A (mem/v ...) will
> never be a general_operand, will it?

The piece of code you quoted also is conditional on volatile_ok.  Connect 
that with what Eric said.


Ciao,
Michael.


Re: Scheduling automaton question

2011-02-11 Thread Alex Turjan
Hi,
According to me at this moment the scheduler does not support your needs.

I was confronted with a similar problem as yours and I solved it by 
implementing the TARGET_SCHED_DFA_NEW_CYCLE hook. Inside of the function which 
supports this hook I choose/set the insn reservation that makes  possible to 
fit as many other insn as possible in the same cycle.

During this process I also update the ready list with insns that become ready 
as a result of scheduling the current insn ( like in your example -insns that 
are anti-dependent on the current insn and which therefore can be scheduled in 
the current cycle). Thus the best insn reservation makes possible scheduling 
antidependent insns of cost zero in the same cycle by avoiding resource 
conflicts. 

Alex


few changes in the gcc mainline sources

--- On Fri, 2/11/11, Bernd Schmidt  wrote:

> From: Bernd Schmidt 
> Subject: Scheduling automaton question
> To: "GCC List" 
> Cc: "Vladimir N. Makarov" 
> Date: Friday, February 11, 2011, 2:33 PM
> Suppose I have two insns, one
> reserving (A|B|C), and the other reserving
> A. I'm observing that when the first one is scheduled in an
> otherwise
> empty state, it reserves the A unit and blocks the second
> one from being
> scheduled in the same cycle. This is a problem when there's
> an
> anti-dependence of cost 0 between the two instructions.
> 
> Vlad - two questions. Is this behaviour what you would
> expect to happen,
> and how much work do you think would be involved to fix it
> (i.e. make
> the first one transition to a state where we can still
> reserve any two
> out of the three units)?
> 
> 
> Bernd
> 


  


Re: Volatile memory is not general operand

2011-02-11 Thread Paulo J. Matos

On 11/02/11 13:56, Michael Matz wrote:


The piece of code you quoted also is conditional on volatile_ok.  Connect
that with what Eric said.



Thanks Michael, I guess I should sleep before asking anything else. Now 
I understand what Eric said.




RE: Vector permutation only deals with # of vector elements same as mask?

2011-02-11 Thread Bingfeng Mei
Thanks. Another question. Is there any plan to vectorize
the loops like the following ones?

for (i=127; i>=0; i--) {
x[i] = y[i] + z[i];
}

I found that GCC trunk still cannot handle negative step
for store. Even it can, it won't be efficient by introducing
redundant permutations on load and store.

Cheers,
Bingfeng
> -Original Message-
> From: Ira Rosen [mailto:i...@il.ibm.com]
> Sent: 10 February 2011 17:22
> To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org
> Subject: Re: Vector permutation only deals with # of vector elements
> same as mask?
> 
> 
> Hi,
> 
> "Bingfeng Mei"  wrote on 10/02/2011 05:35:45 PM:
> >
> > Hi,
> > I noticed that vector permutation gets more use in GCC
> > 4.6, which is great. It is used to handle negative step
> > by reversing vector elements now.
> >
> > However, after reading the related code, I understood
> > that it only works when the # of vector elements is
> > the same as that of mask vector in the following code.
> >
> > perm_mask_for_reverse (tree-vect-stmts.c)
> > ...
> >   mask_type = get_vectype_for_scalar_type (mask_element_type);
> >   nunits = TYPE_VECTOR_SUBPARTS (vectype);
> >   if (!mask_type
> >   || TYPE_VECTOR_SUBPARTS (vectype) != TYPE_VECTOR_SUBPARTS
> (mask_type))
> > return NULL;
> > ...
> >
> > For PowerPC altivec, the mask_type is V16QI. It means that
> > compiler can only permute V16QI type.  But given the capability of
> > altivec vperm instruction, it can permute any 128-bit type
> > (V8HI, V4SI, etc). We just need convert in/out V16QI from
> > given types and a bit more extra work in producing mask.
> >
> > Do I understand correctly or miss something here?
> 
> Yes, you are right. The support of reverse access is somewhat limited.
> Please see vect_transform_slp_perm_load() in tree-vect-slp.c for
> example of
> all type permutation support.
> 
> But, anyway, reverse accesses are not supported for altivec's load
> realignment scheme.
> 
> Ira
> 
> >
> > Thanks,
> > Bingfeng Mei
> >
> >
> >
> >
> 




Re: Vector permutation only deals with # of vector elements same as mask?

2011-02-11 Thread Tim Prince

On 2/11/2011 7:30 AM, Bingfeng Mei wrote:

Thanks. Another question. Is there any plan to vectorize
the loops like the following ones?

 for (i=127; i>=0; i--) {
x[i] = y[i] + z[i];
 }



When I last tried, the Sun compilers could vectorize such loops 
efficiently (for fairly short loops), with appropriate data definitions. 
 The Sun compilers didn't peel for alignment, to improve performance on 
longer loops, as gcc and others do.  For a case with no data overlaps 
(float * __restrict__ x, ,y,z,  or Fortran), loop reversal can do 
the job. gcc has some loop reversal machinery, but I haven't seen it 
used for vectorization.  In a simple case like this, some might argue 
there's no reason to write a backward loop when it could easily be 
reversed in source code, and compilers have been seen to make mistakes 
in reversal.


--
Tim Prince


Re: Scheduling automaton question

2011-02-11 Thread Frédéric RISS
Le vendredi 11 février 2011 à 13:33 +0100, Bernd Schmidt a écrit :
> Suppose I have two insns, one reserving (A|B|C), and the other reserving
> A. I'm observing that when the first one is scheduled in an otherwise
> empty state, it reserves the A unit and blocks the second one from being
> scheduled in the same cycle. This is a problem when there's an
> anti-dependence of cost 0 between the two instructions.

If you generate a NDFA ( using '(automata_option "ndfa")' ) it should
allow you to schedule both instructions together as this should try all
functional unit alternatives. 

Fred



Re: Scheduling automaton question

2011-02-11 Thread Bernd Schmidt
On 02/11/2011 07:43 PM, Frédéric RISS wrote:
> Le vendredi 11 février 2011 à 13:33 +0100, Bernd Schmidt a écrit :
>> Suppose I have two insns, one reserving (A|B|C), and the other reserving
>> A. I'm observing that when the first one is scheduled in an otherwise
>> empty state, it reserves the A unit and blocks the second one from being
>> scheduled in the same cycle. This is a problem when there's an
>> anti-dependence of cost 0 between the two instructions.
> 
> If you generate a NDFA ( using '(automata_option "ndfa")' ) it should
> allow you to schedule both instructions together as this should try all
> functional unit alternatives. 

Ah, that seems to be exactly what I was looking for. Thanks!
I'd expect this won't work too well with define_query_cpu_unit, so I'll
need another method to assign units after scheduling.


Bernd



Re: loop hoisting fails

2011-02-11 Thread Ian Lance Taylor
"Paulo J. Matos"  writes:

> On 10/02/11 16:04, Ian Lance Taylor wrote:
>> Bother.  I've encountered that problem before and I think I used a
>> sledgehammer (a local patch).  It's definitely a bug that gcse doesn't
>> consider costs.
>>
>
> I think I might try also patching my local gcc. I guess the trick is
> to check for the cost of the alternative before making the replacement
> in gcse, right? Is it possible to have an idea of how you did it?

My case was somewhat different.  I think I just patched gcse_constant_p.

Ian


Re: GCC 4.6 performance regressions

2011-02-11 Thread Quentin Neill
On Thu, Feb 10, 2011 at 3:13 AM, Jonathan Wakely  wrote:
> On 10 February 2011 05:18, Quentin Neill wrote:
>> On Wed, Feb 9, 2011 at 2:42 AM, Jonathan Wakely  
>> wrote:
>>> On 9 February 2011 08:34, Sebastian Pop wrote:

 For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so
 building this benchmark with CFLAGS="-O2" would have no effect.
>>>
>>> Why not?
>>>
>>> Ignoring the fact -O3 is the highest level for GCC, the manual says:
>>> "If you use multiple -O options, with or without level numbers, the
>>> last such option is the one that is effective."
>>> http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
>>>
>>> And CFLAGS="-fno-fast-math -O2" would cancel the effects of -ffast-math too.
>>>
>> Because the makefile can override CFLAGS in the environment (or in a
>> make variable) at any time, and GCC wouldn't even see it.
>
> Yes, I know how make works, but the example Sebastian gave is a case
> where CFLAGS from the command-line or environment will be appended to
> the make variable, allowing the default compiler flags to be
> overridden.  Obviously not all makefiles are written that way
> (although most aren't written like your example either) but I was
> referring to a specific case.

Sorry for assuming.

I assumed make variables, I looked at
CFLAGS="-O4 -ffast-math $CFLAGS"
and in my mind I saw
CFLAGS="-O4 -ffast-math $(CFLAGS)"

(If it's env var it should be CFLAGS="-O4 -ffast-math $$CFLAGS")
-- 
Quentin


Re: Scheduling automaton question

2011-02-11 Thread Vladimir Makarov

On 02/11/2011 07:33 AM, Bernd Schmidt wrote:

Suppose I have two insns, one reserving (A|B|C), and the other reserving
A. I'm observing that when the first one is scheduled in an otherwise
empty state, it reserves the A unit and blocks the second one from being
scheduled in the same cycle. This is a problem when there's an
anti-dependence of cost 0 between the two instructions.




Vlad - two questions. Is this behaviour what you would expect to happen,
and how much work do you think would be involved to fix it (i.e. make
the first one transition to a state where we can still reserve any two
out of the three units)?
The scheduler can do what you want but if there is no dependence between 
the instructions.  The first cycle multi-pass insn scheduler will try 
different order of ready insns to choose the best one.  But if there is 
a dependence between the insns only one insn will be in the ready list.  
I think it is a lot of work to implement the same for dependent insns 
(and it probably will make the scheduler much slower).


Still there are other solution for your problem.  The first one proposed 
by Frederic RISS is to use NDFA (*nondeterministic* asutomata).  Itanium 
uses partially such automata for analogous problem.


Another solution (it might not work for some cases) is to use another 
order in alternatives (e.g. C|B|A instead of A|B|C).  In this case the 
generated automata permit to issue the first and then the 2nd insn 
because deterministic automata tries first alternative first.  So the 
state without reservation will go a state with reservation C for the 
first insn, the state with reservation C will go to a state with 
reservation B for the 1st insn etc.





Re: Target deprecations for 4.6

2011-02-11 Thread Nathan Froyd
On Fri, Jan 28, 2011 at 01:11:10AM +, Joseph S. Myers wrote:
> Here is a concrete list I propose for deprecation in 4.6; please send
> any other suggestions...

score-* doesn't have a maintainer and score-elf couldn't build libgcc
last I checked (it was also mentioned in your previous message).

crx-*?  crx-elf can't built libgcc, and hasn't been able to for a while.

-Nathan


Re: Target deprecations for 4.6

2011-02-11 Thread Joseph S. Myers
On Fri, 11 Feb 2011, Nathan Froyd wrote:

> On Fri, Jan 28, 2011 at 01:11:10AM +, Joseph S. Myers wrote:
> > Here is a concrete list I propose for deprecation in 4.6; please send
> > any other suggestions...
> 
> score-* doesn't have a maintainer and score-elf couldn't build libgcc
> last I checked (it was also mentioned in your previous message).
> 
> crx-*?  crx-elf can't built libgcc, and hasn't been able to for a while.

Since the main deprecation patches are now in, feel free to send further 
patches (to config.gcc and the release notes).

I can't quite figure out what the score people are up to, but it doesn't 
appear to involve a simultaneously maintained set of upstream components 
that are usable together in their current upstream forms; they got Linux 
kernel support upstream in 2009 (and don't seem to have maintained it much 
since then), some time after they got GCC support upstream (and then 
stopped maintaining it).

There may also be subtargets or target-specific features worth considering 
for deprecation (e.g. ARM -mwords-little-endian was mentioned in a 
previous cleanup discussion).

-- 
Joseph S. Myers
jos...@codesourcery.com