date:20120504

Re: Paradoxical subreg reload issue

2012-05-04 Thread Aurelien Buhrig

Le 03/05/2012 14:14, Aurelien Buhrig a écrit :
> 02/05/2012 21:36, Eric Botcazou :
>>> I have an issue (gcc 4.6.3, private bacakend) when reloading operands of
>>> this insn:
>>> (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0)
>>>  (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f]))
>>>
>>> The register 21 is reloaded into
>>> (reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register.
>>> Since it is a BIG_ENDIAN target, the SI subreg regno is then -1.
>>>
>>> Note that word_mode is SImode, whereas the class r0 belongs to is
>>> HI-wide. I don't know if this matters when reloading.
>>>
>>> I have no idea how to debug this, if it is a backend or a reload bug.
>>
>> RA/reload is known to have issues with word-mode paradoxical subregs on 
>> big-endian machines.  For example, on SPARC 64-bit, we run into similar 
>> problems for FP regs, which are 32-bit.  Likewise on HP-PA 64-bit I think.
>>
>> So we have kludges in the back-end:
>>
>> /* Defines invalid mode changes.  Borrowed from the PA port.
>>
>>SImode loads to floating-point registers are not zero-extended.
>>The definition for LOAD_EXTEND_OP specifies that integer loads
>>narrower than BITS_PER_WORD will be zero-extended.  As a result,
>>we inhibit changes from SImode unless they are to a mode that is
>>identical in size.
>>
>>Likewise for SFmode, since word-mode paradoxical subregs are
>>problematic on big-endian architectures.  */
>>
>> #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)\
>>   (TARGET_ARCH64 \
>>&& GET_MODE_SIZE (FROM) == 4  \
>>&& GET_MODE_SIZE (TO) != 4\
>>? reg_classes_intersect_p (CLASS, FP_REGS) : 0)
>>
> 
> 
> I modified CANNOT_CHANGE_MODE_CLASS as you suggested. But strange as it
> may seem, it has no effect on such a reload, and I can't find a way to
> make it work...
> 
> BTW, has this bug already been filed?
> Do you have an idea how deep is this bug in the reload, how complex it
> is and which part in the reload it is related to?
> 
> Thanks,
> Aurélien


It seems the paradoxical subreg information is not available from
ira_allocno_t nor ira_object_t when choosing the hw reg.

I finally changed word_mode to the smallest hw reg size (HImode) to work
around this bug.

Aurélien

Re: Paradoxical subreg reload issue

2012-05-04 Thread Eric Botcazou

> I modified CANNOT_CHANGE_MODE_CLASS as you suggested. But strange as it
> may seem, it has no effect on such a reload, and I can't find a way to
> make it work...

The macro is mainly used by the RA, not clear for reload.

> BTW, has this bug already been filed?

In its general form, I'm not sure.  It only occurs for register files that are 
somewhat irregular, and on big-endian, so this is very specific.

> Do you have an idea how deep is this bug in the reload, how complex it
> is and which part in the reload it is related to?

It's the RA and reload, at least.  I guess this should be reasonably fixable by 
specialists, but there probably has been a lack of real incentive to do so.

-- 
Eric Botcazou

Register constraints + and =

2012-05-04 Thread Paulo J. Matos


Hi,

I was just trying to understand exactly what constraint modifiers + and 
= mean. I have read the manual but I am uncertain about their meaning in 
the context of the following rule (without any modifiers):


Expand generates:

(define_insn_and_split "movmem_long"
  [(set (match_operand:QI 2 "register_operand" "d,c") (const_int 0))
   (set (mem:BLK (match_operand:QI 0 "register_operand" "d,c"))
(mem:BLK (match_operand:QI 1 "register_operand" "x,c")))
   (set (match_dup 0) (plus:QI (match_dup 0) (match_dup 2)))
   (set (match_dup 1) (plus:QI (match_dup 1) (match_dup 2)))
   (clobber (match_scratch:QI 3 "w,w"))]
  "!TARGET_NO_BLOCK_COPY"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  if((which_alternative == 0 && REGNO(operands[2]) == RAH))
 || which_alternative == 1)
  {
emit_move_insn(operands[3], operands[0]);
emit_move_insn(operands[0], operands[2]);
emit_move_insn(operands[2], operands[3]);
  }
  emit_insn(gen_bc2());
  DONE;
})

From what I understand + is for input/output operands, = for output 
only operands. Since in the above rule (a parallel) all operands are 
written to and read to, does this mean all their constraints should 
start with +? Or this only applies per set within each parallel (which 
doesn't seem to make much sense)?


Cheers,

--
PMatos

type argument in FUNCTION_ARG macro

2012-05-04 Thread BELBACHIR Selim

Hi,

I'm working on an architecture where the calling convention depends on the type 
of the parameter (i.e. pointers are passed into $C regs and non-pointers are 
passed into $R regs).  I've implemented this difference by using the 
POINTER_TYPE_P() macro on the 'type' argument of the FUNCTION_ARG macro.

I'm having a problem with this approach with calls to libgcc function like  
_Unwind_SjLj_Register(struct foo * ). As this function is invoked as a 
library function, the 'type' argument to the FUNCTION_ARG() macro is NULL.  
Thus, the pointer parameter is not passed as pointer but the function body 
expects a pointer.  

Any ideas on how to get around this problem?

Regards,

   Selim

Re: clear_cache on Alpha architecture not implemented?

2012-05-04 Thread Camm Maguire

Greetings, and thanks for this very helpful synopsis.

I'm wondering if there is a simple configure time test to detect when
this has been fixed.  If I just aborted using __builtin___clear_cache if
it is in fact a noop on alpha, ppc, ppc64, and ia64, would this suffice?

Take care,

Richard Henderson  writes:

> On 05/03/2012 10:51 AM, Camm Maguire wrote:
>> The goal was to exercise the very helpful gcc __builtin___clear_cache
>> support, and to avoid having to maintain our own assembler for all the
>> different cpus in this regard.  Clearly, it is easy to revert this on a
>> per architecture basis if absolutely necessary.  If gcc does or does not
>> plan on fixing this, please let me know so gcl can adjust as needed.
>
> While we can probably fix this, you should know that __builtin_clear_cache
> is highly tied to the implementation of trampolines for the target.  Thus
> there are at least 3 targets that do not handle this "properly":
>
> For alpha, we emit imb directly during the trampoline_init target hook.
>
> For powerpc32, the libgcc routine __clear_cache is unimplemented, but the
> cache flushing for trampolines is inside the __trampoline_setup routine.
>
> For powerpc64 and ia64, the ABI for function calls allows trampolines to
> be implemented without emitting any insns, and thus the icache need not be
> flushed at all.  And thus we never bothered implementing 
> __builtin_clear_cache.
>
> So, the fact of the matter is that you can't reliably use this builtin for
> arbitrary targets for any gcc version up to 4.7.  Feel free to submit an
> enhancement request via bugzilla so that we can remember to address this
> for gcc 4.8.
>

-- 
Camm Maguirec...@maguirefamily.org
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Re: Register constraints + and =

2012-05-04 Thread Ian Lance Taylor

"Paulo J. Matos"  writes:

> Expand generates:
>
> (define_insn_and_split "movmem_long"
>   [(set (match_operand:QI 2 "register_operand" "d,c") (const_int 0))
>(set (mem:BLK (match_operand:QI 0 "register_operand" "d,c"))
> (mem:BLK (match_operand:QI 1 "register_operand" "x,c")))
>(set (match_dup 0) (plus:QI (match_dup 0) (match_dup 2)))
>(set (match_dup 1) (plus:QI (match_dup 1) (match_dup 2)))
>(clobber (match_scratch:QI 3 "w,w"))]
>   "!TARGET_NO_BLOCK_COPY"
>   "#"
>   "&& reload_completed"
>   [(const_int 0)]
> {
>   if((which_alternative == 0 && REGNO(operands[2]) == RAH))
>  || which_alternative == 1)
>   {
> emit_move_insn(operands[3], operands[0]);
> emit_move_insn(operands[0], operands[2]);
> emit_move_insn(operands[2], operands[3]);
>   }
>   emit_insn(gen_bc2());
>   DONE;
> })
>
> From what I understand + is for input/output operands, = for output
> only operands. Since in the above rule (a parallel) all operands are
> written to and read to, does this mean all their constraints should
> start with +? Or this only applies per set within each parallel (which
> doesn't seem to make much sense)?

I agree that there is something wrong here.  I agree that as written
the constraints for operands 0, 1, and 2 should have a '+'.

That said, a '+' constraint is most useful for a pattern that expands
into multiple instructions.  I think this would be better written along
the lines of

  (set (match_operand:QI 2 "register_operand" "=d,c") (const_int 0))
  (set (mem:BLK (match_operand:QI 3 "register_operand" "0")
(match_operand:QI 4 "register_operand" "1")))
  (set (match_operand:QI 0 "register_operand" "=d,c")
   (plus:QI (match_dup 3)
(match_operand:QI 5 "register_operand" "2"
  (set (match_operand:QI 1 "register_operand" "=x,c")
   (plus:QI (match_dup 4) (match_dup 5)))
  (clobber (match_scratch:QI 3 "=w,w"))

Also it looks like it might be possible to add a third alternative such
that that alternative does not require the match_scratch.

Ian

Re: type argument in FUNCTION_ARG macro

2012-05-04 Thread amylaar


Quoting BELBACHIR Selim :


Any ideas on how to get around this problem?


You can look at the name of library functions.

RE: type argument in FUNCTION_ARG macro

2012-05-04 Thread BELBACHIR Selim

That's the only option ? Is there a  more general method to do this ?


-Message d'origine-
De : amyl...@spamcop.net [mailto:amyl...@spamcop.net] 
Envoyé : vendredi 4 mai 2012 15:48
À : BELBACHIR Selim
Cc : gcc@gcc.gnu.org
Objet : Re: type argument in FUNCTION_ARG macro

Quoting BELBACHIR Selim :

> Any ideas on how to get around this problem?

You can look at the name of library functions.

Re: clear_cache on Alpha architecture not implemented?

2012-05-04 Thread Camm Maguire

Greetings!  As a followup, I should note that sh4 and hppa are also
broken.  I currently have this in configure.in

case $use in
 sh4*) ;; #FIXME
 hppa*) ;; #FIXME
 *) 
 AC_MSG_CHECKING(__builtin___clear_cache)
AC_TRY_COMPILE([],
   [void *v,*ve;
__builtin___clear_cache(v,ve);
],
[AC_DEFINE(HAVE_BUILTIN_CLEAR_CACHE)
 AC_MSG_RESULT(yes)],
 AC_MSG_RESULT(no));;
esac

Perhaps I should just add alpha, powerpc, ppc64, and ia64 to the
exception list and deal with this in the future if memory serves.

Take care,


Richard Henderson  writes:

> On 05/03/2012 10:51 AM, Camm Maguire wrote:
>> The goal was to exercise the very helpful gcc __builtin___clear_cache
>> support, and to avoid having to maintain our own assembler for all the
>> different cpus in this regard.  Clearly, it is easy to revert this on a
>> per architecture basis if absolutely necessary.  If gcc does or does not
>> plan on fixing this, please let me know so gcl can adjust as needed.
>
> While we can probably fix this, you should know that __builtin_clear_cache
> is highly tied to the implementation of trampolines for the target.  Thus
> there are at least 3 targets that do not handle this "properly":
>
> For alpha, we emit imb directly during the trampoline_init target hook.
>
> For powerpc32, the libgcc routine __clear_cache is unimplemented, but the
> cache flushing for trampolines is inside the __trampoline_setup routine.
>
> For powerpc64 and ia64, the ABI for function calls allows trampolines to
> be implemented without emitting any insns, and thus the icache need not be
> flushed at all.  And thus we never bothered implementing 
> __builtin_clear_cache.
>
> So, the fact of the matter is that you can't reliably use this builtin for
> arbitrary targets for any gcc version up to 4.7.  Feel free to submit an
> enhancement request via bugzilla so that we can remember to address this
> for gcc 4.8.
>
>
>
> r~
>
>
>
>

-- 
Camm Maguirec...@maguirefamily.org
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Re: type argument in FUNCTION_ARG macro

2012-05-04 Thread Ian Lance Taylor

BELBACHIR Selim  writes:

> I'm working on an architecture where the calling convention depends on the 
> type of the parameter (i.e. pointers are passed into $C regs and non-pointers 
> are passed into $R regs).  I've implemented this difference by using the 
> POINTER_TYPE_P() macro on the 'type' argument of the FUNCTION_ARG macro.
>
> I'm having a problem with this approach with calls to libgcc function like  
> _Unwind_SjLj_Register(struct foo * ). As this function is invoked as a 
> library function, the 'type' argument to the FUNCTION_ARG() macro is NULL.  
> Thus, the pointer parameter is not passed as pointer but the function body 
> expects a pointer.  
>
> Any ideas on how to get around this problem?

If possible, avoid using SJLJ exceptions.  DWARF exceptions are better.

I see three ways to go.

Change the middle-end to avoid using emit_library_call when calling
_Unwind_SjLj_Register.  There is no particular reason for making this a
special library call.  But this is probably a bit painful to implement.

Define INIT_CUMULATIVE_LIBCALL_ARGS for your target, and check the
function.  If it's _Unwind_SjLj_Register, apply special handling.  This
option is nice because you only have to change your backend.

Change the implementation of _Unwind_SjLj_Register to take a parameter
of type uintptr_t, and cast it to struct SjLj_Function_Context *.

Ian

Re: clear_cache on Alpha architecture not implemented?

2012-05-04 Thread Camm Maguire

Greetings!  Last followup:

Nothing here should affect any kfreebsd_amd64 machine, right?  My
understanding is that these do not require cache flushing.  Thus the
failure:

https://buildd.debian.org/status/fetch.php?pkg=acl2&arch=kfreebsd-amd64&ver=4.3-1&stamp=1326315213

mv *saved_acl2.gcl saved_acl2
/usr/bin/make mini-proveall
make[1]: Entering directory 
`/build/buildd-acl2_4.3-1-kfreebsd-amd64-SScGlk/acl2-4.3'
Aborted
make[1]: *** [mini-proveall] Error 134
make[1]: Leaving directory 
`/build/buildd-acl2_4.3-1-kfreebsd-amd64-SScGlk/acl2-4.3'
make: *** [debian/mini-proveall.out] Error 2

which has shown up on

  * Host name:fasch.debian.org
  * Architecture:kfreebsd-amd64
  * Distribution:Debian GNU/Linux
  * Sponsor:
  + Greek Research and Technology Network (GRNET) (hosting)
  + Nordic Gaming (hardware)
  * Processor:PowerEdge 1855

and 

  * Host name:fano.debian.org
  * Architecture:kfreebsd-amd64
  * Distribution:Debian GNU/kFreeBSD
  * Sponsor:
  + University of British Columbia - Department of Electrical and Computer 
Engineering (hosting)
  + Hewlett-Packard (hardware)


but is not reproducible on

  * Host name:asdfasdf.debian.net
  * Architecture:kfreebsd-amd64
  * Distribution:Debian GNU/kFreeBSD
  * Sponsor:
  + ETH Zurich - Department of Physics (hosting+hw)
  * Processor:AMD Sempron 3000+ 1600 MHz

is not related?

Thanks so much!

Richard Henderson  writes:

> On 05/03/2012 10:51 AM, Camm Maguire wrote:
>> The goal was to exercise the very helpful gcc __builtin___clear_cache
>> support, and to avoid having to maintain our own assembler for all the
>> different cpus in this regard.  Clearly, it is easy to revert this on a
>> per architecture basis if absolutely necessary.  If gcc does or does not
>> plan on fixing this, please let me know so gcl can adjust as needed.
>
> While we can probably fix this, you should know that __builtin_clear_cache
> is highly tied to the implementation of trampolines for the target.  Thus
> there are at least 3 targets that do not handle this "properly":
>
> For alpha, we emit imb directly during the trampoline_init target hook.
>
> For powerpc32, the libgcc routine __clear_cache is unimplemented, but the
> cache flushing for trampolines is inside the __trampoline_setup routine.
>
> For powerpc64 and ia64, the ABI for function calls allows trampolines to
> be implemented without emitting any insns, and thus the icache need not be
> flushed at all.  And thus we never bothered implementing 
> __builtin_clear_cache.
>
> So, the fact of the matter is that you can't reliably use this builtin for
> arbitrary targets for any gcc version up to 4.7.  Feel free to submit an
> enhancement request via bugzilla so that we can remember to address this
> for gcc 4.8.
>
>
>
> r~
>
>
>
>

-- 
Camm Maguirec...@maguirefamily.org
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

RE: type argument in FUNCTION_ARG macro

2012-05-04 Thread BELBACHIR Selim

Ok thanks, I'll keep on with plan B (INIT_CUMULATIVE_LIBCALL_ARGS with special 
libcall handling)

Selim

-Message d'origine-
De : Ian Lance Taylor [mailto:i...@google.com] 
Envoyé : vendredi 4 mai 2012 15:58
À : BELBACHIR Selim
Cc : gcc@gcc.gnu.org
Objet : Re: type argument in FUNCTION_ARG macro

BELBACHIR Selim  writes:

> I'm working on an architecture where the calling convention depends on the 
> type of the parameter (i.e. pointers are passed into $C regs and non-pointers 
> are passed into $R regs).  I've implemented this difference by using the 
> POINTER_TYPE_P() macro on the 'type' argument of the FUNCTION_ARG macro.
>
> I'm having a problem with this approach with calls to libgcc function 
> like  _Unwind_SjLj_Register(struct foo * ). As this function is invoked as a 
> library function, the 'type' argument to the FUNCTION_ARG() macro is NULL.  
> Thus, the pointer parameter is not passed as pointer but the function body 
> expects a pointer.
>
> Any ideas on how to get around this problem?

If possible, avoid using SJLJ exceptions.  DWARF exceptions are better.

I see three ways to go.

Change the middle-end to avoid using emit_library_call when calling 
_Unwind_SjLj_Register.  There is no particular reason for making this a special 
library call.  But this is probably a bit painful to implement.

Define INIT_CUMULATIVE_LIBCALL_ARGS for your target, and check the function.  
If it's _Unwind_SjLj_Register, apply special handling.  This option is nice 
because you only have to change your backend.

Change the implementation of _Unwind_SjLj_Register to take a parameter of type 
uintptr_t, and cast it to struct SjLj_Function_Context *.

Ian

Re: clear_cache on Alpha architecture not implemented?

2012-05-04 Thread Richard Henderson

On 05/04/12 07:07, Camm Maguire wrote:
> Nothing here should affect any kfreebsd_amd64 machine, right?

Correct.


r~

Re: Register constraints + and =

2012-05-04 Thread Paulo J. Matos


On 04/05/12 14:44, Ian Lance Taylor wrote:

I agree that there is something wrong here.  I agree that as written
the constraints for operands 0, 1, and 2 should have a '+'.

That said, a '+' constraint is most useful for a pattern that expands
into multiple instructions.  I think this would be better written along
the lines of

   (set (match_operand:QI 2 "register_operand" "=d,c") (const_int 0))
   (set (mem:BLK (match_operand:QI 3 "register_operand" "0")
 (match_operand:QI 4 "register_operand" "1")))
   (set (match_operand:QI 0 "register_operand" "=d,c")
(plus:QI (match_dup 3)
 (match_operand:QI 5 "register_operand" "2"
   (set (match_operand:QI 1 "register_operand" "=x,c")
(plus:QI (match_dup 4) (match_dup 5)))
   (clobber (match_scratch:QI 3 "=w,w"))

Also it looks like it might be possible to add a third alternative such
that that alternative does not require the match_scratch.

Ian



Thanks for your suggestion, I use it in my discussion below.

Unfortunately this is in preparation for my block copy instruction bc2.
bc2 instruction takes source address in register RXL, destination in 
RAH, and count in RAL. After bc2, RAL is set to 0, RXL is RXL + RAL 
(source address plus count) and RAH is RAH + RAL (destination address 
plus count). This is specified as:

(define_insn "bc2"
  [(set (reg:QI RAL) (const_int 0))
   (set (mem:BLK (reg:QI RAH)) (mem:BLK (reg:QI RAL)))
   (set (reg:QI RXL) (plus:QI (reg:QI RXL) (reg:QI RAL)))
   (set (reg:QI RAH) (plus:QI (reg:QI RAH) (reg:QI RAL)))]
  "!TARGET_NO_BLOCK_COPY"
  "bc2")

Unfortunately, and due to problems with GCC47 RA (for reasons mentioned 
in thread "GCC47 movmem breaks RA, GCC46 RA is fine") I am having 
trouble getting this to work as well as it did with GCC46. In GCC46 I 
simply generated, during expand, the correct move insns and then 
expanded the bc2 insn (with the hardcoded registers).


In GCC47 that breaks apart the RA, so I am attempting to get RA to 
understand that indeed, there's only one register each of the values can 
end up in. So now I have something:

(define_expand "movmemqi"
  [(set (match_operand:BLK 0 "memory_operand"); destination
(match_operand:BLK 1 "memory_operand"))   ; source
   (use (match_operand:QI 2 "general_operand"))   ; count
   (match_operand 3 "" "")]
  "!TARGET_NO_BLOCK_COPY && !reload_completed"
{ xap_expand_movmemqi(operands[0], operands[1], operands[2]); DONE; })

(define_insn_and_split "movmem_long"
  [(set (match_operand:QI 2 "register_operand" "=d") (const_int 0))
   (set (mem:BLK (match_operand:QI 3 "register_operand" "0"))
(mem:BLK (match_operand:QI 4 "register_operand" "1")))
   (set (match_operand:QI 0 "register_operand" "=d")
(plus:QI (match_dup 3)
 (match_operand:QI 5 "register_operand" "2")))
   (set (match_operand:QI 1 "register_operand" "=x")
(plus:QI (match_dup 4) (match_dup 5)))
   (clobber (match_scratch:QI 6 "=w"))]
  "!TARGET_NO_BLOCK_COPY"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  if(REGNO(operands[2]) == RAH)
  {
emit_move_insn(operands[6], operands[0]);
emit_move_insn(operands[0], operands[2]);
emit_move_insn(operands[2], operands[6]);
  }
  emit_insn(gen_bc2());
  DONE;
})

xap_expand_movmemqi issues a couple of move_insn if they are required 
and then does a gen_movmem_long.


I am playing a trick with GCC here... The constraint 'd' applies to 
register class DATA_REGS, which constains 2 registers: RAL and RAH. When 
GCC allocates then in bc2, one will be assigned to operand0 and other to 
operand2. If operand2 ends up with RAH, then I use the scratch to swap 
operand2 with operand0. The annoying part is that something RA ends up 
allocating a scratch which I won't use because he was smart enough to 
allocate everything in order (operand2 with RAL and operand0 with RAH). 
As a side note, constraint 'x', refers to class ADDR_REGS which only has 
register RXL.


This solution, however, still causes RA spills which weren't caused by 
GCC46, but it is better than the GCC46 solution to hardcode the 
registers straight off expand phase.


Another thing I tried but whose results are terrible (libgcc doesn't 
even compile due to register spills errors) is to have one class just 
for RAL, one just for RAH and use those constraints, thereby avoiding 
the scratch altogether. Turns out GCC doesn't seem to like that.


If you have any further suggestions or ideas that I can try out, please 
let me know.


--
PMatos

Re: Register constraints + and =

2012-05-04 Thread Paul_Koning


On May 4, 2012, at 9:44 AM, Ian Lance Taylor wrote:

> "Paulo J. Matos"  writes:
> 
>> Expand generates:
>> 
>> (define_insn_and_split "movmem_long"
>>  [(set (match_operand:QI 2 "register_operand" "d,c") (const_int 0))
>>   (set (mem:BLK (match_operand:QI 0 "register_operand" "d,c"))
>>(mem:BLK (match_operand:QI 1 "register_operand" "x,c")))
>>   (set (match_dup 0) (plus:QI (match_dup 0) (match_dup 2)))
>>   (set (match_dup 1) (plus:QI (match_dup 1) (match_dup 2)))
>>   (clobber (match_scratch:QI 3 "w,w"))]
>>  "!TARGET_NO_BLOCK_COPY"
>>  "#"
>>  "&& reload_completed"
>>  [(const_int 0)]
>> {
>>  if((which_alternative == 0 && REGNO(operands[2]) == RAH))
>> || which_alternative == 1)
>>  {
>>emit_move_insn(operands[3], operands[0]);
>>emit_move_insn(operands[0], operands[2]);
>>emit_move_insn(operands[2], operands[3]);
>>  }
>>  emit_insn(gen_bc2());
>>  DONE;
>> })
>> 
>> From what I understand + is for input/output operands, = for output
>> only operands. Since in the above rule (a parallel) all operands are
>> written to and read to, does this mean all their constraints should
>> start with +? Or this only applies per set within each parallel (which
>> doesn't seem to make much sense)?
> 
> I agree that there is something wrong here.  I agree that as written
> the constraints for operands 0, 1, and 2 should have a '+'.
> 
> That said, a '+' constraint is most useful for a pattern that expands
> into multiple instructions.  I think this would be better written along
> the lines of
> 
>  (set (match_operand:QI 2 "register_operand" "=d,c") (const_int 0))
>  (set (mem:BLK (match_operand:QI 3 "register_operand" "0")
>(match_operand:QI 4 "register_operand" "1")))
>  (set (match_operand:QI 0 "register_operand" "=d,c")
>   (plus:QI (match_dup 3)
>(match_operand:QI 5 "register_operand" "2"
>  (set (match_operand:QI 1 "register_operand" "=x,c")
>   (plus:QI (match_dup 4) (match_dup 5)))
>  (clobber (match_scratch:QI 3 "=w,w"))
> 
> Also it looks like it might be possible to add a third alternative such
> that that alternative does not require the match_scratch.

I thought that the "operand" in a mem:BLK is the pointer to the block, not the 
block itself.  So if the instruction(s) generated don't touch the pointer -- a 
likely answer for a block-move instruction -- then the operand would be 
read-only.  Is that the right interpretation?

What I ended up doing in pdp11.md is to add "clobber" clauses for the operands, 
because the generated code is typically a block-copy loop that steps the 
pointer registers through the buffer.  It appeared to do the right thing, but 
I'll admit it was more of a "try this until it works" type of thing rather than 
a deep understanding of the precisely correct interpretation.

paul

Re: Register constraints + and =

2012-05-04 Thread Ian Lance Taylor

 writes:

> I thought that the "operand" in a mem:BLK is the pointer to the block,
> not the block itself.  So if the instruction(s) generated don't touch
> the pointer -- a likely answer for a block-move instruction -- then
> the operand would be read-only.  Is that the right interpretation?

Yes.

But many block move instructions do in fact touch the pointer, in that
they update the registers pointing to the starts of the blocks to point
to the ends after the instruction completes.

Ian

Re: Register constraints + and =

2012-05-04 Thread Paul_Koning

On May 4, 2012, at 11:39 AM, Ian Lance Taylor wrote:

>  writes:
> 
>> I thought that the "operand" in a mem:BLK is the pointer to the block,
>> not the block itself.  So if the instruction(s) generated don't touch
>> the pointer -- a likely answer for a block-move instruction -- then
>> the operand would be read-only.  Is that the right interpretation?
> 
> Yes.
> 
> But many block move instructions do in fact touch the pointer, in that
> they update the registers pointing to the starts of the blocks to point
> to the ends after the instruction completes.

I interpreted + to mean that the operand is written with a value known to the 
compiler, as opposed to clobber which means that the value is not known (or not 
one that can be described to the compiler).   So I take it that for mem:BLK a + 
operand is interpreted as final value == end of the buffer?  Or byte after the 
buffer?

paul

Re: clear_cache on Alpha architecture not implemented?

2012-05-04 Thread Richard Henderson

On 05/04/12 06:39, Camm Maguire wrote:
> I'm wondering if there is a simple configure time test to detect when
> this has been fixed.  If I just aborted using __builtin___clear_cache if
> it is in fact a noop on alpha, ppc, ppc64, and ia64, would this suffice?

I can't think of any simple, portable test.

The only reliable test would be to actually attempt to flush a cache,
with some detectable way to see this didn't happen.  This tends to get 
highly target specific quickly...

A pattern that would at least apply to 32-bit insn word risc might be

  int test_routine[2] = {
 "mov 1, v0"
 "ret"
  };
  #define call_test ((int (*)(void))test_routine)
  int main()
  {
call_test();  // make sure the routine is in icache
test_routine[0] = "mov 0, v0";
__builtin__clear_cache(test_routine, test_routine+2);
return call_test();
  }

for target-dependent values of those instructions.

r~

Re: Register constraints + and =

2012-05-04 Thread Ian Lance Taylor

 writes:

> On May 4, 2012, at 11:39 AM, Ian Lance Taylor wrote:
>
>>  writes:
>> 
>>> I thought that the "operand" in a mem:BLK is the pointer to the block,
>>> not the block itself.  So if the instruction(s) generated don't touch
>>> the pointer -- a likely answer for a block-move instruction -- then
>>> the operand would be read-only.  Is that the right interpretation?
>> 
>> Yes.
>> 
>> But many block move instructions do in fact touch the pointer, in that
>> they update the registers pointing to the starts of the blocks to point
>> to the ends after the instruction completes.
>
> I interpreted + to mean that the operand is written with a value known to the 
> compiler, as opposed to clobber which means that the value is not known (or 
> not one that can be described to the compiler).   So I take it that for 
> mem:BLK a + operand is interpreted as final value == end of the buffer?  Or 
> byte after the buffer?

H.  I don't really know what you mean, so there is some sort of
communication difficulty.

A '+' in a constraint for an operand means that the operand is both read
and written by the instruction.  It's relatively unusual to find such a
constraint in a GCC backend.  In a GCC backend, it's more common to
write the instruction as a PARALLEL with one insn that sets the operand,
another insn that uses the operand, and a matching constraint to put
both operands in the same register.

A '+' in a constraint doesn't say anything at all about what value the
register has after the insn.  That is expressed only in the RTL.

The place where you often see '+' in a constraint is in an asm
instruction.  The asm instruction could also use a matching constraint,
but there is generally less point since the asm instruction can't say
anything about the value the register will have after the asm executes.

Comparing '+' in a constraint and CLOBBER doesn't make sense.  The '+'
tells the register allocator something about which registers it may use.
In particular, an operand with a '+' constraint may not be placed in a
register that holds either an input or an output operand.  An operand
with an '=' constraint, on the other hand, may be placed in the same
register as an input operand.

Constraints like '+' matter to the register allocator and reload.  RTL
constructs like CLOBBER matter to the RTL optimizers.  They are
different categories of things.

Ian

Re: Register constraints + and =

2012-05-04 Thread Paul_Koning


On May 4, 2012, at 1:52 PM, Ian Lance Taylor wrote:

>  writes:
> 
>> On May 4, 2012, at 11:39 AM, Ian Lance Taylor wrote:
>> 
>>>  writes:
>>> 
 I thought that the "operand" in a mem:BLK is the pointer to the block,
 not the block itself.  So if the instruction(s) generated don't touch
 the pointer -- a likely answer for a block-move instruction -- then
 the operand would be read-only.  Is that the right interpretation?
>>> 
>>> Yes.
>>> 
>>> But many block move instructions do in fact touch the pointer, in that
>>> they update the registers pointing to the starts of the blocks to point
>>> to the ends after the instruction completes.
>> 
>> I interpreted + to mean that the operand is written with a value known to 
>> the compiler, as opposed to clobber which means that the value is not known 
>> (or not one that can be described to the compiler).   So I take it that for 
>> mem:BLK a + operand is interpreted as final value == end of the buffer?  Or 
>> byte after the buffer?
> 
> H.  I don't really know what you mean, so there is some sort of
> communication difficulty.
> 
> A '+' in a constraint for an operand means that the operand is both read
> and written by the instruction.  It's relatively unusual to find such a
> constraint in a GCC backend.  In a GCC backend, it's more common to
> write the instruction as a PARALLEL with one insn that sets the operand,
> another insn that uses the operand, and a matching constraint to put
> both operands in the same register.
> 
> A '+' in a constraint doesn't say anything at all about what value the
> register has after the insn.  That is expressed only in the RTL.
> 
> The place where you often see '+' in a constraint is in an asm
> instruction.  The asm instruction could also use a matching constraint,
> but there is generally less point since the asm instruction can't say
> anything about the value the register will have after the asm executes.
> 
> Comparing '+' in a constraint and CLOBBER doesn't make sense.  The '+'
> tells the register allocator something about which registers it may use.
> In particular, an operand with a '+' constraint may not be placed in a
> register that holds either an input or an output operand.  An operand
> with an '=' constraint, on the other hand, may be placed in the same
> register as an input operand.
> 
> Constraints like '+' matter to the register allocator and reload.  RTL
> constructs like CLOBBER matter to the RTL optimizers.  They are
> different categories of things.
> 
> Ian

Thanks, that helps.

What I was trying to describe is the handling of a memcpy operation in the .md 
file, where the operands are the memory pointers and (in my case) I want to 
tell the machinery that the registers it's using to pass in the addresses no 
longer have those addresses in them on completion.  So I put in clobbers to say 
that.  What I really wanted to do is express that the pointer registers, on 
completion, point just past the buffer, so the optimizer could take advantage 
of that, but it wasn't clear how one would do that.

paul

Re: Register constraints + and =

2012-05-04 Thread Ian Lance Taylor

 writes:

> What I was trying to describe is the handling of a memcpy operation in the 
> .md file, where the operands are the memory pointers and (in my case) I want 
> to tell the machinery that the registers it's using to pass in the addresses 
> no longer have those addresses in them on completion.  So I put in clobbers 
> to say that.  What I really wanted to do is express that the pointer 
> registers, on completion, point just past the buffer, so the optimizer could 
> take advantage of that, but it wasn't clear how one would do that.

The i386 rep_movqi insn is an example:

(define_insn "*rep_movqi"
  [(set (match_operand:P 2 "register_operand" "=c") (const_int 0))
   (set (match_operand:P 0 "register_operand" "=D")
(plus:P (match_operand:P 3 "register_operand" "0")
(match_operand:P 5 "register_operand" "2")))
   (set (match_operand:P 1 "register_operand" "=S")
(plus:P (match_operand:P 4 "register_operand" "1") (match_dup 5)))
   (set (mem:BLK (match_dup 3))
(mem:BLK (match_dup 4)))
   (use (match_dup 5))]
  "!(fixed_regs[CX_REG] || fixed_regs[SI_REG] || fixed_regs[DI_REG])"
  "%^rep{%;} movsb"
  [(set_attr "type" "str")
   (set_attr "prefix_rep" "1")
   (set_attr "memory" "both")
   (set_attr "mode" "QI")])

Note that since this is a define_insn, the four SETs in the pattern are
run in parallel.  The input operands are 3 (destination pointer), 4
(source pointer), 5 (count).  The output operands are 0 (pointer past
destination block), 1 (pointer past source block), 2 (set to zero).
Matching constraints are used to put each input operand in the same
register as the corresponding output operand.

Ian

Re: Why does lower-subreg mark copied pseudos as "decomposable"?

2012-05-04 Thread Andrew Stubbs


On 19/04/12 17:36, Andrew Stubbs wrote:

On 18/04/12 21:47, Richard Sandiford wrote:

I still prefer the idea of disabling in the first pass. It'll need to
be tested on something like non-NEON ARM to see whether it makes things
worse or better there. (I think size testing would be fine.)


I'll have a go, and see what happens.


So far I've found that many examples give smaller code with this change, 
and a few examples that give larger code. However, on average it appears 
to give better code, size wise. This is on ARM when NEON is not enabled; 
when NEON is enabled the results are far better, as expected.


I did have a small example that showed much worse register allocation, 
but I can't reproduce that with the latest trunk.


Most of the size reductions can be explained by use of 64-bit loads and 
stores, rather that pairs of 32-bit accesses.


In thumb mode, one cause of size increases appears to be that there are 
no more instructions, but that it has used 32-bit opcodes rather than 
16-bit ones; this is unfortunate.


Otherwise, it's very difficult to identify where the tiny size increases 
come from.


As an example, I compiled (a slightly old copy of) gcc/expmed.c which 
contains a lot of 64-bit operations, and compared the output sizes at 
-O2. Of 43 functions, 37 show no change whatsoever, 5 showed a reduction 
(21 bytes on average), and 1 function showed a 20 byte increase.


The end result is I'm going to try produce a proper patch to post.

Andrew

Re: Why doesn't GCC generate conditional move for COND_EXPR?

2012-05-04 Thread Andrew Pinski

On Tue, Oct 25, 2011 at 4:28 AM, Bingfeng Mei  wrote:
> Thanks, Andrew. I also implemented a quick patch on our port (based on GCC 
> 4.5).
> I noticed it produced better code now for our applications. Maybe eliminating
> control flow in earlier stage helps other optimizing passes. Currently, tree
> if-conversion pass is not turned on by default (only with tree vectorization
> or some other passes). Maybe it is worth to make it default at -O2 (for those
> processors support conditional move)?

I just committed the patch which does the expansion of COND_EXPR to
condmov to the trunk.  I have more patches which do what ifcvt does
but in phiopt (which seems better in general as ifcvt work only over
loops).  I hope to post those patches in the coming weeks.

Thanks,
Andrew Pinski

>
> Cheers,
> Bingfeng
>
>> -Original Message-
>> From: Andrew Pinski [mailto:pins...@gmail.com]
>> Sent: 24 October 2011 17:20
>> To: Richard Guenther
>> Cc: Bingfeng Mei; gcc@gcc.gnu.org
>> Subject: Re: Why doesn't GCC generate conditional move for COND_EXPR?
>>
>> On Mon, Oct 24, 2011 at 7:00 AM, Richard Guenther
>>  wrote:
>> > On Mon, Oct 24, 2011 at 2:55 PM, Bingfeng Mei 
>> wrote:
>> >> Hello,
>> >> I noticed that COND_EXPR is not expanded to conditional move
>> >> as MIN_EXPR/MAX_EXPR are (assuming movmodecc is available).
>> >> I wonder why not?
>> >>
>> >> I have some loop that fails tree vectorization, but still contains
>> >> COND_EXPR from tree ifcvt pass. In the end, the generated code
>> >> is worse than if I don't turned -ftree-vectorize on.  This
>> >> is on our private port.
>> >
>> > Because nobody touched COND_EXPR expansion since ages.
>>
>> I have a patch which I will be submitting next week or so that does
>> this expansion correctly.  In fact I have a few patches which improves
>> the generation of COND_EXPR in simple cases (in PHI-OPT).
>>
>> Thanks,
>> Andrew Pinski
>

gcc-4.6-20120504 is now available

2012-05-04 Thread gccadmin

Snapshot gcc-4.6-20120504 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20120504/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch 
revision 187183

You'll find:

 gcc-4.6-20120504.tar.bz2 Complete GCC

  MD5=ab4fd68586e9809b76307cdda08b5608
  SHA1=4fccc9ca61d0df00f7923691eaf8d4f6d00697ac

Diffs from 4.6-20120427 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

small problem with auto (c++0x)

2012-05-04 Thread Amit Markel

Hello, please consider the little program below which was compiled on GCC 4.7.0.

At the line containing the comment /* auto */, when using auto instead of 
vector, the expected result which would be [3][6][9][12][15] is computed 
as [6][12][15][15][15] for some reason. Despite the high chance of the fault 
probably being mine, I think it'd be interesting either-way.

#include 
#include 
#include 

using namespace std;

#define NUM_OF_THREADS 5

int data[NUM_OF_THREADS] = {0};

void add(int* dest, const vector& arr) {
*dest = 0;
for ( const auto& x : arr ) {
*dest += x;
}
}

int main(void) {
vector threads;
// data[i] := i + i+1 + i+2
for ( int i=0; i arr = {i,i+1,i+2};
threads.push_back( thread{add,data+i,arr} );
}

for ( auto& thread : threads ) {
thread.join();
}

for ( const auto& dat : data ) {
cout << "[" << dat << "]";
} cout << endl;

return 0;
}

Amit Markel

Re: small problem with auto (c++0x)

2012-05-04 Thread Jonathan Wakely

On 5 May 2012 00:04, Amit Markel wrote:
> Hello, please consider the little program below which was compiled on GCC 
> 4.7.0.

Your question is inappropriate on this list which is for discussing
development of GCC not with GCC.  Your question would be appropriate
on the gcc-help list, please take any follow up there, thanks.

> /*              auto */ vector arr = {i,i+1,i+2};

With 'auto' here the type is not vector, it's
std::initializer_list, which is a lightweight utility type that
refers to a temporary array. An initializer_list is intended to be
used only as a temporary or other short-lived type.

In your code the initializer_list gets copied into the std::thread,
but then the temporary array it uses goes out of scope. When the add()
function runs a std::vector is constructed from the initializer_list,
copying the data from the invalid array that has already gone out of
scope.

Re: Paradoxical subreg reload issue

Re: Paradoxical subreg reload issue

Register constraints + and =

type argument in FUNCTION_ARG macro

Re: clear_cache on Alpha architecture not implemented?

Re: Register constraints + and =

Re: type argument in FUNCTION_ARG macro

RE: type argument in FUNCTION_ARG macro

Re: clear_cache on Alpha architecture not implemented?

Re: type argument in FUNCTION_ARG macro

Re: clear_cache on Alpha architecture not implemented?

RE: type argument in FUNCTION_ARG macro

Re: clear_cache on Alpha architecture not implemented?

Re: Register constraints + and =

Re: Register constraints + and =

Re: Register constraints + and =

Re: Register constraints + and =

Re: clear_cache on Alpha architecture not implemented?

Re: Register constraints + and =

Re: Register constraints + and =

Re: Register constraints + and =

Re: Why does lower-subreg mark copied pseudos as "decomposable"?

Re: Why doesn't GCC generate conditional move for COND_EXPR?

gcc-4.6-20120504 is now available

small problem with auto (c++0x)

Re: small problem with auto (c++0x)

26 matches

Site Navigation

Mail list logo

Footer information