Re: Paradoxical subreg reload issue
Le 03/05/2012 14:14, Aurelien Buhrig a écrit : > 02/05/2012 21:36, Eric Botcazou : >>> I have an issue (gcc 4.6.3, private bacakend) when reloading operands of >>> this insn: >>> (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0) >>> (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f])) >>> >>> The register 21 is reloaded into >>> (reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register. >>> Since it is a BIG_ENDIAN target, the SI subreg regno is then -1. >>> >>> Note that word_mode is SImode, whereas the class r0 belongs to is >>> HI-wide. I don't know if this matters when reloading. >>> >>> I have no idea how to debug this, if it is a backend or a reload bug. >> >> RA/reload is known to have issues with word-mode paradoxical subregs on >> big-endian machines. For example, on SPARC 64-bit, we run into similar >> problems for FP regs, which are 32-bit. Likewise on HP-PA 64-bit I think. >> >> So we have kludges in the back-end: >> >> /* Defines invalid mode changes. Borrowed from the PA port. >> >>SImode loads to floating-point registers are not zero-extended. >>The definition for LOAD_EXTEND_OP specifies that integer loads >>narrower than BITS_PER_WORD will be zero-extended. As a result, >>we inhibit changes from SImode unless they are to a mode that is >>identical in size. >> >>Likewise for SFmode, since word-mode paradoxical subregs are >>problematic on big-endian architectures. */ >> >> #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)\ >> (TARGET_ARCH64 \ >>&& GET_MODE_SIZE (FROM) == 4 \ >>&& GET_MODE_SIZE (TO) != 4\ >>? reg_classes_intersect_p (CLASS, FP_REGS) : 0) >> > > > I modified CANNOT_CHANGE_MODE_CLASS as you suggested. But strange as it > may seem, it has no effect on such a reload, and I can't find a way to > make it work... > > BTW, has this bug already been filed? > Do you have an idea how deep is this bug in the reload, how complex it > is and which part in the reload it is related to? > > Thanks, > Aurélien It seems the paradoxical subreg information is not available from ira_allocno_t nor ira_object_t when choosing the hw reg. I finally changed word_mode to the smallest hw reg size (HImode) to work around this bug. Aurélien
Re: Paradoxical subreg reload issue
> I modified CANNOT_CHANGE_MODE_CLASS as you suggested. But strange as it > may seem, it has no effect on such a reload, and I can't find a way to > make it work... The macro is mainly used by the RA, not clear for reload. > BTW, has this bug already been filed? In its general form, I'm not sure. It only occurs for register files that are somewhat irregular, and on big-endian, so this is very specific. > Do you have an idea how deep is this bug in the reload, how complex it > is and which part in the reload it is related to? It's the RA and reload, at least. I guess this should be reasonably fixable by specialists, but there probably has been a lack of real incentive to do so. -- Eric Botcazou
Register constraints + and =
Hi, I was just trying to understand exactly what constraint modifiers + and = mean. I have read the manual but I am uncertain about their meaning in the context of the following rule (without any modifiers): Expand generates: (define_insn_and_split "movmem_long" [(set (match_operand:QI 2 "register_operand" "d,c") (const_int 0)) (set (mem:BLK (match_operand:QI 0 "register_operand" "d,c")) (mem:BLK (match_operand:QI 1 "register_operand" "x,c"))) (set (match_dup 0) (plus:QI (match_dup 0) (match_dup 2))) (set (match_dup 1) (plus:QI (match_dup 1) (match_dup 2))) (clobber (match_scratch:QI 3 "w,w"))] "!TARGET_NO_BLOCK_COPY" "#" "&& reload_completed" [(const_int 0)] { if((which_alternative == 0 && REGNO(operands[2]) == RAH)) || which_alternative == 1) { emit_move_insn(operands[3], operands[0]); emit_move_insn(operands[0], operands[2]); emit_move_insn(operands[2], operands[3]); } emit_insn(gen_bc2()); DONE; }) From what I understand + is for input/output operands, = for output only operands. Since in the above rule (a parallel) all operands are written to and read to, does this mean all their constraints should start with +? Or this only applies per set within each parallel (which doesn't seem to make much sense)? Cheers, -- PMatos
type argument in FUNCTION_ARG macro
Hi, I'm working on an architecture where the calling convention depends on the type of the parameter (i.e. pointers are passed into $C regs and non-pointers are passed into $R regs). I've implemented this difference by using the POINTER_TYPE_P() macro on the 'type' argument of the FUNCTION_ARG macro. I'm having a problem with this approach with calls to libgcc function like _Unwind_SjLj_Register(struct foo * ). As this function is invoked as a library function, the 'type' argument to the FUNCTION_ARG() macro is NULL. Thus, the pointer parameter is not passed as pointer but the function body expects a pointer. Any ideas on how to get around this problem? Regards, Selim
Re: clear_cache on Alpha architecture not implemented?
Greetings, and thanks for this very helpful synopsis. I'm wondering if there is a simple configure time test to detect when this has been fixed. If I just aborted using __builtin___clear_cache if it is in fact a noop on alpha, ppc, ppc64, and ia64, would this suffice? Take care, Richard Henderson writes: > On 05/03/2012 10:51 AM, Camm Maguire wrote: >> The goal was to exercise the very helpful gcc __builtin___clear_cache >> support, and to avoid having to maintain our own assembler for all the >> different cpus in this regard. Clearly, it is easy to revert this on a >> per architecture basis if absolutely necessary. If gcc does or does not >> plan on fixing this, please let me know so gcl can adjust as needed. > > While we can probably fix this, you should know that __builtin_clear_cache > is highly tied to the implementation of trampolines for the target. Thus > there are at least 3 targets that do not handle this "properly": > > For alpha, we emit imb directly during the trampoline_init target hook. > > For powerpc32, the libgcc routine __clear_cache is unimplemented, but the > cache flushing for trampolines is inside the __trampoline_setup routine. > > For powerpc64 and ia64, the ABI for function calls allows trampolines to > be implemented without emitting any insns, and thus the icache need not be > flushed at all. And thus we never bothered implementing > __builtin_clear_cache. > > So, the fact of the matter is that you can't reliably use this builtin for > arbitrary targets for any gcc version up to 4.7. Feel free to submit an > enhancement request via bugzilla so that we can remember to address this > for gcc 4.8. > -- Camm Maguirec...@maguirefamily.org == "The earth is but one country, and mankind its citizens." -- Baha'u'llah
Re: Register constraints + and =
"Paulo J. Matos" writes: > Expand generates: > > (define_insn_and_split "movmem_long" > [(set (match_operand:QI 2 "register_operand" "d,c") (const_int 0)) >(set (mem:BLK (match_operand:QI 0 "register_operand" "d,c")) > (mem:BLK (match_operand:QI 1 "register_operand" "x,c"))) >(set (match_dup 0) (plus:QI (match_dup 0) (match_dup 2))) >(set (match_dup 1) (plus:QI (match_dup 1) (match_dup 2))) >(clobber (match_scratch:QI 3 "w,w"))] > "!TARGET_NO_BLOCK_COPY" > "#" > "&& reload_completed" > [(const_int 0)] > { > if((which_alternative == 0 && REGNO(operands[2]) == RAH)) > || which_alternative == 1) > { > emit_move_insn(operands[3], operands[0]); > emit_move_insn(operands[0], operands[2]); > emit_move_insn(operands[2], operands[3]); > } > emit_insn(gen_bc2()); > DONE; > }) > > From what I understand + is for input/output operands, = for output > only operands. Since in the above rule (a parallel) all operands are > written to and read to, does this mean all their constraints should > start with +? Or this only applies per set within each parallel (which > doesn't seem to make much sense)? I agree that there is something wrong here. I agree that as written the constraints for operands 0, 1, and 2 should have a '+'. That said, a '+' constraint is most useful for a pattern that expands into multiple instructions. I think this would be better written along the lines of (set (match_operand:QI 2 "register_operand" "=d,c") (const_int 0)) (set (mem:BLK (match_operand:QI 3 "register_operand" "0") (match_operand:QI 4 "register_operand" "1"))) (set (match_operand:QI 0 "register_operand" "=d,c") (plus:QI (match_dup 3) (match_operand:QI 5 "register_operand" "2" (set (match_operand:QI 1 "register_operand" "=x,c") (plus:QI (match_dup 4) (match_dup 5))) (clobber (match_scratch:QI 3 "=w,w")) Also it looks like it might be possible to add a third alternative such that that alternative does not require the match_scratch. Ian
Re: type argument in FUNCTION_ARG macro
Quoting BELBACHIR Selim : Any ideas on how to get around this problem? You can look at the name of library functions.
RE: type argument in FUNCTION_ARG macro
That's the only option ? Is there a more general method to do this ? -Message d'origine- De : amyl...@spamcop.net [mailto:amyl...@spamcop.net] Envoyé : vendredi 4 mai 2012 15:48 À : BELBACHIR Selim Cc : gcc@gcc.gnu.org Objet : Re: type argument in FUNCTION_ARG macro Quoting BELBACHIR Selim : > Any ideas on how to get around this problem? You can look at the name of library functions.
Re: clear_cache on Alpha architecture not implemented?
Greetings! As a followup, I should note that sh4 and hppa are also broken. I currently have this in configure.in case $use in sh4*) ;; #FIXME hppa*) ;; #FIXME *) AC_MSG_CHECKING(__builtin___clear_cache) AC_TRY_COMPILE([], [void *v,*ve; __builtin___clear_cache(v,ve); ], [AC_DEFINE(HAVE_BUILTIN_CLEAR_CACHE) AC_MSG_RESULT(yes)], AC_MSG_RESULT(no));; esac Perhaps I should just add alpha, powerpc, ppc64, and ia64 to the exception list and deal with this in the future if memory serves. Take care, Richard Henderson writes: > On 05/03/2012 10:51 AM, Camm Maguire wrote: >> The goal was to exercise the very helpful gcc __builtin___clear_cache >> support, and to avoid having to maintain our own assembler for all the >> different cpus in this regard. Clearly, it is easy to revert this on a >> per architecture basis if absolutely necessary. If gcc does or does not >> plan on fixing this, please let me know so gcl can adjust as needed. > > While we can probably fix this, you should know that __builtin_clear_cache > is highly tied to the implementation of trampolines for the target. Thus > there are at least 3 targets that do not handle this "properly": > > For alpha, we emit imb directly during the trampoline_init target hook. > > For powerpc32, the libgcc routine __clear_cache is unimplemented, but the > cache flushing for trampolines is inside the __trampoline_setup routine. > > For powerpc64 and ia64, the ABI for function calls allows trampolines to > be implemented without emitting any insns, and thus the icache need not be > flushed at all. And thus we never bothered implementing > __builtin_clear_cache. > > So, the fact of the matter is that you can't reliably use this builtin for > arbitrary targets for any gcc version up to 4.7. Feel free to submit an > enhancement request via bugzilla so that we can remember to address this > for gcc 4.8. > > > > r~ > > > > -- Camm Maguirec...@maguirefamily.org == "The earth is but one country, and mankind its citizens." -- Baha'u'llah
Re: type argument in FUNCTION_ARG macro
BELBACHIR Selim writes: > I'm working on an architecture where the calling convention depends on the > type of the parameter (i.e. pointers are passed into $C regs and non-pointers > are passed into $R regs). I've implemented this difference by using the > POINTER_TYPE_P() macro on the 'type' argument of the FUNCTION_ARG macro. > > I'm having a problem with this approach with calls to libgcc function like > _Unwind_SjLj_Register(struct foo * ). As this function is invoked as a > library function, the 'type' argument to the FUNCTION_ARG() macro is NULL. > Thus, the pointer parameter is not passed as pointer but the function body > expects a pointer. > > Any ideas on how to get around this problem? If possible, avoid using SJLJ exceptions. DWARF exceptions are better. I see three ways to go. Change the middle-end to avoid using emit_library_call when calling _Unwind_SjLj_Register. There is no particular reason for making this a special library call. But this is probably a bit painful to implement. Define INIT_CUMULATIVE_LIBCALL_ARGS for your target, and check the function. If it's _Unwind_SjLj_Register, apply special handling. This option is nice because you only have to change your backend. Change the implementation of _Unwind_SjLj_Register to take a parameter of type uintptr_t, and cast it to struct SjLj_Function_Context *. Ian
Re: clear_cache on Alpha architecture not implemented?
Greetings! Last followup: Nothing here should affect any kfreebsd_amd64 machine, right? My understanding is that these do not require cache flushing. Thus the failure: https://buildd.debian.org/status/fetch.php?pkg=acl2&arch=kfreebsd-amd64&ver=4.3-1&stamp=1326315213 mv *saved_acl2.gcl saved_acl2 /usr/bin/make mini-proveall make[1]: Entering directory `/build/buildd-acl2_4.3-1-kfreebsd-amd64-SScGlk/acl2-4.3' Aborted make[1]: *** [mini-proveall] Error 134 make[1]: Leaving directory `/build/buildd-acl2_4.3-1-kfreebsd-amd64-SScGlk/acl2-4.3' make: *** [debian/mini-proveall.out] Error 2 which has shown up on * Host name:fasch.debian.org * Architecture:kfreebsd-amd64 * Distribution:Debian GNU/Linux * Sponsor: + Greek Research and Technology Network (GRNET) (hosting) + Nordic Gaming (hardware) * Processor:PowerEdge 1855 and * Host name:fano.debian.org * Architecture:kfreebsd-amd64 * Distribution:Debian GNU/kFreeBSD * Sponsor: + University of British Columbia - Department of Electrical and Computer Engineering (hosting) + Hewlett-Packard (hardware) but is not reproducible on * Host name:asdfasdf.debian.net * Architecture:kfreebsd-amd64 * Distribution:Debian GNU/kFreeBSD * Sponsor: + ETH Zurich - Department of Physics (hosting+hw) * Processor:AMD Sempron 3000+ 1600 MHz is not related? Thanks so much! Richard Henderson writes: > On 05/03/2012 10:51 AM, Camm Maguire wrote: >> The goal was to exercise the very helpful gcc __builtin___clear_cache >> support, and to avoid having to maintain our own assembler for all the >> different cpus in this regard. Clearly, it is easy to revert this on a >> per architecture basis if absolutely necessary. If gcc does or does not >> plan on fixing this, please let me know so gcl can adjust as needed. > > While we can probably fix this, you should know that __builtin_clear_cache > is highly tied to the implementation of trampolines for the target. Thus > there are at least 3 targets that do not handle this "properly": > > For alpha, we emit imb directly during the trampoline_init target hook. > > For powerpc32, the libgcc routine __clear_cache is unimplemented, but the > cache flushing for trampolines is inside the __trampoline_setup routine. > > For powerpc64 and ia64, the ABI for function calls allows trampolines to > be implemented without emitting any insns, and thus the icache need not be > flushed at all. And thus we never bothered implementing > __builtin_clear_cache. > > So, the fact of the matter is that you can't reliably use this builtin for > arbitrary targets for any gcc version up to 4.7. Feel free to submit an > enhancement request via bugzilla so that we can remember to address this > for gcc 4.8. > > > > r~ > > > > -- Camm Maguirec...@maguirefamily.org == "The earth is but one country, and mankind its citizens." -- Baha'u'llah
RE: type argument in FUNCTION_ARG macro
Ok thanks, I'll keep on with plan B (INIT_CUMULATIVE_LIBCALL_ARGS with special libcall handling) Selim -Message d'origine- De : Ian Lance Taylor [mailto:i...@google.com] Envoyé : vendredi 4 mai 2012 15:58 À : BELBACHIR Selim Cc : gcc@gcc.gnu.org Objet : Re: type argument in FUNCTION_ARG macro BELBACHIR Selim writes: > I'm working on an architecture where the calling convention depends on the > type of the parameter (i.e. pointers are passed into $C regs and non-pointers > are passed into $R regs). I've implemented this difference by using the > POINTER_TYPE_P() macro on the 'type' argument of the FUNCTION_ARG macro. > > I'm having a problem with this approach with calls to libgcc function > like _Unwind_SjLj_Register(struct foo * ). As this function is invoked as a > library function, the 'type' argument to the FUNCTION_ARG() macro is NULL. > Thus, the pointer parameter is not passed as pointer but the function body > expects a pointer. > > Any ideas on how to get around this problem? If possible, avoid using SJLJ exceptions. DWARF exceptions are better. I see three ways to go. Change the middle-end to avoid using emit_library_call when calling _Unwind_SjLj_Register. There is no particular reason for making this a special library call. But this is probably a bit painful to implement. Define INIT_CUMULATIVE_LIBCALL_ARGS for your target, and check the function. If it's _Unwind_SjLj_Register, apply special handling. This option is nice because you only have to change your backend. Change the implementation of _Unwind_SjLj_Register to take a parameter of type uintptr_t, and cast it to struct SjLj_Function_Context *. Ian
Re: clear_cache on Alpha architecture not implemented?
On 05/04/12 07:07, Camm Maguire wrote: > Nothing here should affect any kfreebsd_amd64 machine, right? Correct. r~
Re: Register constraints + and =
On 04/05/12 14:44, Ian Lance Taylor wrote: I agree that there is something wrong here. I agree that as written the constraints for operands 0, 1, and 2 should have a '+'. That said, a '+' constraint is most useful for a pattern that expands into multiple instructions. I think this would be better written along the lines of (set (match_operand:QI 2 "register_operand" "=d,c") (const_int 0)) (set (mem:BLK (match_operand:QI 3 "register_operand" "0") (match_operand:QI 4 "register_operand" "1"))) (set (match_operand:QI 0 "register_operand" "=d,c") (plus:QI (match_dup 3) (match_operand:QI 5 "register_operand" "2" (set (match_operand:QI 1 "register_operand" "=x,c") (plus:QI (match_dup 4) (match_dup 5))) (clobber (match_scratch:QI 3 "=w,w")) Also it looks like it might be possible to add a third alternative such that that alternative does not require the match_scratch. Ian Thanks for your suggestion, I use it in my discussion below. Unfortunately this is in preparation for my block copy instruction bc2. bc2 instruction takes source address in register RXL, destination in RAH, and count in RAL. After bc2, RAL is set to 0, RXL is RXL + RAL (source address plus count) and RAH is RAH + RAL (destination address plus count). This is specified as: (define_insn "bc2" [(set (reg:QI RAL) (const_int 0)) (set (mem:BLK (reg:QI RAH)) (mem:BLK (reg:QI RAL))) (set (reg:QI RXL) (plus:QI (reg:QI RXL) (reg:QI RAL))) (set (reg:QI RAH) (plus:QI (reg:QI RAH) (reg:QI RAL)))] "!TARGET_NO_BLOCK_COPY" "bc2") Unfortunately, and due to problems with GCC47 RA (for reasons mentioned in thread "GCC47 movmem breaks RA, GCC46 RA is fine") I am having trouble getting this to work as well as it did with GCC46. In GCC46 I simply generated, during expand, the correct move insns and then expanded the bc2 insn (with the hardcoded registers). In GCC47 that breaks apart the RA, so I am attempting to get RA to understand that indeed, there's only one register each of the values can end up in. So now I have something: (define_expand "movmemqi" [(set (match_operand:BLK 0 "memory_operand"); destination (match_operand:BLK 1 "memory_operand")) ; source (use (match_operand:QI 2 "general_operand")) ; count (match_operand 3 "" "")] "!TARGET_NO_BLOCK_COPY && !reload_completed" { xap_expand_movmemqi(operands[0], operands[1], operands[2]); DONE; }) (define_insn_and_split "movmem_long" [(set (match_operand:QI 2 "register_operand" "=d") (const_int 0)) (set (mem:BLK (match_operand:QI 3 "register_operand" "0")) (mem:BLK (match_operand:QI 4 "register_operand" "1"))) (set (match_operand:QI 0 "register_operand" "=d") (plus:QI (match_dup 3) (match_operand:QI 5 "register_operand" "2"))) (set (match_operand:QI 1 "register_operand" "=x") (plus:QI (match_dup 4) (match_dup 5))) (clobber (match_scratch:QI 6 "=w"))] "!TARGET_NO_BLOCK_COPY" "#" "&& reload_completed" [(const_int 0)] { if(REGNO(operands[2]) == RAH) { emit_move_insn(operands[6], operands[0]); emit_move_insn(operands[0], operands[2]); emit_move_insn(operands[2], operands[6]); } emit_insn(gen_bc2()); DONE; }) xap_expand_movmemqi issues a couple of move_insn if they are required and then does a gen_movmem_long. I am playing a trick with GCC here... The constraint 'd' applies to register class DATA_REGS, which constains 2 registers: RAL and RAH. When GCC allocates then in bc2, one will be assigned to operand0 and other to operand2. If operand2 ends up with RAH, then I use the scratch to swap operand2 with operand0. The annoying part is that something RA ends up allocating a scratch which I won't use because he was smart enough to allocate everything in order (operand2 with RAL and operand0 with RAH). As a side note, constraint 'x', refers to class ADDR_REGS which only has register RXL. This solution, however, still causes RA spills which weren't caused by GCC46, but it is better than the GCC46 solution to hardcode the registers straight off expand phase. Another thing I tried but whose results are terrible (libgcc doesn't even compile due to register spills errors) is to have one class just for RAL, one just for RAH and use those constraints, thereby avoiding the scratch altogether. Turns out GCC doesn't seem to like that. If you have any further suggestions or ideas that I can try out, please let me know. -- PMatos
Re: Register constraints + and =
On May 4, 2012, at 9:44 AM, Ian Lance Taylor wrote: > "Paulo J. Matos" writes: > >> Expand generates: >> >> (define_insn_and_split "movmem_long" >> [(set (match_operand:QI 2 "register_operand" "d,c") (const_int 0)) >> (set (mem:BLK (match_operand:QI 0 "register_operand" "d,c")) >>(mem:BLK (match_operand:QI 1 "register_operand" "x,c"))) >> (set (match_dup 0) (plus:QI (match_dup 0) (match_dup 2))) >> (set (match_dup 1) (plus:QI (match_dup 1) (match_dup 2))) >> (clobber (match_scratch:QI 3 "w,w"))] >> "!TARGET_NO_BLOCK_COPY" >> "#" >> "&& reload_completed" >> [(const_int 0)] >> { >> if((which_alternative == 0 && REGNO(operands[2]) == RAH)) >> || which_alternative == 1) >> { >>emit_move_insn(operands[3], operands[0]); >>emit_move_insn(operands[0], operands[2]); >>emit_move_insn(operands[2], operands[3]); >> } >> emit_insn(gen_bc2()); >> DONE; >> }) >> >> From what I understand + is for input/output operands, = for output >> only operands. Since in the above rule (a parallel) all operands are >> written to and read to, does this mean all their constraints should >> start with +? Or this only applies per set within each parallel (which >> doesn't seem to make much sense)? > > I agree that there is something wrong here. I agree that as written > the constraints for operands 0, 1, and 2 should have a '+'. > > That said, a '+' constraint is most useful for a pattern that expands > into multiple instructions. I think this would be better written along > the lines of > > (set (match_operand:QI 2 "register_operand" "=d,c") (const_int 0)) > (set (mem:BLK (match_operand:QI 3 "register_operand" "0") >(match_operand:QI 4 "register_operand" "1"))) > (set (match_operand:QI 0 "register_operand" "=d,c") > (plus:QI (match_dup 3) >(match_operand:QI 5 "register_operand" "2" > (set (match_operand:QI 1 "register_operand" "=x,c") > (plus:QI (match_dup 4) (match_dup 5))) > (clobber (match_scratch:QI 3 "=w,w")) > > Also it looks like it might be possible to add a third alternative such > that that alternative does not require the match_scratch. I thought that the "operand" in a mem:BLK is the pointer to the block, not the block itself. So if the instruction(s) generated don't touch the pointer -- a likely answer for a block-move instruction -- then the operand would be read-only. Is that the right interpretation? What I ended up doing in pdp11.md is to add "clobber" clauses for the operands, because the generated code is typically a block-copy loop that steps the pointer registers through the buffer. It appeared to do the right thing, but I'll admit it was more of a "try this until it works" type of thing rather than a deep understanding of the precisely correct interpretation. paul
Re: Register constraints + and =
writes: > I thought that the "operand" in a mem:BLK is the pointer to the block, > not the block itself. So if the instruction(s) generated don't touch > the pointer -- a likely answer for a block-move instruction -- then > the operand would be read-only. Is that the right interpretation? Yes. But many block move instructions do in fact touch the pointer, in that they update the registers pointing to the starts of the blocks to point to the ends after the instruction completes. Ian
Re: Register constraints + and =
On May 4, 2012, at 11:39 AM, Ian Lance Taylor wrote: > writes: > >> I thought that the "operand" in a mem:BLK is the pointer to the block, >> not the block itself. So if the instruction(s) generated don't touch >> the pointer -- a likely answer for a block-move instruction -- then >> the operand would be read-only. Is that the right interpretation? > > Yes. > > But many block move instructions do in fact touch the pointer, in that > they update the registers pointing to the starts of the blocks to point > to the ends after the instruction completes. I interpreted + to mean that the operand is written with a value known to the compiler, as opposed to clobber which means that the value is not known (or not one that can be described to the compiler). So I take it that for mem:BLK a + operand is interpreted as final value == end of the buffer? Or byte after the buffer? paul
Re: clear_cache on Alpha architecture not implemented?
On 05/04/12 06:39, Camm Maguire wrote: > I'm wondering if there is a simple configure time test to detect when > this has been fixed. If I just aborted using __builtin___clear_cache if > it is in fact a noop on alpha, ppc, ppc64, and ia64, would this suffice? I can't think of any simple, portable test. The only reliable test would be to actually attempt to flush a cache, with some detectable way to see this didn't happen. This tends to get highly target specific quickly... A pattern that would at least apply to 32-bit insn word risc might be int test_routine[2] = { "mov 1, v0" "ret" }; #define call_test ((int (*)(void))test_routine) int main() { call_test(); // make sure the routine is in icache test_routine[0] = "mov 0, v0"; __builtin__clear_cache(test_routine, test_routine+2); return call_test(); } for target-dependent values of those instructions. r~
Re: Register constraints + and =
writes: > On May 4, 2012, at 11:39 AM, Ian Lance Taylor wrote: > >> writes: >> >>> I thought that the "operand" in a mem:BLK is the pointer to the block, >>> not the block itself. So if the instruction(s) generated don't touch >>> the pointer -- a likely answer for a block-move instruction -- then >>> the operand would be read-only. Is that the right interpretation? >> >> Yes. >> >> But many block move instructions do in fact touch the pointer, in that >> they update the registers pointing to the starts of the blocks to point >> to the ends after the instruction completes. > > I interpreted + to mean that the operand is written with a value known to the > compiler, as opposed to clobber which means that the value is not known (or > not one that can be described to the compiler). So I take it that for > mem:BLK a + operand is interpreted as final value == end of the buffer? Or > byte after the buffer? H. I don't really know what you mean, so there is some sort of communication difficulty. A '+' in a constraint for an operand means that the operand is both read and written by the instruction. It's relatively unusual to find such a constraint in a GCC backend. In a GCC backend, it's more common to write the instruction as a PARALLEL with one insn that sets the operand, another insn that uses the operand, and a matching constraint to put both operands in the same register. A '+' in a constraint doesn't say anything at all about what value the register has after the insn. That is expressed only in the RTL. The place where you often see '+' in a constraint is in an asm instruction. The asm instruction could also use a matching constraint, but there is generally less point since the asm instruction can't say anything about the value the register will have after the asm executes. Comparing '+' in a constraint and CLOBBER doesn't make sense. The '+' tells the register allocator something about which registers it may use. In particular, an operand with a '+' constraint may not be placed in a register that holds either an input or an output operand. An operand with an '=' constraint, on the other hand, may be placed in the same register as an input operand. Constraints like '+' matter to the register allocator and reload. RTL constructs like CLOBBER matter to the RTL optimizers. They are different categories of things. Ian
Re: Register constraints + and =
On May 4, 2012, at 1:52 PM, Ian Lance Taylor wrote: > writes: > >> On May 4, 2012, at 11:39 AM, Ian Lance Taylor wrote: >> >>> writes: >>> I thought that the "operand" in a mem:BLK is the pointer to the block, not the block itself. So if the instruction(s) generated don't touch the pointer -- a likely answer for a block-move instruction -- then the operand would be read-only. Is that the right interpretation? >>> >>> Yes. >>> >>> But many block move instructions do in fact touch the pointer, in that >>> they update the registers pointing to the starts of the blocks to point >>> to the ends after the instruction completes. >> >> I interpreted + to mean that the operand is written with a value known to >> the compiler, as opposed to clobber which means that the value is not known >> (or not one that can be described to the compiler). So I take it that for >> mem:BLK a + operand is interpreted as final value == end of the buffer? Or >> byte after the buffer? > > H. I don't really know what you mean, so there is some sort of > communication difficulty. > > A '+' in a constraint for an operand means that the operand is both read > and written by the instruction. It's relatively unusual to find such a > constraint in a GCC backend. In a GCC backend, it's more common to > write the instruction as a PARALLEL with one insn that sets the operand, > another insn that uses the operand, and a matching constraint to put > both operands in the same register. > > A '+' in a constraint doesn't say anything at all about what value the > register has after the insn. That is expressed only in the RTL. > > The place where you often see '+' in a constraint is in an asm > instruction. The asm instruction could also use a matching constraint, > but there is generally less point since the asm instruction can't say > anything about the value the register will have after the asm executes. > > Comparing '+' in a constraint and CLOBBER doesn't make sense. The '+' > tells the register allocator something about which registers it may use. > In particular, an operand with a '+' constraint may not be placed in a > register that holds either an input or an output operand. An operand > with an '=' constraint, on the other hand, may be placed in the same > register as an input operand. > > Constraints like '+' matter to the register allocator and reload. RTL > constructs like CLOBBER matter to the RTL optimizers. They are > different categories of things. > > Ian Thanks, that helps. What I was trying to describe is the handling of a memcpy operation in the .md file, where the operands are the memory pointers and (in my case) I want to tell the machinery that the registers it's using to pass in the addresses no longer have those addresses in them on completion. So I put in clobbers to say that. What I really wanted to do is express that the pointer registers, on completion, point just past the buffer, so the optimizer could take advantage of that, but it wasn't clear how one would do that. paul
Re: Register constraints + and =
writes: > What I was trying to describe is the handling of a memcpy operation in the > .md file, where the operands are the memory pointers and (in my case) I want > to tell the machinery that the registers it's using to pass in the addresses > no longer have those addresses in them on completion. So I put in clobbers > to say that. What I really wanted to do is express that the pointer > registers, on completion, point just past the buffer, so the optimizer could > take advantage of that, but it wasn't clear how one would do that. The i386 rep_movqi insn is an example: (define_insn "*rep_movqi" [(set (match_operand:P 2 "register_operand" "=c") (const_int 0)) (set (match_operand:P 0 "register_operand" "=D") (plus:P (match_operand:P 3 "register_operand" "0") (match_operand:P 5 "register_operand" "2"))) (set (match_operand:P 1 "register_operand" "=S") (plus:P (match_operand:P 4 "register_operand" "1") (match_dup 5))) (set (mem:BLK (match_dup 3)) (mem:BLK (match_dup 4))) (use (match_dup 5))] "!(fixed_regs[CX_REG] || fixed_regs[SI_REG] || fixed_regs[DI_REG])" "%^rep{%;} movsb" [(set_attr "type" "str") (set_attr "prefix_rep" "1") (set_attr "memory" "both") (set_attr "mode" "QI")]) Note that since this is a define_insn, the four SETs in the pattern are run in parallel. The input operands are 3 (destination pointer), 4 (source pointer), 5 (count). The output operands are 0 (pointer past destination block), 1 (pointer past source block), 2 (set to zero). Matching constraints are used to put each input operand in the same register as the corresponding output operand. Ian
Re: Why does lower-subreg mark copied pseudos as "decomposable"?
On 19/04/12 17:36, Andrew Stubbs wrote: On 18/04/12 21:47, Richard Sandiford wrote: I still prefer the idea of disabling in the first pass. It'll need to be tested on something like non-NEON ARM to see whether it makes things worse or better there. (I think size testing would be fine.) I'll have a go, and see what happens. So far I've found that many examples give smaller code with this change, and a few examples that give larger code. However, on average it appears to give better code, size wise. This is on ARM when NEON is not enabled; when NEON is enabled the results are far better, as expected. I did have a small example that showed much worse register allocation, but I can't reproduce that with the latest trunk. Most of the size reductions can be explained by use of 64-bit loads and stores, rather that pairs of 32-bit accesses. In thumb mode, one cause of size increases appears to be that there are no more instructions, but that it has used 32-bit opcodes rather than 16-bit ones; this is unfortunate. Otherwise, it's very difficult to identify where the tiny size increases come from. As an example, I compiled (a slightly old copy of) gcc/expmed.c which contains a lot of 64-bit operations, and compared the output sizes at -O2. Of 43 functions, 37 show no change whatsoever, 5 showed a reduction (21 bytes on average), and 1 function showed a 20 byte increase. The end result is I'm going to try produce a proper patch to post. Andrew
Re: Why doesn't GCC generate conditional move for COND_EXPR?
On Tue, Oct 25, 2011 at 4:28 AM, Bingfeng Mei wrote: > Thanks, Andrew. I also implemented a quick patch on our port (based on GCC > 4.5). > I noticed it produced better code now for our applications. Maybe eliminating > control flow in earlier stage helps other optimizing passes. Currently, tree > if-conversion pass is not turned on by default (only with tree vectorization > or some other passes). Maybe it is worth to make it default at -O2 (for those > processors support conditional move)? I just committed the patch which does the expansion of COND_EXPR to condmov to the trunk. I have more patches which do what ifcvt does but in phiopt (which seems better in general as ifcvt work only over loops). I hope to post those patches in the coming weeks. Thanks, Andrew Pinski > > Cheers, > Bingfeng > >> -Original Message- >> From: Andrew Pinski [mailto:pins...@gmail.com] >> Sent: 24 October 2011 17:20 >> To: Richard Guenther >> Cc: Bingfeng Mei; gcc@gcc.gnu.org >> Subject: Re: Why doesn't GCC generate conditional move for COND_EXPR? >> >> On Mon, Oct 24, 2011 at 7:00 AM, Richard Guenther >> wrote: >> > On Mon, Oct 24, 2011 at 2:55 PM, Bingfeng Mei >> wrote: >> >> Hello, >> >> I noticed that COND_EXPR is not expanded to conditional move >> >> as MIN_EXPR/MAX_EXPR are (assuming movmodecc is available). >> >> I wonder why not? >> >> >> >> I have some loop that fails tree vectorization, but still contains >> >> COND_EXPR from tree ifcvt pass. In the end, the generated code >> >> is worse than if I don't turned -ftree-vectorize on. This >> >> is on our private port. >> > >> > Because nobody touched COND_EXPR expansion since ages. >> >> I have a patch which I will be submitting next week or so that does >> this expansion correctly. In fact I have a few patches which improves >> the generation of COND_EXPR in simple cases (in PHI-OPT). >> >> Thanks, >> Andrew Pinski >
gcc-4.6-20120504 is now available
Snapshot gcc-4.6-20120504 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20120504/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch revision 187183 You'll find: gcc-4.6-20120504.tar.bz2 Complete GCC MD5=ab4fd68586e9809b76307cdda08b5608 SHA1=4fccc9ca61d0df00f7923691eaf8d4f6d00697ac Diffs from 4.6-20120427 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
small problem with auto (c++0x)
Hello, please consider the little program below which was compiled on GCC 4.7.0. At the line containing the comment /* auto */, when using auto instead of vector, the expected result which would be [3][6][9][12][15] is computed as [6][12][15][15][15] for some reason. Despite the high chance of the fault probably being mine, I think it'd be interesting either-way. #include #include #include using namespace std; #define NUM_OF_THREADS 5 int data[NUM_OF_THREADS] = {0}; void add(int* dest, const vector& arr) { *dest = 0; for ( const auto& x : arr ) { *dest += x; } } int main(void) { vector threads; // data[i] := i + i+1 + i+2 for ( int i=0; i arr = {i,i+1,i+2}; threads.push_back( thread{add,data+i,arr} ); } for ( auto& thread : threads ) { thread.join(); } for ( const auto& dat : data ) { cout << "[" << dat << "]"; } cout << endl; return 0; } Amit Markel
Re: small problem with auto (c++0x)
On 5 May 2012 00:04, Amit Markel wrote: > Hello, please consider the little program below which was compiled on GCC > 4.7.0. Your question is inappropriate on this list which is for discussing development of GCC not with GCC. Your question would be appropriate on the gcc-help list, please take any follow up there, thanks. > /* auto */ vector arr = {i,i+1,i+2}; With 'auto' here the type is not vector, it's std::initializer_list, which is a lightweight utility type that refers to a temporary array. An initializer_list is intended to be used only as a temporary or other short-lived type. In your code the initializer_list gets copied into the std::thread, but then the temporary array it uses goes out of scope. When the add() function runs a std::vector is constructed from the initializer_list, copying the data from the invalid array that has already gone out of scope.