Re: pa indirect_jump instruction
Trevor Saunders writes: > On Tue, Jun 30, 2015 at 09:53:31PM +0100, Richard Sandiford wrote: >> I have a series of patches to convert all non-optab instructions to >> the target-insns.def interface. config-list.mk showed up one problem >> though. The pa indirect_jump pattern is: >> >> ;;; Hope this is only within a function... >> (define_insn "indirect_jump" >> [(set (pc) (match_operand 0 "register_operand" "r"))] >> "GET_MODE (operands[0]) == word_mode" >> "bv%* %%r0(%0)" >> [(set_attr "type" "branch") >>(set_attr "length" "4")]) >> >> so the C condition depends on operands[], which isn't usually allowed >> for named patterns. We get away with it at the moment because we only >> test for the existence of HAVE_indirect_jump, not its value: > > yeah, I hit this a while ago and filed bug 66114. It looks like I had > trouble with fr30 too, is that fixed now? Hmm, seems not. The fr30 build stopped earlier for me due to a warning turned error. I suppose I should really have fixed all the warnings shown by config-list.mk before doing this stuff... Thanks, Richard
Re: pa indirect_jump instruction
On Sun, Jul 05, 2015 at 09:11:23AM +0100, Richard Sandiford wrote: > Trevor Saunders writes: > > On Tue, Jun 30, 2015 at 09:53:31PM +0100, Richard Sandiford wrote: > >> I have a series of patches to convert all non-optab instructions to > >> the target-insns.def interface. config-list.mk showed up one problem > >> though. The pa indirect_jump pattern is: > >> > >> ;;; Hope this is only within a function... > >> (define_insn "indirect_jump" > >> [(set (pc) (match_operand 0 "register_operand" "r"))] > >> "GET_MODE (operands[0]) == word_mode" > >> "bv%* %%r0(%0)" > >> [(set_attr "type" "branch") > >>(set_attr "length" "4")]) > >> > >> so the C condition depends on operands[], which isn't usually allowed > >> for named patterns. We get away with it at the moment because we only > >> test for the existence of HAVE_indirect_jump, not its value: > > > > yeah, I hit this a while ago and filed bug 66114. It looks like I had > > trouble with fr30 too, is that fixed now? > > Hmm, seems not. The fr30 build stopped earlier for me due to a warning > turned error. I suppose I should really have fixed all the warnings shown > by config-list.mk before doing this stuff... yeah, that's certainly a problem worth working on, but there's certainly something to be said for not going too far down the yak shaving rabbit whole. Trev far dow > > Thanks, > Richard
Allocation of hotness of data structure with respect to the top of stack.
All: I am wondering allocation of hot data structure closer to the top of the stack increases the performance of the application. The data structure are identified as hot and cold data structure and all the data structures are sorted in decreasing order of The hotness and the hot data structure will be allocated closer to the top of the stack. The load and store on accessing with respect to allocation of data structure on stack will be faster with allocation of hot Data structure closer to the top of the stack. Based on the above the code is generated with respect to load and store with the correct offset of the stack allocated on the decreasing order of hotness. Thoughts? Thanks & Regards Ajit
Reduction Pattern ( Vectorization or Parallelization)
All: The scalar and array reduction patterns can be identified if the result of commutative updates Is applied to the same scalar or array variables on the LHS with +, *, Min or Max. Thus the reduction pattern identified with the commutative update help in vectorization or parallelization. For the following code For(j = 0; j <= N;j++) { y = d[j]; For( I = 0 ; I <8 ; i++) X(a[i]) = X(a[i]) + c[i] * y; } Fig(1). For the above code with the reduction pattern on X with respect to the outer loop exhibits the commutative updates on + can be identified In gcc as reduction pattern with respect to outer loops. I wondering whether this can be identified as reduction pattern which can reduce to vectorized Code because of the X is indexed by another array as thus the access of X is not affine expression. Does the above code can be identified as reduction pattern and transform to the vectorized or parallelize code. Thoughts? Thanks & Regards Ajit
Re: Live on Exit renaming.
On Sat, Jul 4, 2015 at 3:45 PM, Ajit Kumar Agarwal wrote: > I am not sure why the above optimization is not implemented in GCC. -fsplit-ivs-in-unroller Ciao! Steven
gcc-6-20150705 is now available
Snapshot gcc-6-20150705 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/6-20150705/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 225437 You'll find: gcc-6-20150705.tar.bz2 Complete GCC MD5=c2ac14a399dc81a20e649d6064d9dd2f SHA1=a22d5325a5c0d615a52b526bdd3141a772c4522c Diffs from 6-20150628 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Live on Exit renaming.
On Mon, Jul 6, 2015 at 6:02 AM, Steven Bosscher wrote: > On Sat, Jul 4, 2015 at 3:45 PM, Ajit Kumar Agarwal wrote: >> I am not sure why the above optimization is not implemented in GCC. > > -fsplit-ivs-in-unroller And thing might have changed. Given the condition GCC does IVO on gimple, unrolling on RTL, there is inconsistency between the two optimizer since IVO takes register pressure of IVs into consideration and assumes IVs will take single registers. At least for some cases, splitting live range of IVs results in bad code. See PR29256 for more information. As described in the comment, actually I am going to do some experiments disabling such transformation to see what happens. Thanks, bin > > Ciao! > Steven
RE: Live on Exit renaming.
-Original Message- From: Bin.Cheng [mailto:amker.ch...@gmail.com] Sent: Monday, July 06, 2015 7:04 AM To: Steven Bosscher Cc: Ajit Kumar Agarwal; l...@redhat.com; Richard Biener; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: Live on Exit renaming. On Mon, Jul 6, 2015 at 6:02 AM, Steven Bosscher wrote: > On Sat, Jul 4, 2015 at 3:45 PM, Ajit Kumar Agarwal wrote: >> I am not sure why the above optimization is not implemented in GCC. > > -fsplit-ivs-in-unroller >>And thing might have changed. Given the condition GCC does IVO on gimple, >>unrolling on RTL, there is inconsistency between the two optimizer since IVO takes register pressure of IVs into consideration and assumes IVs will take >>single registers. At least for some cases, splitting live range of IVs >>results in bad >>code. See PR29256 for more information. As described in >>the comment, actually I am going to do some experiments disabling such >>transformation to see >>what happens. The above optimization is implemented as a part of unroller in gimple. There is an unroller pass in rtl which does not have support for this optimization. Shouldn't be the fsplit-ivs-in-unroller optimization implemented in the unroller pass of rtl. I am looking at the implementation perspective for implementing the fsplit-ivs-in-unroller optimizations in the unroller rtl pass. Thanks & Regards Ajit Thanks, bin > > Ciao! > Steven
Re: Live on Exit renaming.
On Mon, Jul 6, 2015 at 12:02 PM, Ajit Kumar Agarwal wrote: > > > -Original Message- > From: Bin.Cheng [mailto:amker.ch...@gmail.com] > Sent: Monday, July 06, 2015 7:04 AM > To: Steven Bosscher > Cc: Ajit Kumar Agarwal; l...@redhat.com; Richard Biener; gcc@gcc.gnu.org; > Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala > Subject: Re: Live on Exit renaming. > > On Mon, Jul 6, 2015 at 6:02 AM, Steven Bosscher wrote: >> On Sat, Jul 4, 2015 at 3:45 PM, Ajit Kumar Agarwal wrote: >>> I am not sure why the above optimization is not implemented in GCC. >> >> -fsplit-ivs-in-unroller > >>>And thing might have changed. Given the condition GCC does IVO on gimple, >>>unrolling on RTL, there is inconsistency between the two optimizer since IVO >takes register pressure of IVs into consideration and assumes IVs will >>>take single registers. At least for some cases, splitting live range of IVs >>>results in bad >>code. See PR29256 for more information. As described in >>>the comment, actually I am going to do some experiments disabling such >>>transformation to see >>what happens. > > The above optimization is implemented as a part of unroller in gimple. There > is an unroller pass in rtl which does not have support for this As far as I understand, fsplit-ivs-in-unroller is a transformation in RTL unroller. Thanks, bin > optimization. Shouldn't be the fsplit-ivs-in-unroller optimization > implemented in the unroller pass of rtl. I am looking at the implementation > perspective for implementing the fsplit-ivs-in-unroller optimizations in the > unroller rtl pass. > > Thanks & Regards > Ajit > > Thanks, > bin >> >> Ciao! >> Steven
RE: Live on Exit renaming.
-Original Message- From: Bin.Cheng [mailto:amker.ch...@gmail.com] Sent: Monday, July 06, 2015 10:26 AM To: Ajit Kumar Agarwal Cc: Steven Bosscher; l...@redhat.com; Richard Biener; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: Live on Exit renaming. On Mon, Jul 6, 2015 at 12:02 PM, Ajit Kumar Agarwal wrote: > > > -Original Message- > From: Bin.Cheng [mailto:amker.ch...@gmail.com] > Sent: Monday, July 06, 2015 7:04 AM > To: Steven Bosscher > Cc: Ajit Kumar Agarwal; l...@redhat.com; Richard Biener; > gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli > Hunsigida; Nagaraju Mekala > Subject: Re: Live on Exit renaming. > > On Mon, Jul 6, 2015 at 6:02 AM, Steven Bosscher wrote: >> On Sat, Jul 4, 2015 at 3:45 PM, Ajit Kumar Agarwal wrote: >>> I am not sure why the above optimization is not implemented in GCC. >> >> -fsplit-ivs-in-unroller > >>>And thing might have changed. Given the condition GCC does IVO on gimple, >>>unrolling on RTL, there is inconsistency between the two optimizer since IVO >takes register pressure of IVs into consideration and assumes IVs will >>>take single registers. At least for some cases, splitting live range of IVs >>>results in bad >>code. See PR29256 for more information. As described in >>>the comment, actually I am going to do some experiments disabling such >>>transformation to see >>what happens. > > The above optimization is implemented as a part of unroller in gimple. > There is an unroller pass in rtl which does not have support for this >>As far as I understand, fsplit-ivs-in-unroller is a transformation in RTL >>unroller. My mistake. Yes you are right. The fsplit-ivs-in-unroller is a transformation in RTL unroller. IVO on gimple doesn't take unrolling into consideration and assume to assign single register for IV candidates. My thinking is that Splitting IVs at RTL with the unroller removes the long dependent chains and thus makes the overlapping iterations and better Register allocators and there is a chance of movement of independent code that got exposes with split-ivs-in-unroller. You have mentioned that splitting of IV candidate reults in bad code. I could see only the positive end of this optimizations. Could you please elaborate on the negative end of the fsplit-ivs-in-unroller optimizations as you have mentioned that it results In bad code in some cases. Thanks & Regards Ajit Thanks, bin > optimization. Shouldn't be the fsplit-ivs-in-unroller optimization > implemented in the unroller pass of rtl. I am looking at the implementation > perspective for implementing the fsplit-ivs-in-unroller optimizations in the > unroller rtl pass. > > Thanks & Regards > Ajit > > Thanks, > bin >> >> Ciao! >> Steven
Re: Possible issue with ARC gcc 4.8
On Friday 03 July 2015 07:15 PM, Richard Biener wrote: > On Fri, Jul 3, 2015 at 3:10 PM, Vineet Gupta > wrote: >> Hi, >> >> I have the following test case (reduced from Linux kernel sources) and it >> seems >> gcc is optimizing away the first loop iteration. >> >> arc-linux-gcc -c -O2 star-9000857057.c -fno-branch-count-reg --save-temps >> -mA7 >> >> --->8- >> static inline int __test_bit(unsigned int nr, const volatile unsigned long >> *addr) >> { >> unsigned long mask; >> >> addr += nr >> 5; >> #if 0 >> nr &= 0x1f; >> #endif >> mask = 1UL << nr; >> return ((mask & *addr) != 0); >> } >> >> int foo (int a, unsigned long *p) >> { >> int i; >> for (i = 63; i>=0; i--) >> { >> if (!(__test_bit(i, p))) >>continue; >> a += i; >> } >> return a; >> } >> --->8- >> >> gcc generates following >> >> --->8- >> .global foo >> .type foo, @function >> foo: >> ld_s r2,[r1,4] < dead code >> mov_s r2,63 >> .align 4 >> .L2: >> sub r2,r2,1<-SUB first >> cmp r2,-1 >> jeq.d [blink] >> lsr r3,r2,5 <- BUG: first @mask is (1 << 62) NOT (1 << 63) >> .align 2 >> .L4: >> ld.as r3,[r1,r3] >> bbit0.nd r3,r2,@.L2 >> add_s r0,r0,r2 >> sub r2,r2,1 >> cmp r2,-1 >> bne.d @.L4 >> lsr r3,r2,5 >> j_s [blink] >> .size foo, .-foo >> .ident "GCC: (ARCv2 ISA Linux uClibc toolchain >> arc-2015.06-rc1-21-g21b2c4b83dfa) >> 4.8.4" >> --->8- >> >> For initial 32 loop operations, this test is effectively doing 64 bit >> operation, >> e.g. (1 << 63) in 32 bit regime. Is this supposed to be undefined, truncated >> to >> zero or port specific. >> >> If it is truncate to zero then generated code below is not correct as it >> needs to >> elide not just the first iteration (corresponding to i = 63) but 63..32 >> >> Further ARCompact ISA provides that instructions involving bitpos operands >> BSET, >> BCLR, LSL can any number whatsoever, but core will only use the lower 5 bits >> (so >> clamping the bitpos to 0..31 w/o need for doing that in code. >> >> So is this a gcc bug, or some spec misinterpretation,. > It is the C language standard that says that shifts like this invoke > undefined behavior. Right, but the compiler is a program nevertheless and it knows what to do when it sees 1 << 62 It's not like there is an uninitialized variable or something which will provide unexpected behaviour. More importantly, the question is can ports define a specific behaviour for such cases and whether that would be sufficient to guarantee the semantics. The point being ARC ISA provides a neat feature where core only considers lower 5 bits of bitpos operands. Thus we can make such behaviour not only deterministic in the context of ARC, but also optimal, eliding the need for doing specific masking/clamping to 5 bits. -Vineet
Re: Live on Exit renaming.
On Mon, Jul 6, 2015 at 1:16 PM, Ajit Kumar Agarwal wrote: > > > -Original Message- > From: Bin.Cheng [mailto:amker.ch...@gmail.com] > Sent: Monday, July 06, 2015 10:26 AM > To: Ajit Kumar Agarwal > Cc: Steven Bosscher; l...@redhat.com; Richard Biener; gcc@gcc.gnu.org; Vinod > Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala > Subject: Re: Live on Exit renaming. > > On Mon, Jul 6, 2015 at 12:02 PM, Ajit Kumar Agarwal > wrote: >> >> >> -Original Message- >> From: Bin.Cheng [mailto:amker.ch...@gmail.com] >> Sent: Monday, July 06, 2015 7:04 AM >> To: Steven Bosscher >> Cc: Ajit Kumar Agarwal; l...@redhat.com; Richard Biener; >> gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli >> Hunsigida; Nagaraju Mekala >> Subject: Re: Live on Exit renaming. >> >> On Mon, Jul 6, 2015 at 6:02 AM, Steven Bosscher >> wrote: >>> On Sat, Jul 4, 2015 at 3:45 PM, Ajit Kumar Agarwal wrote: I am not sure why the above optimization is not implemented in GCC. >>> >>> -fsplit-ivs-in-unroller >> And thing might have changed. Given the condition GCC does IVO on gimple, unrolling on RTL, there is inconsistency between the two optimizer since IVO >>takes register pressure of IVs into consideration and assumes IVs will take single registers. At least for some cases, splitting live range of IVs results in bad >>code. See PR29256 for more information. As described in the comment, actually I am going to do some experiments disabling such transformation to see >>what happens. >> >> The above optimization is implemented as a part of unroller in gimple. >> There is an unroller pass in rtl which does not have support for this >>>As far as I understand, fsplit-ivs-in-unroller is a transformation in RTL >>>unroller. > > My mistake. Yes you are right. The fsplit-ivs-in-unroller is a transformation > in RTL unroller. > IVO on gimple doesn't take unrolling into consideration and assume to assign > single register for IV candidates. My thinking is that > Splitting IVs at RTL with the unroller removes the long dependent chains and > thus makes the overlapping iterations and better > Register allocators and there is a chance of movement of independent code > that got exposes with split-ivs-in-unroller. > > You have mentioned that splitting of IV candidate reults in bad code. I > could see only the positive end of this optimizations. > Could you please elaborate on the negative end of the fsplit-ivs-in-unroller > optimizations as you have mentioned that it results > In bad code in some cases. I had pointed to PR29256 in previous message. I also saw such examples in different benchmarks, and the situation is even worse on targets supporting auto-increment addressing mode. Thanks, bin > > Thanks & Regards > Ajit > > Thanks, > bin >> optimization. Shouldn't be the fsplit-ivs-in-unroller optimization >> implemented in the unroller pass of rtl. I am looking at the implementation >> perspective for implementing the fsplit-ivs-in-unroller optimizations in the >> unroller rtl pass. >> >> Thanks & Regards >> Ajit >> >> Thanks, >> bin >>> >>> Ciao! >>> Steven
Re: Possible issue with ARC gcc 4.8
On Mon, 6 Jul 2015, Vineet Gupta wrote: It is the C language standard that says that shifts like this invoke undefined behavior. Right, but the compiler is a program nevertheless and it knows what to do when it sees 1 << 62 It's not like there is an uninitialized variable or something which will provide unexpected behaviour. More importantly, the question is can ports define a specific behaviour for such cases and whether that would be sufficient to guarantee the semantics. The point being ARC ISA provides a neat feature where core only considers lower 5 bits of bitpos operands. Thus we can make such behaviour not only deterministic in the context of ARC, but also optimal, eliding the need for doing specific masking/clamping to 5 bits. IMO, writing a << (b & 31) instead of a << b has only advantages. It documents the behavior you are expecting. It makes the code standard-conformant and portable. And the back-ends can provide patterns for exactly this so they generate a single insn (the same as for a << b). When I see x << 1024, 0 is the only value that makes sense to me, and I'd much rather get undefined behavior (detected by sanitizers) than silently get 'x' back. -- Marc Glisse
Re: Possible issue with ARC gcc 4.8
On Mon, Jul 6, 2015 at 7:30 AM, Vineet Gupta wrote: > On Friday 03 July 2015 07:15 PM, Richard Biener wrote: >> On Fri, Jul 3, 2015 at 3:10 PM, Vineet Gupta >> wrote: >>> Hi, >>> >>> I have the following test case (reduced from Linux kernel sources) and it >>> seems >>> gcc is optimizing away the first loop iteration. >>> >>> arc-linux-gcc -c -O2 star-9000857057.c -fno-branch-count-reg --save-temps >>> -mA7 >>> >>> --->8- >>> static inline int __test_bit(unsigned int nr, const volatile unsigned long >>> *addr) >>> { >>> unsigned long mask; >>> >>> addr += nr >> 5; >>> #if 0 >>> nr &= 0x1f; >>> #endif >>> mask = 1UL << nr; >>> return ((mask & *addr) != 0); >>> } >>> >>> int foo (int a, unsigned long *p) >>> { >>> int i; >>> for (i = 63; i>=0; i--) >>> { >>> if (!(__test_bit(i, p))) >>>continue; >>> a += i; >>> } >>> return a; >>> } >>> --->8- >>> >>> gcc generates following >>> >>> --->8- >>> .global foo >>> .type foo, @function >>> foo: >>> ld_s r2,[r1,4] < dead code >>> mov_s r2,63 >>> .align 4 >>> .L2: >>> sub r2,r2,1<-SUB first >>> cmp r2,-1 >>> jeq.d [blink] >>> lsr r3,r2,5 <- BUG: first @mask is (1 << 62) NOT (1 << 63) >>> .align 2 >>> .L4: >>> ld.as r3,[r1,r3] >>> bbit0.nd r3,r2,@.L2 >>> add_s r0,r0,r2 >>> sub r2,r2,1 >>> cmp r2,-1 >>> bne.d @.L4 >>> lsr r3,r2,5 >>> j_s [blink] >>> .size foo, .-foo >>> .ident "GCC: (ARCv2 ISA Linux uClibc toolchain >>> arc-2015.06-rc1-21-g21b2c4b83dfa) >>> 4.8.4" >>> --->8- >>> >>> For initial 32 loop operations, this test is effectively doing 64 bit >>> operation, >>> e.g. (1 << 63) in 32 bit regime. Is this supposed to be undefined, >>> truncated to >>> zero or port specific. >>> >>> If it is truncate to zero then generated code below is not correct as it >>> needs to >>> elide not just the first iteration (corresponding to i = 63) but 63..32 >>> >>> Further ARCompact ISA provides that instructions involving bitpos operands >>> BSET, >>> BCLR, LSL can any number whatsoever, but core will only use the lower 5 >>> bits (so >>> clamping the bitpos to 0..31 w/o need for doing that in code. >>> >>> So is this a gcc bug, or some spec misinterpretation,. >> It is the C language standard that says that shifts like this invoke >> undefined behavior. > > Right, but the compiler is a program nevertheless and it knows what to do > when it > sees 1 << 62 > It's not like there is an uninitialized variable or something which will > provide > unexpected behaviour. > More importantly, the question is can ports define a specific behaviour for > such > cases and whether that would be sufficient to guarantee the semantics. > > The point being ARC ISA provides a neat feature where core only considers > lower 5 > bits of bitpos operands. Thus we can make such behaviour not only > deterministic in > the context of ARC, but also optimal, eliding the need for doing specific > masking/clamping to 5 bits. There is SHIFT_COUNT_TRUNCATED which allows you to combine b & 31 with the shift value if you instead write a << (b & 31). Of course a << 63 is still undefined behavior regardless of target behavior. Richard. > -Vineet