Reloading an auto-increment addresses
Hello all, I am porting GCC 4.5.1 for a private target. For one particular test reloading pass is being asked to reload the following instruction: (insn 45 175 46 11 pr20601-1.c:90 (set (reg/f:PQI 3 g3 [70]) (mem/f:PQI (pre_inc:PQI (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55])) [2 S1 A32])) 9 {movpqi_op} (expr_list:REG_INC (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55]) (nil))) The address is invalid in this. Base address should always be stored in the address register. This instruction gets reloaded in the following manner: (insn 175 43 202 11 pr20601-1.c:90 (set (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55]) (reg/f:PQI 12 as0 [orig:49 e.4 ] [49])) 9 {movpqi_op} (nil)) (insn 202 175 203 11 pr20601-1.c:90 (set (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55]) (plus:PQI (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55]) (const_int 1 [0x1]))) 14 {addpqi3} (nil)) (insn 203 202 45 11 pr20601-1.c:90 (set (reg:PQI 28 a0) (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55])) 9 {movpqi_op} (nil)) (insn 45 203 46 11 pr20601-1.c:90 (set (reg/f:PQI 3 g3 [70]) (mem/f:PQI (reg:PQI 28 a0) [2 S1 A32])) 9 {movpqi_op} (nil)) The issue with this reload is that there is no move operation between GP registers and address registers. So insn 203 is invalid. I am catching these kinds in secondary reloads, but auto-increment addressing modes are not handled in that . So if i try to do that in TARGET_SECONDARY_RELOAD i am getting assert failure from reload1.c:emit_input_reload_insns() due to the following code: /* Auto-increment addresses must be reloaded in a special way. */ if (rl->out && ! rl->out_reg) { /* We are not going to bother supporting the case where a incremented register can't be copied directly from OLDEQUIV since this seems highly unlikely. */ gcc_assert (rl->secondary_in_reload < 0); How can i overcome this failure? Can some one suggest a solution? Thanks for the help. Regards, Shafi
Re: Reloading an auto-increment addresses
On 11/02/11 09:46, Mohamed Shafi wrote: How can i overcome this failure? Can some one suggest a solution? Have you defined TARGET_LEGITIMATE_ADDRESS_P and also BASE_REG_CLASS correctly for your target?
Re: Reloading an auto-increment addresses
On 11 February 2011 15:28, Paulo J. Matos wrote: > > > On 11/02/11 09:46, Mohamed Shafi wrote: >> >> How can i overcome this failure? Can some one suggest a solution? >> > > > Have you defined TARGET_LEGITIMATE_ADDRESS_P and also BASE_REG_CLASS > correctly for your target? > > Yes, I have. Register allocator is allocating the wrong registers for the base registers. This probably is due to the fact that address registers cannot be saved and restored directly, a secondary reload is required. There is also the restriction that there is no move operation between the address registers. For that also a secondary reload is required. (I know its weird). I am trying to figure out why register allocator is not assigning a base register. But even then, reload could be asked to reload a auto-increment addresses. Shafi
Pruning for torture tests?
Hi, On Darwin, we have a number of gcc.c-torture fails reported for both ppc and i386 which are bogus (nothing to do with gcc - but simply warning output from a system tool). For dg-based tests these are pruned - I wonder if it would be worth adding a prune capability to the torture suites? opinions? Iain
Volatile memory is not general operand
Hi, I just noticed something very surprising. There's a clause in general_operand (recog.c): if (! volatile_ok && MEM_VOLATILE_P (op)) return 0; Oh... so, a MEM_VOLATILE_P is _not_ a general operand? Why? This is also not referred to in the documentation of general operand so it kind of caught me by surprise. Before gimplification I have: INT_PRIORITIESD.10391[iD.10456] ={v} 0; This is converted to a couple of insn that combine tries to combine into: (set (mem/s/v:QImode ...) (const_int 0)) I have a (set (match_operand:QImode "nonimmediate_operand" "") (match_operand:QImode "general_operand" "")) However, the insn doesn't match cause nonimmediate_operand returns false for (mem/s/v ...). Why is this? And what can I do to match this set? I guess I can only create another pattern. I am just amazed that a volatile memory is not a general_operand. Cheers, PMatos
Re: Volatile memory is not general operand
> I just noticed something very surprising. There's a clause in > general_operand (recog.c): > > if (! volatile_ok && MEM_VOLATILE_P (op)) > return 0; > > Oh... so, a MEM_VOLATILE_P is _not_ a general operand? Why? This is also > not referred to in the documentation of general operand so it kind of > caught me by surprise. It's more of the other way around: MEM_VOLATILE_P is a general operand unless explicitly requested via init_recog_no_volatile. Some passes, like combine, don't track the volatileness of operands precisely, so they disable their manipulation altogether to avoid generating wrong code. -- Eric Botcazou
Scheduling automaton question
Suppose I have two insns, one reserving (A|B|C), and the other reserving A. I'm observing that when the first one is scheduled in an otherwise empty state, it reserves the A unit and blocks the second one from being scheduled in the same cycle. This is a problem when there's an anti-dependence of cost 0 between the two instructions. Vlad - two questions. Is this behaviour what you would expect to happen, and how much work do you think would be involved to fix it (i.e. make the first one transition to a state where we can still reserve any two out of the three units)? Bernd
Re: Volatile memory is not general operand
On 11/02/11 12:03, Eric Botcazou wrote: if (! volatile_ok&& MEM_VOLATILE_P (op)) return 0; . It's more of the other way around: MEM_VOLATILE_P is a general operand unless explicitly requested via init_recog_no_volatile. Some passes, like combine, don't track the volatileness of operands precisely, so they disable their manipulation altogether to avoid generating wrong code. But the piece of code I quoted above is in general_regs. A (mem/v ...) will never be a general_operand, will it? Without that, the insn (set (mem/s/v:QImode ...) (const_int 0)) is never matched by (set (match_operand:QImode "nonimmediate_operand" "") (match_operand:QImode "general_operand" ""))
Re: Scheduling automaton question
On Fri, 11 Feb 2011, Bernd Schmidt wrote: > Suppose I have two insns, one reserving (A|B|C), and the other reserving > A. I'm observing that when the first one is scheduled in an otherwise > empty state, it reserves the A unit and blocks the second one from being > scheduled in the same cycle. This is a problem when there's an > anti-dependence of cost 0 between the two instructions. > > Vlad - two questions. Is this behaviour what you would expect to happen, > and how much work do you think would be involved to fix it (i.e. make > the first one transition to a state where we can still reserve any two > out of the three units)? Could you please clarify a bit: would the modified behavior match what your target CPU does? The current behavior matches CPUs without lookahead in instruction dispatch: the first insn goes to the first matching execution unit (A), the second has to wait. Alexander
Re: Scheduling automaton question
On 02/11/2011 02:13 PM, Alexander Monakov wrote: > Could you please clarify a bit: would the modified behavior match what your > target CPU does? The current behavior matches CPUs without lookahead in > instruction dispatch: the first insn goes to the first matching execution > unit (A), the second has to wait. The CPU I'm working on needs to specify explicitly which unit an insn is using, but to generate optimal code that assignment must be made _after_ scheduling all the insns in a given cycle. Bernd
Re: Volatile memory is not general operand
Hi, On Fri, 11 Feb 2011, Paulo J. Matos wrote: > On 11/02/11 12:03, Eric Botcazou wrote: > > > if (! volatile_ok&& MEM_VOLATILE_P (op)) > > > return 0; > >. > > > > It's more of the other way around: MEM_VOLATILE_P is a general operand > > unless > > explicitly requested via init_recog_no_volatile. Some passes, like combine, > > don't track the volatileness of operands precisely, so they disable their > > manipulation altogether to avoid generating wrong code. > > > > But the piece of code I quoted above is in general_regs. A (mem/v ...) will > never be a general_operand, will it? The piece of code you quoted also is conditional on volatile_ok. Connect that with what Eric said. Ciao, Michael.
Re: Scheduling automaton question
Hi, According to me at this moment the scheduler does not support your needs. I was confronted with a similar problem as yours and I solved it by implementing the TARGET_SCHED_DFA_NEW_CYCLE hook. Inside of the function which supports this hook I choose/set the insn reservation that makes possible to fit as many other insn as possible in the same cycle. During this process I also update the ready list with insns that become ready as a result of scheduling the current insn ( like in your example -insns that are anti-dependent on the current insn and which therefore can be scheduled in the current cycle). Thus the best insn reservation makes possible scheduling antidependent insns of cost zero in the same cycle by avoiding resource conflicts. Alex few changes in the gcc mainline sources --- On Fri, 2/11/11, Bernd Schmidt wrote: > From: Bernd Schmidt > Subject: Scheduling automaton question > To: "GCC List" > Cc: "Vladimir N. Makarov" > Date: Friday, February 11, 2011, 2:33 PM > Suppose I have two insns, one > reserving (A|B|C), and the other reserving > A. I'm observing that when the first one is scheduled in an > otherwise > empty state, it reserves the A unit and blocks the second > one from being > scheduled in the same cycle. This is a problem when there's > an > anti-dependence of cost 0 between the two instructions. > > Vlad - two questions. Is this behaviour what you would > expect to happen, > and how much work do you think would be involved to fix it > (i.e. make > the first one transition to a state where we can still > reserve any two > out of the three units)? > > > Bernd >
Re: Volatile memory is not general operand
On 11/02/11 13:56, Michael Matz wrote: The piece of code you quoted also is conditional on volatile_ok. Connect that with what Eric said. Thanks Michael, I guess I should sleep before asking anything else. Now I understand what Eric said.
RE: Vector permutation only deals with # of vector elements same as mask?
Thanks. Another question. Is there any plan to vectorize the loops like the following ones? for (i=127; i>=0; i--) { x[i] = y[i] + z[i]; } I found that GCC trunk still cannot handle negative step for store. Even it can, it won't be efficient by introducing redundant permutations on load and store. Cheers, Bingfeng > -Original Message- > From: Ira Rosen [mailto:i...@il.ibm.com] > Sent: 10 February 2011 17:22 > To: Bingfeng Mei > Cc: gcc@gcc.gnu.org > Subject: Re: Vector permutation only deals with # of vector elements > same as mask? > > > Hi, > > "Bingfeng Mei" wrote on 10/02/2011 05:35:45 PM: > > > > Hi, > > I noticed that vector permutation gets more use in GCC > > 4.6, which is great. It is used to handle negative step > > by reversing vector elements now. > > > > However, after reading the related code, I understood > > that it only works when the # of vector elements is > > the same as that of mask vector in the following code. > > > > perm_mask_for_reverse (tree-vect-stmts.c) > > ... > > mask_type = get_vectype_for_scalar_type (mask_element_type); > > nunits = TYPE_VECTOR_SUBPARTS (vectype); > > if (!mask_type > > || TYPE_VECTOR_SUBPARTS (vectype) != TYPE_VECTOR_SUBPARTS > (mask_type)) > > return NULL; > > ... > > > > For PowerPC altivec, the mask_type is V16QI. It means that > > compiler can only permute V16QI type. But given the capability of > > altivec vperm instruction, it can permute any 128-bit type > > (V8HI, V4SI, etc). We just need convert in/out V16QI from > > given types and a bit more extra work in producing mask. > > > > Do I understand correctly or miss something here? > > Yes, you are right. The support of reverse access is somewhat limited. > Please see vect_transform_slp_perm_load() in tree-vect-slp.c for > example of > all type permutation support. > > But, anyway, reverse accesses are not supported for altivec's load > realignment scheme. > > Ira > > > > > Thanks, > > Bingfeng Mei > > > > > > > > >
Re: Vector permutation only deals with # of vector elements same as mask?
On 2/11/2011 7:30 AM, Bingfeng Mei wrote: Thanks. Another question. Is there any plan to vectorize the loops like the following ones? for (i=127; i>=0; i--) { x[i] = y[i] + z[i]; } When I last tried, the Sun compilers could vectorize such loops efficiently (for fairly short loops), with appropriate data definitions. The Sun compilers didn't peel for alignment, to improve performance on longer loops, as gcc and others do. For a case with no data overlaps (float * __restrict__ x, ,y,z, or Fortran), loop reversal can do the job. gcc has some loop reversal machinery, but I haven't seen it used for vectorization. In a simple case like this, some might argue there's no reason to write a backward loop when it could easily be reversed in source code, and compilers have been seen to make mistakes in reversal. -- Tim Prince
Re: Scheduling automaton question
Le vendredi 11 février 2011 à 13:33 +0100, Bernd Schmidt a écrit : > Suppose I have two insns, one reserving (A|B|C), and the other reserving > A. I'm observing that when the first one is scheduled in an otherwise > empty state, it reserves the A unit and blocks the second one from being > scheduled in the same cycle. This is a problem when there's an > anti-dependence of cost 0 between the two instructions. If you generate a NDFA ( using '(automata_option "ndfa")' ) it should allow you to schedule both instructions together as this should try all functional unit alternatives. Fred
Re: Scheduling automaton question
On 02/11/2011 07:43 PM, Frédéric RISS wrote: > Le vendredi 11 février 2011 à 13:33 +0100, Bernd Schmidt a écrit : >> Suppose I have two insns, one reserving (A|B|C), and the other reserving >> A. I'm observing that when the first one is scheduled in an otherwise >> empty state, it reserves the A unit and blocks the second one from being >> scheduled in the same cycle. This is a problem when there's an >> anti-dependence of cost 0 between the two instructions. > > If you generate a NDFA ( using '(automata_option "ndfa")' ) it should > allow you to schedule both instructions together as this should try all > functional unit alternatives. Ah, that seems to be exactly what I was looking for. Thanks! I'd expect this won't work too well with define_query_cpu_unit, so I'll need another method to assign units after scheduling. Bernd
Re: loop hoisting fails
"Paulo J. Matos" writes: > On 10/02/11 16:04, Ian Lance Taylor wrote: >> Bother. I've encountered that problem before and I think I used a >> sledgehammer (a local patch). It's definitely a bug that gcse doesn't >> consider costs. >> > > I think I might try also patching my local gcc. I guess the trick is > to check for the cost of the alternative before making the replacement > in gcse, right? Is it possible to have an idea of how you did it? My case was somewhat different. I think I just patched gcse_constant_p. Ian
Re: GCC 4.6 performance regressions
On Thu, Feb 10, 2011 at 3:13 AM, Jonathan Wakely wrote: > On 10 February 2011 05:18, Quentin Neill wrote: >> On Wed, Feb 9, 2011 at 2:42 AM, Jonathan Wakely >> wrote: >>> On 9 February 2011 08:34, Sebastian Pop wrote: For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so building this benchmark with CFLAGS="-O2" would have no effect. >>> >>> Why not? >>> >>> Ignoring the fact -O3 is the highest level for GCC, the manual says: >>> "If you use multiple -O options, with or without level numbers, the >>> last such option is the one that is effective." >>> http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html >>> >>> And CFLAGS="-fno-fast-math -O2" would cancel the effects of -ffast-math too. >>> >> Because the makefile can override CFLAGS in the environment (or in a >> make variable) at any time, and GCC wouldn't even see it. > > Yes, I know how make works, but the example Sebastian gave is a case > where CFLAGS from the command-line or environment will be appended to > the make variable, allowing the default compiler flags to be > overridden. Obviously not all makefiles are written that way > (although most aren't written like your example either) but I was > referring to a specific case. Sorry for assuming. I assumed make variables, I looked at CFLAGS="-O4 -ffast-math $CFLAGS" and in my mind I saw CFLAGS="-O4 -ffast-math $(CFLAGS)" (If it's env var it should be CFLAGS="-O4 -ffast-math $$CFLAGS") -- Quentin
Re: Scheduling automaton question
On 02/11/2011 07:33 AM, Bernd Schmidt wrote: Suppose I have two insns, one reserving (A|B|C), and the other reserving A. I'm observing that when the first one is scheduled in an otherwise empty state, it reserves the A unit and blocks the second one from being scheduled in the same cycle. This is a problem when there's an anti-dependence of cost 0 between the two instructions. Vlad - two questions. Is this behaviour what you would expect to happen, and how much work do you think would be involved to fix it (i.e. make the first one transition to a state where we can still reserve any two out of the three units)? The scheduler can do what you want but if there is no dependence between the instructions. The first cycle multi-pass insn scheduler will try different order of ready insns to choose the best one. But if there is a dependence between the insns only one insn will be in the ready list. I think it is a lot of work to implement the same for dependent insns (and it probably will make the scheduler much slower). Still there are other solution for your problem. The first one proposed by Frederic RISS is to use NDFA (*nondeterministic* asutomata). Itanium uses partially such automata for analogous problem. Another solution (it might not work for some cases) is to use another order in alternatives (e.g. C|B|A instead of A|B|C). In this case the generated automata permit to issue the first and then the 2nd insn because deterministic automata tries first alternative first. So the state without reservation will go a state with reservation C for the first insn, the state with reservation C will go to a state with reservation B for the 1st insn etc.
Re: Target deprecations for 4.6
On Fri, Jan 28, 2011 at 01:11:10AM +, Joseph S. Myers wrote: > Here is a concrete list I propose for deprecation in 4.6; please send > any other suggestions... score-* doesn't have a maintainer and score-elf couldn't build libgcc last I checked (it was also mentioned in your previous message). crx-*? crx-elf can't built libgcc, and hasn't been able to for a while. -Nathan
Re: Target deprecations for 4.6
On Fri, 11 Feb 2011, Nathan Froyd wrote: > On Fri, Jan 28, 2011 at 01:11:10AM +, Joseph S. Myers wrote: > > Here is a concrete list I propose for deprecation in 4.6; please send > > any other suggestions... > > score-* doesn't have a maintainer and score-elf couldn't build libgcc > last I checked (it was also mentioned in your previous message). > > crx-*? crx-elf can't built libgcc, and hasn't been able to for a while. Since the main deprecation patches are now in, feel free to send further patches (to config.gcc and the release notes). I can't quite figure out what the score people are up to, but it doesn't appear to involve a simultaneously maintained set of upstream components that are usable together in their current upstream forms; they got Linux kernel support upstream in 2009 (and don't seem to have maintained it much since then), some time after they got GCC support upstream (and then stopped maintaining it). There may also be subtargets or target-specific features worth considering for deprecation (e.g. ARM -mwords-little-endian was mentioned in a previous cleanup discussion). -- Joseph S. Myers jos...@codesourcery.com