On Wed, Oct 02, 2019 at 03:04:35PM -0400, Michael Meissner wrote: > On Tue, Oct 01, 2019 at 06:56:01PM -0500, Segher Boessenkool wrote: > > On Mon, Sep 30, 2019 at 10:12:54AM -0400, Michael Meissner wrote: > > > I needed > > > to add a second memory constraint ('eM') that prevents using a prefixed > > > address. > > > > Do we need both em and eM? Do we ever want to allow prefixed insns but not > > pcrel? Or, alternatively, this only uses eM for some insns where the "p" > > prefix trick won't work (because there are multiple insns in the template); > > we could solve that some other way (by inserting the p's manually for > > example). > > No right now we need one (no prefix) and the other (no pcrel) is desirable, > but > if we can only have one, it will mean extra instructions being generated.
We can have both if we need to, but it should have less confusing names at a minimum. > In the case of no prefix, we need this for stack_protect_testdi and > stack_protect_setdi. There if we have a large stack frame and enable stack > protection, we don't want the register allocator from re-combining the insns > to > an insn with a large offset, which it will do, even though the predicate does > not allow this case. This was discovered when we tried to build glibc, and > one > module (vfwprintf-internal.c) has a large stack frame and is built with > -fstack-protector-all. Yes, but why does it not allow prefixed insns anyway? It does not currently *handle* prefixed insns properly, but that can be fixed. It won't be pretty, but it won't be horrible either. Anyway, we need to be able to handle non-prefixed anyway (for asm, as I mentioned later), so yes we want a constraint for that, and have "m" in inline asm mean that (just like right now it actually means "m but not update form"). > In the case of no pc-rel, this occurs in optimizing vector extracts where the > vector is in memory. In this code, we only have one temporary base register. Why? > In the case where the address is PC-relative, and the element number being > extracted is variable, we would need two temporary base registers, one to hold > the PC-relative address, and the other to hold the offset from the start of > the > vector. So here, we disallow PC-relative addresses, but numeric addresses > that > result in a prefixed instruction are fine, since the code calculates the > offset, adds in the offset, and then does the memory operation. So it should have more scratch registers here? Or, alternatively, we can just disallow all prefixed addressing here? Will that really degrade anything? > If we only had a no prefixed constraint, the code would not combine the vector > extract from memory, and instead, it would load the whole vector into a > register, and then do the appropriate shifts, VSLO, etc. to extract the > element. I imagine the case shows up when you have large stack frames (or > very > large structures). I don't understand. > > But what should inline asm users do wrt prefixed/pcrel memory? Should > > they not use prefixed memory at all? That means for asm we should always > > disallow prefixed for "m". > > Yes, I've been thinking that for some time. But I'm not going to worry about > that until the patches are in. Please do worry about it. > > > 4) In the previous patch, I missed setting the prefixed size and non > > > prefixed > > > size for mov<mode>_ppc64 in the insn. This pattern is used for moving > > > PTImode > > > in GPR registers (on non-VSX systems, it would move TImode also). By the > > > time > > > it gets to final, it will have been split, but it is still useful to get > > > the > > > sizes correct before the mode is split. > > > > So that is a separate patch? Please send it as one, then? > > No, it needs to be part of this patch. It was just missing from the patch I > sent out. This patch does a whole lot of separate things. It needs to be split up, it took ages to review it like this. > > > + /* The LWA instruction uses the DS-form format where the bottom two > > > bits of > > > + the offset must be 0. The prefixed PLWA does not have this > > > + restriction. */ > > > + if (TARGET_PREFIXED_ADDR > > > + && address_is_prefixed (addr, DImode, NON_PREFIXED_DS)) > > > + return true; > > > > Should TARGET_PREFIXED_ADDR be part of address_is_prefixed, instead of > > part of all its callers? > > Just trying to do the test before the call, so the default case (not 'future') > won't have the call/return. Don't micro-optimise like this please, the compiler can do this better than humans can. And humans *can* forget it in places, which gives ICEs. > > What is INSN_FORM_BAD about, in both functions? > > Because I think it is a bad idea to return true in the case where the memory > is > not correct (for example, if somebody created an address with 35-bit offsets, > the address_to_insn_form would return INSN_FORM_BAD, but you would get a false > positive if non_pcrel_memory returned true for it). Add an assert that this doesn't happen, if you are worried that it could. > > > +;; Return 1 if op is either a register operand or a memory operand that > > > does > > > +;; not use a PC-relative address. > > > +(define_predicate "reg_or_non_pcrel_memory" > > > + (match_code "reg,subreg,mem") > > > +{ > > > + if (REG_P (op) || SUBREG_P (op)) > > > + return register_operand (op, mode); > > > + > > > + return non_pcrel_memory (op, mode); > > > +}) > > > > Why do we need this predicate? Should it use register_operand like this, > > or should it use gpc_reg_operand? > > Again, just micro-optimizing. Please don't? Also, that doesn't answer the second question. > > > @@ -6860,6 +6904,12 @@ rs6000_split_vec_extract_var (rtx dest, > > > systems. */ > > > if (MEM_P (src)) > > > { > > > + /* If this is a PC-relative address, we would need another > > > register to > > > + hold the address of the vector along with the variable offset. The > > > + callers should use reg_or_non_pcrel_memory to make sure we don't > > > + get a PC-relative address here. */ > > > > I don't understand this comment, nor the problem. Please expand? > > If you have: > > static vector double v; > > double > get_v_n (size_t n) > { > return vec_extract (v, n); > } > > It calls rs6000_split_vec_extract_var with: > > dest = (mem:V2DF (SYMBOL_REF:DImode "v")) > element = (reg:DI <xxx>) > tmp_gpr = (reg:DI <yyy>) > tmp_altivec = (scratch:V2DF) > > So to turn this into a (MEM:DF ...) to access the element, it needs to > calculate the offset from the beginning of the vector into tmp_gpr. But since > the hardware doesn't allow indexed PC-relative instructions, we have to load > the address into a register to do an indexed load. > > Because that combination is not allowed, the compiler then loads up the > address > into a temporary register, and calls rs6000_split_vec_extract_var with: > (MEM:V2DF (reg:DI <zzz>)) > > I.e. > > rldicl 3,3,0,63 > pla 9,.LANCHOR0@pcrel > sldi 10,3,3 > lfdx 1,9,10 Where is that rldicl generated? You can solve this problem there (already shift it there, saving an instruction as well!) Or you can do sldi 3,3,3 lfdx 1,9,3 srdi 3,3,3 (and that last insn will be deleted later, if r3 isn't used after this). > > > -{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; } > > > - [(set_attr "length" "8,8,8,12,12,8")]) > > > +{ > > > + rs6000_split_multireg_move (operands[0], operands[1]); > > > + DONE; > > > +} > > > + [(set_attr "non_prefixed_length" "8") > > > + (set_attr "prefixed_length" "20")]) > > > > But this one I don't get... Various alternatives were 12 before, what > > changed? Or was it wrong? > > As far as I can tell it was wrong. But I can add 4 to the values if you think > it might be needed. Please check if it was correct, and send a patch fixing it *before* this one if not. > > > @@ -8984,7 +8989,8 @@ (define_insn "*mov<mode>_ppc64" > > > return rs6000_output_move_128bit (operands); > > > } > > > [(set_attr "type" "store,store,load,load,*,*") > > > - (set_attr "length" "8")]) > > > + (set_attr "non_prefixed_length" "8,8,8,8,8,40") > > > + (set_attr "prefixed_length" "20,20,20,20,8,40")]) > > > > What about the 40 here? > > Dunno. The original code just used 8, which is wrong. Same as above then, please: separate patch. Segher