On 18/04/15 19:17, Maxim Kuvyrkov wrote: >> On Apr 18, 2015, at 8:21 PM, Richard Earnshaw >> <richard.earns...@foss.arm.com> wrote: >> >> On 18/04/15 16:13, Jakub Jelinek wrote: >>> On Sat, Apr 18, 2015 at 03:07:16PM +0100, Richard Earnshaw wrote: >>>> You need to ensure that your scratch register cannot overlap op1, since >>>> the scratch is written before op1 is read. >>> >>> - (clobber (match_scratch:QI 3 "=X,w,X"))] >>> + (clobber (match_scratch:QI 3 "=X,&w,X"))] >>> >>> incremental diff should ensure that, right? >>> >>> Jakub >>> >> >> >> Sorry, where in the patch is that hunk? >> >> I see just: >> >> + (clobber (match_scratch:QI 3 "=X,w,X"))] > > Jakub's suggestion is an incremental patch on top of Kugan's. >
Ah, sorry, I though he was implying it was already in the patch somewhere. >> >> And why would early clobbering the scratch be notably better than the >> original? >> > > It will still be better. With this patch we want to allow RA freedom to > optimally handle both of the following cases: > > 1. operand[1] dies after the instruction. In this case we want operand[0] > and operand[1] to be assigned to the same reg, and operand[3] to be assigned > to a different register to provide a temporary. In this case we don't care > whether operand[3] is early-clobber or not. This case is not optimally > handled with current insn patterns. > > 2. operand[1] lives on after the instruction. In this case we want > operand[0] and operand[3] to be assigned to the same reg, and not clobber > operand[1]. By marking operand[3] early-clobber we ensure that operand[1] is > in a different register from what operand[0] and operand[3] were assigned to. > This case should be handled equally well before and after the patch. > > My understanding is that Kugan's patch with Jakub's fix on top satisfy both > of these cases. > I still don't think it handles all cases efficiently. If we really want the result in a different register from both of the inputs, then now we need two registers for the results, one for the result and another for the temporary. In that case we could have used the result register as the scratch, but now we can't. Maybe we can provide two alternatives, one that early-clobbers the result register but doesn't need a scratch and one that doesn't early-clobber the result, but does need a scratch. So something like (define_insn "aarch64_lshr_sisd_or_int_<mode>3" [(set (match_operand:GPI 0 "register_operand" "=w,&w,w,r") (lshiftrt:GPI (match_operand:GPI 1 "register_operand" "w,w,w,r") (match_operand:QI 2 "aarch64_reg_or_shift_imm_<mode>" "Us<cmode>,w,w,rUs<cmode>"))) (clobber (match_scratch:QI 3 "=X,X,w,X"))] ... but I haven't tested any of that. I would also note the conversation in https://gcc.gnu.org/ml/gcc/2015-04/msg00240.html. That seems to suggest we should be wary of using scratch sequences since the register allocator doesn't account for them properly. R. > -- > Maxim Kuvyrkov > www.linaro.org >