On Mon, Nov 18, 2024 at 09:32:29AM +0100, Andreas Krebbel wrote: > Hi Stefan, > > > On 11/12/24 10:35, Stefan Schulze Frielinghaus wrote: > > > > + rtx cond = gen_rtx_LTU (<MODE>mode, gen_rtx_REG (CCL1mode, > > > > CC_REGNUM), const0_rtx); > > > > + if (operands[4] == const0_rtx) > > > > + emit_insn (gen_add<mode>3_carry1_cc (operands[0], operands[2], > > > > operands[3])); > > > > + else > > > If we would just generate the alc pattern with a carry in of 0, wouldn't > > > the > > > other optimizers be able to figure out that the a normal add would do > > > here? > > > This path does not seem to get exercised by your testcases. > > The zero carry in can occur due to > > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a7aec76a74dd38524be325343158d3049b6ab3ac > > which is why we have a special case just in order to emit an add with > > carry out only. > > Ok. I was hoping that this downgrade from add with carry in/out to add with > carry out would happen somewhat "automatically", if the the carry in happens > to be a constant zero. But probably not. > > Your testcases invokes the pattern also with a constant 0 as carry in, but > since you prevent inlining the pattern is never expanded with a const0_rtx. > The testcase in the commit above is x86 specific, so it might make sense to > add an invocation which triggers the code path explicitly. Just to be sure.
Ok, just for the curiosity, my take is that the middle end only emits an u{add,sub}c optab with a constant zero if a pattern is detected as it would occur e.g. for bitints, i.e., where multiple limbs are about to be handled and the first one has a zero carry in. Thus, just invoking the test case with a zero results in an add-overflow only as expected. Anyhow, I have added tests u{add,sub}c-3.c in order to test this path, too. > > > > create a CCU mode comparison result, which is currently consumed as CCL1 > > > by > > > the ALC pattern. This seems to be inconsistent. s390_emit_compare returns > > > the condition, which I think needs to be fed into the alc_carry1_cc > > > pattern > > > as input condition. > > > > > > Also is LTU really the correct code here? CCU + LTU would expect CC1, but > > > you want CC2 or CC3 for the carry in, so GTU should be the better choice. > > > s390_alc_comparison should make sure that only valid combinations are > > > accepted, CCU + GTU would be one of them. > > I was coming up with my own condition since conditions created by > > s390_emit_compare() are of void mode which is why the alc predicates > > s390_alc_comparison() failed since these require GPR mode. I've fixed > > that by using PUT_MODE_RAW on those. I think we could also remove the > > mode from the match_operand which might be more appropriate. I've done > > it the latter way for sub<mode>3_slb_borrow1_cc. Once we have settled > > for one or the other version I will align uaddc/usubc. > > The PLUS/MINUS arithmetic operations require both operands to have a proper > integer mode. So I think dropping the mode from match_operand would be > wrong. On the other hand, an IF_THEN_ELSE always requires the comparison to > have VOIDmode, that's what s390_emit_compare is supposed to be used for. So > taking what s390_emit_compare generates and changing the mode is the right > thing here. Since PUT_MODE is always a bit ugly, we might also want go with > extending s390_emit_compare with a target mode operand defaulting to > VOIDmode instead. Ok, I have added a mode operand to s390_emit_compare() and changed everything accordingly. > > > > > +/* { dg-do run } */ > > > > +/* { dg-options "-O2 -mzarch -save-temps -fdump-tree-optimized" } */ > > > > +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 2 "optimized" { > > > > target lp64 } } } */ > > > > +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 1 "optimized" { > > > > target { ! lp64 } } } } */ > > > > +/* { dg-final { scan-assembler-times "\\talcr\\t" 1 } } */ > > > > +/* { dg-final { scan-assembler-times "\\talcgr\\t" 1 { target lp64 } } > > > > } */ > > > Your checks seem to rely on the testcase being compiled with at least > > > -march=z13. > > I think for the run test you would have to make sure that the test is not > executed on machines older than z13 then. > > /* { dg-do run { target { s390_useable_hw } } } */ Didn't know about this neat predicate. Will probably use it more often in the future :) I will send a v2 shortly. Cheers, Stefan