On Sun, Jun 4, 2023 at 12:45 AM Roger Sayle <[email protected]> wrote:
>
>
> This patch is the latest revision of my patch to add support for the
> STC (set carry flag), CLC (clear carry flag) and CMC (complement
> carry flag) instructions to the i386 backend, incorporating Uros'
> previous feedback. The significant changes are (i) the inclusion
> of CMC, (ii) the use of UNSPEC for pattern, (iii) Use of a new
> X86_TUNE_SLOW_STC tuning flag to use alternate implementations on
> pentium4 (which has a notoriously slow STC) when not optimizing
> for size.
>
> An example of the use of the stc instruction is:
> unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) {
> return __builtin_ia32_addcarryx_u32 (1, a, b, c);
> }
>
> which previously generated:
> movl $1, %eax
> addb $-1, %al
> adcl %esi, %edi
> setc %al
> movl %edi, (%rdx)
> movzbl %al, %eax
> ret
>
> with this patch now generates:
> stc
> adcl %esi, %edi
> setc %al
> movl %edi, (%rdx)
> movzbl %al, %eax
> ret
>
> An example of the use of the cmc instruction (where the carry from
> a first adc is inverted/complemented as input to a second adc) is:
> unsigned int bar (unsigned int a, unsigned int b,
> unsigned int c, unsigned int d)
> {
> unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1);
> return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2);
> }
>
> which previously generated:
> movl $1, %eax
> addb $-1, %al
> adcl %esi, %edi
> setnc %al
> movl %edi, o1(%rip)
> addb $-1, %al
> adcl %ecx, %edx
> setc %al
> movl %edx, o2(%rip)
> movzbl %al, %eax
> ret
>
> and now generates:
> stc
> adcl %esi, %edi
> cmc
> movl %edi, o1(%rip)
> adcl %ecx, %edx
> setc %al
> movl %edx, o2(%rip)
> movzbl %al, %eax
> ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
>
>
> 2022-06-03 Roger Sayle <[email protected]>
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>:
> Use new x86_stc or negqi_ccc_1 instructions to set the carry flag.
> * config/i386/i386.h (TARGET_SLOW_STC): New define.
> * config/i386/i386.md (UNSPEC_CLC): New UNSPEC for clc.
> (UNSPEC_STC): New UNSPEC for stc.
> (UNSPEC_CMC): New UNSPEC for cmc.
> (*x86_clc): New define_insn.
> (*x86_clc_xor): New define_insn for pentium4 without -Os.
> (x86_stc): New define_insn.
> (define_split): Convert x86_stc into alternate implementation
> on pentium4.
> (x86_cmc): New define_insn.
> (*x86_cmc_1): New define_insn_and_split to recognize cmc pattern.
> (*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to
> recognize (and eliminate) the carry flag being copied to itself.
> (*setcc_qi_negqi_ccc_2_<mode>): Likewise.
> (neg<mode>_ccc_1): Renamed from *neg<mode>_ccc_1 for gen function.
> * config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/cmc-1.c: New test case.
> * gcc.target/i386/stc-1.c: Likewise.
+;; Clear carry flag.
+(define_insn "*x86_clc"
+ [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_CLC))]
+ "!TARGET_SLOW_STC || optimize_function_for_size_p (cfun)"
+ "clc"
+ [(set_attr "length" "1")
+ (set_attr "length_immediate" "0")
+ (set_attr "modrm" "0")])
+
+(define_insn "*x86_clc_xor"
+ [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_CLC))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "TARGET_SLOW_STC && !optimize_function_for_size_p (cfun)"
+ "xor{l}\t%0, %0"
+ [(set_attr "type" "alu1")
+ (set_attr "mode" "SI")
+ (set_attr "length_immediate" "0")])
I think the above would be better implemented as a peephole2 pattern
that triggers when a register is available. We should not waste a
register on a register starved x86_32 just to set a carry flag. This
is implemented with:
[(match_scratch:SI 2 "r")
at the beginning of the peephole2 pattern that generates x86_clc_xor.
The pattern should be constrained with "TARGET_SLOW_STC &&
!optimize_function_for_size_p()" and x86_clc_xor should be available
only after reload (like e.g. "*mov<mode>_xor").
+;; On Pentium 4, set the carry flag using mov $1,%al;neg %al.
+(define_split
+ [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_STC))]
+ "TARGET_SLOW_STC
+ && !optimize_insn_for_size_p ()
+ && can_create_pseudo_p ()"
+ [(set (match_dup 0) (const_int 1))
+ (parallel
+ [(set (reg:CCC FLAGS_REG)
+ (unspec:CCC [(match_dup 0) (const_int 0)] UNSPEC_CC_NE))
+ (set (match_dup 0) (neg:QI (match_dup 0)))])]
+ "operands[0] = gen_reg_rtx (QImode);")
Same here, this should be a peephole2 to trigger only when a register
is available, again for "TARGET_SLOW_STC &&
!optimize_function_for_size_p()".
Is the new sequence "mov $1,%al; neg %al" any better than existing
"mov $1,%al; addb $-1,%al" or is there another reason for the changed
sequence?
+(define_insn_and_split "*x86_cmc_1"
+ [(set (reg:CCC FLAGS_REG)
+ (unspec:CCC [(geu:QI (reg:CCC FLAGS_REG) (const_int 0))
+ (const_int 0)] UNSPEC_CC_NE))]
+ "!TARGET_SLOW_STC || optimize_function_for_size_p (cfun)"
+ "#"
+ "&& 1"
+ [(set (reg:CCC FLAGS_REG) (unspec:CCC [(reg:CCC FLAGS_REG)] UNSPEC_CMC))])
The above should be the RTL model of x86_cmc. No need to split to an unspec.
Uros.