On Wed, May 18, 2022 at 7:30 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Uros, > Very many thanks for the speedy review and approval of my ix86_rtx_costs > patch to correct the cost of multi-word multiplications. In the same > spirit, this patch tidies up a few additional nits I noticed while there: > Multi-word NOT requires two operations, but multi-word NEG requires > three operations. Using SSE, vector NOT requires a pxor with -1, but > AND of NOT is cheap thanks to the existence of pandn. There's also some > legacy (aka incorrect) logic explicitly testing for DImode [independently > of TARGET_64BIT] in determining the cost of logic operations that's not > required. > > There should be no behavioural changes from this patch (as confirmed by > testing) but hopefully vectorization and other middle-end passes can now > rely on more sensible "cost" estimates. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, with > no new failures. Ok for mainline? > > > 2022-05-18 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386.cc (ix86_rtx_costs): Split AND from XOR/IOR. > Multi-word binary logic operations require two instructions. > For vector integer modes, AND with a NOT operand requires only > a single instruction (pandn). > [NOT]: Vector integer NOT requires more than 1 instruction (pxor). > [NEG]: Multi-word negation requires 3 instructions.
- case AND: case IOR: case XOR: if (GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_SIZE (mode) > UNITS_PER_WORD) { - *total = (cost->add * 2 - + (rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed) - << (GET_MODE (XEXP (x, 0)) != DImode)) - + (rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed) - << (GET_MODE (XEXP (x, 1)) != DImode))); + *total = cost->add * 2 + + rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed) + + rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed); + return true; + } + else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT) + *total = ix86_vec_cost (mode, cost->sse_op); + else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD) + *total = cost->add * 2; + else + *total = cost->add; + return false; Shouldn't this be just: if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT) *total = ix86_vec_cost (mode, cost->sse_op); else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD) *total = cost->add * 2; else *total = cost->add; return false; When returning false, subexpressions will be scanned automatically. + else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT) + { + /* pandn is a single instruction. */ + if (GET_CODE (XEXP (x, 0)) == NOT) pandn is also a single instruction with BMI. IMO, this should be also reflected in RTX costs. Uros.