Christophe Lyon <christophe.l...@linaro.org> writes: > On Fri, 29 Jun 2018 at 13:36, Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Richard Sandiford <richard.sandif...@arm.com> writes: >> > This patch is the main part of PR85694. The aim is to recognise at least: >> > >> > signed char *a, *b, *c; >> > ... >> > for (int i = 0; i < 2048; i++) >> > c[i] = (a[i] + b[i]) >> 1; >> > >> > as an over-widening pattern, since the addition and shift can be done >> > on shorts rather than ints. However, it ended up being a lot more >> > general than that. >> > >> > The current over-widening pattern detection is limited to a few simple >> > cases: logical ops with immediate second operands, and shifts by a >> > constant. These cases are enough for common pixel-format conversion >> > and can be detected in a peephole way. >> > >> > The loop above requires two generalisations of the current code: support >> > for addition as well as logical ops, and support for non-constant second >> > operands. These are harder to detect in the same peephole way, so the >> > patch tries to take a more global approach. >> > >> > The idea is to get information about the minimum operation width >> > in two ways: >> > >> > (1) by using the range information attached to the SSA_NAMEs >> > (effectively a forward walk, since the range info is >> > context-independent). >> > >> > (2) by back-propagating the number of output bits required by >> > users of the result. >> > >> > As explained in the comments, there's a balance to be struck between >> > narrowing an individual operation and fitting in with the surrounding >> > code. The approach is pretty conservative: if we could narrow an >> > operation to N bits without changing its semantics, it's OK to do that if: >> > >> > - no operations later in the chain require more than N bits; or >> > >> > - all internally-defined inputs are extended from N bits or fewer, >> > and at least one of them is single-use. >> > >> > See the comments for the rationale. >> > >> > I didn't bother adding STMT_VINFO_* wrappers for the new fields >> > since the code seemed more readable without. >> > >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Here's a version rebased on top of current trunk. Changes from last time: >> >> - reintroduce dump_generic_expr_loc, with the obvious change to the >> prototype >> >> - fix a typo in a comment >> >> - use vect_element_precision from the new version of 12/n. >> >> Tested as before. OK to install? >> > > Hi Richard, > > This patch introduces regressions on arm-none-linux-gnueabihf: > gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2 > gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times > vect "vect_recog_widen_shift_pattern: detected" 2 > gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2 > gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect > "vect_recog_widen_shift_pattern: detected" 2 > gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2 > gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times > vect "vect_recog_widen_shift_pattern: detected" 2 > gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2 > gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect > "vect_recog_widen_shift_pattern: detected" 2 > gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8 > gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect > "vect_recog_widen_shift_pattern: detected" 8 > gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1 > gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect > "vect_recog_widen_shift_pattern: detected" 1 > gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1 > gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect > "vect_recog_widen_shift_pattern: detected" 1 > gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2 > gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect > "vect_recog_widen_shift_pattern: detected" 2
Sorry about that, was caused by a stupid typo. I've applied the below as obvious. (For the record, it was actually 12/n that caused this. 14/n hasn't been applied yet.) Thanks, Richard 2018-07-02 Richard Sandiford <richard.sandif...@arm.com> gcc/ * tree-vect-patterns.c (vect_recog_widen_shift_pattern): Fix typo in dump string. Index: gcc/tree-vect-patterns.c =================================================================== --- gcc/tree-vect-patterns.c 2018-07-02 14:30:57.000000000 +0100 +++ gcc/tree-vect-patterns.c 2018-07-02 14:30:57.383750450 +0100 @@ -1739,7 +1739,7 @@ vect_recog_widen_shift_pattern (vec<gimp { return vect_recog_widen_op_pattern (stmts, type_out, LSHIFT_EXPR, WIDEN_LSHIFT_EXPR, true, - "vect_widen_shift_pattern"); + "vect_recog_widen_shift_pattern"); } /* Detect a rotate pattern wouldn't be otherwise vectorized: