https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- This was quite ugly already before that change. What the whole operation does is in bswap terminology 2 1 3 4 swap (i.e. lowest byte is the 2nd of val, then 1st, then 3rd, then 4th), which is something bswap pass doesn't handle. So it tried to bswap just the earlier stmts and there it before r16-2601 saw something which constructs 2 1 0 0 (i.e. 2nd byte of val, 1st byte of val, zero, zero) and decides to lower that to (__builtin_bswap32 (val) & 0xffff0000) r<< 16, wonder if that wouldn't be better written as simply (unsigned) (((unsigned short) val) r << 8), i.e. just bswap16 extended, that is also 3 gimple ops (cast, lrotate, cast) without a need for a fancy constant. Anyway, the way bswap pass works, it just analyzes everything and either it is something it decides to optimize and then it does, or not and then it punts. And it walks a bb backwards, trying to optimize each stmt. Now, with random reassociation of large | or ^ or + (for + guess only unsigned ones) operands this sometimes works and sometimes doesn't. Say if I want __builtin_bswap32 (x) | __builtin_bswap32 (y) and write it as ((x & 0xff) << 24) | ((x & 0xff00) << 8) | ((x & 0xff0000) >> 8) | ((x & 0xff000000) >> 24)) | ((y & 0xff) << 24) | ((y & 0xff00) << 8) | ((y & 0xff0000) >> 8) | ((y & 0xff000000) >> 24)) and it isn't reassociated from that form, it is recognized, while if it is ((x & 0xff) << 24) | ((y & 0xff) << 24) | ((x & 0xff00) << 8) | ((y & 0xff00) << 8) | ((x & 0xff0000) >> 8) | ((y & 0xff0000) >> 8) | ((x & 0xff000000) >> 24)) | ((y & 0xff000000) >> 24)) then it won't match. Shall we try to randomly reassociate toplevel |^ or unsigned +? I.e. if we see BIT_IOR_EXPR, BIT_XOR_EXPR or unsigned PLUS_EXPR, in similar way how reassoc collects all operands collect all operands (perhaps with some limit), run find_bswap_or_nop_1 on each leaf operand, sort them based on having a source_stmt non-NULL, precision and vuse and then in each group try to gradually perform_symbolic_merge + verify_symbolic_number_p if some of their combination makes sense and for each such combination note if find_bswap_or_nop_finalize or for bswap pass only also the if (!is_bswap_or_nop_p (tmp_n, cmpxchg, cmpnop, mask, bswap)) stuff after it moved into a helper function makes something recognizable and then pick the largest collection of arguments that can be turned into something recognizable? Plus some flag marking on stmts we've already walked that way, so that it doesn't have terrible compile time complexity.
