On Thu, Oct 12, 2017 at 10:32 AM, Jakub Jelinek <ja...@redhat.com> wrote: > On Thu, Oct 12, 2017 at 08:32:32AM +0200, Uros Bizjak wrote: >> On Wed, Oct 11, 2017 at 10:59 PM, Jakub Jelinek <ja...@redhat.com> wrote: >> > As can be seen on the testcase below, the *<rotate_insn><mode>3_mask >> > insn/splitter is able to optimize only the case when the and is >> > performed in SImode and then the result subreged into QImode, >> > while if the computation is already in QImode, we don't handle it. >> > >> > Fixed by adding another pattern, bootstrapped/regtested on x86_64-linux and >> > i686-linux, ok for trunk? >> >> We probably want to add this variant to *all* *_mask splitters (there >> are a few of them in i386.md, please grep for "Avoid useless > > Well, not all of them, just the 5 ones that have (subreg:QI (and:SI ...)), > *jcc_bt<mode>_mask expects SImode argument, so doesn't have this kind of > problem. > >> masking"). Which finally begs a question - should we implement this >> simplification in a generic, target-independent way? OTOH, we already > > The target independent way is SHIFT_COUNT_TRUNCATED, but that isn't > applicable to targets which aren't consistent in their behavior across > the whole ISA. x86 has some instructions that truncate the mask and others > that saturate and others (b* with memory operand) that use something > still different. > So we need to apply it only to the instructions which really truncate > the shift counts and the best infrastructure for that I'm aware is the > combiner. > > In order to have just one set of the *_mask patterns we'd need to > change simplify_truncation to canonicalize > (subreg:QI (and:SI (something:SI) (const_int N)) low) > into > (and:QI (subreg:QI (something:SI) low) (const_int lowN)) > and change all these patterns; but I'm afraid that is going to break almost > all WORD_REGISTER_OPERATIONS targets and various others, those actually > perform all the operations in word mode and that is why the first form > is actually preferred for them. Except that if something is already > QImode there is no need for the subreg at all, which is why we I'm afraid > need the two sets of mask patterns. > > So, if you aren't against it, I can extend the patch to handle the 4 > other mask patterns; as for other modes, SImode is what is being handled > already, DImode is not a problem, because the FEs truncate the shift counts > to integer_type_node already, and for HImode I haven't seen problem > probably because most tunings avoid HImode math and so it isn't worth > optimizing.
OK, I think that we can live wtih 4 new patterns. Since these are all written in the same way (as in the patch you posted), the ammended patch is pre-approved for mainline. Thanks, Uros.