On Thu, Nov 29, 2018 at 08:13:48PM +0530, Umesh Kalappa wrote:
> We are able to fix the subjected issue  with the  peephole patterns
> (target specific) in the md file (attached the patch pr54589.patch).
> While testing the fix ,we end up with some of the C constructs like

The right thing for this sort of transformations is generally combiner
and it works for most complex addressing cases.  The reason why it doesn't
work in this case is that:
Failed to match this instruction:
(set (mem:SI (reg:DI 97) [2 *dst_7(D)+0 S4 A32])
    (vec_select:SI (mem:V4SI (plus:DI (plus:DI (mult:DI (reg:DI 90 [ *src_4(D) 
])
                        (const_int 16 [0x10]))
                    (reg/f:DI 16 argp))
                (const_int 16 [0x10])) [1 p.array S16 A128])
        (parallel [
                (const_int 0 [0])
            ])))
Indeed, x86 doesn't have scale 16 addressing mode nor anything higher,
so the above isn't recognized and combiner only tries
Failed to match this instruction:
(set (reg:DI 95)
    (plus:DI (ashift:DI (reg:DI 90 [ *src_4(D) ])
            (const_int 4 [0x4]))
        (reg/f:DI 16 argp)))
after it, which doesn't help.

For (x + N) << M, replacing it with (x << M) + (N << M) isn't generally
a win, e.g. the (N << M) constant could be expensive to construct in a
register, so optimizing that at GIMPLE level is not right, furthermore
it isn't even visible at GIMPLE level, as it is just part of ARRAY_REF
addressing (or could be).

Not sure how hard would be to teach combiner to retry for the addressing
case with the mult/ashift replaced with a register and if successful,
emit that multiplication/shift as a separate instruction in 3->2 or 4->2
combination.  Segher?

For your patch, not really sure if peephole is the best place, it is too
late (after RA), and your patch is in some cases too narrow (e.g. hardcoding
DImode for the addresses, Pmode might be better), especially because you
want to apply it for any kind of MEM, without caring about what mode that
mem has, nor where in the instruction it appears and what else the
instruction does.  It is equally useful if the mem already has some
immediate if the two can be combined, or if it doesn't have a base, etc.

        Jakub

Reply via email to