On 01/17/2018 05:37 AM, Wilco Dijkstra wrote:
In general I think the best way to achieve this would be to use the
existing cost models which are there for exactly this purpose. If
this doesn't work well enough then we should fix those.

I tried using cost models, and this didn't work, because the costs don't allow us to distinguish between loads and stores. If you mark reg+reg as expensive, you get a performance loss from losing the loads, and a performance gain from losing the stores, and they cancel each other out.

this patch disables a whole class of instructions for a specific
target rather than simply telling GCC that they are expensive and
should only be used if there is no cheaper alternative.

This is the only solution I found that worked.

Also there is potential impact on generic code from:

  (define_insn "*aarch64_simd_mov<VQ:mode>"
    [(set (match_operand:VQ 0 "nonimmediate_operand"
-               "=w, Umq,  m,  w, ?r, ?w, ?r, w")
+               "=w, Umq, Utf,  w, ?r, ?w, ?r, w")
        (match_operand:VQ 1 "general_operand"
-               "m,  Dz, w,  w,  w,  r,  r, Dn"))]
+               "m,  Dz,    w,  w,  w,  r,  r, Dn"))]

It seems an 'm' constraint has special meaning in the register allocator,
using a different constraint can block certain simplifications (for example
merging stack offsets into load/store in the post-reload cleanup pass),
so we'd need to verify this doesn't cause regressions.

No optimizer should be checking for 'm'. They should be checking for CT_MEMORY, which indicates a constraint that accepts memory. Utf is properly marked as a memory constraint.

I did some testing to verify that the patch would not affect other aarch64 targets at the time, though I don't recall now exactly what I did.

Jim


Reply via email to