On 09/01/14 04:29, Ilya Tocar wrote:

AVX512 added new 16 xmm registers (xmm16-xmm31).
Those registers require evex encoding.
Only 512-bit wide versions of instructions have evex encoding with
avx512f, but all versions have it with avx512vl.
Most instructions have same macroized pattern for 128/256/512 vector
length. They all use constraint 'v', which corresponds to
class ALL_SSE_REGS (xmm0 - xmm31). To disallow e. g. xmm20 in
256-bit case (avx512f) and allow it only in avx512vl case we have
HARD_REGNO_MODE_OK checking for regno being evex-only and
disallowing it if mode is not 512-bit.
Generally this kind of thing has been handled by splitting the register
class into two classes.  I strongly suspect there are numerous places where
we assume that two regs in the same class are interchangeable.
I'm not sure that there are many places where we replace hard regs
without checks. E. g. in regrename we have HARD_REGNO_RENAME_OK.
As far as I understand, idea behind HARD_REGNO_RENAME_OK is that we
should always check when substituting hard reg. Why is regcprop
different, and what's the point of HARD_REGNO_MODE_OK if it is ignored
by some passes?


I realize that's going to require some work in the x86 machine description,
but I think that's going to be a much better approach and save you work in
the long run.


This will approximately double sse.md, as we will need to split all
patterns with 512-bit versions in 2 (512 and 128/256 cases) and play
games with enabling/disabling alternatives depending on flags.
Are you sure that this better than honoring HARD_REGNO_MODE_OK?
As far as I understand, honoring  HARD_REGNO_MODE_OK shouldn't produce
worse code.
I don't see how it doubles the size. You split the class into two classes. Whatever letter your second class has, you use it in conjunction with 'v' that you're already using. Note you do not need different alternatives, you use them in the same alternative.

It's not a question of performance, but of design. I suspect you're really just at the tip of the iceberg with this stuff if you continue to go down the path of having registers in the same class, some of which are allocatable and some of which are not.

The other approach that I believe has been taken has been to mark the new registers as fixed when compiling for hardware where they're not available. But I'm not sure offhand if that would be sufficient to fix this problem.


Jeff

Reply via email to