https://gcc.gnu.org/bugzilla/show_bug.cgi?id=73350

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2016-08-11
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |kyukhin at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
There are at least two different issues.  The mask issue and the CSE issue.

For the mask issue, the problem is that in order to decrease the number of md
builtin functions for the intrinsics, many have mask argument and rely on
combine to combine the all ones mask with the various instructions that use the
mask - the vec_merge is then simplified and we get the non-masked insns.
The problem is if there are multiple such instructions, because CSE notices the
same all ones is used in multiple instructions and CSEs them into a single all
ones pseudo for the given mode that is then used by multiple instructions.
combine then attempts to merge this all ones setter with the arithmetics insn,
but fails:
Failed to match this instruction:
(parallel [
        (set (reg:V16SF 110)
            (plus:V16SF (reg:V16SF 102)
                (reg:V16SF 106)))
        (unspec [
                (const_int 9 [0x9])
            ] UNSPEC_EMBEDDED_ROUNDING)
        (set (reg:HI 105)
            (const_int -1 [0xffffffffffffffff]))
    ])
because of the extra set.  So, either we need to do some hacks at expansion
time or in some machine specific pass before reload (ideally also before
combine) that would just attempt to change all vec_merges where the mask is a
pseudo known to have all ones.  Or we could perhaps allow -1 as the mask
operand, let me play with a patch.

The other thing are the embedded roundings, the reason for the CSE is that it
is represented by unspec sitting next to the actual RTL operation, not around
it.  Either we'd need to introduce some generic RTL that would represent
rounding mode, or wrap the operation inside of the unspec or something similar
to avoid the problems.

Reply via email to