http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59163

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uros at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Looks like it, yes.
In *.jump we still have (IMHO correct):
(insn 2 4 3 2 (set (reg/v/f:DI 89 [ a ])
        (reg:DI 5 di [ a ])) pr59163-2.C:13 85 {*movdi_internal}
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:TI 90)
        (mem:TI (reg/v/f:DI 89 [ a ]) [3 MEM[(const struct A &)a_4(D)]+0 S16
A32])) pr59163-2.C:15 84 {*movti_internal}
     (nil))
(insn 7 6 8 2 (set (mem/c:TI (plus:DI (reg/f:DI 20 frame)
                (const_int -16 [0xfffffffffffffff0])) [3 c+0 S16 A128])
        (reg:TI 90)) pr59163-2.C:15 84 {*movti_internal}
     (nil))
...
(insn 9 8 10 2 (set (reg:V4SF 91 [ vect__7.10 ])
        (mult:V4SF (reg:V4SF 92)
            (mem/c:V4SF (plus:DI (reg/f:DI 20 frame)
                    (const_int -16 [0xfffffffffffffff0])) [2 MEM[(float *)&c]+0
S16 A128]))) pr59163-2.C:17 1269 {*mulv4sf3}
     (nil))

movti_internal handles unaligned loads properly.
Then *.dse1 transforms this into:
(insn 6 3 18 2 (set (reg:TI 90 [ MEM[(const struct A &)a_4(D)] ])
        (mem:TI (reg/v/f:DI 89 [ a ]) [3 MEM[(const struct A &)a_4(D)]+0 S16
A32])) pr59163-2.C:15 84 {*movti_internal}
     (nil))
(insn 18 6 8 2 (set (reg:V4SF 94)
        (subreg:V4SF (reg:TI 90 [ MEM[(const struct A &)a_4(D)] ]) 0))
pr59163-2.C:15 -1
     (expr_list:REG_DEAD (reg:TI 90 [ MEM[(const struct A &)a_4(D)] ])
        (nil)))
...
(insn 9 8 19 2 (set (reg:V4SF 91 [ vect__7.10 ])
        (mult:V4SF (reg:V4SF 92)
            (reg:V4SF 94))) pr59163-2.C:17 1269 {*mulv4sf3}
     (expr_list:REG_DEAD (reg:V4SF 94)
        (expr_list:REG_DEAD (reg:V4SF 92)
            (nil))))
which also looks ok to me.  But then combine combines it into:
(insn 9 8 19 2 (set (reg:V4SF 91 [ vect__7.10 ])
        (mult:V4SF (reg:V4SF 92)
            (mem:V4SF (reg/v/f:DI 89 [ a ]) [3 MEM[(const struct A &)a_4(D)]+0
S16 A32]))) pr59163-2.C:17 1269 {*mulv4sf3}
     (expr_list:REG_DEAD (reg:V4SF 92)
        (nil)))
which is wrong (for pre-AVX), because mulv4sf3 can't accept unaligned memory.
Likely the SSEx pre-AVX predicates assume that an unaligned vector load will be
done using UNSPEC and thus not really mergeable here, and don't count with the
fact that the load could be done using integral mode and just subreged into
vector mode.  Perhaps we need new predicates for this that would fail for
exactly this situation (disallow unaligned scalar load subregged into vector
mode for pre-AVX) and use them everywhere where SSE? doesn't accept unaligned
loads?

Reply via email to