https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93069

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:319eafce3e54c8cb10e3fddce6823a6a558fca8b

commit r11-147-g319eafce3e54c8cb10e3fddce6823a6a558fca8b
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Wed May 6 20:05:02 2020 +0200

    x86: Fix vextract* masked patterns [PR93069]

    The AVX512F documentation clearly states that in instructions where the
    destination is a memory only merging-masking is possible, not zero-masking,
    and the assembler enforces that.

    The testcase in this patch fails to assemble because of
    Error: unsupported masking for `vextracti32x8'
    on
            vextracti32x8   $0x0, %zmm1, -64(%rsp){%k1}{z}
    For the vector extraction patterns, we apparently have 7 *_maskm patterns
    that only accept memory destinations and rtx_equal_p merge-masking source
    for it, 7 *<mask_name> corresponding patterns that allow memory destination
    only for the non-masked cases (through <store_mask_constraint>), then 2
    *<mask_name> patterns (lo ssehalf V16FI and lo ssehalf VI8F_256 ones) which
    do allow memory destination even for masked cases and are the cause of the
    testsuite failure, because we must not allow C constraint if the
destination
    is m, and finally one pair of patterns (separate * and *_mask, hi ssehalf
    VI4F_256), which has another issue (for which I don't have a testcase
    though), where if it would match zero-masking with register destination,
    it wouldn't emit the needed {z} into assembly.
    The attached patch fixes those 3 issues only, perhaps more suitable for
    backporting.
    But, even with that fixed, we are missing 3 further *_maskm patterns and
    more importantly, I find the split into 3 separate patterns after subst,
    *_maskm for masking with memory destination, *_mask for masking with
    register destination and * for non-masking unnecessarily complex and harder
    for reload, so the included patch below (non-attached) instead kills all
    *_maskm patterns and splits the *<mask_name> patterns into * and *_mask
    by hand instead of subst, where the *_mask ones make sure that with v
    destination they use 0C, while with m destination they use 0 and as
    condition enforce that either destination is not MEM, or rtx_equal_p
between
    the destination and corresponding merging-masking operand source.
    If we had those 3 missing *_maskm patterns, this patch would actually
result
    in both shorter sse.md and shorter machine description after subst (e.g.
    length of tmp-mddump.md), as we don't have them, the patch is actually 16
    lines longer sse.md, but still shorter tmp-mddump.md.

    2020-05-06  Jakub Jelinek  <ja...@redhat.com>

            PR target/93069
            * config/i386/subst.md (store_mask_constraint,
store_mask_predicate):
            Remove.
            (avx512dq_vextract<shuffletype>64x2_1_maskm,
            avx512f_vextract<shuffletype>32x4_1_maskm,
            vec_extract_lo_<mode>_maskm, vec_extract_hi_<mode>_maskm): Remove.
            (<mask_codefor>avx512dq_vextract<shuffletype>64x2_1<mask_name>):
Split
            into ...
            (*avx512dq_vextract<shuffletype>64x2_1,
            avx512dq_vextract<shuffletype>64x2_1_mask): ... these new
            define_insns.  Even in the masked variant allow memory output but
in
            that case use 0 rather than 0C constraint on the source of
masked-out
            elts.
            (<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name>):
Split
            into ...
            (*avx512f_vextract<shuffletype>32x4_1,
            avx512f_vextract<shuffletype>32x4_1_mask): ... these new
define_insns.
            Even in the masked variant allow memory output but in that case use
            0 rather than 0C constraint on the source of masked-out elts.
            (vec_extract_lo_<mode><mask_name>): Split into ...
            (vec_extract_lo_<mode>, vec_extract_lo_<mode>_mask): ... these new
            define_insns.  Even in the masked variant allow memory output but
in
            that case use 0 rather than 0C constraint on the source of
masked-out
            elts.
            (vec_extract_hi_<mode><mask_name>): Split into ...
            (vec_extract_hi_<mode>, vec_extract_hi_<mode>_mask): ... these new
            define_insns.  Even in the masked variant allow memory output but
in
            that case use 0 rather than 0C constraint on the source of
masked-out
            elts.

Reply via email to