https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54201

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:0106300f6c3f7bae5eb1c46dbd45aa07c94e1b15

commit r11-2944-g0106300f6c3f7bae5eb1c46dbd45aa07c94e1b15
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Mon Aug 31 10:27:00 2020 +0200

    varasm: Optimize memory broadcast for constant vector under AVX512
[PR54201]

    I meant something like the following, which on e.g. a dumb:

    typedef float V __attribute__((vector_size (4 * sizeof (float))));

    void
    foo (V *p, float *q)
    {
      p[0] += (V) { 1.0f, 2.0f, 3.0f, 4.0f };
      q[0] += 4.0f;
      q[1] -= 3.0f;
      q[17] -= 2.0f;
      q[31] += 1.0f;
    }

    testcase merges all the 4 scalar constant pool entries into the
CONST_VECTOR
    one.

    I'm punting for section anchors and not doing it in the per-function (i.e.
    non-shared) constant pools simply because I don't know them well enough,
    don't know whether backends use the offsets for something etc.
    For section anchors, I guess it would need to be done before (re)computing
the
    offsets and arrange for the desc->mark < 0 entries not to be considered as
    objects in the object block, for non-shared pools, perhaps it would be
    enough to call the new function from output_constant_pool before calling
    recompute_pool_offsets and adjust recompute_pool_offsets to ignore
    desc->mark < 0.

    Here is an adjusted patch that ought to merge even the same sized different
    mode vectors with the same byte representation, etc.
    It won't really help with avoiding the multiple reads of the constant in
the
    same function, but as you found, your patch doesn't help with that either.
    Your patch isn't really incompatible with what the patch below does, though
    I wonder whether a) it wouldn't be better to always canonicalize to an
    integral mode with as few elts as possible even e.g. for floats b) whether
    asserting that it simplify_rtx succeeds is safe, whether it shouldn't just
    canonicalize if the canonicalization works and just do what it previously
    did otherwise.

    The following patch puts all pool entries which can be natively encoded
    into a vector, sorts it by decreasing size, determines minimum size
    of a pool entry and adds hash elts for each (aligned) min_size or wider
    power of two-ish portion of the pool constant in addition to the whole pool
    constant byte representation.

    This is the version that passed bootstrap/regtest on both x86_64-linux and
    i686-linux.  In both bootstraps/regtests together, it saved (from the
    statistics I've gathered) 63104 .rodata bytes (before constant merging),
    in 6814 hits of the data->desc->mark = ~(*slot)->desc->labelno;.

    2020-08-31  Jakub Jelinek  <ja...@redhat.com>

            PR middle-end/54201
            * varasm.c: Include alloc-pool.h.
            (output_constant_pool_contents): Emit desc->mark < 0 entries as
            aliases.
            (struct constant_descriptor_rtx_data): New type.
            (constant_descriptor_rtx_data_cmp): New function.
            (struct const_rtx_data_hasher): New type.
            (const_rtx_data_hasher::hash, const_rtx_data_hasher::equal): New
            methods.
            (optimize_constant_pool): New function.
            (output_shared_constant_pool): Call it if TARGET_SUPPORTS_ALIASES.

Reply via email to