[Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

crazylht at gmail dot com via Gcc-bugs Wed, 24 Nov 2021 17:15:51 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393


Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #3)
> (In reply to H.J. Lu from comment #2)
> > (In reply to Richard Biener from comment #1)
> > > It isn't the vectorizer but memmove inline expansion.  I'm not sure it's
> > > really a bug, but there isn't a way to disable %ymm use besides disabling
> > > AVX entirely.
> > > HJ?
> > 
> > YMM move is generated by loop distribution which doesn't check
> > TARGET_PREFER_AVX128.
> 
> I think it's generated by gimple_fold_builtin_memory_op which since Richards
> changes accepts bigger now, up to MOVE_MAX * MOVE_RATIO and that ends up
> picking an integer mode via
> 
>               scalar_int_mode mode;
>               if (int_mode_for_size (ilen * 8, 0).exists (&mode)
>                   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
>                   && have_insn_for (SET, mode)
>                   /* If the destination pointer is not aligned we must be
> able
>                      to emit an unaligned store.  */
>                   && (dest_align >= GET_MODE_ALIGNMENT (mode)
>                       || !targetm.slow_unaligned_access (mode, dest_align)
>                       || (optab_handler (movmisalign_optab, mode)
>                           != CODE_FOR_nothing)))
> 
> not sure if there's another way to validate things.

For one single set operation, shouldn't the total size be less than MOVE_MAX
instead of MOVE_MAX * MOVE_RATIO?


      /* If we can perform the copy efficiently with first doing all loads and
         then all stores inline it that way.  Currently efficiently means that
         we can load all the memory with a single set operation and that the
         total size is less than MOVE_MAX * MOVE_RATIO.  */
      src_align = get_pointer_alignment (src);
      dest_align = get_pointer_alignment (dest);
      if (tree_fits_uhwi_p (len)
          && (compare_tree_int
              (len, (MOVE_MAX
                     * MOVE_RATIO (optimize_function_for_size_p (cfun))))
              <= 0)

[Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

Reply via email to