On Tue, Jul 27, 2021 at 6:39 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Tue, Jul 27, 2021 at 06:33:24PM +0800, Hongtao Liu wrote: > > > AVX2 introduced vector >> vector shifts, but unfortunately for > > > V{2,4}DImode > > > it only supports logical and not arithmetic shifts, only AVX512F for > > > V8DImode or AVX512VL for V{2,4}DImode fixed that omission. > > > Earlier in GCC12 cycle I've committed vector >> scalar arithmetic shift > > > emulation using various sequences, this patch handles the vector >> vector > > > case. No need to adjust costs, the previous cost adjustment actually > > > covers even the vector by vector shifts. > > > The patch emits the right arithmetic V{2,4}DImode shifts using 2 logical > > > right > > > V{2,4}DImode shifts (once of the original operands, once of sign mask > > > constant by the vector shift count), xor and subtraction, on each element > > > (long long) x >> y is done as > > > (((unsigned long long) x >> y) ^ (0x8000000000000000ULL >> y)) > > > - (0x8000000000000000ULL >> y) > > I'm wondering when y > 64, would the transformation still be proper. > > Guess since it's UD, compiler can do anything. > > The patch is changing optabs, not something from target builtins where the > intrinsics might make it well defined. > In the optabs out of bound shifts (including y == 64) are UB - i386.h > doesn't define SHIFT_COUNTS_TRUNCATED. Thanks for the explanation, patch LGTM. > > Jakub >
-- BR, Hongtao