https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97019
Bug ID: 97019 Summary: rs6000:redundant rldicr fed to lvx/stvx Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- When we do the early expansion for altivec built-in function vec_ld/vec_st, we can probably leave some redundant rldicr x,y,0,59 which aims to AND (-16) for the vector access address, since the lvx/stvx will do the aligned and with -16 themselves, they are useless. ===== test case ==== extern int a, b, c; extern vector unsigned long long ev5, ev6, ev7, ev8; int test(unsigned char *pe) { vector unsigned long long v1, v2, v3, v4, v9; vector unsigned long long v5 = ev5; vector unsigned long long v6 = ev6; vector unsigned long long v7 = ev7; vector unsigned long long v8 = ev8; unsigned char *e = pe; do { if (a) { asm("memory"); v1 = __builtin_vec_ld(16, (unsigned long long *)e); v2 = __builtin_vec_ld(32, (unsigned long long *)e); v3 = __builtin_vec_ld(48, (unsigned long long *)e); e = e + 8; for (int i = 0; i < a; i++) { v4 = v5; v5 = __builtin_crypto_vpmsumd(v1, v6); v6 = __builtin_crypto_vpmsumd(v2, v7); v7 = __builtin_crypto_vpmsumd(v3, v8); e = e + 8; } } v5 = __builtin_vec_ld(16, (unsigned long long *)e); v6 = __builtin_vec_ld(32, (unsigned long long *)e); v7 = __builtin_vec_ld(48, (unsigned long long *)e); if (c) b = 1; } while (b); v9 = v4; int p = __builtin_unpack_vector_int128((vector __int128_t)v9, 0); return p; } ==== command ==== -m64 -O2 -mcpu=power8 Currently the function find_alignment_op in RTL swaps pass cares the case where have one single AND operation definition, we can extend it to check all definitions are AND operations and aligned with -16B.