https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114966
--- Comment #5 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- I saw pass_eras optimize BIT_FIELD_REF of big memory into load from small memory Created a replacement for D.161366 offset: 0, size: 64: SR.20D.170101 Created a replacement for D.161366 offset: 64, size: 64: SR.21D.170102 Created a replacement for D.161366 offset: 128, size: 64: SR.22D.170103 Created a replacement for D.161547 offset: 0, size: 256: SR.23D.170104 _8 = BIT_FIELD_REF <MEM[(const struct _SimdWrapper *)&D.159286].D.158970._M_data, 64, 0>; _9 = BIT_FIELD_REF <MEM[(const struct _SimdWrapper *)&D.159286].D.158970._M_data, 64, 64>; _10 = BIT_FIELD_REF <MEM[(const struct _SimdWrapper *)&D.159286].D.158970._M_data, 64, 128>; _11 = {0, _8, _9, _10}; to SR.20_3 = MEM <const long unsigned int> [(struct simd *)&data]; SR.21_13 = MEM <const long unsigned int> [(struct simd *)&data + 8B]; SR.22_14 = MEM <const long unsigned int> [(struct simd *)&data + 16B]; _7 = SR.20_3; _8 = SR.21_13; _9 = SR.22_14; _10 = {0, _7, _8, _9}; So I guess for the later GCC somehow can't be sure the whole 256-bit memory is valid and fail to optimize it with vec_perm_expr?