https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
--- Comment #7 from Tejas Belagod <belagod at gcc dot gnu.org> --- I tried this, but it still doesn't seem to fold for aarch64. So, here is the DOM trace for aarch64: Optimizing statement a = *.LC0; LKUP STMT a = *.LC0 with .MEM_3(D) LKUP STMT *.LC0 = a with .MEM_3(D) Optimizing statement vectp_a.5_1 = &a; LKUP STMT vectp_a.5_1 = &a ==== ASGN vectp_a.5_1 = &a Optimizing statement vect__6.6_13 = MEM[(int *)vectp_a.5_1]; Replaced 'vectp_a.5_1' with constant '&aD.2604' LKUP STMT vect__6.6_13 = MEM[(int *)&a] with .MEM_4 2>>> STMT vect__6.6_13 = MEM[(int *)&a] with .MEM_4 Optimizing statement vect_sum_7.7_6 = vect__6.6_13; LKUP STMT vect_sum_7.7_6 = vect__6.6_13 ==== ASGN vect_sum_7.7_6 = vect__6.6_13 Optimizing statement vectp_a.4_7 = vectp_a.5_1 + 16; Replaced 'vectp_a.5_1' with constant '&aD.2604' LKUP STMT vectp_a.4_7 = &a pointer_plus_expr 16 2>>> STMT vectp_a.4_7 = &a pointer_plus_expr 16 ==== ASGN vectp_a.4_7 = &MEM[(void *)&a + 16B] Optimizing statement ivtmp_8 = 1; LKUP STMT ivtmp_8 = 1 ==== ASGN ivtmp_8 = 1 Optimizing statement vect__6.6_10 = MEM[(int *)vectp_a.4_7]; Replaced 'vectp_a.4_7' with constant '&MEM[(voidD.39 *)&aD.2604 + 16B]' Folded to: vect__6.6_10 = MEM[(int *)&a + 16B]; LKUP STMT vect__6.6_10 = MEM[(int *)&a + 16B] with .MEM_4 2>>> STMT vect__6.6_10 = MEM[(int *)&a + 16B] with .MEM_4 Optimizing statement vect_sum_7.7_17 = vect_sum_7.7_6 + vect__6.6_10; Replaced 'vect_sum_7.7_6' with variable 'vect__6.6_13' gimple_simplified to vect_sum_7.7_17 = vect__6.6_10 + vect__6.6_13; Folded to: vect_sum_7.7_17 = vect__6.6_10 + vect__6.6_13; LKUP STMT vect_sum_7.7_17 = vect__6.6_10 plus_expr vect__6.6_13 2>>> STMT vect_sum_7.7_17 = vect__6.6_10 plus_expr vect__6.6_13 ... In x86's case, by this time, the constant vectors have been propagated and folded into a constant vector: Optimizing statement vect_cst_.12_23 = { 0, 1, 2, 3 }; LKUP STMT vect_cst_.12_23 = { 0, 1, 2, 3 } ==== ASGN vect_cst_.12_23 = { 0, 1, 2, 3 } Optimizing statement vect_cst_.11_32 = { 4, 5, 6, 7 }; LKUP STMT vect_cst_.11_32 = { 4, 5, 6, 7 } ==== ASGN vect_cst_.11_32 = { 4, 5, 6, 7 } Optimizing statement vectp.14_2 = &a[0]; LKUP STMT vectp.14_2 = &a[0] ==== ASGN vectp.14_2 = &a[0] Optimizing statement MEM[(int *)vectp.14_2] = vect_cst_.12_23; Replaced 'vectp.14_2' with constant '&aD.1831[0]' Replaced 'vect_cst_.12_23' with constant '{ 0, 1, 2, 3 }' Folded to: MEM[(int *)&a] = { 0, 1, 2, 3 }; LKUP STMT MEM[(int *)&a] = { 0, 1, 2, 3 } with .MEM_3(D) LKUP STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_3(D) LKUP STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_25 2>>> STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_25 Optimizing statement vectp.14_21 = vectp.14_2 + 16; Replaced 'vectp.14_2' with constant '&aD.1831[0]' LKUP STMT vectp.14_21 = &a[0] pointer_plus_expr 16 2>>> STMT vectp.14_21 = &a[0] pointer_plus_expr 16 ==== ASGN vectp.14_21 = &MEM[(void *)&a + 16B] Optimizing statement MEM[(int *)vectp.14_21] = vect_cst_.11_32; Replaced 'vectp.14_21' with constant '&MEM[(voidD.41 *)&aD.1831 + 16B]' Replaced 'vect_cst_.11_32' with constant '{ 4, 5, 6, 7 }' Folded to: MEM[(int *)&a + 16B] = { 4, 5, 6, 7 }; LKUP STMT MEM[(int *)&a + 16B] = { 4, 5, 6, 7 } with .MEM_25 LKUP STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_25 LKUP STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_19 2>>> STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_19 Optimizing statement vectp_a.5_22 = &a; LKUP STMT vectp_a.5_22 = &a ==== ASGN vectp_a.5_22 = &a Optimizing statement vect__13.6_20 = MEM[(int *)vectp_a.5_22]; Replaced 'vectp_a.5_22' with constant '&aD.1831' LKUP STMT vect__13.6_20 = MEM[(int *)&a] with .MEM_19 FIND: { 0, 1, 2, 3 } Replaced redundant expr '# VUSE <.MEM_19> MEM[(intD.6 *)&aD.1831]' with '{ 0, 1, 2, 3 }' ==== ASGN vect__13.6_20 = { 0, 1, 2, 3 } Optimizing statement vect_sum_14.7_13 = vect__13.6_20; Replaced 'vect__13.6_20' with constant '{ 0, 1, 2, 3 }' LKUP STMT vect_sum_14.7_13 = { 0, 1, 2, 3 } ==== ASGN vect_sum_14.7_13 = { 0, 1, 2, 3 } .... While the MEM[vect_ptr + CST] gets replaced correctly by 'a', it doesn't seem to figure out that the literal pool load 'a = *LC0' is nothing but vect_cst_.12_23 = { 0, 1, 2, 3 }; and vect_cst_.11_32 = { 4, 5, 6, 7 }; which is the only major difference between how the const vector is initialized in x86 and aarch64. Is DOM not able to understand 'a = *LC0'?