https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053
--- Comment #6 from luoxhu at gcc dot gnu.org --- "-O2 -ftree-slp-vectorize" could also generate the expected simple fmrs. Reason is pass_cselim will transform conditional stores into unconditional ones with PHI instructions when vectorization and if-conversion is enabled(gcc/tree-ssa-phiopt.c:2482). pr70053.c.108t.cdce: D256_add_finite (_Decimal128 a, _Decimal128 b, _Decimal128 c) { struct TDx2_t D.2914; <bb 2> [local count: 1073741824]: if (b_4(D) == c_5(D)) goto <bb 3>; [34.00%] else goto <bb 4>; [66.00%] <bb 3> [local count: 365072224]: D.2914.td0 = c_5(D); D.2914.td1 = c_5(D); goto <bb 5>; [100.00%] <bb 4> [local count: 708669601]: D.2914.td0 = a_3(D); D.2914.td1 = b_4(D); <bb 5> [local count: 1073741824]: return D.2914; } => pr70053.c.109t.cselim: D256_add_finite (_Decimal128 a, _Decimal128 b, _Decimal128 c) { struct TDx2_t D.2914; _Decimal128 cstore_10; _Decimal128 cstore_11; <bb 2> [local count: 1073741824]: if (b_4(D) == c_5(D)) goto <bb 4>; [34.00%] else goto <bb 3>; [66.00%] <bb 3> [local count: 708669601]: <bb 4> [local count: 1073741824]: # cstore_10 = PHI <c_5(D)(2), a_3(D)(3)> # cstore_11 = PHI <c_5(D)(2), b_4(D)(3)> D.2914.td1 = cstore_11; D.2914.td0 = cstore_10; return D.2914; } Then at expand pass, the PHI instruction "cstore_10 = PHI <c_5(D)(2), a_3(D)(3)>" will be expanded to move for "-O2 -ftree-slp-vectorize". If no such PHI generated, bb3 and bb4 in pr70053.c.108t.cdce will be expanded to STORE/LOAD with TD->DI conversion, causing a lot st/ld conversion finally.