https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67530
Bug ID: 67530 Summary: Failure to eliminate dead code produced by vector lowering Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wschmidt at gcc dot gnu.org CC: bergner at gcc dot gnu.org, rguenth at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu Test case gcc.dg/fold-compare-7.c is as follows: /* { dg-do compile } */ /* { dg-options "-O2" } */ typedef float vecf __attribute__((vector_size(8*sizeof(float)))); long f(vecf *f1, vecf *f2){ return ((*f1 == *f2) < 0)[2]; } The initial GIMPLE code is simple (compiled with -O2): f (vecf * f1, vecf * f2) { long int D.2301; vector(8) int D.2299; vector(8) float D.2302; vector(8) float D.2303; vector(8) int D.2304; int D.2305; D.2302 = *f1; D.2303 = *f2; D.2304 = D.2302 == D.2303; D.2299 = D.2304; D.2305 = BIT_FIELD_REF <D.2299, 32, 64>; D.2301 = (long int) D.2305; return D.2301; } However, the vector lowering code expands this so we have a lot of comparisons and BIT_FIELD_REF expressions, most of which turn out to not be needed. The optimized tree dump shows: <bb 2>: _3 = *f1_2(D); _5 = *f2_4(D); _9 = BIT_FIELD_REF <_3, 32, 0>; _10 = BIT_FIELD_REF <_5, 32, 0>; _11 = _9 == _10 ? -1 : 0; _12 = BIT_FIELD_REF <_3, 32, 32>; _13 = BIT_FIELD_REF <_5, 32, 32>; _14 = _12 == _13 ? -1 : 0; _15 = BIT_FIELD_REF <_3, 32, 64>; _16 = BIT_FIELD_REF <_5, 32, 64>; _17 = _15 == _16 ? -1 : 0; _18 = BIT_FIELD_REF <_3, 32, 96>; _19 = BIT_FIELD_REF <_5, 32, 96>; _20 = _18 == _19 ? -1 : 0; _21 = BIT_FIELD_REF <_3, 32, 128>; _22 = BIT_FIELD_REF <_5, 32, 128>; _23 = _21 == _22 ? -1 : 0; _24 = BIT_FIELD_REF <_3, 32, 160>; _25 = BIT_FIELD_REF <_5, 32, 160>; _26 = _24 == _25 ? -1 : 0; _27 = BIT_FIELD_REF <_3, 32, 192>; _28 = BIT_FIELD_REF <_5, 32, 192>; _29 = _27 == _28 ? -1 : 0; _30 = BIT_FIELD_REF <_3, 32, 224>; _31 = BIT_FIELD_REF <_5, 32, 224>; _32 = _30 == _31 ? -1 : 0; _6 = {_11, _14, _17, _20, _23, _26, _29, _32}; _7 = _17; _8 = (long int) _17; return _8; Note that the only instructions in here that matter are: _3 = *f1_2(D); _5 = *f2_4(D); _15 = BIT_FIELD_REF <_3, 32, 64>; _16 = BIT_FIELD_REF <_5, 32, 64>; _17 = _15 == _16 ? -1 : 0; _8 = (long int) _17; return _8; We end up generating really horrible code for this. The middle end should be able to detect that _6 is dead and clean up the rest of this. We don't even get rid of the dead copy _7 = _17. I haven't looked at this carefully, but presumably this is due to the late running of pass_lower_vector. Perhaps running DCE again would be appropriate if pass_lower_vector makes any changes?