https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67438
--- Comment #11 from Yuri Rumyantsev <ysrumyan at gmail dot com> --- In fact, the problem is quite different although it is caused by non-profitable pattern matching ~X CMP ~Y -> Y CMP X. In general this pattern may be helpful if we can delete not operation, e.g. x1 = ~x; y1 = ~y; if (x1 <cmp> y1) ... and there no any other uses of x1 and y1, i.e. x1 and y1 have single use. But if this is not truth we will increase register pressure since we can not use the same register for x,x1 and y,y1. Richard proposed to use the same simplification for min/max operations but in original test-case nested min/max operation (min(x,min(y,z)) or multi operand min/max (min(x,y,z)) are not recognized by gcc (Note that icc does such transformation) and so this won't help since we have the same register pressure issue: c = ~r; m = ~g; y = ~b; k = min(c, m, y); *out++ = c - k; *out++ = m - k; *out++ = y - k; *out++ = k; and we can see that value of 'c' is used in min computation and resulting store, so if we will use r <cmp> g comparison we will increase live range for r, g, b variables and additional registers will require for them (till comparison). Note also that there exists another issue with path-splitting (aka tail duplication) which duplicate loop back edge and in fact move tail block to hammock. This transformation does not loop useful (at least at given stage of design) but this is another topic for discussion. I'd like to propose to introduce new predicate for pattern matching which tells us how much uses have left-hand side of ~x.