https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88510
--- Comment #2 from Devin Hussey <husseydevin at gmail dot com> --- Update: I did the calculations, and twomul has the same cycle count as goodmul_sse. vmul.i32 with 128-bit operands takes 4 cycles (I assumed it was two), so just like goodmul_sse, it takes 11 cycles.