https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974
--- Comment #7 from cqwrteur <unlvsur at live dot com> --- (In reply to cqwrteur from comment #6) > (In reply to Andrew Pinski from comment #5) > > (In reply to cqwrteur from comment #4) > > > (In reply to cqwrteur from comment #3) > > > > (In reply to Andrew Pinski from comment #2) > > > > > There might be another bug about _addcarryx_u64 already. > > > > > > > > This is 32 bit addcarry. > > > > > > but yeah. GCC does not perform optimizations very well to add carries and > > > mul + recognize >>64u <<64u patterns > > > > I mean all of _addcarryx_* intrinsics. > > https://godbolt.org/z/qq3nb49Eq > https://godbolt.org/z/cqoYG35jx > Also this is weird. just extract part of code into function generates > different assembly for __builtin_bit_cast. It must be a inliner bug. my fault for misreading(In reply to Andrew Pinski from comment #5) > (In reply to cqwrteur from comment #4) > > (In reply to cqwrteur from comment #3) > > > (In reply to Andrew Pinski from comment #2) > > > > There might be another bug about _addcarryx_u64 already. > > > > > > This is 32 bit addcarry. > > > > but yeah. GCC does not perform optimizations very well to add carries and > > mul + recognize >>64u <<64u patterns > > I mean all of _addcarryx_* intrinsics. This example is also interesting that -O2, -O3, -Ofast generates much worse assembly than -O1. There is no point for doing SIMD for things like this