https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110235
Bug ID: 110235 Summary: Wrong use of us_truncate in SSE and AVX RTL representation Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org CC: uros at gcc dot gnu.org Target Milestone: --- Target: x86 After g:921b841350c4fc298d09f6c5674663e0f4208610 added constant-folding for SS_TRUNCATE and US_TRUNCATE some tests in i386.exp started failing: FAIL: gcc.target/i386/avx-vpackuswb-1.c execution test FAIL: gcc.target/i386/avx2-vpackssdw-2.c execution test FAIL: gcc.target/i386/avx2-vpackusdw-2.c execution test FAIL: gcc.target/i386/avx2-vpackuswb-2.c execution test FAIL: gcc.target/i386/sse2-packuswb-1.c execution test >From what I can gather from the documentation for intrinsics like _mm_packus_epi16 the operation they perform is not what we model as us_truncate in RTL. That is, they don't perform a truncation while treating their input as an unsigned value. Rather, they treat the input as a signed value and saturate it to the unsigned min and max of the narrow mode before truncation. In that regard they seem similar to the SQMOVUN instructions in aarch64. I think it'd be best to change the representation of those instructions to a truncating clamp operation, similar to g:b747f54a2a930da55330c2861cd1e344f67a88d9 in aarch64.