https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110235
Bug ID: 110235
Summary: Wrong use of us_truncate in SSE and AVX RTL
representation
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: wrong-code
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
CC: uros at gcc dot gnu.org
Target Milestone: ---
Target: x86
After g:921b841350c4fc298d09f6c5674663e0f4208610 added constant-folding for
SS_TRUNCATE and US_TRUNCATE some tests in i386.exp started failing:
FAIL: gcc.target/i386/avx-vpackuswb-1.c execution test
FAIL: gcc.target/i386/avx2-vpackssdw-2.c execution test
FAIL: gcc.target/i386/avx2-vpackusdw-2.c execution test
FAIL: gcc.target/i386/avx2-vpackuswb-2.c execution test
FAIL: gcc.target/i386/sse2-packuswb-1.c execution test
>From what I can gather from the documentation for intrinsics like
_mm_packus_epi16 the operation they perform is not what we model as us_truncate
in RTL. That is, they don't perform a truncation while treating their input as
an unsigned value. Rather, they treat the input as a signed value and saturate
it to the unsigned min and max of the narrow mode before truncation. In that
regard they seem similar to the SQMOVUN instructions in aarch64.
I think it'd be best to change the representation of those instructions to a
truncating clamp operation, similar to
g:b747f54a2a930da55330c2861cd1e344f67a88d9 in aarch64.