https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110235

            Bug ID: 110235
           Summary: Wrong use of us_truncate in SSE and AVX RTL
                    representation
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
                CC: uros at gcc dot gnu.org
  Target Milestone: ---
            Target: x86

After g:921b841350c4fc298d09f6c5674663e0f4208610 added constant-folding for
SS_TRUNCATE and US_TRUNCATE some tests in i386.exp started failing:
FAIL: gcc.target/i386/avx-vpackuswb-1.c execution test
FAIL: gcc.target/i386/avx2-vpackssdw-2.c execution test
FAIL: gcc.target/i386/avx2-vpackusdw-2.c execution test
FAIL: gcc.target/i386/avx2-vpackuswb-2.c execution test
FAIL: gcc.target/i386/sse2-packuswb-1.c execution test

>From what I can gather from the documentation for intrinsics like
_mm_packus_epi16 the operation they perform is not what we model as us_truncate
in RTL. That is, they don't perform a truncation while treating their input as
an unsigned value. Rather, they treat the input as a signed value and saturate
it to the unsigned min and max of the narrow mode before truncation. In that
regard they seem similar to the SQMOVUN instructions in aarch64.

I think it'd be best to change the representation of those instructions to a
truncating clamp operation, similar to
g:b747f54a2a930da55330c2861cd1e344f67a88d9 in aarch64.

Reply via email to