https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #22 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by hongtao Liu <liuho...@gcc.gnu.org>: https://gcc.gnu.org/g:cdfa5fe03512f7ac5a293480f634df68fc973060 commit r16-1298-gcdfa5fe03512f7ac5a293480f634df68fc973060 Author: liuhongt <hongtao....@intel.com> Date: Tue Jun 3 14:12:23 2025 +0800 Also handle avx512 kmask & immediate 15 or 3 when VF is 4/2. like r16-105-g599bca27dc37b3, the patch handles redunduant clean up of upper-bits for maskload. .i.e Successfully matched this instruction: (set (reg:V4DF 175) (vec_merge:V4DF (unspec:V4DF [ (mem:V4DF (plus:DI (reg/v/f:DI 155 [ b ]) (reg:DI 143 [ ivtmp.56 ])) [1 S32 A64]) ] UNSPEC_MASKLOAD) (const_vector:V4DF [ (const_double:DF 0.0 [0x0.0p+0]) repeated x4 ]) (and:QI (reg:QI 125 [ mask__29.16 ]) (const_int 15 [0xf])))) For maskstore, looks like it's already optimal(at least I can't make a testcase). So The patch only hanldes maskload. gcc/ChangeLog: PR target/103750 * config/i386/i386.cc (ix86_rtx_costs): Adjust rtx_cost for maskload. * config/i386/sse.md (*<avx512>_load<mode>mask_and15): New define_insn_and_split. (*<avx512>_load<mode>mask_and3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512f-pr103750-3.c: New test.