https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91201
--- Comment #21 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Jakub Jelinek from comment #20) > (In reply to Joel Yliluoma from comment #19) > > If the function return type is changed to "unsigned short", the AVX code > > with "vpextrb" will do a spurious "movzx eax, al" at the end — but if the > > return type is "unsigned int", it will not. The code with "(v)movd" should > > of course do it, if the vector element size is shorter than the return type. > > With movd there is a non-redundant movzxl %al, %eax after the movd in both > unsigned short and unsigned int cases. For {,v}pextrb there is a pattern > that makes the zero extension explicit in the IL: > (insn 28 27 29 2 (set (subreg:SI (reg:QI 87 [ stmp_r_10.10 ]) 0) > (zero_extend:SI (vec_select:QI (subreg:V16QI (reg:V2DI 121) 0) > (parallel [ > (const_int 0 [0]) > ])))) 4165 {*vec_extractv16qi_zext} > (expr_list:REG_DEAD (reg:V2DI 121) > (nil))) > and for unsigned int return type the combiner is able to combine that with > the following > (insn 29 28 34 2 (set (reg:SI 122 [ stmp_r_10.10 ]) > (zero_extend:SI (reg:QI 87 [ stmp_r_10.10 ]))) "pr91201-4.c":9:11 > 119 {*zero_extendqisi2} > (expr_list:REG_DEAD (reg:QI 87 [ stmp_r_10.10 ]) > (nil))) > but it isn't able to merge that for a different extension in the unsigned > short return type. I think an insn similar to (define_insn "*vec_extract<PEXTR_MODE12:mode>_zext" is missing. Like: (define_insn "*vec_extractv16qi_zext_hi" [(set (match_operand:HI 0 "register_operand" "=r,r") (zero_extend:HI (vec_select:QI (match_operand:V16QI 1 "register_operand" "x,v") (parallel [(match_operand:SI 2 "const_0_to_15")]))))] "TARGET_SSE4_1" "@ %vpextrb\t{%2, %1, %k0|%k0, %1, %2} vpextrb\t{%2, %1, %k0|%k0, %1, %2}"