[Bug target/66369] gcc 4.8.3/5.1.0 miss optimisation with vpmovmskb

2015-06-04 Thread marcus.kool at urlfilterdb dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66369 --- Comment #8 from Marcus Kool --- (In reply to Uroš Bizjak from comment #5) > Created attachment 35693 [details] > Patch to add zero-extended MOVMSK patterns > > This patch adds zero-extended MOVMSK patterns. > > However, one more cast from (

[Bug target/66369] gcc 4.8.3/5.1.0 miss optimisation with vpmovmskb

2015-06-02 Thread marcus.kool at urlfilterdb dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66369 --- Comment #4 from Marcus Kool --- > The intrinsic returns "int", and from the above tree dump, the compiler > won't even consider to combine the sign-extension with vpmovmskb. > > So, why not: > >unsigned int v; > >v = (unsigned int)

[Bug target/66369] gcc 4.8.3/5.1.0 miss optimisation with vpmovmskb

2015-06-02 Thread marcus.kool at urlfilterdb dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66369 --- Comment #3 from Marcus Kool --- > The intrinsic returns "int", and from the above tree dump, the compiler > won't even consider to combine the sign-extension with vpmovmskb. That is the core of the issue: the part of gcc that deals with int

[Bug c/66369] gcc 4.8.3/5.1.0 miss optimisation with vpmovmskb

2015-06-01 Thread marcus.kool at urlfilterdb dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66369 Marcus Kool changed: What|Removed |Added Keywords||missed-optimization Known to fail|

[Bug c/66369] New: gcc 4.8.3/5.1.0 miss optimisation with vpmovmskb

2015-06-01 Thread marcus.kool at urlfilterdb dot com
: c Assignee: unassigned at gcc dot gnu.org Reporter: marcus.kool at urlfilterdb dot com Target Milestone: --- Created attachment 35672 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35672&action=edit example C code to demonstrate the missed optimisation in gcc 4.

[Bug target/63791] use 32-byte version of vpbroadcastb (and register to poulate) on AVX/AVX2 platforms

2015-05-01 Thread marcus.kool at urlfilterdb dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63791 Marcus Kool changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|INVALID

[Bug target/63791] use 32-byte version of vpbroadcastb (and register to poulate) on AVX/AVX2 platforms

2015-05-01 Thread marcus.kool at urlfilterdb dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63791 --- Comment #4 from Marcus Kool --- > movl%edi, -12(%rsp) > vpxor %xmm1, %xmm1, %xmm1 > vmovd -12(%rsp), %xmm0 > xorl%eax, %eax > vpshufb %xmm1, %xmm0, %xmm0 The xorl instruction is part of an

[Bug target/63791] use 32-byte version of vpbroadcastb (and register to poulate) on AVX/AVX2 platforms

2015-05-01 Thread marcus.kool at urlfilterdb dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63791 --- Comment #3 from Marcus Kool --- Created attachment 35436 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35436&action=edit example code to show code generation on AVX platform (avx.c)

[Bug target/63791] use 32-byte version of vpbroadcastb (and register to poulate) on AVX/AVX2 platforms

2015-05-01 Thread marcus.kool at urlfilterdb dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63791 Marcus Kool changed: What|Removed |Added Keywords||missed-optimization Summary|use

[Bug c/63791] New: use 32-byte version of vpbroadcastb on AVX2 platform

2014-11-09 Thread marcus.kool at urlfilterdb dot com
Component: c Assignee: unassigned at gcc dot gnu.org Reporter: marcus.kool at urlfilterdb dot com Created attachment 33926 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33926&action=edit code with _mm256_set1_epi8, _mm256_loadu_si256, _mm256_cmpeq_epi8, _mm256_movema