https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119223
Bug ID: 119223 Summary: GCC does not optimize with AVX in bitshift with if condition Product: gcc Version: 14.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kaelfandrew at gmail dot com Target Milestone: --- I decided to create 2 C programs that matches newlines in a file (The file is src/Sema.zig from Zig 0.14) from https://godbolt.org/z/v9hqzPv4b. Both programs behave the same. The only difference is at line 56, where the first C code has no if condition. GCC adds SIMD when no if condition is used as seen in second C program. Clang optimizes both with SIMD. The difference seems to be at -fdump-tree-optimized. Gentoo GCC 14.2 was used and both C programs was optimized with -std=gnu23 -O3 -march=icelake-client -D_FILE_OFFSET_BITS=64 -flto. uname -a is Linux tux 6.6.67-gentoo-gentoo-dist #4 SMP PREEMPT_DYNAMIC Sun Jan 26 03:15:41 EST 2025 x86_64 Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz GenuineIntel GNU/Linux The results are measured from poop with the following speedups: ./poop './main2' './main1' -d 60000 Benchmark 1 (10000 runs): ./main2 measurement mean ± σ min … max outliers delta wall_time 4.58ms ± 972us 2.11ms … 6.88ms 0 ( 0%) 0% peak_rss 3.10MB ± 64.4KB 2.78MB … 3.20MB 1 ( 0%) 0% cpu_cycles 4.97M ± 110K 4.47M … 6.18M 1090 (11%) 0% instructions 12.0M ± 1.19 12.0M … 12.0M 799 ( 8%) 0% cache_references 31.4K ± 528 30.1K … 32.9K 0 ( 0%) 0% cache_misses 4.26K ± 808 2.73K … 10.8K 170 ( 2%) 0% branch_misses 28.1K ± 285 10.4K … 28.2K 153 ( 2%) 0% Benchmark 2 (10000 runs): ./main1 measurement mean ± σ min … max outliers delta wall_time 3.28ms ± 310us 1.54ms … 4.61ms 1807 (18%) - 28.4% ± 0.4% peak_rss 3.10MB ± 64.0KB 2.78MB … 3.20MB 2 ( 0%) - 0.0% ± 0.1% cpu_cycles 2.06M ± 28.2K 2.02M … 2.72M 602 ( 6%) - 58.6% ± 0.0% instructions 2.37M ± 1.14 2.37M … 2.37M 5 ( 0%) - 80.2% ± 0.0% cache_references 31.4K ± 378 30.5K … 32.8K 5 ( 0%) + 0.3% ± 0.0% cache_misses 4.25K ± 809 2.71K … 15.6K 246 ( 2%) - 0.3% ± 0.5% branch_misses 2.16K ± 35.0 1.44K … 2.32K 110 ( 1%) - 92.3% ± 0.0%