https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119223

            Bug ID: 119223
           Summary: GCC does not optimize with AVX in bitshift with if
                    condition
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kaelfandrew at gmail dot com
  Target Milestone: ---

I decided to create 2 C programs that matches newlines in a file (The file is
src/Sema.zig from Zig 0.14) from https://godbolt.org/z/v9hqzPv4b. Both programs
behave the same.

The only difference is at line 56, where the first C code has no if condition.
GCC adds SIMD when no if condition is used as seen in second C program. Clang
optimizes both with SIMD. The difference seems to be at -fdump-tree-optimized.

Gentoo GCC 14.2 was used and both C programs was optimized with -std=gnu23 -O3
-march=icelake-client -D_FILE_OFFSET_BITS=64 -flto.
uname -a is Linux tux 6.6.67-gentoo-gentoo-dist #4 SMP PREEMPT_DYNAMIC Sun Jan
26 03:15:41 EST 2025 x86_64 Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz
GenuineIntel GNU/Linux
The results are measured from poop with the following speedups:
./poop './main2' './main1' -d 60000
Benchmark 1 (10000 runs): ./main2
measurement          mean ± σ            min … max           outliers        
delta
wall_time          4.58ms ±  972us    2.11ms … 6.88ms          0 ( 0%)       
0%
peak_rss           3.10MB ± 64.4KB    2.78MB … 3.20MB          1 ( 0%)       
0%
cpu_cycles         4.97M  ±  110K     4.47M  … 6.18M        1090 (11%)       
0%
instructions       12.0M  ± 1.19      12.0M  … 12.0M         799 ( 8%)       
0%
cache_references   31.4K  ±  528      30.1K  … 32.9K           0 ( 0%)       
0%
cache_misses       4.26K  ±  808      2.73K  … 10.8K         170 ( 2%)       
0%
branch_misses      28.1K  ±  285      10.4K  … 28.2K         153 ( 2%)       
0%
Benchmark 2 (10000 runs): ./main1
measurement          mean ± σ            min … max           outliers        
delta
wall_time          3.28ms ±  310us    1.54ms … 4.61ms       1807 (18%)        -
28.4% ±  0.4%
peak_rss           3.10MB ± 64.0KB    2.78MB … 3.20MB          2 ( 0%)         
-  0.0% ±  0.1%
cpu_cycles         2.06M  ± 28.2K     2.02M  … 2.72M         602 ( 6%)        -
58.6% ±  0.0%
instructions       2.37M  ± 1.14      2.37M  … 2.37M           5 ( 0%)        -
80.2% ±  0.0%
cache_references   31.4K  ±  378      30.5K  … 32.8K           5 ( 0%)         
+  0.3% ±  0.0%
cache_misses       4.25K  ±  809      2.71K  … 15.6K         246 ( 2%)         
-  0.3% ±  0.5%
branch_misses      2.16K  ± 35.0      1.44K  … 2.32K         110 ( 1%)        -
92.3% ±  0.0%

Reply via email to