https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987
Bug ID: 114987 Summary: floating point vector regression, x86, between gcc 14 and gcc-13 using -O3 and target clones on skylake platforms Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: colin.king at intel dot com Target Milestone: --- Created attachment 58126 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58126&action=edit reproducer.c source code I'm seeing a ~10% performance regression in gcc-14 compared to gcc-13, using gcc on Ubuntu 24.04: Versions: gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4) gcc version 14.0.1 20240412 (experimental) [master r14-9935-g67e1433a94f] (Ubuntu 14-20240412-0ubuntu1) king@skylake:~$ CFLAGS="" gcc-13 reproducer.c; ./a.out 4.92 secs duration, 2130.379 Mfp-ops/sec cking@skylake:~$ CFLAGS="" gcc-14 reproducer.c; ./a.out 5.46 secs duration, 1921.799 Mfp-ops/sec The original issue appeared when regression testing stress-ng vecfp stressor [1] using the floating point vector 16 add stressor method. I've managed to extract the attached reproducer (reproducer.c) from the original code. Salient points to focus on: 1. The issue is dependant on the OPTIMIZE3 macro in the reproducer being __attribute__((optimize("-O3"))) 2. The issue is also dependant on the TARGET_CLONES macro being defined as __attribute__((target_clones("mmx,avx,default"))) - the avx target clones seems to be an issue in reproducing this problem. Attached are the reproducer.c C source and disassembled object code. The stress_vecfp_float_add_16.avx from gcc-13 is significantly different from the gcc-14 code. References: [1] https://github.com/ColinIanKing/stress-ng/blob/master/stress-vecfp.c