https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93243
Bug ID: 93243 Summary: misoptimization: minor changes of the code leads change up to +/- 30% performance on x86_64, -Os faster than -Ofast/O2/O3 Product: gcc Version: 9.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: leo at yuriev dot ru Target Milestone: --- Briefly: ./heapsort-bench, cc 9.2.1 20191008 pass 1, small: 1.138047 seconds, baseline 1.090476 seconds, case-1, 95.8% of baseline 0.957207 seconds, case-2, 84.1% of baseline 1.163323 seconds, case-1+2, 102.2% of baseline pass 1, large: 2.766881 seconds, baseline 2.677642 seconds, case-1, 96.8% of baseline 3.230149 seconds, case-2, 116.7% of baseline 2.758408 seconds, case-1+2, 99.7% of baseline ./heapsort-bench, cc Clang 9.0.0 (tags/RELEASE_900/final) pass 1, small: 1.048489 seconds, baseline 1.050220 seconds, case-1, 100.2% of baseline 1.056953 seconds, case-2, 100.8% of baseline 1.050501 seconds, case-1+2, 100.2% of baseline pass 1, large: 2.588565 seconds, baseline 2.585488 seconds, case-1, 99.9% of baseline 2.610508 seconds, case-2, 100.8% of baseline 2.587282 seconds, case-1+2, 100.0% of baseline ./heapsort-bench, gcc 7.4.0 (ubuntu) pass 1, small: 0.893917 seconds, baseline 1.135796 seconds, case-1, 127.1% of baseline 0.920338 seconds, case-2, 103.0% of baseline 1.140505 seconds, case-1+2, 127.6% of baseline pass 1, large: 3.804271 seconds, baseline 2.955773 seconds, case-1, 77.7% of baseline 3.908621 seconds, case-2, 102.7% of baseline 2.925845 seconds, case-1+2, 76.9% of baseline The diffs in the source code are: #if CASE & 1 #define CMP(a, b) ((a) < (b)) #else #define CMP(a, b) (((a) - (b)) < 0) #endiF #if CASE & 2 for (size_t root = from; (root + root) <= to;) { size_t child = root << 1; #else for (size_t child, root = from; (child = root + root) <= to;) { #endif gcc 9.x and clang 9.x shows (nearly) the same results on Fedora 31 and Ubunto 19.10. gcc 7.4 probed only on ubuntu, moreover clang 6.0 shown stable results like clang 9. Source code of testcase at https://github.com/leo-yuriev/gcc-issues $ wc heapsort.c 165 528 4309 heapsort.c Using PGO (included in the testcase) does not significantly change the result. Basically these words is seems enough, but more ones I will add tomorrow (likely after afternoon UTC+03). Regards, Leonid.