https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93243

            Bug ID: 93243
           Summary: misoptimization: minor changes of the code leads
                    change up to +/- 30% performance on x86_64, -Os faster
                    than -Ofast/O2/O3
           Product: gcc
           Version: 9.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: leo at yuriev dot ru
  Target Milestone: ---

Briefly:

./heapsort-bench, cc 9.2.1 20191008
pass 1, small:
  1.138047 seconds, baseline
  1.090476 seconds, case-1, 95.8% of baseline
  0.957207 seconds, case-2, 84.1% of baseline
  1.163323 seconds, case-1+2, 102.2% of baseline
pass 1, large:
  2.766881 seconds, baseline
  2.677642 seconds, case-1, 96.8% of baseline
  3.230149 seconds, case-2, 116.7% of baseline
  2.758408 seconds, case-1+2, 99.7% of baseline

./heapsort-bench, cc Clang 9.0.0 (tags/RELEASE_900/final)
pass 1, small:
  1.048489 seconds, baseline
  1.050220 seconds, case-1, 100.2% of baseline
  1.056953 seconds, case-2, 100.8% of baseline
  1.050501 seconds, case-1+2, 100.2% of baseline
pass 1, large:
  2.588565 seconds, baseline
  2.585488 seconds, case-1, 99.9% of baseline
  2.610508 seconds, case-2, 100.8% of baseline
  2.587282 seconds, case-1+2, 100.0% of baseline

./heapsort-bench, gcc 7.4.0 (ubuntu)
pass 1, small:
  0.893917 seconds, baseline
  1.135796 seconds, case-1, 127.1% of baseline
  0.920338 seconds, case-2, 103.0% of baseline
  1.140505 seconds, case-1+2, 127.6% of baseline
pass 1, large:
  3.804271 seconds, baseline
  2.955773 seconds, case-1, 77.7% of baseline
  3.908621 seconds, case-2, 102.7% of baseline
  2.925845 seconds, case-1+2, 76.9% of baseline

The diffs in the source code are:
#if CASE & 1
#define CMP(a, b) ((a) < (b))
#else
#define CMP(a, b) (((a) - (b)) < 0)
#endiF

#if CASE & 2
  for (size_t root = from; (root + root) <= to;) {
    size_t child = root << 1;
#else
  for (size_t child, root = from; (child = root + root) <= to;) {
#endif

gcc 9.x and clang 9.x shows (nearly) the same results on Fedora 31 and Ubunto
19.10.
gcc 7.4 probed only on ubuntu, moreover clang 6.0 shown stable results like
clang 9.

Source code of testcase at https://github.com/leo-yuriev/gcc-issues
$ wc heapsort.c
 165  528 4309 heapsort.c

Using PGO (included in the testcase) does not significantly change the result.

Basically these words is seems enough, but more ones I will add tomorrow
(likely after afternoon UTC+03).

Regards, 
Leonid.

Reply via email to