https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90168

            Bug ID: 90168
           Summary: Unstable register allocation result for same source
                    code
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fxue at os dot amperecomputing.com
  Target Milestone: ---

Supposed a function as the following, in which 'cond', 'S1' and 'S2' are
completely irrelevant, means they do not access same variables(in term of RA,
they own separate live range set).

  f1()
  { 
      if (cond) {
          S1
      } else {
          S2
      }
  }

Ideally, we can expect that register allocation on 'S1'is totally independent
of 'S2', w or w/o which makes no difference. Its result should be same as below
function consisting of only 'S1':

  f2()
  {
      S1
  }

But we found gcc does not has this property. Strictly speaking, this is not a
bug, but exposes some kind of instability in code generation, has
undeterminable impact on some optimization, such as inlining. 

A real example code is listed here (-m64 -O3, for x86-64). Besides different
register number assignments, RA generates different amount of register spills
for 'CODES', foo() has 3 spills and foo1() has only 2.


  int value[10];
  int count;
  int user1;

  int fncall(void);

  #define CODES \
          int i; \
          int sum = 0; \
          int lv_in_1 = value[1]; \
          int lv_in_2 = value[2]; \
          int lv_in_3 = value[3]; \
          int lv_in_4 = value[4]; \
          int lv_in_5 = value[5]; \
          int lv_in_6 = value[6]; \
          int lv_in_7 = value[7]; \
                                \
          for (i = 0; i < count; i++) { \
              int j; \
              sum += lv_in_1; \
              sum += lv_in_2; \
              sum += lv_in_3; \
              sum += lv_in_4; \
              sum += lv_in_5; \
              sum += lv_in_6; \
              sum += lv_in_7; \
                              \
              for (j = 0; j < 16; j++) { \
                  fncall(); \
                  sum += lv_in_1 ^ j; \
                  sum += lv_in_2 ^ j; \
                  sum += lv_in_3 ^ j; \
                  sum += lv_in_4 ^ j; \
                  sum += lv_in_5 ^ j; \
                  sum += lv_in_6 ^ j; \
                  sum += lv_in_7 ^ j; \
              }             \
                            \
              lv_in_1 ^= i; \
              lv_in_2 ^= i; \
              lv_in_3 ^= i; \
              lv_in_4 ^= i; \
              lv_in_5 ^= i; \
              lv_in_6 ^= i; \
              lv_in_7 ^= i; \
          } \
          user1 = sum

  void foo()
  {
      CODES;
  }

  int cond;
  int user2;

  void foo1()
  {
      if (cond == 5) {
          CODES;
      } else {
          unsigned long long i, j;

          for (i = 0; i < 100000000; i++) {
              for (j = 0; j < 100000000; j++) {
                  user2 += i * j;
              }
          }
      }
  }

Investigation shows this is related to integer-based frequency normalization
(REG_FREQ_FROM_BB) used by RA, which always rounds up a small frequency (less
than 1) to 1. In foo1(), introduction of new code makes profile counts of CODES
be decreased, so that impact of frequency normalization error becomes more
significant and actually distorts original proportion of profile counts among
basic blocks in CODES. For example, in foo(), two blocks have counts of 3 and
100 receptively, and in foo1(), they become 0.3 and 10, after rounding up, they
are 1 and 10, thus proportion is changed from (3 vs 100) to (1 vs 10).

Possible solution might be to adjust two scale factors used by REG_FREQ_FROM_BB
: REG_FREQ_MAX and BB_FREQ_MAX, or to use float type to hold frequency?

Reply via email to