https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90168
Bug ID: 90168 Summary: Unstable register allocation result for same source code Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: fxue at os dot amperecomputing.com Target Milestone: --- Supposed a function as the following, in which 'cond', 'S1' and 'S2' are completely irrelevant, means they do not access same variables(in term of RA, they own separate live range set). f1() { if (cond) { S1 } else { S2 } } Ideally, we can expect that register allocation on 'S1'is totally independent of 'S2', w or w/o which makes no difference. Its result should be same as below function consisting of only 'S1': f2() { S1 } But we found gcc does not has this property. Strictly speaking, this is not a bug, but exposes some kind of instability in code generation, has undeterminable impact on some optimization, such as inlining. A real example code is listed here (-m64 -O3, for x86-64). Besides different register number assignments, RA generates different amount of register spills for 'CODES', foo() has 3 spills and foo1() has only 2. int value[10]; int count; int user1; int fncall(void); #define CODES \ int i; \ int sum = 0; \ int lv_in_1 = value[1]; \ int lv_in_2 = value[2]; \ int lv_in_3 = value[3]; \ int lv_in_4 = value[4]; \ int lv_in_5 = value[5]; \ int lv_in_6 = value[6]; \ int lv_in_7 = value[7]; \ \ for (i = 0; i < count; i++) { \ int j; \ sum += lv_in_1; \ sum += lv_in_2; \ sum += lv_in_3; \ sum += lv_in_4; \ sum += lv_in_5; \ sum += lv_in_6; \ sum += lv_in_7; \ \ for (j = 0; j < 16; j++) { \ fncall(); \ sum += lv_in_1 ^ j; \ sum += lv_in_2 ^ j; \ sum += lv_in_3 ^ j; \ sum += lv_in_4 ^ j; \ sum += lv_in_5 ^ j; \ sum += lv_in_6 ^ j; \ sum += lv_in_7 ^ j; \ } \ \ lv_in_1 ^= i; \ lv_in_2 ^= i; \ lv_in_3 ^= i; \ lv_in_4 ^= i; \ lv_in_5 ^= i; \ lv_in_6 ^= i; \ lv_in_7 ^= i; \ } \ user1 = sum void foo() { CODES; } int cond; int user2; void foo1() { if (cond == 5) { CODES; } else { unsigned long long i, j; for (i = 0; i < 100000000; i++) { for (j = 0; j < 100000000; j++) { user2 += i * j; } } } } Investigation shows this is related to integer-based frequency normalization (REG_FREQ_FROM_BB) used by RA, which always rounds up a small frequency (less than 1) to 1. In foo1(), introduction of new code makes profile counts of CODES be decreased, so that impact of frequency normalization error becomes more significant and actually distorts original proportion of profile counts among basic blocks in CODES. For example, in foo(), two blocks have counts of 3 and 100 receptively, and in foo1(), they become 0.3 and 10, after rounding up, they are 1 and 10, thus proportion is changed from (3 vs 100) to (1 vs 10). Possible solution might be to adjust two scale factors used by REG_FREQ_FROM_BB : REG_FREQ_MAX and BB_FREQ_MAX, or to use float type to hold frequency?