Hello, I'd like to apply the following LRA patch to make qsort comparator reload_pseudo_compare_func proper (right now it lacks transitivity due to incorrect use of non_reload_pseudos bitmap, PR 68988).
This function was originally a proper comparator, and the problematic code was added as a fix for PR 57878. However, some time later the fix for PR 60650 really fixed this LRA spill failure, making the original fix unneeded. So now GCC can revert to the original, simpler comparator. The only question is what comparison order would be better for performance. The check in question only matters for multi-reg pseudos, so it matters mostly for 64-bit modes on 32-bit architectures. To investigate that, I've bootstrapped GCC on 32-bit x86 in 4 configurations: 1. Current trunk. [2-4 are with original PR 57878 fix reverted] 2. Original code, with ira_reg_class_max_nregs below regno_assign_info.freq check. 3. Hybrid code, with i_r_c_max_nregs preferred over r_a_i.freq during the second assignment pass, but not first. 4. With i_r_c_max_nregs above r_a_i.freq check, i.e. always do fragmentation avoidance check before the frequency check. This is the original PR 57878 fix proposed by Wei Mi. I have found that cc1 size is largest with variant 1, variants 2 and 3 result in ~500 bytes size reduction, and variant 4 is further ~500 bytes smaller than that (considering only the .text section, debug info variance is larger). I have also tested variants 2 and 4 on SPEC CPU 2000: there's no significant difference in performance (in fact generated code is the same on almost all tests). Therefore I suggest we go with variant 4, implemented by the following patch. Bootstrapped and regtested on 32-bit x86, OK to apply? Thanks. Alexander PR rtl-optimization/57878 PR rtl-optimization/68988 * lra-assigns.c (reload_pseudo_compare_func): Remove fragmentation avoidance test involving non_reload_pseudos. Move frequency test below the general fragmentation avoidance test. diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c index 9208fccfd59..5a65c7c8c5f 100644 --- a/gcc/lra-assigns.c +++ b/gcc/lra-assigns.c @@ -211,24 +211,15 @@ reload_pseudo_compare_func (const void *v1p, const void *v2p) if ((diff = (ira_class_hard_regs_num[cl1] - ira_class_hard_regs_num[cl2])) != 0) return diff; - if ((diff - = (ira_reg_class_max_nregs[cl2][lra_reg_info[r2].biggest_mode] - - ira_reg_class_max_nregs[cl1][lra_reg_info[r1].biggest_mode])) != 0 - /* The code below executes rarely as nregs == 1 in most cases. - So we should not worry about using faster data structures to - check reload pseudos. */ - && ! bitmap_bit_p (&non_reload_pseudos, r1) - && ! bitmap_bit_p (&non_reload_pseudos, r2)) - return diff; - if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq - - regno_assign_info[regno_assign_info[r1].first].freq)) != 0) - return diff; /* Allocate bigger pseudos first to avoid register file fragmentation. */ if ((diff = (ira_reg_class_max_nregs[cl2][lra_reg_info[r2].biggest_mode] - ira_reg_class_max_nregs[cl1][lra_reg_info[r1].biggest_mode])) != 0) return diff; + if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq + - regno_assign_info[regno_assign_info[r1].first].freq)) != 0) + return diff; /* Put pseudos from the thread nearby. */ if ((diff = regno_assign_info[r1].first - regno_assign_info[r2].first) != 0) return diff;