This is yet another attempt to fix PR46278 (fake X addressing). After the previous clean-ups it is just a small change.
caller-saves.c tries to eliminate call-clobbered hard-regs allocated to pseudos around function calls and that leads to situations that reload is no more capable to perform all requested spills because of the very few AVR's address registers. Thus, the patch adds a new target option -mstrict-X so that the user can turn that option if he like to do so, and then -fcaller-save is disabled. The patch passes the testsuite without regressions. Moreover, the testsuite passes without regressions if all test cases are run with -mstrict-X and all libraries (libgcc, avr-libc) are built with the new option turned on. The sizes from the test cases attached to the PR are: > avr-gcc vektor-zeichen-i.c -c -std=gnu99 -Os -mmcu=avr4 -mno-strict-X && avr-size vektor-zeichen-i.o text data bss dec hex filename 1084 0 190 1274 4fa vektor-zeichen-i.o > avr-gcc vektor-zeichen-i.c -c -std=gnu99 -Os -mmcu=avr4 -mstrict-X && avr-size vektor-zeichen-i.o text data bss dec hex filename 732 0 190 922 39a vektor-zeichen-i.o > avr-gcc snake.c -c -std=gnu99 -Os -mmcu=avr4 -mno-strict-X && avr-size > snake.o text data bss dec hex filename 1537 0 0 1537 601 snake.o > avr-gcc snake.c -c -std=gnu99 -Os -mmcu=avr4 -mstrict-X && avr-size snake.o text data bss dec hex filename 1417 0 0 1417 589 snake.o So these programs gets smaller, similar for -O2 where the first test case reduces by 30%. Even the test case testsuite/gcc.c-torture/compile/950612-1.c that caused problems in earlier patches with spill fails reduces in size: > avr-gcc 950612-1.c -c -std=gnu99 -Os -mmcu=avr4 -mno-strict-X -save-temps -dp && avr-size 950612-1.o text data bss dec hex filename 7101 0 0 7101 1bbd 950612-1.o > avr-gcc 950612-1.c -c -std=gnu99 -Os -mmcu=avr4 -mstrict-X -save-temps -dp && avr-size 950612-1.o text data bss dec hex filename 6931 0 0 6931 1b13 950612-1.o And again similarly for -O2. For the snake test case, there is room for improvement. The prologue with -Os -mstrict-X reads onRedraw_snake: push r13 push r14 push r15 push r16 push r17 push r28 push r29 rcall . rcall . in r28,__SP_L__ in r29,__SP_H__ /* prologue: function */ /* frame size = 4 */ /* stack size = 11 */ and there is a frame set up without need. The variables put in the frame could just as well live in remaining hard registers saving a frame pointer and accessing the values there altogether. I guess this is fallout from IRA that assigns to stack slots and the program is too complex for reload to fix that. But I see similar bloat (setting up FP without need, sometimes even without using it) for programs without this patch, too. So it's not caused by this batch and general IRA/reload flaw. The results are quite promising IMHO and I'd like to know what you think about it and maybe it's already fine to apply? Johann PR target/46278 * config/avr/avr.c (avr_reg_ok_for_addr_p): Add parameter outer_code and pass it down to avr_regno_mode_code_ok_for_base_p. (avr_legitimate_address_p): Pass outer_code to avr_reg_ok_for_addr_p and use that function in case PLUS. (avr_mode_code_base_reg_class): Depend on avr_strict_X. (avr_regno_mode_code_ok_for_base_p): Ditto, and depend on outer_code. (avr_option_override): Disable -fcaller-saves if -mstrict-X is on. * config/avr/avr.opt (-mstrict-X): New option. (avr_strict_X): New variable reflecting -mstrict-X. * doc/invoke.texi (AVR Options): Document -mstrict-X.
Index: config/avr/avr.opt =================================================================== --- config/avr/avr.opt (revision 179842) +++ config/avr/avr.opt (working copy) @@ -61,3 +61,7 @@ Relax branches mpmem-wrap-around Target Report Make the linker relaxation machine assume that a program counter wrap-around occurs. + +mstrict-X +Target Report Var(avr_strict_X) Init(0) +When accessing RAM, use X as imposed by the hardware, i.e. just use pre-decrement, post-increment and indirect addressing with the X register. Without this option, the compiler may assume that there is an addressing mode X+const similar to Y+const and Z+const and emit instructions to emulate such an addressing mode for X. Index: config/avr/avr.c =================================================================== --- config/avr/avr.c (revision 179843) +++ config/avr/avr.c (working copy) @@ -351,6 +351,17 @@ avr_option_override (void) { flag_delete_null_pointer_checks = 0; + /* caller-save.c looks for call-clobbered hard registers that are assigned + to pseudos that cross calls and tries so save-restore them around calls + in order to reduce the number of stack slots needed. + + This might leads to situations where reload is no more able to cope + with the challenge of AVR's very few address registers and fails to + perform the requested spills. */ + + if (avr_strict_X) + flag_caller_saves = 0; + /* Unwind tables currently require a frame pointer for correctness, see toplev.c:process_options(). */ @@ -1205,11 +1216,12 @@ avr_cannot_modify_jumps_p (void) /* Helper function for `avr_legitimate_address_p'. */ static inline bool -avr_reg_ok_for_addr_p (rtx reg, addr_space_t as ATTRIBUTE_UNUSED, int strict) +avr_reg_ok_for_addr_p (rtx reg, addr_space_t as ATTRIBUTE_UNUSED, + RTX_CODE outer_code, bool strict) { return (REG_P (reg) && (avr_regno_mode_code_ok_for_base_p (REGNO (reg), - QImode, MEM, UNKNOWN) + QImode, outer_code, UNKNOWN) || (!strict && REGNO (reg) >= FIRST_PSEUDO_REGISTER))); } @@ -1221,58 +1233,69 @@ avr_reg_ok_for_addr_p (rtx reg, addr_spa static bool avr_legitimate_address_p (enum machine_mode mode, rtx x, bool strict) { - reg_class_t r = NO_REGS; + bool ok = CONSTANT_ADDRESS_P (x); - if (REG_P (x) - && avr_reg_ok_for_addr_p (x, ADDR_SPACE_GENERIC, strict)) - { - r = POINTER_REGS; - } - else if (CONSTANT_ADDRESS_P (x)) - { - r = ALL_REGS; - } - else if (GET_CODE (x) == PLUS - && REG_P (XEXP (x, 0)) - && CONST_INT_P (XEXP (x, 1)) - && INTVAL (XEXP (x, 1)) >= 0) + switch (GET_CODE (x)) { - rtx reg = XEXP (x, 0); - bool fit = INTVAL (XEXP (x, 1)) <= MAX_LD_OFFSET (mode); - - if (fit) - { - if (! strict - || REGNO (reg) == REG_X - || REGNO (reg) == REG_Y - || REGNO (reg) == REG_Z) - { - r = BASE_POINTER_REGS; - } - - if (reg == frame_pointer_rtx - || reg == arg_pointer_rtx) - { - r = BASE_POINTER_REGS; - } - } - else if (frame_pointer_needed && reg == frame_pointer_rtx) + case REG: + ok = avr_reg_ok_for_addr_p (x, ADDR_SPACE_GENERIC, + MEM, strict); + + if (strict + && DImode == mode + && REG_X == REGNO (x)) { - r = POINTER_Y_REGS; + ok = false; } - } - else if ((GET_CODE (x) == PRE_DEC || GET_CODE (x) == POST_INC) - && REG_P (XEXP (x, 0)) - && avr_reg_ok_for_addr_p (XEXP (x, 0), ADDR_SPACE_GENERIC, strict)) - { - r = POINTER_REGS; - } + break; + + case POST_INC: + case PRE_DEC: + ok = avr_reg_ok_for_addr_p (XEXP (x, 0), ADDR_SPACE_GENERIC, + GET_CODE (x), strict); + break; + case PLUS: + { + rtx reg = XEXP (x, 0); + rtx op1 = XEXP (x, 1); + + if (REG_P (reg) + && CONST_INT_P (op1) + && INTVAL (op1) >= 0) + { + bool fit = IN_RANGE (INTVAL (op1), 0, MAX_LD_OFFSET (mode)); + + if (fit) + { + ok = (! strict + || avr_reg_ok_for_addr_p (reg, ADDR_SPACE_GENERIC, + PLUS, strict)); + + if (reg == frame_pointer_rtx + || reg == arg_pointer_rtx) + { + ok = true; + } + } + else if (frame_pointer_needed + && reg == frame_pointer_rtx) + { + ok = true; + } + } + } + break; + + default: + break; + } + if (avr_log.legitimate_address_p) { - avr_edump ("\n%?: ret=%d=%R, mode=%m strict=%d " + avr_edump ("\n%?: ret=%d, mode=%m strict=%d " "reload_completed=%d reload_in_progress=%d %s:", - !!r, r, mode, strict, reload_completed, reload_in_progress, + ok, mode, strict, reload_completed, reload_in_progress, reg_renumber ? "(reg_renumber)" : ""); if (GET_CODE (x) == PLUS @@ -1288,7 +1311,7 @@ avr_legitimate_address_p (enum machine_m avr_edump ("\n%r\n", x); } - return r == NO_REGS ? 0 : (int)r; + return ok; } /* Attempts to replace X with a valid @@ -7304,10 +7327,13 @@ avr_hard_regno_mode_ok (int regno, enum reg_class_t avr_mode_code_base_reg_class (enum machine_mode mode ATTRIBUTE_UNUSED, - RTX_CODE outer_code ATTRIBUTE_UNUSED, + RTX_CODE outer_code, RTX_CODE index_code ATTRIBUTE_UNUSED) { - return reload_completed ? BASE_POINTER_REGS : POINTER_REGS; + if (!avr_strict_X) + return reload_completed ? BASE_POINTER_REGS : POINTER_REGS; + + return PLUS == outer_code ? BASE_POINTER_REGS : POINTER_REGS; } @@ -7316,19 +7342,20 @@ avr_mode_code_base_reg_class (enum machi bool avr_regno_mode_code_ok_for_base_p (int regno, enum machine_mode mode ATTRIBUTE_UNUSED, - RTX_CODE outer_code ATTRIBUTE_UNUSED, + RTX_CODE outer_code, RTX_CODE index_code ATTRIBUTE_UNUSED) { + bool ok = false; + if (regno < FIRST_PSEUDO_REGISTER && (regno == REG_X || regno == REG_Y || regno == REG_Z || regno == ARG_POINTER_REGNUM)) { - return true; + ok = true; } - - if (reg_renumber) + else if (reg_renumber) { regno = reg_renumber[regno]; @@ -7337,11 +7364,18 @@ avr_regno_mode_code_ok_for_base_p (int r || regno == REG_Z || regno == ARG_POINTER_REGNUM) { - return true; + ok = true; } } - - return false; + + if (avr_strict_X + && PLUS == outer_code + && regno == REG_X) + { + ok = false; + } + + return ok; } Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 179765) +++ doc/invoke.texi (working copy) @@ -486,7 +486,7 @@ Objective-C and Objective-C++ Dialects}. @emph{AVR Options} @gccoptlist{-mmcu=@var{mcu} -mno-interrupts @gol --mcall-prologues -mtiny-stack -mint8} +-mcall-prologues -mtiny-stack -mint8 -mstrict-X} @emph{Blackfin Options} @gccoptlist{-mcpu=@var{cpu}@r{[}-@var{sirevision}@r{]} @gol @@ -10668,6 +10668,23 @@ char will be 1 byte, an int will be 1 by and long long will be 4 bytes. Please note that this option does not comply to the C standards, but it will provide you with smaller code size. + +@item -mstrict-X +@opindex mstrict-X +Use register @code{X} in the way imposed by the hardware. This means +that @code{X} can only be used in indirect, post-increment or +pre-decrement addressing. + +Without this option, the @code{X} register may be used in the same way +as @code{Y} or @code{Z} which then is emulated by additional +instructions. +For example, loading a value with @code{X+const} addressing with a +small @code{const @leq{} 63} to a register @var{Rn} will be printed as +@example +adiw r26, const +ld @var{Rn}, X +sbiw r26, const +@end example @end table @node Blackfin Options