On Mon, Dec 20, 2021 at 11:44:08AM -0800, H.J. Lu wrote:
> The problem is in
> 
> (define_memory_constraint "TARGET_MEM_CONSTRAINT"
>   "Matches any valid memory."
>   (and (match_code "mem")
>        (match_test "memory_address_addr_space_p (GET_MODE (op), XEXP (op, 0),
>                                                  MEM_ADDR_SPACE (op))")))
> 
> define_register_constraint allows LRA to convert the operand to the form
> '(mem (reg X))', where X is a base register.  I am testing the v2 patch with

If you mean replacing an immediate with a MEM containing that immediate,
isn't that often the right thing though?
I mean, if the register pressure is high and options are either spill some
register, set it to immediate, use it in one instruction and then fill the
spilled register (i.e. 2 memory loads), compared to a MEM use on the
arithmetic instruction one MEM seems cheaper to me.  With -fPIC and the
cst needing runtime relocation slightly less so of course.

The code due to ivopts is trying to have something like
  size_t a = (size_t) &tunable_list;
  size_t b = 0xffffffffffffffa8 - a;
  size_t c = x + b;
and for that cst - &symbol one needs actually 2 registers, one to hold the
constant and one to hold the (%rip) based address.
(insn 790 789 791 111 (set (reg:DI 292)
        (const_int -88 [0xffffffffffffffa8])) "dl-tunables.c":304:15 76 
{*movdi_internal}
     (nil))
(insn 791 790 792 111 (set (reg:DI 293)
        (symbol_ref:DI ("tunable_list") [flags 0x2]  <var_decl 0x7f3460aa9cf0 
tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
     (nil))
(insn 792 791 793 111 (parallel [
            (set (reg:DI 291)
                (minus:DI (reg:DI 292)
                    (reg:DI 293)))
            (clobber (reg:CC 17 flags))
        ]) "dl-tunables.c":304:15 299 {*subdi_1}
     (nil))
(insn 793 792 794 111 (parallel [
            (set (reg:DI 294)
                (plus:DI (reg:DI 291)
                    (reg:DI 198 [ ivtmp.176 ])))
            (clobber (reg:CC 17 flags))
        ]) "dl-tunables.c":304:15 226 {*adddi_1}
     (nil))
It would be smarter to rewrite the above into a lea 88+tunable_list(%rip), 
%temp1
and use a subtraction instead of addition in the last insn above, or of
course in the particular case even consider the following 2 instructions
that do:
(insn 794 793 795 111 (set (reg:DI 296)
        (symbol_ref:DI ("tunable_list") [flags 0x2]  <var_decl 0x7f3460aa9cf0 
tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
     (nil))
(insn 795 794 796 111 (parallel [
            (set (reg:DI 295 [ cur ])
                (plus:DI (reg:DI 294)
                    (reg:DI 296)))
            (clobber (reg:CC 17 flags))
        ]) "dl-tunables.c":304:15 226 {*adddi_1}
     (nil))
and find out that &tuneble_list - &tuneble_list is 0 and we don't need it at
all.  Guess we don't figure that out due to the cast of one of those
addresses to size_t and the other one used in POINTER_PLUS_EXPR as normal
pointer.

        Jakub

Reply via email to