Re: LRA for avr: help with FP and elimination
On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the > content is safe > > On 7/13/23 05:27, SenthilKumar.Selvaraj--- via Gcc wrote: > > Hi, > > > >I've been spending some (spare) time checking what it would take to > >make LRA work for the avr target. > > > >Right after I removed the TARGET_LRA_P hook disabling LRA, building > >libgcc failed with a weird ICE. > > On the avr, the stack pointer (SP) > >is not used to access stack slots > It is very uncommon target then. > > - TARGET_CAN_ELIMINATE returns false > >if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true > >if get_frame_size() > 0. > > > >With LRA, however, reload generates > > > > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__) > > (const_int 1 [0x1])) [2 %sfp+1 S1 A8]) > > (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 > > {movqi_insn_split} > > (nil)) > > > >and the backend code errors out when it finds SP is being used as a > >pointer register. > > > >Digging through the RTL dumps, I found the following. For the > >following insn sequence in *.ira > > > > (insn 189 128 159 7 (set (reg:HI 58 [ b ]) > > (const_int 0 [0])) "case.c":7:7 101 {*movhi_split} > > (nil)) > > (insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0) > > (reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split} > > (nil)) > > (insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1) > > (reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split} > > (nil)) > > > >1. For r58, IRA picks R28:R29, which is the frame pointer for avr. > > > >Popping a13(r58,l0) -- assign reg 28 > > > >2. LRA sees the subreg in insn 159 and generates a reload reg > >(r125). simplify_subreg_regno (lra-constraints.cc:1810) however > >bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and > >reload isn't completed yet. LRA therefore decides rclass for the > >pseudo reg is NO_REGS. > > > > > > Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg > > r125 > >159: r125:HI#0=r86:QI > > > >4. As rclass is NO_REGS, LRA picks an insn alternative that involves > > memory. > >That is my understanding, please correct me if I'm wrong. > > > > 0 Small class reload: reject+=3 > > 0 Non input pseudo reload: reject++ > > Cycle danger: overall += LRA_MAX_REJECT > >alt=0,overall=610,losers=1,rld_nregs=1 > > 0 Small class reload: reject+=3 > > 0 Non input pseudo reload: reject++ > > alt=1: Bad operand -- refuse > > 0 Non pseudo reload: reject++ > >alt=2,overall=1,losers=0,rld_nregs=0 > >Choosing alt 2 in insn 159: (0) Qm (1) rY00 {movqi_insn_split} > > > >5. LRA creates stack slots, and then uses the FP register to access > >the slots. This is despite r58 already being assigned R28:R29. > > > >6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore > > frame_pointer_needed is not set, despite the creation of stack > > slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP, > > and this eventually causes the ICE when the avr backend sees SP being > > used as a pointer register. > > > >This is the relevant sequence after reload > > > > (insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58]) > > (const_int 0 [0])) "case.c":7:7 101 {*movhi_split} > > (nil)) > > (insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__) > > (const_int 1 [0x1])) [2 %sfp+1 S2 A8]) > > (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split} > > (nil)) > > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__) > > (const_int 1 [0x1])) [2 %sfp+1 S1 A8]) > > (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 > > {movqi_insn_split} > > (nil)) > > (insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58]) > > (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__) > > (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 > > {*movhi_split} > > (nil)) > > (insn 241 240 160 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__) > > (const_int 1 [0x1])) [2 %sfp+1 S2 A8]) > > (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split} > > (nil)) > > (insn 160 241 242 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__) > > (const_int 2 [0x2])) [2 %sfp+2 S1 A8]) > > (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 > > {movqi_insn_split} > > (nil)) > > (insn 242 160 33 7 (set (reg:HI 28 r28 [orig:58 b ] [58]) > > (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__) > > (const_int 1 [0x1])) [2 %sfp+1 S2 A8]))
Re: [RFC] Exposing complex numbers to target backends
Hi, > [...] > You can find my code in this repo "https://github.com/ElectrikSpace/gcc.git"; > The implementation was originally done against the main dev branch of the KVX > port, > and the working proof of concept is located in the "complex/kvx" branch. > I've also tried to apply the patch of GCC master branch today, with a minimal > adaptation of the x86 backend, in "the complex/x86" branch. > Both branches have the same generic patch (that I will split later) plus a > backend > specific patch. I'm a little bit lost in the x86 backend, so it's not very > usable > for now, but I will work on it. I have sent a little bit earlier a series of patches which describes my implementation of the support for native complex operations. There are 8 generic patches and 1 experimental x86 patch which exploits a portion of the added features. The initial message called "[PATCH 0/9] Native complex operations" [1] explains the implementation details and illustrate it with some examples, mainly on KVX because the backend implements all the features and the ISA has native complex instructions. Some implementation details need to be discussed, especially concerning vectors of complex elements and the merge with the work from Arm. [1] : "https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624644.html"; Best regards, Sylvain
LRA for avr: Handling hard regs set directly at expand
Hi, The avr target has a bunch of patterns that directly set hard regs at expand time, like so (define_expand "cpymemhi" [(parallel [(set (match_operand:BLK 0 "memory_operand" "") (match_operand:BLK 1 "memory_operand" "")) (use (match_operand:HI 2 "const_int_operand" "")) (use (match_operand:HI 3 "const_int_operand" ""))])] "" { if (avr_emit_cpymemhi (operands)) DONE; FAIL; }) where avr_emit_cpymemhi generates (insn 14 13 15 4 (set (reg:HI 30 r30) (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1 (nil)) (insn 15 14 16 4 (set (reg:HI 26 r26) (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1 (nil)) (insn 16 15 17 4 (parallel [ (set (mem:BLK (reg:HI 26 r26) [0 A8]) (mem:BLK (reg:HI 30 r30) [0 A8])) (unspec [ (const_int 0 [0]) ] UNSPEC_CPYMEM) (use (reg:QI 52)) (clobber (reg:HI 26 r26)) (clobber (reg:HI 30 r30)) (clobber (reg:QI 0 r0)) (clobber (reg:QI 52)) ]) "pr53505.c":21:22 -1 (nil)) Classic reload knows about these - find_reg masks out bad_spill_regs, and bad_spill_regs when ORed with chain->live_throughout in order_regs_for_reload picks up r30. LRA, however, appears to not consider that, and proceeds to use such regs as reload regs. For the same source, it generates Choosing alt 0 in insn 15: (0) =r (1) r {*movhi_split} Creating newreg=70, assigning class GENERAL_REGS to r70 15: r26:HI=r70:HI REG_EQUAL r28:HI+0x1 Inserting insn reload before: 58: r70:HI=r28:HI+0x1 Choosing alt 3 in insn 58: (0) d (1) 0 (2) nYnn {*addhi3_split} Creating newreg=71 from oldreg=70, assigning class LD_REGS to r71 58: r71:HI=r71:HI+0x1 Inserting insn reload before: 59: r71:HI=r28:HI Inserting insn reload after: 60: r70:HI=r71:HI ** Assignment #1: ** Assigning to 71 (cl=LD_REGS, orig=70, freq=3000, tfirst=71, tfreq=3000)... Assign 30 to reload r71 (freq=3000) Hard reg 26 is preferable by r70 with profit 1000 Hard reg 30 is preferable by r70 with profit 1000 Assigning to 70 (cl=GENERAL_REGS, orig=70, freq=2000, tfirst=70, tfreq=2000)... Assign 30 to reload r70 (freq=2000) (insn 14 13 59 3 (set (reg:HI 30 r30) (reg:HI 18 r18 [orig:48 ivtmp.10 ] [48])) "pr53505.c":21:22 101 {*movhi_split} (nil)) (insn 59 14 58 3 (set (reg:HI 30 r30 [70]) (reg/f:HI 28 r28)) "pr53505.c":21:22 101 {*movhi_split} (nil)) (insn 58 59 15 3 (set (reg:HI 30 r30 [70]) (plus:HI (reg:HI 30 r30 [70]) (const_int 1 [0x1]))) "pr53505.c":21:22 165 {*addhi3_split} (nil)) (insn 15 58 16 3 (set (reg:HI 26 r26) (reg:HI 30 r30 [70])) "pr53505.c":21:22 101 {*movhi_split} (expr_list:REG_EQUAL (plus:HI (reg/f:HI 28 r28) (const_int 1 [0x1])) (nil))) (insn 16 15 17 3 (parallel [ (set (mem:BLK (reg:HI 26 r26) [0 A8]) (mem:BLK (reg:HI 30 r30) [0 A8])) (unspec [ (const_int 0 [0]) ] UNSPEC_CPYMEM) (use (reg:QI 22 r22 [52])) (clobber (reg:HI 26 r26)) (clobber (reg:HI 30 r30)) (clobber (reg:QI 0 r0)) (clobber (reg:QI 22 r22 [52])) ]) "pr53505.c":21:22 132 {cpymem_qi} (nil)) LRA generates insn 59 that clobbers r30 set in insn 14, causing an execution failure down the line. How should the avr backend deal with this? Regards Senthil
Questions on parallel processing and multi-threading support in GCC's profiling collection
Hi, Jan, I did a little search online and also into GCC’s documentation, and found the following several options to support parallel processing and multi-threading during profiling collection: https://gcc.gnu.org/onlinedocs/gcc-10.5.0/gcc/Instrumentation-Options.html -fprofile-dir=path and -fprofile-reproducible=[multithreaded|parallel-runs|serial] === -fprofile-dir=path ….. When an executable is run in a massive parallel environment, it is recommended to save profile to different folders. That can be done with variables in path that are exported during run-time: %p process ID. %q{VAR} value of environment variable VAR -fprofile-reproducible=[multithreaded|parallel-runs|serial] …. With -fprofile-reproducible=parallel-runs collected profile stays reproducible regardless the order of streaming of the data into gcda files. This setting makes it possible to run multiple instances of instrumented program in parallel (such as with make -j). This reduces quality of gathered data, in particular of indirect call profiling. == I have the following questions on GCC’s profiling collection with parallel processing or multi-threading environment: 1. In addition to the above two options, are there any other changes in GCC to support parallel processing or multi-threading? 2. Is there any documentation on how to collect and use profiling feedback for a parallel processing or multi-threading environment safely and accurately? 3. I noted that -fprofile-dir=path was added in GCC9, and -fprofile-reproducible was added in GCC10, is it doable to add these two options back to GCC8 to Support parallel processing and multi-threading profiling collection? Thanks a lot for your help. Qing
Ju-Zhe Zhong and Robin Dapp as RISC-V reviewers
I am pleased to announce that the GCC Steering Committee has appointed Ju-Zhe Zhong and Robin Dapp as reviewers for the RISC-V port. Ju-Zhe and Robin, can you both updated your MAINTAINERS entry appropriately. Thanks, Jeff
Re: Ju-Zhe Zhong and Robin Dapp as RISC-V reviewers
Thanks Jeff. I will wait after Robin updated his MAINTAINERS (since I don't known what information I need put in). juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-07-18 00:54 To: GCC Development CC: juzhe.zh...@rivai.ai; Robin Dapp Subject: Ju-Zhe Zhong and Robin Dapp as RISC-V reviewers I am pleased to announce that the GCC Steering Committee has appointed Ju-Zhe Zhong and Robin Dapp as reviewers for the RISC-V port. Ju-Zhe and Robin, can you both updated your MAINTAINERS entry appropriately. Thanks, Jeff