Re: LRA for avr: Handling hard regs set directly at expand
On 7/17/23 07:33, senthilkumar.selva...@microchip.com wrote: Hi, The avr target has a bunch of patterns that directly set hard regs at expand time, like so (define_expand "cpymemhi" [(parallel [(set (match_operand:BLK 0 "memory_operand" "") (match_operand:BLK 1 "memory_operand" "")) (use (match_operand:HI 2 "const_int_operand" "")) (use (match_operand:HI 3 "const_int_operand" ""))])] "" { if (avr_emit_cpymemhi (operands)) DONE; FAIL; }) where avr_emit_cpymemhi generates (insn 14 13 15 4 (set (reg:HI 30 r30) (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1 (nil)) (insn 15 14 16 4 (set (reg:HI 26 r26) (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1 (nil)) (insn 16 15 17 4 (parallel [ (set (mem:BLK (reg:HI 26 r26) [0 A8]) (mem:BLK (reg:HI 30 r30) [0 A8])) (unspec [ (const_int 0 [0]) ] UNSPEC_CPYMEM) (use (reg:QI 52)) (clobber (reg:HI 26 r26)) (clobber (reg:HI 30 r30)) (clobber (reg:QI 0 r0)) (clobber (reg:QI 52)) ]) "pr53505.c":21:22 -1 (nil)) Classic reload knows about these - find_reg masks out bad_spill_regs, and bad_spill_regs when ORed with chain->live_throughout in order_regs_for_reload picks up r30. LRA, however, appears to not consider that, and proceeds to use such regs as reload regs. For the same source, it generates Choosing alt 0 in insn 15: (0) =r (1) r {*movhi_split} Creating newreg=70, assigning class GENERAL_REGS to r70 15: r26:HI=r70:HI REG_EQUAL r28:HI+0x1 Inserting insn reload before: 58: r70:HI=r28:HI+0x1 Choosing alt 3 in insn 58: (0) d (1) 0 (2) nYnn {*addhi3_split} Creating newreg=71 from oldreg=70, assigning class LD_REGS to r71 58: r71:HI=r71:HI+0x1 Inserting insn reload before: 59: r71:HI=r28:HI Inserting insn reload after: 60: r70:HI=r71:HI ** Assignment #1: ** Assigning to 71 (cl=LD_REGS, orig=70, freq=3000, tfirst=71, tfreq=3000)... Assign 30 to reload r71 (freq=3000) Hard reg 26 is preferable by r70 with profit 1000 Hard reg 30 is preferable by r70 with profit 1000 Assigning to 70 (cl=GENERAL_REGS, orig=70, freq=2000, tfirst=70, tfreq=2000)... Assign 30 to reload r70 (freq=2000) (insn 14 13 59 3 (set (reg:HI 30 r30) (reg:HI 18 r18 [orig:48 ivtmp.10 ] [48])) "pr53505.c":21:22 101 {*movhi_split} (nil)) (insn 59 14 58 3 (set (reg:HI 30 r30 [70]) (reg/f:HI 28 r28)) "pr53505.c":21:22 101 {*movhi_split} (nil)) (insn 58 59 15 3 (set (reg:HI 30 r30 [70]) (plus:HI (reg:HI 30 r30 [70]) (const_int 1 [0x1]))) "pr53505.c":21:22 165 {*addhi3_split} (nil)) (insn 15 58 16 3 (set (reg:HI 26 r26) (reg:HI 30 r30 [70])) "pr53505.c":21:22 101 {*movhi_split} (expr_list:REG_EQUAL (plus:HI (reg/f:HI 28 r28) (const_int 1 [0x1])) (nil))) (insn 16 15 17 3 (parallel [ (set (mem:BLK (reg:HI 26 r26) [0 A8]) (mem:BLK (reg:HI 30 r30) [0 A8])) (unspec [ (const_int 0 [0]) ] UNSPEC_CPYMEM) (use (reg:QI 22 r22 [52])) (clobber (reg:HI 26 r26)) (clobber (reg:HI 30 r30)) (clobber (reg:QI 0 r0)) (clobber (reg:QI 22 r22 [52])) ]) "pr53505.c":21:22 132 {cpymem_qi} (nil)) LRA generates insn 59 that clobbers r30 set in insn 14, causing an execution failure down the line. How should the avr backend deal with this? Sorry for the big delay with the answer. I was on vacation. There are probably some ways to fix it by changing patterns as other people suggested but I'd like to see the current patterns work for LRA as well. Could you send me the test case on which I could reproduce the problem and work on implementing such functionality.
Re: Where to place warning about non-optimized tail and sibling calls
On 8/1/23 6:08 PM, David Malcolm wrote: FWIW I added it to support Scheme from libgccjit; Do you know of any Scheme using libgccjit? BTW, I tried to build mainline with --enable-coverage to see which code is executed with -foptimize-sibling-calls, but bootstrap fails with /home/lucier/programs/gcc/objdirs/gcc-mainline/./prev-gcc/xg++ -B/home/lucier/programs/gcc/objdirs/gcc-mainline/./prev-gcc/ -B/pkgs/gcc-mainline/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/home/lucier/programs/gcc/gcc-mainline/libstdc++-v3/libsupc++ -L/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fno-checking -gtoggle -DIN_GCC -fprofile-arcs -ftest-coverage -frandom-seed=opts.o -O0 -fkeep-static-functions -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -fno-PIE -I. -I. -I../../../gcc-mainline/gcc -I../../../gcc-mainline/gcc/. -I../../../gcc-mainline/gcc/../include -I../../../gcc-mainline/gcc/../libcpp/include -I../../../gcc-mainline/gcc/../libcody -I../../../gcc-mainline/gcc/../libdecnumber -I../../../gcc-mainline/gcc/../libdecnumber/bid -I../libdecnumber -I../../../gcc-mainline/gcc/../libbacktrace -o opts.o -MT opts.o -MMD -MP -MF ./.deps/opts.TPo ../../../gcc-mainline/gcc/opts.cc ../../../gcc-mainline/gcc/opts.cc: In function 'void print_filtered_help(unsigned int, unsigned int, unsigned int, unsigned int, gcc_options*, unsigned int)': ../../../gcc-mainline/gcc/opts.cc:1687:26: error: ' ' directive output may be truncated writing 2 bytes into a region of size between 1 and 256 [-Werror=format-truncation=] 1687 | "%s %s", help, _(use_diagnosed_msg)); | ^~ ../../../gcc-mainline/gcc/opts.cc:1686:22: note: 'snprintf' output 3 or more bytes (assuming 258) into a destination of size 256 1686 | snprintf (new_help, sizeof new_help, | ~^~~ 1687 | "%s %s", help, _(use_diagnosed_msg)); | ~ cc1plus: all warnings being treated as errors
GCC support for extensions from later standards
Hi everyone! I'm working on libc++ and we are currently discussing using language extensions from later standards (https://discourse.llvm.org/t/rfc-use-language-extensions-from-future-standards-in-libc/71898/4). By that I mean things like using `if constexpr` with `-std=c++11`. GCC has quite a lot of these kinds of conforming extensions, but doesn't document them AFAICT. While discussing using these extensions, the question came up what GCCs support policy for these is. Aaron was kind enough to answer these questions for us on the Clang side. Since I couldn't find anything in the documentation, I thought I'd ask here. So, here are my questions: Do you expect that these extensions will ever be removed for some reason? If yes, what could those reasons be? Would you be interested in documenting them? Aaron noted that we should ask the Clang folks before using them, so they can evaluated whether the extension makes sense, since they might not be aware of them, and some might be broken. So I'd be interested whether you would also like us to ask whether you want to actually support these extensions. Thanks, Nikolas
Re: Where to place warning about non-optimized tail and sibling calls
On Wed, 2023-08-02 at 13:16 -0400, Bradley Lucier wrote: > On 8/1/23 6:08 PM, David Malcolm wrote: > > FWIW I added it to support Scheme from libgccjit; > > Do you know of any Scheme using libgccjit? I don't. It's not Scheme, but in case it's relevant, Emacs is doing ahead-of- time compilation of its Emacs Lisp using libgccjit; see: https://akrl.sdf.org/gccemacs.html > > BTW, I tried to build mainline with --enable-coverage to see which > code > is executed with -foptimize-sibling-calls, but bootstrap fails with > [...snip...] Sorry, I don't have any special knowledge of this build failure. Dave
Re: LRA for avr: Handling hard regs set directly at expand
On Wed, 2023-08-02 at 12:54 -0400, Vladimir Makarov wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the > content is safe > > On 7/17/23 07:33, senthilkumar.selva...@microchip.com wrote: > > Hi, > > > >The avr target has a bunch of patterns that directly set hard regs at > > expand time, like so > > > > (define_expand "cpymemhi" > >[(parallel [(set (match_operand:BLK 0 "memory_operand" "") > > (match_operand:BLK 1 "memory_operand" "")) > >(use (match_operand:HI 2 "const_int_operand" "")) > >(use (match_operand:HI 3 "const_int_operand" ""))])] > >"" > >{ > > if (avr_emit_cpymemhi (operands)) > >DONE; > > > > FAIL; > >}) > > > > where avr_emit_cpymemhi generates > > > > (insn 14 13 15 4 (set (reg:HI 30 r30) > > (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1 > > (nil)) > > (insn 15 14 16 4 (set (reg:HI 26 r26) > > (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1 > > (nil)) > > (insn 16 15 17 4 (parallel [ > > (set (mem:BLK (reg:HI 26 r26) [0 A8]) > > (mem:BLK (reg:HI 30 r30) [0 A8])) > > (unspec [ > > (const_int 0 [0]) > > ] UNSPEC_CPYMEM) > > (use (reg:QI 52)) > > (clobber (reg:HI 26 r26)) > > (clobber (reg:HI 30 r30)) > > (clobber (reg:QI 0 r0)) > > (clobber (reg:QI 52)) > > ]) "pr53505.c":21:22 -1 > > (nil)) > > > > Classic reload knows about these - find_reg masks out bad_spill_regs, and > > bad_spill_regs > > when ORed with chain->live_throughout in order_regs_for_reload picks up r30. > > > > LRA, however, appears to not consider that, and proceeds to use such regs > > as reload regs. > > For the same source, it generates > > > > Choosing alt 0 in insn 15: (0) =r (1) r {*movhi_split} > >Creating newreg=70, assigning class GENERAL_REGS to r70 > > 15: r26:HI=r70:HI > >REG_EQUAL r28:HI+0x1 > > Inserting insn reload before: > > 58: r70:HI=r28:HI+0x1 > > > > Choosing alt 3 in insn 58: (0) d (1) 0 (2) nYnn {*addhi3_split} > >Creating newreg=71 from oldreg=70, assigning class LD_REGS to r71 > > 58: r71:HI=r71:HI+0x1 > > Inserting insn reload before: > > 59: r71:HI=r28:HI > > Inserting insn reload after: > > 60: r70:HI=r71:HI > > > > ** Assignment #1: ** > > > >Assigning to 71 (cl=LD_REGS, orig=70, freq=3000, tfirst=71, > > tfreq=3000)... > > Assign 30 to reload r71 (freq=3000) > > Hard reg 26 is preferable by r70 with profit 1000 > > Hard reg 30 is preferable by r70 with profit 1000 > >Assigning to 70 (cl=GENERAL_REGS, orig=70, freq=2000, tfirst=70, > > tfreq=2000)... > > Assign 30 to reload r70 (freq=2000) > > > > > > (insn 14 13 59 3 (set (reg:HI 30 r30) > > (reg:HI 18 r18 [orig:48 ivtmp.10 ] [48])) "pr53505.c":21:22 101 > > {*movhi_split} > > (nil)) > > (insn 59 14 58 3 (set (reg:HI 30 r30 [70]) > > (reg/f:HI 28 r28)) "pr53505.c":21:22 101 {*movhi_split} > > (nil)) > > (insn 58 59 15 3 (set (reg:HI 30 r30 [70]) > > (plus:HI (reg:HI 30 r30 [70]) > > (const_int 1 [0x1]))) "pr53505.c":21:22 165 {*addhi3_split} > > (nil)) > > (insn 15 58 16 3 (set (reg:HI 26 r26) > > (reg:HI 30 r30 [70])) "pr53505.c":21:22 101 {*movhi_split} > > (expr_list:REG_EQUAL (plus:HI (reg/f:HI 28 r28) > > (const_int 1 [0x1])) > > (nil))) > > (insn 16 15 17 3 (parallel [ > > (set (mem:BLK (reg:HI 26 r26) [0 A8]) > > (mem:BLK (reg:HI 30 r30) [0 A8])) > > (unspec [ > > (const_int 0 [0]) > > ] UNSPEC_CPYMEM) > > (use (reg:QI 22 r22 [52])) > > (clobber (reg:HI 26 r26)) > > (clobber (reg:HI 30 r30)) > > (clobber (reg:QI 0 r0)) > > (clobber (reg:QI 22 r22 [52])) > > ]) "pr53505.c":21:22 132 {cpymem_qi} > > (nil)) > > > > LRA generates insn 59 that clobbers r30 set in insn 14, causing an execution > > failure down the line. > > > > How should the avr backend deal with this? > > > Sorry for the big delay with the answer. I was on vacation. > > There are probably some ways to fix it by changing patterns as other > people suggested but I'd like to see the current patterns work for LRA > as well. > > Could you send me the test case on which I could reproduce the problem > and work on implementing such functionality. > > Thanks for taking your time to look at this. To reproduce the behavior, apply the below patch on master diff --git gcc/config/avr/avr.cc gcc/config/avr/avr.cc index 25f3f4c22e0..a9ab8259339 100644 --- gcc/config/avr/avr.cc +++ gcc/config/avr/avr.cc @@ -1574,6 +1574,9 @@ avr_allocate_stack_slots_for_args (void)