[Bug libgcc/99964] android(bionic) cannot find crti.o and crtn.o on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99964 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- (In reply to cqwrteur from comment #0) > bionic simply does not provide crti.o and crtn.o > https://github.com/aosp-riscv/platform_bionic/tree/master/libc/arch-common/ > bionic That is the RISC-V Android port in progress. It should not be used for anything other than RISC-V, as RISC-V patches might have accidentally broken other targets. Use the official Android sources for Aarch64, or maybe something distributed by Linaro.
[Bug c/100178] Should the “short” be promoted to “int” when use inline asm?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100178 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- This looks similar to 85185. Same problem with a short type not working correctly as an operand to an asm, and RISC-V is not the only target that the code fails for. I would suggest avoiding using char/short with an asm until someone figures out how to fix this. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85185
[Bug bootstrap/97409] riscv cross toolchain build fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97409 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #7 from Jim Wilson --- To follow up on what Andreas mentioned, it appears that the user is trying to use a linux gcc with an embedded elf binutils which will not work. They aren't fully compatible. One of the differences is the set of supported emulations. rohan:2069$ riscv64-unknown-elf-ld -mfoo riscv64-unknown-elf-ld: unrecognised emulation mode: foo Supported emulations: elf64lriscv elf32lriscv rohan:2070$ riscv64-unknown-linux-gnu-ld -mfoo riscv64-unknown-linux-gnu-ld: unrecognised emulation mode: foo Supported emulations: elf64lriscv elf64lriscv_lp64f elf64lriscv_lp64 elf32lriscv elf32lriscv_ilp32f elf32lriscv_ilp32 rohan:2071$ So as Andreas mentioned, you need to use the same target for binutils as for gcc.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org Status|UNCONFIRMED |NEW Last reconfirmed||2020-10-15 Ever confirmed|0 |1 --- Comment #1 from Jim Wilson --- Comparing with the ARM port, I see that in the ARM port, the movqi pattern emits (insn 8 7 9 2 (set (reg:SI 117) (zero_extend:SI (mem/v/c:QI (reg/f:SI 115) [1 active+0 S1 A8]))) "tmp.c\ ":7:7 -1 (nil)) (insn 9 8 10 2 (set (reg:QI 116) (subreg:QI (reg:SI 117) 0)) "tmp.c":7:7 -1 (nil)) and then later it combines the subreg operation with the following zero_extend and cancels them out. Whereas in the RISC-V port, the movqi pattern emits (insn 9 7 10 2 (set (reg:QI 76) (mem/v/c:QI (lo_sum:DI (reg:DI 74) (symbol_ref:DI ("active") [flags 0xc4] )) [1 active+0 S1 A8])) "tmp.c":7:7 -1 (nil)) and then combine refuses to combine the following zero-extend with this insn as the memory operation is volatile. So it seems we need to rewrite the RISC-V port to make movqi and movhi zero extend to si/di mode and then subreg. That probably will require cascading changes to avoid code size and performance regressions. Looks like a tractable small to medium size project, but will need to wait for a volunteer to work on it.
[Bug bootstrap/97409] riscv cross toolchain build fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97409 Jim Wilson changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #8 from Jim Wilson --- Apparent user error, and no response in several days, so closing.
[Bug target/97481] GCC ice when build with RISCV on msys2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97481 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #2 from Jim Wilson --- riscv-unknown-linux-gnu is not a supported target. You must use either riscv32-unknown-linux-gnu or riscv64-unknwon-linux-gnu. Both compilers support both word sizes, the only difference is which is the default word size. This may be part of the reason why it failed. The attachment doesn't show any ICE. It is just a config.log file, and I don't see anything interesting in there. Note that mingw64 builds of a linux toolchain are unlikely to work as glibc requires a case sensitive file system. I'd suggest using WSL2 as something more likely to work, but I haven't tried that myself. It looks like you have a badly broken compiler build. You will need to figure out why it is broken. This doesn't seem to be a gcc problem, but rather a build problem on your side.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #3 from Jim Wilson --- The basic idea here is that the movqi pattern in riscv.md currently emits RTL for a load that looks like this (set (reg:QI target) (mem:QI (address))) As an experiment, we want to try changing that to something like this (set (reg:DI temp) (zero_extend:DI (mem:DI (address (set (reg:QI target) (subreg:QI (reg:DI temp) 0)) The hope is that the optimizer will combine the subreg with following operations resulting in smaller faster code at the end. And this should also solve the volatile load optimization problem. So we need a patch, and then we need experiments to see if the patch actually produces better code on real programs. It should be fairly easy to write the patch even if you don't have any gcc experience. The testing part of this is probably more work than the patch writing. The movqi pattern calls riscv_legitmize_move in riscv.c, so that would have to be modified to look for qimode loads from memory, allocate a temporary register, do a zero_extending load into the temp reg, and then a subreg copy into the target register. You will probably also need to handle cases where both the target and source are memory locations, in which case this already gets split into two instructions, a load followed by a store. You can look at the movqi pattern in arm.md file to see an example of how to do this, where it calls gen_zero_extendqisi2. Though for RISC-V, we would want gen_zero_extendqidi2 for rv64 and gen_zero_extendqisi2 for rv32. If the movqi change works, then we would want similar changes for movhi and maybe also movsi for rv64. It might also be worth checking whether zero-extend or sign-extend is the better choice. We zero extend char by default, so that should be best. For rv64, the hardware sign extends simode to dimode by default, so sign-extend is probably best for that. For himode I'm not sure, I think we prefer sign-extend by default, but that isn't necessarily the best choice for loads. This would have to be tested. You can see rtl dumps by using -fdump-rtl-all. The combiner is the pass that should be optimizing away the unnecessary zero-extend. You can see details of what the combiner pass is doing by using -fdump-rtl-combine-all.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #5 from Jim Wilson --- Yes, the volatile is the problem. We need to disable some optimizations like the combiner to avoid breaking the semantics of volatile. However, if you try looking at other ports, like arm, you can see that they don't have this problem because they generate different RTL at the start and hence do not need the combiner to generate the sign-extended load. So the proposal here is that we modify the RISC-V gcc port to also emit the sign-extended load at RTL generation time, which solves this problem. And then we need to do some testing to make sure that this actually generates good code in every case, as we don't want to accidentally introduce a code size or performance regression while fixing this volatile optimization problem. If you are curious about the combiner issue, see the init_recog_no_volatile call in combine.c. If you comment that out, the andi will be optimized away. But we can't remove that call, because that would break programs using volatile.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #7 from Jim Wilson --- That patch is basically correct. I would suggest using gen_lowpart instead of gen_rtx_SUBREG as a minor cleanup. It will do the same thing, and is shorter and easier to read. There is one problem here that you can't generate new pseudo registers during register allocation, or after register allocation is complete. So you need to disable this optimization in this case. You can do that by adding a check for can_create_pseudo_p (). This is already used explicitly in one place in riscv_legitimize_move and implicitly in some subfunctions, and is used in the arm.md movqi pattern. The next part is testing the patch. We need some correctness testing. You can just run the gcc testsuite for that. And we need some code size/performance testing. I'd suggest compiling some code with and without the patch and check function sizes and look for anything that got bigger with the patch, and check to see if it is a problem. I like to use the toolchain libraries like libc.a and libstdc++.a since they are being built anways, but using a nice benchmark is OK also as long as it is big enough to stress the compiler. If the patch passes testing, then we can look at expanding the scope to handle more modes, and also handle MEM dest in addition to REG dest. Yes, we can discuss this Monday.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #13 from Jim Wilson --- The attachments show the entire riscv.c file being deleted and then readded with your change. This is due to a line ending problem. The original file has the unix style linefeed and the new file has the windows style carriage return and linefeed. Git has a setting called core.autocrlf that can help with this. This will require using git diff instead of just diff though to generate patche, but should give better diffs. Or alternatively, maybe to can force the original file to have msdos line endings before you make the diff, e.g. maybe just loading the original file into your editor and saving it without making changes will fix the line endings. You have in the second patch (mode == QImode || mode == SImode || mode == HImode) which is wrong but harmless for rv32 since we can't extend SImode, and is also wrong for the eventual rv128 support. You can fix this by using something like (GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_SIZE (mode) < UNITS_PER_WORD There is one place in riscv_legitimize_move that already uses this. The code inside the if statement is a lot more verbose then necessary. You can use some helper functions to simplify it. First you can use word_mode which is DImode for rv64 and SImode for rv32 (and will be OImode for rv128). Then you can call a helper to do the mode conversion. So for instance something like temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, 1)); should work. That should emit an insn to do the zero extend and put it in a reg. Now you no longer need to check src mode or TARGET_64BIT as the code is the same in all cases. So it should just be about 3 or 4 lines of code for the body of the if statement. You have a check for REG_P (dest). This is stricter than you need, since it only works for REG and doesn't accept SUBREG. We should handle both. Also, it isn't hard to also handle non-reg or non-subreg dests. You just need to force the source to a reg, and you already did that when you generated the zero_extend operation. So this code looks like it should work for any dest and you should be able to drop the REG_P (dest) check.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #14 from Jim Wilson --- Actually, for the SImode to DImode conversion, zero extending is unlikely to be correct in that case. The rv64 'w' instructions require the input to be sign-extended from 32-bits to 64-bits, so a sign-extend is probably required. There will be a similar consideration for rv128 when we get to that. So maybe we should only handle QImode and HImode here. Or maybe we make the choice of zero-extend or sign-extend depend on the mode, and use zero-extend for smaller than SImode and sign-extend for SImode and larger. For qimode, char is unsigned by default, so zero extension is likely the right choice. For himode, it isn't clear which is best, but the arm port does a zero extend. Also, the Huawei code size proposal says that zero exnteded byte and short loads are more important than sign extended byte and short load, so that is more evidence that zero extend is more useful in those cases.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #15 from Jim Wilson --- I tried testing your first patch. I just built unpatched/patch rv32-elf/rv64-linux toolchains and used nm/objdump to look at libraries like libc, libstdc++, libm. I managed to find a testcase from glibc that gives worse code with the patch. struct { unsigned int a : 1; unsigned int b : 1; unsigned int c : 1; unsigned int d : 1; unsigned int pad1 : 28; } s; void sub (void) { s.a = 1; s.c = 1; } Without the patch we get sub: lui a5,%hi(s) addia5,a5,%lo(s) lbu a4,1(a5) ori a4,a4,5 sb a4,1(a5) ret and with the patch we get sub: lui a4,%hi(s) lbu a5,%lo(s)(a4) andia5,a5,-6 ori a5,a5,5 sb a5,%lo(s)(a4) ret Note the extra and instruction. This seems to be a combine problem. With the patched compiler, I see in the -fdump-rtl-combine-all dump file Trying 9 -> 11: 9: r79:DI=r78:DI&0xfffa REG_DEAD r78:DI 11: r81:DI=r79:DI|0x5 REG_DEAD r79:DI Failed to match this instruction: (set (reg:DI 81) (ior:DI (and:DI (reg:DI 78 [ MEM [(struct *)&sD.1491] ]) (const_int 250 [0xfa])) (const_int 5 [0x5]))) Combine knows that reg 78 only has 8 nonzero bits, so it simplified the AND -6 to AND 250. If combine had left that constant alone, or if maybe combine propagated that info about nonzero bits through to r81, then it should have been able to optimize out the AND operation. This does work when the load does not zero extend by default. The ARM port shows the exact same problem. I see sub: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movwr2, #:lower16:.LANCHOR0 movtr2, #:upper16:.LANCHOR0 ldrbr3, [r2]@ zero_extendqisi2 bic r3, r3, #5 orr r3, r3, #5 strbr3, [r2] bx lr and the bic (bit clear) is obviously unnecessary. This probably should be submitted as a separate bug if we don't want to fix it here.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #17 from Jim Wilson --- Yes, LOAD_EXTEND_OP is a good suggestion. We can maybe do something like int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND); temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, extend));
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #20 from Jim Wilson --- Maybe convert_to_mode is recursively calling convert_to_mode until you run out of stack space. You can use --save-temps to generate a .i preprocessed file form your input testcasxe, then run cc1 under gdb. You need to give the default arch/abi options or you will get an error (gdb) run -march=rv64gc -mabi=lp64d -O2 tmp.i when look at the backtrace when you hit the segfault. There are other ways to get the zero extend we need here. We could just generate the RTL for the insn directly instead of calling the gen_zero_extendXiYi2 functions because we know that they exist and what form they are in. This is inconvenient though. We could try querying the optabs table to get the right insn code. There is a gen_extend_insn function that looks like it would work and is more direct than convert_to_mode. gen_extend_insn does require that the input is in a form that the convert will accept, so it must be forced to a register if it isn't already a register. Another solution would be to use one of the rtx simplier functions, e.g. you can do simplify_gen_unary (ZERO_EXTEND, word_mode, src, mode); but the simplify functions are really for simplifying complex RTL to simpler RTL, so I think not the right approach here. I think gen_extend_insn is the best approach if we can't use convert_to_mode.
[Bug rtl-optimization/97747] New: missed combine opt with logical ops after zero extended load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97747 Bug ID: 97747 Summary: missed combine opt with logical ops after zero extended load Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- Consider this testcase struct { unsigned int a : 1; unsigned int b : 1; unsigned int c : 1; unsigned int d : 1; unsigned int pad1 : 28; } s; void sub (void) { s.a = 1; s.c = 1; } Compiling with -O2 -S for ARM I get sub: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movwr2, #:lower16:.LANCHOR0 movtr2, #:upper16:.LANCHOR0 ldrbr3, [r2]@ zero_extendqisi2 bic r3, r3, #5 orr r3, r3, #5 strbr3, [r2] bx lr The bic bit-clear instruction is obviously unnecessary. In the combine dump file I see that we have (insn 9 7 11 2 (set (reg:SI 120) (and:SI (reg:SI 119 [ MEM [(struct *)&sD.5619] ]) (const_int -6 [0xfffa]))) "tmp.c":13:7 90 {*arm_andsi3_insn} (expr_list:REG_DEAD (reg:SI 119 [ MEM [(struct *)&sD.5619] ]) (nil))) (insn 11 9 13 2 (set (reg:SI 122) (ior:SI (reg:SI 120) (const_int 5 [0x5]))) "tmp.c":13:7 106 {*iorsi3_insn} (expr_list:REG_DEAD (reg:SI 120) (nil))) And the combiner does: Trying 9 -> 11: 9: r120:SI=r119:SI&0xfffa REG_DEAD r119:SI 11: r122:SI=r120:SI|0x5 REG_DEAD r120:SI Failed to match this instruction: (set (reg:SI 122) (ior:SI (and:SI (reg:SI 119 [ MEM [(struct *)&sD.5619] ]) (const_int 250 [0xfa])) (const_int 5 [0x5]))) The problem here is that the ARM port generated a zero_extend for the load byte, so combine knows that r120 has only 8 nonzero bits, it modified the -6 to 250 and then fails to notice that the and operation can be folded away because in SImode the operation is no longer redundant with the modified constant. On targets that do not generate the zero_extend, the and -6 operation gets optimized away in combine. For instance, with the current RISC-V port I get sub: lui a4,%hi(s) lbu a5,%lo(s)(a4) ori a5,a5,5 sb a5,%lo(s)(a4) ret This likely fails on any target where movqi generates a zero extended load.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #21 from Jim Wilson --- I submitted my testcase as 97747 so it will get more attention.
[Bug middle-end/94083] inefficient soft-float x!=Inf code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94083 The original bug report was apparently lost in the sourceware/gcc migration back in the spring and I didn't notice until now. This testcase int foo(void) { volatile float f, g; intn; f = __builtin_huge_valf(); g = __builtin_huge_valf(); n += 1 - (f != __builtin_huge_valf()); return n; } compiled for soft-float with -O2, and looking at the original tree dump I see f = Inf; g = Inf; SAVE_EXPR ;, n = SAVE_EX PR + n;; So the C front end converted the f != Inf compare to a f u<= compare, but the problem here is that the != operation is a single libcall, but u<= is two libcalls. So code that should have a single soft-float libcall ends up with two. First a call to __unordsf2, then a compare and branch, and then a call to __lesf2. This is a de-optimization. Perhaps we can convert the f u<= back to f != Inf in the optimization to get a single libcall. Or maybe we can add unordered soft-float libcalls like ulesf2.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #30 from Jim Wilson --- Looking at your v2 patch, the first verison fails because you are passing mismatched modes to emit_move_insn. The version with gen_lowpart solves that problem, but fails because of infinite recursion. The v4 patch looks good. There are some coding style issues, but I can always fix them for you. There should be a space before open paren. The first && should line up with the last two. The braces should be indented two more spaces. The comment needs to end with a period and two spaces. In the comment, instead of "Separate ... to ..." I'd say "Expand ... to ..." or maybe "Emit ... as ...". Now we need the copyright assignment paperwork.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #33 from Jim Wilson --- I did say that I'm willing to fix code style issues. All major software projects I have worked with have coding style conventions. It makes it easier to maintain a large software base when everything is using the same coding style. We do have info to help you follow the GNU coding style. See https://gcc.gnu.org/wiki/FormattingCodeForGCC which has emacs and vim settings to get the right code style.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #35 from Jim Wilson --- By combine issue, are you referring to the regression I mentioned in comment 3 and filed as bug 97747? We can handle that as a separate issue. It should be uncommon. I expect to get much more benefit from this patch than the downside due to that combine issue. As for the shorten-memrefs problem, I didn't notice that one in my testing. It does need to be fixed. Taking a look now, it looks pretty simple to fix. The code currently looks for MEM, it needs to handle (SIGN_EXTEND (MEM)) and ((ZERO_EXTEND (MEM)). See the get_si_mem_base_reg function. You need to skip over the sign_extend or zero_extend when looking fot the mem at the first call. Then at the second call you need to be careful to put the sign_extend or zero_extend back when creating the new RTL. Maybe just another arg to get_si_mem_base so it can record the parent rtx code of the mem. Or maybe do this outside get_si_mem_base to skip over a sign/zero extend at the first call, and then do the same at the second call but record what rtx we skipped over so that we can put it back. We can either handle this here or as another patch. But since you have some time while waiting for paperwork maybe you can try writing a fix.
[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #40 from Jim Wilson --- If you look at riscv.opt you will see that the -mshorten-memrefs option sets the variable riscv_mshorten_memrefs. If you grep for that, you will see that it is used in riscv_address_cost in riscv.c. I believe it is this change to the address cost that is supposed to prevent the recombination back into addresses that don't fit in compressed instructions. So you need to look at why this works in the current code, but not with your zero/sign extend load patch. Maybe there is something about the rtx costs for a regular load versus a zero/sign extend load that causes the problem. In the combine dump where it says "original costs" and "replacement costs" that is where it is using rtx_cost and riscv_address_cost. The replacement cost should be more than the original cost to prevent the recombination. You should see that if you look at the combine dump for the unpatched compiler.
[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #42 from Jim Wilson --- riscv_address_cost is a hook, so it is targetm.address_cost which is only called from address_cost which is only called in a few places one of which is in postreload.c so that is the one I would look at first. This is in try_replace_in_use which is called from reload_combine_recognize_const_pattern which is trying to put offsets back into mems which is exactly what we don't want here. This suggests that containing_mem isn't getting set when we have a sign/zero extend. It should get set in reload_combine_note_use.
[Bug middle-end/98227] [11 Regression] ICE: tree check: expected tree that contains 'decl common' structure, have 'constructor' in get_section, at varasm.c:297 on riscv64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98227 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #4 from Jim Wilson --- This should already be fixed. I ran into the same problem in my testing and committed a patch today that HJ helpfully pointed at. I don't test ada builds normally. I can try that to verify.
[Bug middle-end/98227] [11 Regression] ICE: tree check: expected tree that contains 'decl common' structure, have 'constructor' in get_section, at varasm.c:297 on riscv64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98227 --- Comment #5 from Jim Wilson --- My bootstrap with ada succeeded. I used the same configure options except for --prefix. make check is still running.
[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #50 from Jim Wilson --- The combine change is inconvenient. We can't do that in stage3, and it means we need to make sure that this doesn't break other targets. If the combine change is a good idea, then I think you can just modify is_just_move rather than add a new function. Just skip over a ZERO_EXTEND or SIGN_EXTEND before the the general_operand check. We might need a mode check against UNITS_PER_WORD since extending past the word size is not necessarily a simple move. So the problem stems from the code in combine that is willing to do a 2->2 split if neither original instruction is a simple move. When we add a SIGN_EXTEND or ZERO_EXTEND they aren't considered simple moves anymore. There is still the question of why the instruction cost allows the change. First I noticed that riscv_address_cost has a hook to check for shorten_memrefs but that riscv_rtx_costs isn't calling it. It uses riscv_address_insns instead. So it seems like adding a shorten_memref check to the MEM case riscv_rtx_costs might solve the problem. But riscv_compressed_lw_address_p has a !reload_completed check, and combine runs before reload, so that returns the same result for both the new and old insns. I think that reload_completed check isn't quite right. If we have a pseudo-reg, then we should allow that, but if we have an offset that is clearly not compressible, then we should reject it before reload. So I think the reload_completed check should be moved down to where it checks for a compressible register. With those two changes, I can make the testcase work without a combine change. Though since I changed how shorten_memrefs works we need a check to make sure this patch doens't change code size. I haven't tried to do that yet. With my changes, in the combine dump, I see Successfully matched this instruction: (set (reg/f:DI 92) (plus:DI (reg:DI 96) (const_int 768 [0x300]))) Successfully matched this instruction: (set (reg:DI 82 [ MEM[(intD.1 *)array_5(D) + 800B] ]) (sign_extend:DI (mem:SI (plus:DI (reg:DI 96) (const_int 800 [0x320])) [1 MEM[(intD.1 *)array_5(D) + 800B]+0 S4 A32]))) rejecting combination of insns 27 and 6 original costs 4 + 16 = 20 replacement costs 4 + 20 = 24 so now the replacement gets rejected because of the increased address cost.
[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #51 from Jim Wilson --- Created attachment 49773 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49773&action=edit untested fix to use instead of levy's combine.c patch Needs testing without Levy's patch to make sure it doesn't accidentally increase code size.
[Bug target/96700] undefined reference to `failure_on_line_796'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96700 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- The testcase is not strictly valid C code, as it has references to undefined funtions. The testcase was written this way to check for a specific compiler optimization. If the testcase is used correctly, and the optimization succeeds, then the testcase links. If the testcase is used correctly, and the optimization fails, then the testcase gives link errors. Also, if you use the testcase incorrectly, i.e. the wrong optimziation level, it can give link errors. The testcase as written can't work with -flto, but does correectly verify whether the optimization works, so it is OK, but could probably be improved. This looks like more of an enhancement request than a bug. This is not a riscv specific problem.
[Bug libstdc++/97182] Add support for targets that only define SYS_futex_time64 and not SYS_futex
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97182 --- Comment #5 from Jim Wilson --- I see that a riscv64-linux libgomp.so has mutex calls and no obvious futex calls. Though I find it a little curious that futex support isn't auto-detected. There is already config/futex.m4 to detect futex support, and libstdc++ is using it. So yes, we are missing riscv*-linux futex support in libgomp.
[Bug bootstrap/97183] zstd build failure for gcc 10 on Ubuntu 16.04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97183 --- Comment #3 from Jim Wilson --- I installed Ubuntu 16.04 on an old laptop so I can directly reproduce the build failure. Checking for the zstd version looks like the easier patch. Checking for specific macros and functions might be better, but what do we do with the info? I'm not familiar with zstd or the lto-compress.c code, and it doesn't look like we can easily change it to not use certain macros/functions when they don't exist. Checking for half a dozen macros/functions and disabling all of the code if one or more is missing seems silly. Though looking at this, as Kito pointed out, it appears that ZSTD_getFrameContentSize() is the most recently added function that we are using, so maybe we only need to check for that one. That was added in version 1.3.0. Or we can check for the 1.3.0 version which seems simpler. I do know that version 1.3.3 of zstd works as this is what Ubuntu 18.04 has. I haven't tried the versions 1.3.[012].
[Bug target/96700] undefined reference to `failure_on_line_796'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96700 Jim Wilson changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #3 from Jim Wilson --- I took another look at this. I can get this to fail with an x86_64 gcc, but that is the system compiler which is gcc-7. A little testing shows that the testcase works for x86_64 gcc-8 and later and fails for gcc-7. However, since this is a testcase for gcc-11, there is no expectation that it should work with older gcc versions. We sometimes need to modify testcases to work when gcc changes. I also tried this with rv32-elf compilers, gcc-11, gcc-10, gcc-9, and I couldn't get the testcase to fail for any of them, with or without -flto. So I can't reproduce the original poster's problem, which gets me back to my original claim that I don't think that this is a gcc bug. Actually, on second thought, I realized that I'm using the original testcase from the gcc source tree gcc.dg/tree-ssa/buitlin-sprintf.c not the attachment. Looking at the attachment, I see that some numbers have been modified. So that is why it fails. It is a bogus testcase. Looking further, I see that you are using an old version of the testcase before it was changed in 2018. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86274 which both changed an optimization and changed the testcase to match the new optimization. The fix was put on mainlnie (gcc-9) and backported to gcc-8, so the testcase should fail for any gcc-8 or later. This is invalid. You can't take a testcase from gcc-X and expect it to work with gcc-Y.
[Bug bootstrap/97183] zstd build failure for gcc 10 on Ubuntu 16.04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97183 --- Comment #7 from Jim Wilson --- Fixed on mainline and the gcc-10 branch.
[Bug bootstrap/97183] zstd build failure for gcc 10 on Ubuntu 16.04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97183 Jim Wilson changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #8 from Jim Wilson --- Remember to set it to resolved/fixed this tine,
[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #52 from Jim Wilson --- I did some simple testing of my patch alone. I modified the riscv-gnu-toolchain makefile to use -Os to build all libraries, built a rv32imac/ilp32 toolchain, and looked at library code size. I see a few cases where my patch gives smaller code, and no obvious cases where it gives larger code, so I think it is OK. I haven't tried a full test with my patch combined with Levy's patch minus the combine change.
[Bug target/98596] registers not reused on RISC-V
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98596 Jim Wilson changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2021-01-14 CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- Part of the problem seems to be the use of volatile, as we disable optimizations inside volatile operations, and both constants are used inside volatile operations. But a bigger issue seems to be how we calculate constant costs. In riscv_rtx_costs we have /* If the constant is likely to be stored in a GPR, SETs of single-insn constants are as cheap as register sets; we never want to CSE them. */ if (cost == 1 && outer_code == SET) *total = 0; which tells the compiler that constants are cheaper than registers. If I change that to "*total = 1;" then the two constants get optimized. Changing this cost means we will likely extend register lifetimes and increase register pressure, which may reduce performance in some applications. We would need a lot of tersting to see what happens. We are already computing a cost of 0 for constants in the arithmetic immediate range, so setting costs to 0 here seems unnecessary, but it is hard to predict what might happen with this change. There might be something else I'm missing here.
[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- The extra move instruction is a side effect of how the riscv64 toolchain handles 32-bit arithmetic. We lie to the compiler and tell it that we have instructions that produce 32-bit results. In fact, we only have instructions that produce 64-bit sign-extended 32-bit results. The lie means that the RTL has some insns with SImode output and some instructions with DImode outputs, and sometimes we end up with nop moves to convert between the modes. In this case, it is peephole2 after regalloc that notices a SImode add followed by a sign-extend, and converts it to a sign-extending 32-bit add followed by a move, but can't eliminate the move because we already did register allocation. This same problem is also why we get the unnecessary sext after the label, as peephole can't fix that. This problem has been on my todo list for a few years, and I have ideas of how to fix it, but I have no idea when I will have time to try to fix it. I did document it for the RISC-V International Code Speed Optimization task group. https://github.com/riscv/riscv-code-speed-optimization/blob/main/projects/gcc-optimizations.adoc This one is the first one in the list.
[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 --- Comment #3 from Jim Wilson --- I suppose cost model problems could explain why combine didn't do the optimization. I didn't have a chance to look at that. I still think there is a fundmental problem with how we represent SImode operations, but again cost model problems could explain why my experiments to fix that didn't work as expected. I probably didn't look at that when I was experimenting with riscv.md changes. Your patch does look useful, but setting cost to 1 for MULT is wrong, and would be just as wrong for DIV. That is OK for PLUS, MINUS, and NEG though. I think a better option is to set *total = 0 and return false. That gives no extra cost to the sign extend, and recurs to get the proper cost for the operation underneath. That would work for MUL and DIV. I found code in the rs6000 port that does this.
[Bug rtl-optimization/99067] Missed optimization for induction variable elimination
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99067 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- This looks similar to an ivopts problem I looked at regarding coremark. Given this testcase void matrix_add_const_unsigned(unsigned int N, short *A, short val) { unsigned int i,j; for (i=0; i
[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Jim Wilson changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #62 from Jim Wilson --- I committed my shorten memrefs patch and Levy's patch minus the combine change. I don't believe that we need the combine change. I did notice one interesting case where we get unnecesssary zero extends. I will submit that as a new bug report. I also noticed with riscv64-linux testing that some -gsplit-dwarf tests fail, but this turns out to be a latent bug in the split-dwarf support. I will submit that as a new bug report. I believe this is fixed, so closing as resolved.
[Bug target/99089] New: unnecessary zero extend before AND
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99089 Bug ID: 99089 Summary: unnecessary zero extend before AND Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- Given this testcase extracted from newlib struct s { short s; int i; }; extern void sub2 (void); extern void sub3 (void); int sub (struct s* t) { short i = t->s; if ((i & 0x8) == 0) t->s |= 0x800; if ((t->s & 0x3) == 0) sub3 (); t->i = i; return 0; } compiling for riscv32-elf with -O2 -S and looking at assembly code, I see two places where we have sllia5,a5,16 srlia5,a5,16 andia5,a5,3 The zero extend before the AND is clearly unnecessary. It seems a complex set of circumstances leads to here. The tree level optimizer extends the lifetime of the first zero extend into a phi, which means the operation is split across basic blocks. This also means no REG_DEAD note and no combine. It isn't until 309 bbro that the zero extend and AND end up back in the same basic block again, and that is too late to optimize it as nothing after 309 bbro can fix this. So it appears we need a global rtl pass that can notice and fix redundant zero extends feeding into AND operations across basic blocks.
[Bug debug/99090] New: gsplit-dwarf broken on riscv64-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99090 Bug ID: 99090 Summary: gsplit-dwarf broken on riscv64-linux Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- Enabling -gsplit-dwarf by default and trying a build hits an assert in dw2_asm_output_delta_uleb128 because HAVE_AS_LEB128 is not defined. The problem appears to be in output_loc_list in dwarf2out.c which has in the dwarf_split_debug_info code /* FIXME: This will ICE ifndef HAVE_AS_LEB128. For that case we probably need to emit DW_LLE_startx_endx, but we'd need 2 .debug_addr entries rather than just one. */ riscv doesn't allow leb128 because of agressive linker relaxation, so we need the alternative solution here that works without HAVE_AS_LEB128.
[Bug target/99089] unnecessary zero extend before AND
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99089 --- Comment #2 from Jim Wilson --- I don't know if REE can do this optimization, but it is a good place to start looking.
[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 --- Comment #4 from Jim Wilson --- With this testcase extern void sub2 (void); void sub (int *i, int *j) { int k = *i + 1; *j = k; if (k == 0) sub2 (); } Compiling without the riscv_rtx_cost patch, I get lw a5,0(a0) addiw a4,a5,1 sw a4,0(a1) beq a4,zero,.L4 compiling with the riscv_rtx_cost patch, I get lw a5,0(a0) addiw a4,a5,1 sw a4,0(a1) addiw a5,a5,1 beq a5,zero,.L4 The problem here is that we have RTL (insn 9 7 10 2 (set (reg:SI 76) (plus:SI (subreg:SI (reg:DI 78 [ *i_4(D) ]) 0) (const_int 1 [0x1]))) "tmp.c":6:7 3 {addsi3} (expr_list:REG_DEAD (reg:DI 78 [ *i_4(D) ]) (nil))) (insn 10 9 11 2 (set (reg/v:DI 73 [ k ]) (sign_extend:DI (reg:SI 76))) "tmp.c":6:7 90 {extendsidi2} (nil)) with the SImode 76 used for the sw and the DImode 73 used for the beq. With the riscv_rtx_cost change, which makes the sign_extend add cheap, combine folds the add into the sign-extend to get (insn 9 7 10 2 (set (reg:SI 76) (plus:SI (subreg:SI (reg:DI 78 [ *i_4(D) ]) 0) (const_int 1 [0x1]))) "tmp.c":6:7 3 {addsi3} (nil)) (insn 10 9 11 2 (set (reg/v:DI 73 [ k ]) (sign_extend:DI (plus:SI (subreg:SI (reg:DI 78 [ *i_4(D) ]) 0) (const_int 1 [0x1] "tmp.c":6:7 5 {*addsi3_extended} (expr_list:REG_DEAD (reg:DI 78 [ *i_4(D) ]) (nil))) and now we have two adds which is wrong. Without the patch, combine does nothing, then ree manages to merge the sign_extend into the add and subseg the sw, so we end up with only the one addw and no explicit sign extend. My take on this is that we never should have generated the SImode add in the first place, because we don't actually have that instruction. If we would have generated the sign-extended add at rtl generation time, and subreged the result, then we would have gotten the right result. In theory. So I think the riscv_rtx_cost change is useful, but only if combined with the rtl generation change I proposed in comment 1.
[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 --- Comment #5 from Jim Wilson --- Neither of the two patches I mentioned in comment 1 can fix the problem by themselves, as we still have a mix of SImode and DImode operations. I looked at REE. It doesn't work because there is more than one reaching def. But even if it did work, I don't think it would completely solve the problem because it runs after register allocation and hence won't be able to remove move instructions. To get the best result, we need the register allocator to take two registers with different modes with overlapping live ranges, and realize that they can be allocated to the same hard reg because the overlapping uses are non-conflicting. I haven't tried looking at the register allocator, but it doesn't seem like a good way to try to solve the problem. We have an inconvenient mix of SImdoe and DImode because we don't have SImode compare and branch instructions. That requires sign extending 32-bit values to 64-bit to compare them, which then results in the sign extend and register allocation optimization issues. it is unlikely that 32-bit compare and branch instructions will be added to the ISA though. One useful thing I noticed is that the program is doing a max operation, and the B extension adds a max instruction. Having one instruction instead of a series of instructions including a branch to compute max makes the optimization issues easier, and gcc does give the right result in this case. Using a compiler with B support I get lw a4,0(a5) lw a2,0(a3) addia5,a5,4 addia3,a3,4 addwa4,a4,a2 max a0,a4,a0 bne a5,a1,.L2 which is good code with the extra moves and sign-extends removed. So I have a workaround of sorts, but only if you have the B extension. The -mtune=sifive-7-series can support conditional move via macro fusion, I was hopeful that this would work as well as max, but unfortunately the sign-extend that was removed in the max case does't get removed in the conditional move case. Also, the conditional move is 2-address, and the register allocator ends up needing a reload, which gives us the unwanted mv again. So the code in this case is the same as without the option. I didn't check to see if this is fixable.
[Bug tree-optimization/94092] Code size and performance degradations after -ftree-loop-distribute-patterns was enabled at -O[2s]+
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092 --- Comment #6 from Jim Wilson --- Trying Alex's patch, it doesn't work on this testcase. One problem is that the loop bound is unknown, so the memset size is estimated as max 0x1fffc bytes. The code calls default_use_by_pieces_infrastructure_p which wants the total number of instructions to be less than SET_RATIO which is 16. The total number of instructions is computed as 0x1fffc. So the attempt fails. There is a contributing problem here that the alignment of the dest was computed as 8-bits, even though we have a pointer to int which has 32-bit alignment on a strict alignment system. That problem happens in expand_builtin_memset_args which calls get_pointer_alignment on dest. DEST is a SSN_NAME, and the SSA_NAME pointer info says align is 0, so it defaults to 8-bit byte alignment. I haven't tried to figure out where that went wrong. The patch does work as advertised on libgcc soft-float code. For an rv32i build I see the memset calls in libgcc.a reduced from 31 to 10.
[Bug debug/99090] gsplit-dwarf broken on riscv64-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99090 --- Comment #2 from Jim Wilson --- Yes we could have partial uleb128 support. There is only a problem if at least one label is in the code section. There is another proposed solution to add special relaxable relocations for uleb128 but the initial proposal had flaws, and no one has reviewed the second proposal yet. Or we could change the -gsplit-dwarf support to work even when there is no uleb128 support.
[Bug debug/99090] gsplit-dwarf broken on riscv64-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99090 --- Comment #5 from Jim Wilson --- I tested it with a riscv-gnu-toolchain build and check. The 4 -gsplit-dwarf testcases that fail without the patch work with the patch. I also tried a build and check with -gsplit-dwarf enabled by default and discovered that there are a number of debug tests that fail simply because the output is a little different than what is expected. But nothing else appeared to fail.
[Bug target/100316] Regression: __clear_cache() does not support NULL-pointer arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100316 Jim Wilson changed: What|Removed |Added CC||aoliva at gcc dot gnu.org, ||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- Constant addresses are VOIDmode, and it only accepts Pmode and ptr_mode. It should accept VOIDmode too. For instance cse_lookup_mem does this addr_mode = GET_MODE (XEXP (x, 0)); if (addr_mode == VOIDmode) addr_mode = Pmode; This stems from changes that Alexandre Oliva made. Adding him to the cc list.
[Bug target/100348] RISC-V extra pointer adjustments for memcpy() from glibc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100348 Jim Wilson changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2021-04-30 Ever confirmed|0 |1
[Bug target/100348] RISC-V extra pointer adjustments for memcpy() from glibc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100348 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- Most glibc targets already have an assembly language memcpy.S implementation. RISC-V should also. This is likely the most effective solution to the problem. We should fix the optimizer problem if possible of course. It is just that the glibc solution is likely easier than the gcc solution.
[Bug target/86387] [RISCV][ABI] GCC fails to sign/zero-ext integers as necessary for passing/returning int+fp structs on hard-float ABIs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86387 Jim Wilson changed: What|Removed |Added Resolution|--- |WONTFIX Status|ASSIGNED|RESOLVED --- Comment #3 from Jim Wilson --- This was fixed with a psABI change in Nov 2018 and we forgot to update this bug report. https://github.com/riscv/riscv-elf-psabi-doc/pull/74
[Bug rtl-optimization/100264] REE does not work on PARALLEL expressions with a single register SET child
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100264 Jim Wilson changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||wilson at gcc dot gnu.org Resolution|--- |FIXED --- Comment #3 from Jim Wilson --- Fixed on mainline.
[Bug c/98892] FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98892 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #2 from Jim Wilson --- I ran into this with a riscv toolchain make check. Debugging, I discovered that if my terminal window is 80 characters wide, then context->max_caret_width gets set to 78 in diagnostic.c, and then in diagnostic-show-locus.c m_x_offset_display gets set to 9, and then in print_trailing_fixits it calls move_to_column with column set to 9 which causes an extra unnecessary newline to be emitted. This extra newline leads to the "1 blank lines in output" error. Also, I suspect that there is a contributing bug in the dg-{begin,end}-multiline-output support that causes it to fail when an error is followed by two newlines. This causes the "expected multiline pattern lines 550-551" failure. A bit of experimenting with the testcase in isolation shows that if my screen is 88 chars wide there is no extra newline, and if my screen is 87 chars wide I get the extra newline. A bit of experimenting with RUNTESTFLAGS="plugin.exp" show that the testcase fails every time when my screen is 80 characters wide and works everytime when my screen is 81 characters wide. I don't know why this number is different than above. A testcase whose success depends on screen width is very annoying. One way to work around the problem is to make the line shorter. If I delete one tab char from line 548 of gcc.dg/plugin/diagnostic-test-expressions-1.c and 8 space characters from the two matching error lines 550 and 551, then the testcase does work with an 80 character wide terminal. However, it does now fail with a 77 char wide screen due to a different line 415-416. So this isn't an ideal solution. But it works for me as my screens are always 80 characters wide. ideally there should be a way to turn off the max_caret_width stuff when running the testsuite so results don't depend on screen width. Someone who knows the diagnostic code should probably look at this. Or maybe print_trailing_fixits can be fixed? It isn't obvious why it needs to call move_to_column at the end.
[Bug c/98892] FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98892 --- Comment #3 from Jim Wilson --- On second thought, I don't think that there is anything wrong with dg-*-multiline-output. The problem is simply that the diagnostic code is left shifting the error message by m_x_offset_display, and this left shit causes the printed message to not match the expected message. The difference is only a few chars of white space which makes it very hard to see the problem when inspecting the test log. You have to count white space characters to see the problem. In my case, there is one less space char in the printed message than in the expected message. And it looks like the solution is -fmessage-length=0 which is already added to ALWAYS_CXXFLAGS and should maybe be added to ALWAYS_TEST_FLAGS instead. Or maybe just added to this one testcase for now to make it work.
[Bug c/98892] FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98892 --- Comment #4 from Jim Wilson --- It turns out that -fmessage-length=0 doesn't work which is odd. I suspect a latent bug in the diagnostic code. I tried -fmessage-length=128, which should work as that is longer than the error line, and does work for this failure, but that causes a different line to fail. Turns out that there is a test that emits a 2.3KB error message. To make the entire testcase work, I need a -fmessage-length value longer than the longest error message, so the smallest power of 2 that works is -fmessage-length=4096 which seems too stupid to submit as a patch.
[Bug target/103889] gccgo is unable to find its standard library by default on 64-Bit RISC-V due to musl not using multilib but still uses t-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103889 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #14 from Jim Wilson --- The RISC-V ABI defines 6 different ABIs currently (not including rv32e support), so the old /lib and /lib64 distinction isn't enough. We define 6 different directories where the libraries can be, depending on the ABI in use. For rv64/lp64d, the most common one, they go in /lib64/lp64d. Desktop linux does not use all of these ABIs, but embedded linux does. Alibaba/T-Head incidentally has written linux kernel and qemu support for running 32-bit code on a rv64 kernel, but I don't know if they have hardware that supports this yet, and don't know if they are multilibbing/multiarching yet. I think it is only a matter of time before we have to support both 32-bit and 64-bit code on rv64 systems. Anyways, the first thing I would try is to make some links. You can make /lib64 a link to /lib, and make a /lib/lp64d link that points back at lib. Hence /lib64/lp64d will point at /lib. Likewise for /usr/lib64/lp64d to point at /usr/lib. Then see if gccgo works. Some linux distros already have these links. Adding these links to the OS might be the simplest solution if it works. If you don't link the symlink idea, then we probably need a new t-linux-musl file. And if you are using GNU Binutils, then you need GNU ld changes, since it is hardwired to use the same dirs. And if you are using glibc then you need glibc changes too, but you mentioned musl so this doesn't apply to you, but would apply to others if they want the same kind of changes for another linux distro. On a Fedora RISC-V system [wilson@fedora-riscv /]$ ls -lt lib64 lrwxrwxrwx. 1 root root 9 Aug 12 2020 lib64 -> usr/lib64 [wilson@fedora-riscv /]$ cd /usr/lib64 [wilson@fedora-riscv lib64]$ ls -lt lp64d lrwxrwxrwx. 1 root root 1 Aug 4 07:18 lp64d -> . [wilson@fedora-riscv lib64]$ So both /lib64/lp64d and /usr/lib64/lp64d work, and point to the same place.
[Bug target/103302] wrong code with -fharden-compares
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #2 from Jim Wilson --- It is the second reversed comparison that is wrong. This is the u32_0 <= (...) on the first line of foo0. In the assembly file, this ends up as mv a0,a6 mv a1,a7 xor a6,a0,a6 xor a7,a1,a7 or a6,a6,a7 seqza6,a6 and note that it is comparing a value against itself when it should be comparing two different values. The harden compare pass is generating RTL insn 156 152 155 6 (set (reg:TI 201) (asm_operands:TI ("") ("=g") 0 [ (reg:TI 77 [ _8 ]) ] [ (asm_input:TI ("0")) ] [])) -1 (nil)) (insn 155 156 153 6 (clobber (reg:TI 77 [ _8 ])) -1 (nil)) (insn 153 155 154 6 (set (subreg:DI (reg:TI 77 [ _8 ]) 0) (subreg:DI (reg:TI 201) 0)) -1 (nil)) (insn 154 153 160 6 (set (subreg:DI (reg:TI 77 [ _8 ]) 8) (subreg:DI (reg:TI 201) 8)) -1 (nil)) Then the asmcons pass is changing this to (insn 851 152 849 5 (clobber (reg:TI 201)) -1 (nil)) (insn 849 851 850 5 (set (subreg:DI (reg:TI 201) 0) (subreg:DI (reg:TI 77 [ _8 ]) 0)) -1 (nil)) (insn 850 849 156 5 (set (subreg:DI (reg:TI 201) 8) (subreg:DI (reg:TI 77 [ _8 ]) 8)) -1 (nil)) (insn 156 850 155 5 (set (reg:TI 201) (asm_operands:TI ("") ("=g") 0 [ (reg:TI 201) ] [ (asm_input:TI ("0")) ] [])) -1 (expr_list:REG_DEAD (reg:TI 77 [ _8 ]) (nil))) (insn 155 156 153 5 (clobber (reg:TI 77 [ _8 ])) -1 (nil)) (insn 153 155 154 5 (set (subreg:DI (reg:TI 77 [ _8 ]) 0) (subreg:DI (reg:TI 201) 0)) 135 {*movdi_64bit} (nil)) (insn 154 153 854 5 (set (subreg:DI (reg:TI 77 [ _8 ]) 8) (subreg:DI (reg:TI 201) 8)) 135 {*movdi_64bit} (expr_list:REG_DEAD (reg:TI 201) (nil))) Then the register allocator puts both 77 and 201 in the same register, which means we are now clobbering values we need. In the reload dump I see (insn 851 152 849 5 (clobber (reg:TI 16 a6 [201])) -1 (nil)) (insn 849 851 850 5 (set (reg:DI 16 a6 [201]) (reg:DI 10 a0 [orig:77 _8 ] [77])) 135 {*movdi_64bit} (nil)) (insn 850 849 907 5 (set (reg:DI 17 a7 [+8 ]) (reg:DI 11 a1 [ _8+8 ])) 135 {*movdi_64bit} (nil)) (insn 907 850 1014 5 (clobber (reg:TI 16 a6 [201])) -1 (nil)) so the insns 849 and 850 get optimized away, but we need them. Also, we have (insn 854 154 852 5 (clobber (reg:TI 16 a6 [202])) -1 (nil)) (insn 852 854 853 5 (set (reg:DI 16 a6 [202]) (reg:DI 6 t1 [orig:86 _39 ] [86])) 135 {*movdi_64bit} (nil)) (insn 853 852 913 5 (set (reg:DI 17 a7 [+8 ]) (reg:DI 7 t2 [ _39+8 ])) 135 {*movdi_64bit} (nil)) (insn 913 853 1010 5 (clobber (reg:TI 16 a6 [202])) -1 (nil)) and the insns 852 and 853 get optimized away, but we need them. The comparison is supposed to be a0/a1 versus t1/t2, but we end up with comparing a6/a7 against itself. asmcons is calling emit_move_insn to copy the asm source to the asm dest so it can simplify the asm. Since this is a multiword mode, and the riscv backend doesn't have a movti pattern, this ends up calling emit_move_multi_word which emits the extra clobber that causes the problem. I suppose we could fix this by adding a movti pattern to the riscv backend to avoid the clobbers but we shouldn't have to. Though it would be interesting to see if this maybe results in better code optimization. It isn't clear exactly where the problem is. Maybe asmcons shouldn't try to fix an asm when the mode is larger than the word mode? This could be left to the register allocator to fix. Or maybe harden compare shouldn't generate RTL like this? This could be a harden compare issue, or maybe an issue with the RTL expander to emit the rtl differently. Looks like the same issue with the RTL expander calling emit_move_multi_word which generates the clobber. Or maybe a movti pattern is actually required now? I did verify that disabling asmcons fixes the problem for this testcase. I had to hack the code in function.c to do that as there is no option to disable it.
[Bug target/103302] wrong code with -fharden-compares
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302 --- Comment #3 from Jim Wilson --- Maybe the register allocator should remove clobbers of pseudos, instead of turning them into clobbers of hard register pairs. That would eliminate the ambiguity after register allocation. It is also true that we don't needs hard reg clobbers. The clobbers are only there for tracking pseudo reg subregs.
[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #5 from Jim Wilson --- SiFive doesn't support -mno-strict-align so I've never tested it. I doubt that it works correctly, i.e. I doubt that it optimizes as intended. I've mentioned this to other RVI members, but there hasn't been anyone other than SiFive actively working on upstream gcc so I don't think anyone ever looked at it. It shouldn't give an ICE though. Looking at this, it appears to be another "if only we had a movti pattern" issue. In expand_DEFERRED_INIT in internal-fn.c, in the reg_lhs == TRUE case, there is a test && have_insn_for (SET, var_mode)) which fails because var_mode is TImode and we don't have a movti pattern. The code calls build_zero_cst which returns a constructor with an array type. We then call expand_assignment which gets confused as it doesn't know the size of the array it is copying. However, if we had a movti pattern, then the code computes the size of the array, and creates a VIEW_CONVERT_EXPR to document the array size before calling expand_assignment. So it looks like it would work if we had a movti pattern. I verified that adding a dummy movti pattern makes the ICE go away.
[Bug target/103302] wrong code with -fharden-compares
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302 --- Comment #4 from Jim Wilson --- See also bug 103271 which can also be fixed by adding a movti pattern.
[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271 --- Comment #6 from Jim Wilson --- See also bug 103302 which can also be fixed by adding a movti pattern.
[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271 --- Comment #16 from Jim Wilson --- I have a patch to add movti to the riscv port. I agree that we should be adding this. I just unfortunately had a kitchen accident and had take some time off before I finished it. I noticed that a comment before riscv_split_doubleword_move implies that we did used to have a movti pattern. I wanted to check out the history before re-adding movti to make sure I didn't break anything.
[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271 Jim Wilson changed: What|Removed |Added CC||kito.cheng at gmail dot com --- Comment #18 from Jim Wilson --- I found the place where the movti patterns were removed. This is a riscv-gcc commit from Kito. commit 38650bdbe91bf37c3cce706ce612097b657a91d5 Author: Kito Cheng Date: Mon Sep 12 14:52:56 2016 +0800 Remove 128 bit support, just let gcc handle it I don't see a github pull request or issue for it, so I don't know why Kito removed the support. This is a little too long ago to easily find info related to the change. This is a gcc-6.1 source base. Unless perhaps Kito remembers why he made the change, all I can do is readd the support and wait to see if something else breaks. I do think that we should readd the movti support to fix a couple of bugs. I suspect a code size or performance issue, but wrong code and ICEs are worse problems than code size and performance.
[Bug target/103302] wrong code with -fharden-compares
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302 --- Comment #11 from Jim Wilson --- FYI I have a patch to re-add the movti pattern to riscv.md which should also fix this and another bug. Kito removed the pattern in 2016 and I was hoping to get an answer from him about why he removed it. The RISC-V Summit is this week so not much development work happening this week from RISC-V developers. And in general I've been in transition away from SiFive and not getting much done.
[Bug tree-optimization/102139] New: -O3 miscompile due to slp-vectorize on strict align target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102139 Bug ID: 102139 Summary: -O3 miscompile due to slp-vectorize on strict align target Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- This was originally reported here. https://github.com/riscv/riscv-gcc/issues/289 This testcase is miscompiled at -O3 for a riscv64 target, though this is not a bug in the riscv64 port. I think it will fail for any strict align target. typedef unsigned short uint16_t; void zero_two_uint16(uint16_t* ptr) { ptr[0] = 0; ptr[1] = 0; } void zero(uint16_t* ptr) { for (int i = 0; i < 16; ++i) { zero_two_uint16(ptr); ptr += 2; } } The output is zero: sd zero,0(a0) sd zero,8(a0) sd zero,16(a0) sd zero,24(a0) sd zero,32(a0) sd zero,40(a0) sd zero,48(a0) sd zero,56(a0) ret which fails due to unaligned accesses as a0 only has 2 byte alignment. A git bisect tracked the problem down to this commit. commit f5e18dd Author: Kewen Lin li...@gcc.gnu.org Date: Tue Nov 3 02:51:47 2020 + pass: Run cleanup passes before SLP [PR96789] ... I get correct code if I disable the fre4 pass, which is the fre pass inside pre_slp_scalar_cleanup which was added by this patch. The 169t.vectorize pass adds an address alignment check, and then emits a loop with double-word stores if aligned, and a loop with half-word stores if unaligned. 172t.cunroll fully unrolls both loops. The 173t.fre4 pass deletes a phi node before the half-word stores. The 172t output has [local count: 12627204]: # ptr_3 = PHI # ivtmp_15 = PHI <16(2)> *ptr_3 = 0; and the 173t.fre4 output has [local count: 12627204]: *ptr_4(D) = 0; In the 175t.slp1 pass, the block of half-word stores gets vectorized which is wrong. Then later 207t.dce7 notices duplicate code and deletes the second block of stores. Comparing the full slp1 dump with fre4 disabled versus the unmodified slp1 dump, I see that the first significant difference is when computing pointer alignment. With fre4 disabled, I get tmp.c:4:10: note: recording new base alignment for vectp_ptr.8_125 alignment:8 misalignment: 0 based on: MEM [(uint16_t *)vectp_ptr.8_125] = { 0, 0, 0, 0 }; tmp.c:4:10: note: recording new base alignment for ptr_3 alignment:2 misalignment: 0 based on: *ptr_3 = 0; tmp.c:4:10: note: === vect_slp_analyze_instance_alignment === tmp.c:4:10: note: vect_compute_data_ref_alignment: tmp.c:4:10: note: can't force alignment of ref: *ptr_3 It then refuses to vectorize. With the unmodified compiler I get tmp.c:4:10: note: recording new base alignment for ptr_4(D) alignment:8 misalignment: 0 based on: MEM [(uint16_t *)ptr_4(D)] = { 0, 0, 0, 0 }; tmp.c:4:10: note: === vect_slp_analyze_instance_alignment === tmp.c:4:10: note: vect_compute_data_ref_alignment: tmp.c:4:10: missed: misalign = 0 bytes of ref *ptr_4(D) and then goes ahead and vectorizes which is wrong. Maybe fre4 shouldn't optimize away a phi node when the pointers have different alignment? I noticed that before slp1 runs, the double-word store block has # ALIGN = 8, MISALIGN = 0 but the half-word store block does not. After slp1 runs, both the double-word store and the half-word store block have these notes.
[Bug target/102211] [12 regression] ICE introduced by r12-3277
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #4 from Jim Wilson --- Yes, moving SI/DI values to FP regs is OK. However, RISC-V requires that FP values in FP registers be stored NaN-boxed. So an SFmode value in a 64-bit FP reg has the upper 32-bits of all ones, and the lower 32-bits is the value. Thus if accessed as a 64-bit value, you get a NaN. The hardware may trap if you access a 32-bit value which is not properly NaN-boxed. Using qemu to check this may not be good enough, as last time I looked at qemu it wasn't handling NaN-boxing correctly, but this was over a year ago, so maybe it has been fixed since. I don't know. Anyways, this code sequence is OK foo: fmv.w.x fa0,a0 ret because we are moving a 32-bit SImode value to an FP reg and then treating it as SFmode, and the 32-bit move will properly NaN-box the SFmode value. This code sequence is not OK foo: fmv.d.x fa5,a0 fmul.s fa0,fa0,fa5 because we are moving a 64-bit DImode value to an FP reg and then treating it as SFmode, which is not OK because the value won't be NaN-boxed and may trap at run time. validate_subreg used to prevent the bad subreg from being created. I would think that TARGET_CAN_CHANGE_MODE_CLASS could help here, but it isn't being called inside general_operand when called from fwprop1 where the bad substitution happens. Because we have a pseudo-register, and it is only called for hard registers. I don't see a way to fix this as a backend change with current validate_subreg, other than by replacing register_operand with riscv_register_operand, and putting the subreg check I need inside riscv_register_operand. And likewise for any other affected predicate, like move_operand. This will be a big change, though a lot of it will be mechanical. As an optimization, we can continue to use register_operand in any pattern that can't use FP registers. As a middle end change, I need a new hook in general_operand to reject subregs that we can't support on RISC-V. Or maybe re-add the check I need to validate_subreg as a hook, so it can be conditionally enabled for RISC-V. We can allow (subreg:SF (reg:DI)) if it gets allocated to an integer register. It is only when it is allocated to an FP register that it can't work. I don't know offhand if that can be described. But disallowing the subreg always for RISC-V is simpler and also works.
[Bug target/102211] [12 regression] ICE introduced by r12-3277
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211 Jim Wilson changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2021-09-08 Status|UNCONFIRMED |NEW
[Bug target/102211] [12 regression] ICE introduced by r12-3277
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211 --- Comment #5 from Jim Wilson --- I have a WIP fix that lets me build newlib and glibc via riscv-gnu-toolchain. I haven't tried a bootstrap yet. I created a new predicate that uses the small bit of deleted code I need from validate_subregs, and then modified most patterns with an f constraint to use it. +;; This predicate should be used for any pattern that has constraints that +;; accept FP registers. +(define_predicate "f_register_operand" + (match_operand 0 "register_operand") +{ + /* We can't allow a subreg that changes size in an FP reg, as that violates + the NaN-boxing requirement. Copied from old validate_subregs code. */ + if (GET_CODE (op) == SUBREG) +{ + machine_mode imode = GET_MODE (SUBREG_REG (op)); + machine_mode omode = GET_MODE (op); + if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode)) + { + poly_uint64 isize = GET_MODE_SIZE (imode); + poly_uint64 osize = GET_MODE_SIZE (omode); + + if (! (known_eq (isize, osize) +/* LRA can use subreg to store a floating point value in + an integer mode. Although the floating point and the + integer modes need the same number of hard registers, + the size of floating point mode can be less than the + integer mode. LRA also uses subregs for a register + should be used in different mode in on insn. */ +|| lra_in_progress)) + return false; + } +} + return true; +}) I haven't fixed the move patterns yet. I need a few more new predicates for that. The riscv_hard_regno_mode_ok function looks odd as Hongtao Liu mentioned. The mov*i patterns have f constraints, but we don't allow FP values in integer registers here. I think this is a missed optimization, or maybe good register allocation creates some no-op moves to hide the problem. My current patch does not need a riscv_hard_regno_mode_ok change to work. The riscv_can_change_mode_class also looks a little odd. The MIPS one that we copied doesn't allow mode changes in FP regs, but the alpha one does allow mode changes in FP regs if the modes have the same size. I think the alpha one can work for RISC-V and that this is another missed optimization, though again maybe good regalloc hides the problem with no-op moves. But I will worry about the possible missed optimizations after I get bootstrap working again.
[Bug target/102250] [11/12 Regression] python is not documented as a Prerequisite for building for riscv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102250 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #3 from Jim Wilson --- There is no python requirement for building a riscv linux compiler. It is only needed if the user specifies the optional --with-arch configure option, and if the user specifies a non-canonical form of the architecture string in the --with-arch option string. Otherwise gcc will build fine without python. A later patch fixed this to check for python before using it, after Andreas Schwab complained about this.
[Bug target/102211] [12 regression] ICE introduced by r12-3277
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211 --- Comment #9 from Jim Wilson --- Created attachment 51456 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51456&action=edit proposed RISC-V backend solution
[Bug target/102211] [12 regression] ICE introduced by r12-3277
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211 --- Comment #10 from Jim Wilson --- I attached a patch which is my proposed solution to the RISC-V backend. It adds a new f_register_operand predicate and modifies patterns that use the f constraint to use it instead of register_operand. This was tested with an default language gcc build, glibc build, and glibc check on an unmatched running OpenEmbedded. And an all language gcc build, glibc build, and glibc check on an unleashed running Fedora with an old 4.15 kernel. Both succeeded as well as expected. I'll be trying gcc check next. Meanwhile, the validate_subregs patch was reverted, so we don't immediately need this to fix build errors. But it still might be useful if validate_subregs changes again. Or if it happens to give better code, though I think it won't do anything if validate_subregs is rejecting the subregs we are checking for in f_register_operand.
[Bug target/105733] riscv: Poor codegen for large stack frames
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #2 from Jim Wilson --- This is a known problem that I have commented on before. For instance here https://github.com/riscv-collab/riscv-gcc/issues/193 Copying my reply to here since this is a better place for the bug report: This is a known problem. GCC first emits code using a frame pointer, and then tries to eliminate the frame pointer during register allocation. When the frame size is larger than the immediate field of an add, we need temporaries to compute frame offsets. Compiler optimizations like common subexpression elimination (cse) try to optimize the calculation of the temporaries. Then when we try to eliminate the frame pointer, we run into trouble because the cse opt changes interfere with the frame pointer elimination, and we end up with an ugly inefficient mess. This problem isn't RISC-V specific, but since RISC-V has 12-bit immediates and most everyone else has 16-bit immediates, we hit the problem sooner, and thus makes it much more visible for RISC-V. The example above is putting 12KB on the stack, which is larger than 12-bits of range (4KB), but well within 16-bits of range (64KB). I have a prototype of a patch to fix it by allowing >12-bit offsets for frame pointer references, and then fixing this when eliminating the frame pointer, but this can cause other problems, and needs more work to be usable. I have no idea when someone might try to finish the patch. End of inclusion from github. To improve my earlier answer, we have signed 12-bit immediates, so the trouble actually starts at 2048 bytes, and Jessica's example is larger than that. Reduce BUFSIZ to 2044 and you get a reasonable result. foo: li a5,4096 addia5,a5,-2048 addisp,sp,-2048 add a5,a5,a0 add a0,a5,sp li t0,4096 sb zero,-2048(a0) addit0,t0,-2048 add sp,sp,t0 jr ra There is still a bit of oddness here. We are adding 2048 to a0 and then using an address -2048(a0). I think that is more cse trouble. 2048 requires two instructions to load into a register which is likely confusing something somewhere.
[Bug target/91602] GCC fails to build for riscv in a combined tree due to misconfigured leb128 support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91602 --- Comment #14 from Jim Wilson --- I posted a patch but didn't get a review, and then got distracted by other stuff and failed to follow up. https://gcc.gnu.org/pipermail/gcc-patches/2020-January/539461.html
[Bug middle-end/101996] libatomic: RISC-V 64: Infinite recursion in __atomic_compare_exchange_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101996 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #4 from Jim Wilson --- The only way I can reproduce this is if I hand modify configure output. In the riscv64-unknown-linux-gnu/libatomic/ build dir there is a auto-config.h file that has #define HAVE_ATOMIC_CAS_1 0 If I modify that to #define HAVE_ATOMIC_CAS_1 1 then I get the same result where atomic_compare_exchange_1 calls itself. libat_compare_exchange_1: addisp,sp,-16 li a4,0 li a3,5 sd ra,8(sp) call__atomic_compare_exchange_1 ld ra,8(sp) addisp,sp,16 jr ra .set__atomic_compare_exchange_1,libat_compare_exchange_1 The question would then be why you get this configure result. In the config.log file in the same dir I see configure:13591: checking for __atomic_compare_exchange for size 1 configure:13611: /scratch/jimw/riscv-gnu-toolchain/X-tmp/build-gcc-linux-stage2/./gcc/xgcc -B/scratch/jimw/riscv-gnu-toolchain/X-tmp/build-gcc-linux-stage2/./gcc/ -B/scratch/jimw/riscv-gnu-toolchain/X-tmp/build-install/riscv64-unknown-linux-gnu/bin/ -B/scratch/jimw/riscv-gnu-toolchain/X-tmp/build-install/riscv64-unknown-linux-gnu/lib/ -isystem /scratch/jimw/riscv-gnu-toolchain/X-tmp/build-install/riscv64-unknown-linux-gnu/include -isystem /scratch/jimw/riscv-gnu-toolchain/X-tmp/build-install/riscv64-unknown-linux-gnu/sys-include -o conftest -O2 -mcmodel=medlow -fno-sync-libcallsconftest.c >&5 /scratch/jimw/riscv-gnu-toolchain/X-tmp/build-install/riscv64-unknown-linux-gnu/bin/ld: /tmp/ccEYquRC.o: in function `main': conftest.c:(.text.startup+0xa): undefined reference to `__atomic_compare_exchange_1' collect2: error: ld returned 1 exit status configure:13614: $? = 1 configure:13642: result: no You should check the auto-config.h and config.log files in your gcc build. I don't think that the compiler version matters. I see you are using musl which is something I haven't been testing. Maybe there is something in musl that is confusing the libatomic build. Or maybe there is something about the way that Alpine Linux is compiling gcc that is causing the wrong result. It looks like either an Alpine Linux problem or a musl problem to me. I don't have a system running Alpine Linux and don't know how to set one up offhand which limits what I can immediately do.
[Bug middle-end/101996] libatomic: RISC-V 64: Infinite recursion in __atomic_compare_exchange_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101996 --- Comment #5 from Jim Wilson --- A riscv-gnu-toolchain build with musl looks OK, so it looks like an Alpine Linux problem.
[Bug middle-end/101996] libatomic: RISC-V 64: Infinite recursion in __atomic_compare_exchange_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101996 --- Comment #6 from Jim Wilson --- Looking at Alpine Linux discussions, I see that it has a --enable-autolink-libatomic configure option which links in libatomic by default. This could break the libatomic autoconf tests that check to see if libatomic functions exist, by linking in libatomic when we aren't expecting it. The libatomic autoconf tests are really checking to see if the calls get inline expanded. They don't want the calls to be satisified by a library. They want the calls to be undefined functions if they aren't inline expanded. These autoconf tests might need to be rewritten if more distros start linking in libatomic by default.