[Bug target/94341] New: mve_mov can produce ICE on latest trunk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94341 Bug ID: 94341 Summary: mve_mov can produce ICE on latest trunk Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: sripar01 at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- The following code ICE's on fsf-trunk. #include "arm_mve.h" uint8x16_t test() { uint8x16_t accum = (uint8x16_t)(uint32x4_t){0, 0, 0, 2}; uint8x16_t accum2 = (uint8x16_t)(uint32x4_t){0, 0, 0, 1}; accum += __arm_vddupq_m_n_u8 (accum2, 0, 4, (mve_pred16_t)1); return accum; } It ICE's because the first (define_insn "*mve_mov" ...) pattern has the following clause for it's 4th alternative: case 4: if ((TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode)) || (MEM_P (operands[1]) && (GET_CODE (XEXP (operands[1], 0)) == LABEL_REF))) return output_move_neon (operands); else return "vldrb.8 %q0, %E1"; For the test above, we get an RTL pattern of (insn 5 7 6 2 (set (reg:V16QI 28 s12 [116]) (mem:V16QI (const:SI (plus:SI (label_ref 28) (const_int 16 [0x10]))) [0 S16 A64])) "cde-mve-tests.c":6:12 2990 {*mve_movv16qi} (expr_list:REG_EQUIV (const_vector:V16QI [ (const_int 0 [0]) repeated x16 ]) (nil))) This matches the *mve_movv16qi pattern on the fourth alternative. the clause above does not match for going into `output_move_neon` (since the operand is not a mem of a label_ref, it's the mem of an offset to a label_ref). Hence the compiler tries to emit "vldrb.8 %q0, %E1", but since the 'E' syntax is for registers it does not match the RTL pattern for the operand. This results in an ICE. test: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, #1 @ movhi movsr2, #0 vldr.64 d6, .L3 vldr.64 d7, .L3+8 during RTL pass: final dump file: cde-mve-tests.c.314r.final /home/matmal01/Documents/gnu-work/cde-intrinsics/cde-mve-tests.c: In function 'test': /home/matmal01/Documents/gnu-work/cde-intrinsics/cde-mve-tests.c:9:1: internal compiler error: in arm_print_operand, at config/arm/arm.c:23953 0x113c138 arm_print_operand /tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/config/arm/arm.c:23953 0x9332c5 output_operand(rtx_def*, int) /tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:4051 0x933b63 output_asm_insn(char const*, rtx_def**) /tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:3944 0x9366b6 final_scan_insn_1 /tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:3106 0x9369a7 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) /tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:3152 0x9376ac final_1 /tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:2020 0x9378f6 rest_of_handle_final /tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:4658 0x9378f6 execute /tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:4736 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. vldrb.8 q0, gcc [11:11:19] $
[Bug target/94341] mve_mov can produce ICE on latest trunk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94341 Matthew Malcomson changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-03-26 Status|UNCONFIRMED |ASSIGNED --- Comment #1 from Matthew Malcomson --- Have already discussed bug privately and work is being done on it. Just posting this so I have a public page for reference in public discussions.
[Bug target/94383] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94383 --- Comment #9 from Matthew Malcomson --- (In reply to Jakub Jelinek from comment #8) > I'd like to ping this, it would be nice to at least decide if this should be > handled for GCC10 or postponed to GCC11 only. Hi Jakub -- I'm taking a look at this at the moment, so I'm hoping I can get it done for GCC10. So far I've only double checked what the AAPCS64 says and confirmed that we're producing the correct code for gnu++14 and not gnu++17 (the base class is empty and a language type that occupies zero bytes has no mapping on the ABI level, hence that `pair` structure is an HFA and should be passed in the vector registers). I'm just about to start looking at changes & testing.
[Bug target/94711] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94711 --- Comment #3 from Matthew Malcomson --- This has been fixed by. https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544625.html In the email linked above I mentioned another problem using `-mabi=apcs-gnu`. Since that ABI is obsolete (Kyrylo pointed that out to me) I don't think that problem should hold up GCC10.
[Bug target/94711] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94711 Matthew Malcomson changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #4 from Matthew Malcomson --- Resolving since the issue with the obsolete ABI is different to this ticket.
[Bug target/95816] New: Aarch64 jumps between Hot/Cold sections use possibly clobbered registers x16/x17
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95816 Bug ID: 95816 Summary: Aarch64 jumps between Hot/Cold sections use possibly clobbered registers x16/x17 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- Target: AArch64 When splitting a function into two different sections (hot/cold). The assembler produces a relocation on jumps between the two sections. The linker is permitted to use a veneer to implement such a relocated jump. The registers x16 and x17 are reserved for use in those veneers. Hence the registers x16 and x17 should be treated as clobbered when jumping between the hot/cold sections in a function. This is not done. We can use the testcase below to demonstrate this (modified from predict-22.c in the testsuite). - $ aarch64-none-linux-gnu-gcc \ >predict-22.c \ >-O2 -w -fPIC -freorder-blocks-and-partition \ >-c -o predict-22.o - volatile int v; void bar (void) __attribute__((leaf, cold)); void baz (int *); void alt (long long); void foo (int x, int y, int z) { static int f __attribute__((section ("mysection"))); register long long k asm ("x16"); __asm__ ("mov\t%0, 10" : "=r" (k) : "0" (k)); f = 1; if (__builtin_expect (x, 0)) if (__builtin_expect (y, 0)) if (__builtin_expect (z, 0)) { f = 2; k *= 13; bar (); v += 1; v *= 2; v /= 2; v -= 1; v += 1; v *= 2; v /= 2; v -= 1; v += 1; v *= 2; v /= 2; v -= 1; v += k; v *= 2; v /= 2; v -= 1; v += 1; v *= 2; v /= 2; v -= 1; v += 1; v *= 2; v /= 2; v -= 1; v += 1; v *= 2; v /= 2; v -= 1; v += 1; v *= 2; v /= 2; v -= 1; f = 3; __builtin_abort (); } f = k; baz (&f); } This produces an object file which is dumped below. The dump below demonstrates that there is a R_AARCH64_JUMP26 relocation on the jump between the hot/cold sections, and that the value stored in x16 is used after that jump. $ aarch64-none-linux-gnu-objdump -dr predict-22.o predict-22.o: file format elf64-littleaarch64 Disassembly of section .text: : 0: 713fcmp w1, #0x0 4: 7a401844ccmpw2, #0x0, #0x4, ne // ne = any 8: 7a401804ccmpw0, #0x0, #0x4, ne // ne = any c: d2800150mov x16, #0xa // #10 10: 54a1b.ne24 // b.any 14: 9001adrpx1, 0 14: R_AARCH64_ADR_PREL_PG_HI21 .bss 18: 9120add x0, x1, #0x0 18: R_AARCH64_ADD_ABS_LO12_NC .bss 1c: b930str w16, [x1] 1c: R_AARCH64_LDST32_ABS_LO12_NC.bss 20: 1400b 0 20: R_AARCH64_JUMP26baz 24: a9bd7bfdstp x29, x30, [sp, #-48]! 28: 910003fdmov x29, sp 2c: a90153f3stp x19, x20, [sp, #16] 30: f90013f5str x21, [sp, #32] 34: 1400b 0# Here is the relocation. 34: R_AARCH64_JUMP26.text.unlikely Disassembly of section .text.unlikely: : 0: 9015adrpx21, 0 0: R_AARCH64_ADR_PREL_PG_HI21 .bss 4: 52800053mov w19, #0x2 // #2 8: aa1003f4mov x20, x16# Here we try and use the clobbered x16 register. c: b90002b3str w19, [x21] c: R_AARCH64_LDST32_ABS_LO12_NC .bss 10: 9400bl 0 10: R_AARCH64_CALL26bar 14: 9000adrpx0, 4 14: R_AARCH64_ADR_GOT_PAGE v 18: d28001a3mov x3, #0xd// #13 1c: 52800062mov w2, #0x3// #3 20: f940ldr x0, [x0] 20: R_AARCH64_LD64_GOT_LO12_NC v
[Bug fortran/96381] New: gfc_find_vtab can use a character type typespec as a derived type (causing invalid access)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96381 Bug ID: 96381 Summary: gfc_find_vtab can use a character type typespec as a derived type (causing invalid access) Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- When running the testsuite with a compiler sanitized with -fsanitize=hwaddress (HWASAN sanitizer which is not yet committed upstream) I saw the error below. The error comes from the testsuite running `pr93337.f90`. It is complaining that `gfc_find_derived_vtab` is attempting to access an 8 byte chunk of data 88 bytes after a region that is only 48 bytes long. That seems to be coming from the access of `derived->attr.pdt_template` (which is a one-bit field in the byte 92 bytes into a `gfc_symbol` structure). According to the dump the 48 byte long structure is allocated in `gfc_new_charlen`. This function only ever sets the `cl` alternative of the union in a `gfc_typespec`. I've inlined a GDB session demonstrating the mis-use under the HWASAN dump. ==25394==ERROR: HWAddressSanitizer: tag-mismatch on address 0xefdf79d8 at pc 0x006a8560 READ of size 8 at 0xefdf79d8 tags: 58/ff (ptr/mem) in thread T0 #0 0x6a855c in SigTrap<3> ../../../../gcc-source/libsanitizer/hwasan/hwasan_checks.h:27 #1 0x6a855c in CheckAddress<(__hwasan::ErrorAction)0, (__hwasan::AccessType)0, 3> ../../../../gcc-source/libsanitizer/hwasan/hwasan_checks.h:88 #2 0x6a855c in __hwasan_load8 ../../../../gcc-source/libsanitizer/hwasan/hwasan.cpp:454 #3 0x6ff654 in gfc_find_derived_vtab(gfc_symbol*) ../../gcc-source/gcc/fortran/class.c:2269 #4 0x707498 in gfc_find_vtab(gfc_typespec*) ../../gcc-source/gcc/fortran/class.c:2908 #5 0x707498 in gfc_find_vtab(gfc_typespec*) ../../gcc-source/gcc/fortran/class.c:2898 #6 0x7a7578 in gfc_match_assignment() ../../gcc-source/gcc/fortran/match.c:1393 #7 0x80d53c in match_word ../../gcc-source/gcc/fortran/parse.c:65 #8 0x80d53c in decode_statement ../../gcc-source/gcc/fortran/parse.c:361 #9 0x812f28 in next_free ../../gcc-source/gcc/fortran/parse.c:1280 #10 0x812f28 in next_statement ../../gcc-source/gcc/fortran/parse.c:1512 #11 0x816190 in parse_spec ../../gcc-source/gcc/fortran/parse.c:3923 #12 0x819948 in parse_progunit ../../gcc-source/gcc/fortran/parse.c:5853 #13 0x81c02c in gfc_parse_file() ../../gcc-source/gcc/fortran/parse.c:6394 #14 0x898d98 in gfc_be_parse_file ../../gcc-source/gcc/fortran/f95-lang.c:212 #15 0x152ac7c in compile_file ../../gcc-source/gcc/toplev.c:458 #16 0x69d114 in do_compile ../../gcc-source/gcc/toplev.c:2320 #17 0x69d114 in toplev::main(int, char**) ../../gcc-source/gcc/toplev.c:2459 #18 0x6a0218 in main ../../gcc-source/gcc/main.c:39 #19 0xa03c38dc in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x1f8dc) [0xefdf79c0,0xefdf7a00) is a small allocated heap chunk; size: 64 offset: 24 0xefdf79d8 is located 40 bytes to the right of 48-byte region [0xefdf7980,0xefdf79b0) allocated here: #0 0x6a9d40 in __sanitizer_calloc ../../../../gcc-source/libsanitizer/hwasan/hwasan_interceptors.cpp:138 #1 0x2d1ebbc in xcalloc ../../gcc-source/libiberty/xmalloc.c:162 #2 0x8831a0 in gfc_new_charlen(gfc_namespace*, gfc_charlen*) ../../gcc-source/gcc/fortran/symbol.c:3964 #3 0x7146ec in gfc_match_char_spec(gfc_typespec*) ../../gcc-source/gcc/fortran/decl.c:3478 #4 0x71e324 in gfc_match_decl_type_spec(gfc_typespec*, int) ../../gcc-source/gcc/fortran/decl.c:4169 #5 0x7220d8 in gfc_match_data_decl() ../../gcc-source/gcc/fortran/decl.c:6129 #6 0x80d6b0 in match_word ../../gcc-source/gcc/fortran/parse.c:65 #7 0x80d6b0 in decode_statement ../../gcc-source/gcc/fortran/parse.c:376 #8 0x812f28 in next_free ../../gcc-source/gcc/fortran/parse.c:1280 #9 0x812f28 in next_statement ../../gcc-source/gcc/fortran/parse.c:1512 #10 0x816600 in parse_derived ../../gcc-source/gcc/fortran/parse.c:3343 #11 0x816600 in parse_spec ../../gcc-source/gcc/fortran/parse.c:3884 #12 0x819948 in parse_progunit ../../gcc-source/gcc/fortran/parse.c:5853 #13 0x81c02c in gfc_parse_file() ../../gcc-source/gcc/fortran/parse.c:6394 #14 0x898d98 in gfc_be_parse_file ../../gcc-source/gcc/fortran/f95-lang.c:212 #15 0x152ac7c in compile_file ../../gcc-source/gcc/toplev.c:458 #16 0x69d114 in do_compile ../../gcc-source/gcc/toplev.c:2320 #17 0x69d114 in toplev::main(int, char**) ../../gcc-source/gcc/toplev.c:2459 #18 0x6a0218 in main ../../gcc-source/gcc/main.c:39 #19 0xa03c38dc in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x1f8dc) #20 0x6a3e5c (/home/ubuntu/working-directory/gcc-hwasan-install/libexec/gcc/aarch64-unkn
[Bug middle-end/92410] New: Invalid access to df->insns[] in regstat_bb_compute_calls_crossed (caught by hwasan)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410 Bug ID: 92410 Summary: Invalid access to df->insns[] in regstat_bb_compute_calls_crossed (caught by hwasan) Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- Host: aarch64-none-linux-gnu Target: aarch64-none-linux-gnu Build: aarch64-none-linux-gnu On trying to rebase the hwasan patches to a newer GCC version, I've found that hwasan catches a bug newly introduced between commits r275833 and r277678. The stack trace of the access (as given by hwasan) is: #0 0x6293ac in SigTrap<3> ../../../../gcc/libsanitizer/hwasan/hwasan_checks.h:27 #1 0x6293ac in CheckAddress<(__hwasan::ErrorAction)0, (__hwasan::AccessType)0, 3> ../../../../gcc/libsanitizer/hwasan/hwasan_checks.h:88 #2 0x6293ac in __hwasan_load8 ../../../../gcc/libsanitizer/hwasan/hwasan.cpp:478 #3 0x10cff94 in regstat_bb_compute_calls_crossed ../../gcc/gcc/regstat.c:327 #4 0x10cff94 in regstat_compute_calls_crossed() ../../gcc/gcc/regstat.c:379 #5 0x21c0858 in sched_init() ../../gcc/gcc/haifa-sched.c:7337 #6 0x21d3994 in haifa_sched_init() ../../gcc/gcc/haifa-sched.c:7354 #7 0x11376c8 in schedule_insns() ../../gcc/gcc/sched-rgn.c:3514 #8 0x1138584 in rest_of_handle_sched2 ../../gcc/gcc/sched-rgn.c:3746 #9 0x1138584 in execute ../../gcc/gcc/sched-rgn.c:3882 #10 0x102911c in execute_one_pass(opt_pass*) ../../gcc/gcc/passes.c:2494 #11 0x1029c58 in execute_pass_list_1 ../../gcc/gcc/passes.c:2580 #12 0x1029c74 in execute_pass_list_1 ../../gcc/gcc/passes.c:2581 #13 0x1029c74 in execute_pass_list_1 ../../gcc/gcc/passes.c:2581 #14 0x1029cd0 in execute_pass_list(function*, opt_pass*) ../../gcc/gcc/passes.c:2591 #15 0x955aa4 in cgraph_node::expand() ../../gcc/gcc/cgraphunit.c:2196 #16 0x956f38 in expand_all_functions ../../gcc/gcc/cgraphunit.c:2334 #17 0x956f38 in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2684 #18 0x95ac58 in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2597 #19 0x95ac58 in symbol_table::finalize_compilation_unit() ../../gcc/gcc/cgraphunit.c:2864 #20 0x1209bf8 in compile_file ../../gcc/gcc/toplev.c:481 #21 0x61e9d4 in do_compile ../../gcc/gcc/toplev.c:2199 #22 0x61e9d4 in toplev::main(int, char**) ../../gcc/gcc/toplev.c:2334 #23 0x621758 in main ../../gcc/gcc/main.c:39 #24 0x8b46889c in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x1f89c) While the stack trace of the allocation aronud this address (as given by hwasan) is: [0xefe68400,0xefe68500) is a small allocated heap chunk; size: 256 offset: 240 0xefe684f0 is located 0 bytes to the right of 240-byte region [0xefe68400,0xefe684f0) allocated here: #0 0x62ae0c in __sanitizer_malloc ../../../../gcc/libsanitizer/hwasan/hwasan_interceptors.cpp:169 #1 0x24b3df8 in xrealloc ../../gcc/libiberty/xmalloc.c:177 #2 0x99c434 in df_grow_insn_info() ../../gcc/gcc/df-scan.c:545 #3 0x99e928 in df_scan_alloc(bitmap_head*) ../../gcc/gcc/df-scan.c:262 #4 0xe34ee0 in do_reload ../../gcc/gcc/ira.c:5574 #5 0xe34ee0 in execute ../../gcc/gcc/ira.c:5697 #6 0x102911c in execute_one_pass(opt_pass*) ../../gcc/gcc/passes.c:2494 #7 0x1029c58 in execute_pass_list_1 ../../gcc/gcc/passes.c:2580 #8 0x1029c74 in execute_pass_list_1 ../../gcc/gcc/passes.c:2581 #9 0x1029cd0 in execute_pass_list(function*, opt_pass*) ../../gcc/gcc/passes.c:2591 #10 0x955aa4 in cgraph_node::expand() ../../gcc/gcc/cgraphunit.c:2196 #11 0x956f38 in expand_all_functions ../../gcc/gcc/cgraphunit.c:2334 #12 0x956f38 in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2684 #13 0x95ac58 in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2597 #14 0x95ac58 in symbol_table::finalize_compilation_unit() ../../gcc/gcc/cgraphunit.c:2864 #15 0x1209bf8 in compile_file ../../gcc/gcc/toplev.c:481 #16 0x61e9d4 in do_compile ../../gcc/gcc/toplev.c:2199 #17 0x61e9d4 in toplev::main(int, char**) ../../gcc/gcc/toplev.c:2334 #18 0x621758 in main ../../gcc/gcc/main.c:39 #19 0x8b46889c in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x1f89c) #20 0x624e34 (/home/ubuntu/working-directory/gcc-objdir-hwasan/gcc/cc1+0x624e34) I've verified the bug by compiling on r277678 and viewing the relevent command in gdb. After running a full bootstrap (convert filenames as relevent): /home/ubuntu/working-directory/gcc-objdir/./gcc/xgcc -B/home/ubuntu/working-directory/gcc-objdir/./gcc/ -B/home/ubuntu/working-directory/gcc-install/aarch64-unknown-linux-gnu/bin/ -B/home/ubuntu/working-directory/gcc-install/aarch64-unknown-linux-gnu/lib/ -isystem /home
[Bug middle-end/92410] Invalid access to df->insns[] in regstat_bb_compute_calls_crossed (caught by hwasan)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410 --- Comment #5 from Matthew Malcomson --- I've had a little look into it, and the below seems promising: Based on a comment in haifa-sched.c, notes are removed before scheduling and added back in. Since the insn that is larger than the df buffer is a note, and I saw in gdb that it's added during `reemit_notes`, I figure the root problem might be that the notes are removed, then the df->insns array is calculated, then notes are added back in. I hence tested the below patch, and the testcase that Martin found no longer crashes. I have not yet looked into whether `df_recompute_luids` is the correct function to call or if there's a better approach. Just sharing an update. diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c index 41cf1f3..564a358 100644 --- a/gcc/haifa-sched.c +++ b/gcc/haifa-sched.c @@ -6231,6 +6231,7 @@ commit_schedule (rtx_insn *prev_head, rtx_insn *tail, basic_block *target_bb) reemit_notes (insn); last_scheduled_insn = insn; } + df_recompute_luids(*target_bb); scheduled_insns.truncate (0); } diff --git a/gcc/regstat.c b/gcc/regstat.c index 4da9b7c..c6cefb11 100644 --- a/gcc/regstat.c +++ b/gcc/regstat.c @@ -324,6 +324,7 @@ regstat_bb_compute_calls_crossed (unsigned int bb_index, bitmap live) FOR_BB_INSNS_REVERSE (bb, insn) { + gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ()); struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); unsigned int regno; diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index 8a1d414..5d8 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -4673,6 +4673,7 @@ sel_restore_notes (void) if (NONDEBUG_INSN_P (insn)) reemit_notes (insn); + df_recompute_luids (first); first = first->next_bb; } while (first != last);
[Bug middle-end/92410] Invalid access to df->insns[] in regstat_bb_compute_calls_crossed (caught by hwasan)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410 --- Comment #6 from Matthew Malcomson --- I believe the problem is that `remove_notes` followed by `reemit_notes` can generate these notes with a different UID. When `reemit_notes` adds the new note, the dataflow information is not updated automatically because `add_insn_before` only updates the information for INSN_P(insn). Hence the later lookup of this dataflow information is problematic. I'm not sure whether there's any pre-existing "should not use dataflow queries on notes" rule. If there is, then the regstat_bb_compute_calls_crossed function should be modified to check for NONDEBUG_INSN_P and continue earlier on its loop. If there isn't such a rule then I guess the best approach would be to ensure we call `df_insn_create_insn_record` whenever calling `emit_note_before` or `emit_note_after` once the dataflow information has been created. (assuming that notes don't need the information to be populated since `df_insn_rescan` seems to ignore notes). I've tried both moving the check for NONDEBUG_INSN_P in `regstat_bb_compute_calls_crossed` and adding a call to `df_insn_create_insn_record` into `reemit_notes` on a cross-compiler and both pass the testcase Martin found.
[Bug middle-end/92410] Invalid access to df->insns[] in regstat_bb_compute_calls_crossed (caught by hwasan)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410 --- Comment #7 from Matthew Malcomson --- (In reply to Matthew Malcomson from comment #6) > I'm not sure whether there's any pre-existing "should not use dataflow > queries on notes" rule. If there is, then the > regstat_bb_compute_calls_crossed function should be modified to check for > NONDEBUG_INSN_P and continue earlier on its loop. I now see that `df_bb_refs_record` generates insn info for notes (but leaves it mostly zeroed out). I figure doing the same for the notes emitted by `reemit_notes` seems reasonable. Currently bootstrapping and regtesting (both with HWASAN and without) the following patch. diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c index 41cf1f3..2e1a84f 100644 --- a/gcc/haifa-sched.c +++ b/gcc/haifa-sched.c @@ -5433,6 +5433,7 @@ reemit_notes (rtx_insn *insn) last = emit_note_before (note_type, last); remove_note (insn, note); + df_insn_create_insn_record (last); } } } diff --git a/gcc/regstat.c b/gcc/regstat.c index 4da9b7c..c6cefb11 100644 --- a/gcc/regstat.c +++ b/gcc/regstat.c @@ -324,6 +324,7 @@ regstat_bb_compute_calls_crossed (unsigned int bb_index, bitmap live) FOR_BB_INSNS_REVERSE (bb, insn) { + gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ()); struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); unsigned int regno;
[Bug middle-end/92410] Invalid access to df->insns[] in regstat_bb_compute_calls_crossed (caught by hwasan)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410 --- Comment #8 from Matthew Malcomson --- Author: matmal01 Date: Mon Dec 9 12:03:53 2019 New Revision: 279124 URL: https://gcc.gnu.org/viewcvs?rev=279124&root=gcc&view=rev Log: [mid-end] Add notes to dataflow insn info when re-emitting (PR92410) In scheduling passes, notes are removed with `remove_notes` before the scheduling is done, and added back in with `reemit_notes` once the scheduling has been decided. This process leaves the notes in the RTL chain with different insn uid's than were there before. Having different UID's (larger than the previous ones) means that DF_INSN_INFO_GET(insn) will access outside of the allocated array. This has been seen in the `regstat_bb_compute_calls_crossed` function. This patch adds an assert to the `regstat_bb_compute_calls_crossed` function so that bad accesses here are caught instead of going unnoticed, and then avoids the problem. We avoid the problem by ensuring that new notes added by `reemit_notes` have an insn record given to them. This is done by adding a call to `df_insn_create_insn_record` on each note added in `reemit_notes`. `df_insn_create_insn_record` leaves this new record zeroed out, which appears to be fine for notes (e.g. `df_bb_refs_record` already does not set anything except the luid for notes, and notes have no dataflow information to record). We add the testcase that Martin found here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410#c2 . This testcase fails with the "regstat.c" change, and then succeeds with the "haifa-sched.c" change. There is a similar problem with labels, that the `gcc_assert` catches when running regression tests in gcc.dg/fold-eqandshift-1.c and gcc.c-torture/compile/pr32482.c. This is due to the `cfg_layout_finalize` call in `bb-reorder.c` emitting new labels, and these labels not having a dataflow df_insn_info member. We solve this by manually calling `df_recompute_luids` on each basic block once this pass has finished. Testing done: Ran regression tests on aarch64-none-linux-gnu cross compiler. Bootstrapped and ran tests on aarch64-none-linux-gnu native. gcc/ChangeLog: 2019-12-09 Matthew Malcomson PR middle-end/92410 * bb-reorder.c (pass_reorder_blocks::execute): Recompute dataflow luids once basic blocks have been reordered. * haifa-sched.c (reemit_notes): Create df insn record for each new note. * regstat.c (regstat_bb_compute_calls_crossed): Assert every insn has an insn record before trying to use it. gcc/testsuite/ChangeLog: 2019-12-09 Matthew Malcomson PR middle-end/92410 * gcc.dg/torture/pr92410.c: New test. Added: trunk/gcc/testsuite/gcc.dg/torture/pr92410.c Modified: trunk/gcc/ChangeLog trunk/gcc/bb-reorder.c trunk/gcc/haifa-sched.c trunk/gcc/regstat.c trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/92882] [10 Regression] ICE in regstat_bb_compute_calls_crossed, at regstat.c:327 since r279124
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92882 --- Comment #3 from Matthew Malcomson --- (In reply to Jakub Jelinek from comment #2) > The question is if we just have some exception that for new labels etc. we > don't grow the tables, while for insns we always do. If yes, the patch is a > real fix, if not, we can wait for further ICEs on the same assertion. > That's a good point. At the time I proposed r279124 I decided that the comment inside `df_recompute_luids` was enough of an indication that labels should be in the tables, but another interpretation could be that we already know labels can be outside of a table for some period of computation. void df_recompute_luids (basic_block bb) { rtx_insn *insn; int luid = 0; df_grow_insn_info (); /* Scan the block an insn at a time from beginning to end. */ FOR_BB_INSNS (bb, insn) { struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); /* Inserting labels does not always trigger the incremental rescanning. */ if (!insn_info) { gcc_assert (!INSN_P (insn)); insn_info = df_insn_create_insn_record (insn); } DF_INSN_INFO_LUID (insn_info) = luid; if (INSN_P (insn)) luid++; } }
[Bug c++/92919] New: invalid memory access in wide_str_to_charconst when running ucn2.C testcase (caught by hwasan)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92919 Bug ID: 92919 Summary: invalid memory access in wide_str_to_charconst when running ucn2.C testcase (caught by hwasan) Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- Target: aarch64-none-linux-gnu When running the ucn2.C testcase, hwasan catches an invalid access in the function `wide_str_to_charconst`. The problematic line is: const char16_t p = u'\U00110003'; It seems this is to do with the size of the constant, since the line below does not trigger this invalid access. const char16_t j = u'\U0001F914'; yet changing that constant to the below does. const char16_t j = u'\U0011F914'; HWASAN output is below. ==9608==ERROR: HWAddressSanitizer: tag-mismatch on address 0xefdf80bf at pc 0x00651270 READ of size 1 at 0xefdf80bf tags: 5f/79 (ptr/mem) in thread T0 #0 0x65126c in SigTrap<0> ../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_checks.h:27 #1 0x65126c in CheckAddress<(__hwasan::ErrorAction)0, (__hwasan::AccessType)0, 0> ../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_checks.h:88 #2 0x65126c in __hwasan_load1 ../../../../gcc-pdtl/libsanitizer/hwasan/hwasan.cpp:469 #3 0x2b143dc in wide_str_to_charconst ../../gcc-pdtl/libcpp/charset.c:1980 #4 0x2b143dc in cpp_interpret_charconst(cpp_reader*, cpp_token const*, unsigned int*, int*) ../../gcc-pdtl/libcpp/charset.c:2045 #5 0xb31a48 in lex_charconst ../../gcc-pdtl/gcc/c-family/c-lex.c:1368 #6 0xb35964 in c_lex_with_flags(tree_node**, unsigned int*, unsigned char*, int) ../../gcc-pdtl/gcc/c-family/c-lex.c:617 #7 0x89c6bc in cp_lexer_get_preprocessor_token ../../gcc-pdtl/gcc/cp/parser.c:807 #8 0x943cc0 in cp_lexer_new_main ../../gcc-pdtl/gcc/cp/parser.c:654 #9 0x943cc0 in cp_parser_new ../../gcc-pdtl/gcc/cp/parser.c:3968 #10 0x943cc0 in c_parse_file() ../../gcc-pdtl/gcc/cp/parser.c:42963 #11 0xb50c90 in c_common_parse_file() ../../gcc-pdtl/gcc/c-family/c-opts.c:1185 #12 0x16a49fc in compile_file ../../gcc-pdtl/gcc/toplev.c:458 #13 0x6466bc in do_compile ../../gcc-pdtl/gcc/toplev.c:2280 #14 0x6466bc in toplev::main(int, char**) ../../gcc-pdtl/gcc/toplev.c:2419 #15 0x649468 in main ../../gcc-pdtl/gcc/main.c:39 #16 0x93dd689c in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x1f89c) [0xefdf80a0,0xefdf80c0) is a small unallocated heap chunk; size: 32 offset: 31 0xefdf80bf is located 1 bytes to the left of 2-byte region [0xefdf80c0,0xefdf80c2) allocated here: #0 0x652bc0 in __sanitizer_realloc ../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_interceptors.cpp:146 #1 0x2b95f40 in xrealloc ../../gcc-pdtl/libiberty/xmalloc.c:179 #2 0x2b122ec in cpp_interpret_string_1 ../../gcc-pdtl/libcpp/charset.c:1753 #3 0x2b14284 in cpp_interpret_string(cpp_reader*, cpp_string const*, unsigned long, cpp_string*, cpp_ttype) ../../gcc-pdtl/libcpp/charset.c:1784 #4 0x2b14284 in cpp_interpret_charconst(cpp_reader*, cpp_token const*, unsigned int*, int*) ../../gcc-pdtl/libcpp/charset.c:2036 #5 0xb31a48 in lex_charconst ../../gcc-pdtl/gcc/c-family/c-lex.c:1368 #6 0xb35964 in c_lex_with_flags(tree_node**, unsigned int*, unsigned char*, int) ../../gcc-pdtl/gcc/c-family/c-lex.c:617 #7 0x89c6bc in cp_lexer_get_preprocessor_token ../../gcc-pdtl/gcc/cp/parser.c:807 #8 0x943cc0 in cp_lexer_new_main ../../gcc-pdtl/gcc/cp/parser.c:654 #9 0x943cc0 in cp_parser_new ../../gcc-pdtl/gcc/cp/parser.c:3968 #10 0x943cc0 in c_parse_file() ../../gcc-pdtl/gcc/cp/parser.c:42963 #11 0xb50c90 in c_common_parse_file() ../../gcc-pdtl/gcc/c-family/c-opts.c:1185 #12 0x16a49fc in compile_file ../../gcc-pdtl/gcc/toplev.c:458 #13 0x6466bc in do_compile ../../gcc-pdtl/gcc/toplev.c:2280 #14 0x6466bc in toplev::main(int, char**) ../../gcc-pdtl/gcc/toplev.c:2419 #15 0x649468 in main ../../gcc-pdtl/gcc/main.c:39 #16 0x93dd689c in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x1f89c) #17 0x64cb24 (/home/ubuntu/working-directory/gcc-hwasan-install/libexec/gcc/aarch64-unknown-linux-gnu/10.0.0/cc1plus+0x64cb24) Thread: T0 0xeffe2000 stack: [0xe544a000,0xe944a000) sz: 67108864 tls: [0x9402,0x94020850) Memory tags around the buggy address (one tag corresponds to 16 bytes): 0d 00 09 00 09 00 e7 09 09 00 e2 0c 9a 0c 0a 4a e7 0c 0d 00 0d 00 05 00 0d 00 08 00 08 00 08 00 08 00 0b 00 0b 00 0b 00 0b 00 0e 00 0e 00 05 00 0e 00 08 00 08 00 09 00 08 00 0c 00 0c 00 09 00 0c 00 0c 00 0c 00 08 00 0c 00 0b 00 0b 0
[Bug rtl-optimization/88904] New: Basic block incorrectly skipped in jump threading.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88904 Bug ID: 88904 Summary: Basic block incorrectly skipped in jump threading. Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- When compiling the attached code, with an arm-none-eabi cross compiler from trunk, arm-none-eabi-gcc -march=armv6-m -S test.c -o test.s -Os incorrect assembly is generated, which leads to the second assert always being triggered. This happens since revision r266734 which introduced a new pass running jump-threading just after reload. For the attached testcase this triggers a latent bug in the `thread_jump` function. The combine pass can modify a jump_insn so that its pattern is of the form (parallel [ (set (pc) ...) (clobber (scratch))]) which after reload can end up in the form (parallel [ (set (pc) (if_then_else (
[Bug rtl-optimization/88904] Basic block incorrectly skipped in jump threading.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88904 --- Comment #1 from Matthew Malcomson --- Created attachment 45458 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45458&action=edit Problematic testcase
[Bug middle-end/88950] New: stack_protect_prologue can be reordered by sched1 around memory accesses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950 Bug ID: 88950 Summary: stack_protect_prologue can be reordered by sched1 around memory accesses Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- I've found a testcase where the stack protector code generated through `-fstack-protector-all` doesn't actually protect anything. With the following testcase int foo (int a, int b, int c) { char buf[64]; buf[a] = 1; buf[b] = c; // Just add something so that the assignments above have some // observable behaviour. int retval = 0; for (size_t i = 0; i < 32; i++) { retval += buf[i]; } return retval; } When compiling for aarch64 with gcc -fstack-protector-all -g -S stack-reorder.c -o test.s -O3 -fdump-rtl-final (with ~gcc (GCC) 9.0.0 20181214 (experimental)~) We get an RTL dump on the final pass that has the snippet (insn 8 21 130 (parallel [ (set (mem/v/f/c:DI (plus:DI (reg/f:DI 31 sp) (const_int 88 [0x58])) [1 D.4227+0 S8 A64]) (unspec:DI [ (mem/v/f/c:DI (reg/f:DI 0 x0 [116]) [1 __stack_chk_guard+0 S8 A64]) ] UNSPEC_SP_SET)) (set (reg:DI 1 x1 [141]) (const_int 0 [0])) ]) "stack-reorder.c":3:31 1046 {stack_protect_set_di} (expr_list:REG_UNUSED (reg:DI 1 x1 [141]) (nil))) (note 130 8 117 (var_location b (entry_value:SI (reg:SI 1 x1 [ b ]))) NOTE_INSN_VAR_LOCATION) (note 117 130 118 stack-reorder.c:4 NOTE_INSN_BEGIN_STMT) (note 118 117 119 stack-reorder.c:5 NOTE_INSN_BEGIN_STMT) (note 119 118 120 stack-reorder.c:6 NOTE_INSN_BEGIN_STMT) (note 120 119 131 stack-reorder.c:10 NOTE_INSN_BEGIN_STMT) (note 131 120 121 (var_location retval (const_int 0 [0])) NOTE_INSN_VAR_LOCATION) (note 121 131 144 stack-reorder.c:11 NOTE_INSN_BEGIN_STMT) (note 144 121 122 0xb76fd960 NOTE_INSN_BLOCK_BEG) (note 122 144 132 stack-reorder.c:11 NOTE_INSN_BEGIN_STMT) (note 132 122 123 (var_location retval (nil)) NOTE_INSN_VAR_LOCATION) (note 123 132 145 stack-reorder.c:13 NOTE_INSN_BEGIN_STMT) (note 145 123 75 0xb76fd960 NOTE_INSN_BLOCK_END) (insn:TI 75 145 133 (parallel [ (set (reg:DI 1 x1 [137]) (unspec:DI [ (mem/v/f/c:DI (plus:DI (reg/f:DI 31 sp) (const_int 88 [0x58])) [1 D.4227+0 S8 A64]) (mem/v/f/c:DI (reg/f:DI 0 x0 [116]) [1 __stack_chk_guard+0 S8 A64]) ] UNSPEC_SP_TEST)) (clobber (reg:DI 2 x2 [142])) ]) "stack-reorder.c":16:1 1048 {stack_protect_test_di} (expr_list:REG_DEAD (reg/f:DI 0 x0 [116]) (expr_list:REG_UNUSED (reg:DI 2 x2 [142]) (nil In this snippet the stack protect set and test patterns are right next to each other, causing the stack protector to essentially do nothing. The RTL insns to set the two elements in `buf[]` are before this snippet. The stack_protect_set and stack_protect_test patterns are put together in the sched1 pass (as seen by the change in the RTL between the previous dump and that one). I would like to know what is supposed to stop RTL from the stack_protect_set pattern from being reordered around the code it protects like this? I don't believe aarch64 is doing anything special here -- the stack protect set and test patterns are very similar to those of other backends. I recognise this is an unlikely pattern of code and that it doesn't present as much of a security risk as things like calling memcpy or setting memory through some sort of loop.
[Bug middle-end/88950] stack_protect_prologue can be reordered by sched1 around memory accesses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950 --- Comment #1 from Matthew Malcomson --- Created attachment 45480 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45480&action=edit Testcase
[Bug middle-end/88950] stack_protect_prologue can be reordered by sched1 around memory accesses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950 --- Comment #3 from Matthew Malcomson --- aarch64 (both aarch64-none-linux-gnu and aarch64-none-elf)
[Bug rtl-optimization/88904] [9 Regression] Basic block incorrectly skipped in jump threading.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88904 --- Comment #3 from Matthew Malcomson --- I agree Jakub -- I've been testing a patch that does the same thing and everything seems to be working (though my patch was not as neat).
[Bug middle-end/88950] stack_protect_prologue can be reordered by sched1 around memory accesses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950 Matthew Malcomson changed: What|Removed |Added Known to fail||5.4.0 --- Comment #5 from Matthew Malcomson --- This problem has been around for a long time -- I have seen the same fundamental problem on gcc 5.4 (when looking for a version to put in the "known to work" field). With "gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" on the same testcase, the stack_protect_test pattern gets reordered to before the second memory access (the "buf[b] = c" line), and again the stack protection does not guard this memory access. (insn:TI 8 126 16 (parallel [ (set (mem/v/f/c:DI (plus:DI (reg/f:DI 29 x29) (const_int 88 [0x58])) [1 D.2834+0 S8 A64]) (unspec:DI [ (mem/v/f/c:DI (reg/f:DI 3 x3 [100]) [1 __stack_chk_guard+0 S8 A64]) ] UNSPEC_SP_SET)) (set (reg:DI 5 x5 [126]) (const_int 0 [0])) ]) stack-reorder.c:1 864 {stack_protect_set_di} (expr_list:REG_UNUSED (reg:DI 5 x5 [126]) (nil))) (insn:TI 16 8 71 (set (mem/j:QI (plus:DI (reg:DI 0 x0 [105]) (const_int 4016 [0xfb0])) [0 buf S1 A8]) (reg:QI 4 x4 [106])) stack-reorder.c:3 45 {*movqi_aarch64} (expr_list:REG_DEAD (reg:QI 4 x4 [106]) (expr_list:REG_DEAD (reg:DI 0 x0 [105]) (nil (insn 71 16 22 (parallel [ (set (reg:DI 3 x3 [125]) (unspec:DI [ (mem/v/f/c:DI (plus:DI (reg/f:DI 29 x29) (const_int 88 [0x58])) [1 D.2834+0 S8 A64]) (mem/v/f/c:DI (reg/f:DI 3 x3 [100]) [1 __stack_chk_guard+0 S8 A64]) ] UNSPEC_SP_TEST)) (clobber (reg:DI 0 x0 [127])) ]) stack-reorder.c:14 866 {stack_protect_test_di} (expr_list:REG_UNUSED (reg:DI 0 x0 [127]) (nil))) (insn:TI 22 71 140 (set (mem/j:QI (plus:DI (reg:DI 1 x1 [110]) (const_int 4016 [0xfb0])) [0 buf S1 A8]) (reg:QI 2 x2 [ c ])) stack-reorder.c:4 45 {*movqi_aarch64} (expr_list:REG_DEAD (reg:QI 2 x2 [ c ]) (expr_list:REG_DEAD (reg:DI 1 x1 [110]) (nil
[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714 Matthew Malcomson changed: What|Removed |Added CC||matmal01 at gcc dot gnu.org --- Comment #29 from Matthew Malcomson --- Hi Jakub, I've been working on a patch that does very similar to the draft patch posted above, and I notice a few things I've tried to avoid in it. I doubt there are any actual bugs, since I don't know if the patterns that trigger actual faults can occur at the moment. Using the `address_operand` predicate and 'p' constraint to ensure the address is a valid address would use the mode SImode of the operand rather than checking it's valid for the DImode of the emitted ldrd. If this happens we generate an ICE in the `adjust_address` call just before `output_move_double`. I don't know if such a pattern can actually be generated, but we could use `arm_legitimate_address_p (DImode, XEXP (operands[1], 0), true)` in the condition to avoid it just in case. There's a similar problem to the `address_operand` one above with using the `arm_count_output_move_double_insns` function. It's called on the original operands, which means it eventually calls `output_move_double` with the first two operands (which are in SImode). This function has some calls to `reg_overlap_mentioned_p`, which depends on the number of hard registers for a given registers mode. I've only found cases where the `arm_count_output_move_double_insns` function returns something other than what it should in cases that only match because of the `address_operand` problem above. This could be replaced by a wrapper that generates DImode registers specifically for checking this. --- I think generation of patterns of the form (plus:SI (plus:SI (reg) (const_int)) (const_int)) which can happen with these peepholes isn't very nice. I can't find any constraint against these patterns in the canonicalization rules (maybe there should be?) so I can't say this is an actual problem. As an example: the following int __RTL (startwith ("peephole2")) foo_x4 (int *a) { (function "foo_x4" (insn-chain (cnote 1 NOTE_INSN_DELETED) (block 2 (edge-from entry (flags "FALLTHRU")) (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) (cinsn 101 (set (reg:SI r2) (mem/c:SI (plus:SI (reg:SI r0) (const_int 8)) [0 S4 A64])) "/home/matmal01/test.c":18) (cinsn 102 (set (reg:SI r3) (mem/c:SI (plus:SI (reg:SI r0) (const_int 12)) [0 S4 A32])) "/home/matmal01/test.c":18) (cinsn 103 (set (reg:SI r0) (plus:SI (reg:SI r2) (reg:SI r3))) "/home/matmal01/test.c":18) (edge-to exit (flags "FALLTHRU")) ) ;; block 2 ) ;; insn-chain (crtl (return_rtx (reg/i:SI r0) ) ;; return_rtx ) ;; crtl ) ;; function "main" } Produces (insn 104 3 103 2 (parallel [ (set (reg:SI 2 r2) (mem/c:SI (plus:SI (reg:SI 0 r0) (const_int 8 [0x8])) [0 S4 S4 A64])) (set (reg:SI 3 r3) (mem/c:SI (plus:SI (plus:SI (reg:SI 0 r0) (const_int 8 [0x8])) (const_int 4 [0x4])) [0 S4 S4 A32])) ]) -1 (nil)) Maybe we could use the existing operands, and match with `rtx_equal_p (..., plus_constant (...))` so that the plus_constant can take care of adding the constants together. This is what we do in the load_pair patterns for aarch64. There are a few other tidy-up points around the define_insn patterns, but overall I believe they can be merged into one pattern. The difference between the 'q' and 'r' constraints are using either CORE_REGS or GENERAL_REGS, where CORE_REGS allows r13 and GENERAL_REGS doesn't. I guess this is from a line in infocenter that mentions r12 is strongly recommended to not be used as the first register for ldrdb, as this is stopped by requiring both the first and second register to not be r13. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cihjffga.html
[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714 --- Comment #31 from Matthew Malcomson --- (In reply to Jakub Jelinek from comment #30) > (In reply to Matthew Malcomson from comment #29) > > I've been working on a patch that does very similar to the draft patch > > posted > > above, and I notice a few things I've tried to avoid in it. > > I doubt there are any actual bugs, since I don't know if the patterns that > > trigger actual faults can occur at the moment. > > > > > > > > Using the `address_operand` predicate and 'p' constraint to ensure the > > address > > is a valid address would use the mode SImode of the operand rather than > > checking > > it's valid for the DImode of the emitted ldrd. > > Sure, but does it really matter? > This is a post reload pattern created by the peephole2s, so nothing that can > be matched out of the blue sky like combiner normally matches. > So, if it didn't pass the conditions in the peephole2s, the patterns > wouldn't be created. True -- as I mentioned I don't know if a problematic pattern could actually occur, so I doubt this is actually a problem. > Are there any addresses that pass arm_legitimate_address_p (DImode, x, true) > and fail address_operand (x, SImode)? From brief skimming I couldn't find > anything. > So, would you be happy if the && arm_legitimate_address_p (DImode, XEXP > (operands[n], 0), true) > condition is added to the insn conditions (after the rtx_equal_p check)? That sounds good to me. > > > There's a similar problem to the `address_operand` one above with using the > > `arm_count_output_move_double_insns` function. > > > > It's called on the original operands, which means it eventually calls > > `output_move_double` with the first two operands (which are in SImode). > > > > This function has some calls to `reg_overlap_mentioned_p`, which depends on > > the > > number of hard registers for a given registers mode. > > > > I've only found cases where the `arm_count_output_move_double_insns` > > function > > returns something other than what it should in cases that only match because > > of > > the `address_operand` problem above. > > > > This could be replaced by a wrapper that generates DImode registers > > specifically > > for checking this. > > For non-vfp or iwmmxt, the length is always 8, are there cases in the vfp > insn that the length is not 8? I believe the length *can* be 4 non-vfp, vfp, or iwmmxt (the case below produces a single ldrd when compiled with each of them). int __RTL (startwith ("peephole2")) foo_x4 (int *a) { (function "foo_x4" (insn-chain (cnote 1 NOTE_INSN_DELETED) (block 2 (edge-from entry (flags "FALLTHRU")) (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) (cinsn 101 (set (reg:SI r2) (mem/c:SI (plus:SI (reg:SI r0) (const_int 8)) [0 S4 A64])) "/home/matmal01/test.c":18) (cinsn 102 (set (reg:SI r3) (mem/c:SI (plus:SI (reg:SI r0) (const_int 12)) [0 S4 A32])) "/home/matmal01/test.c":18) (cinsn 103 (set (reg:SI r0) (plus:SI (reg:SI r2) (reg:SI r3))) "/home/matmal01/test.c":18) (edge-to exit (flags "FALLTHRU")) ) ;; block 2 ) ;; insn-chain (crtl (return_rtx (reg/i:SI r0) ) ;; return_rtx ) ;; crtl ) ;; function "main" } Something else I've just noticed: When compiling for vfp or iwmmxt, the ldm2_ define_insn matches the simpler case below as it comes first in the md order. That means we get a ldm instruction instead of the ldrd. int __RTL (startwith ("peephole2")) foo_x5 (int *a) { (function "foo_x5" (insn-chain (cnote 1 NOTE_INSN_DELETED) (block 2 (edge-from entry (flags "FALLTHRU")) (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) (cinsn 101 (set (reg:SI r2) (mem/c:SI (reg:SI r0) [0 S4 A64])) "/home/matmal01/test.c":18) (cinsn 102 (set (reg:SI r3) (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4 A32])) "/home/matmal01/test.c":18) (cinsn 103 (set (reg:SI r0) (plus:SI (reg:SI r2) (reg:SI r3))) "/home/matmal01/test.c":18) (edge-to exit (flags "FALLTHRU")) ) ;; block 2 ) ;; insn-chain (crtl (return_rtx (reg/i:SI r0) ) ;; return_rtx ) ;; crtl ) ;; function "main" }
[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714 --- Comment #32 from Matthew Malcomson --- Created attachment 45584 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45584&action=edit Single define_insn version of above patch FWIW I've attached the patch I'd made. The only interesting differences are that I'd added only one define_insn as I don't believe the existing patterns' difference in constraints is needed and I made some RTL testcases. (I've just now added the testcase you found).
[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714 --- Comment #34 from Matthew Malcomson --- Yes, I needed to redo that check for an offset of 4 -- I compared the expression of the first MEM with the result of `plus_constant` with 4 on the expression of the second MEM in the condition.
[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714 --- Comment #37 from Matthew Malcomson --- Good point (and interesting about the HOST_WIDE_INT_MIN exception -- I didn't know that). To avoid duplication of effort would you prefer I make the change or do you want to handle it?
[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714 --- Comment #39 from Matthew Malcomson --- (In reply to Jakub Jelinek from comment #38) > I don't mind if you take over, I don't really have good opportunities to > test on arm anyway. Though, do you have copyright assignment on file (or > covered by ARM or Linaro or similar assignments)? OK, will do. I'm covered by the ARM assignment.
[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714 --- Comment #42 from Matthew Malcomson --- Author: matmal01 Date: Thu Feb 7 14:54:15 2019 New Revision: 268644 URL: https://gcc.gnu.org/viewcvs?rev=268644&root=gcc&view=rev Log: [Patch] [arm] Fix 88714, Arm LDRD/STRD peepholes. These peepholes match a pair of SImode loads or stores that can be implemented with a single LDRD or STRD instruction. When compiling for TARGET_ARM, these peepholes originally created a set pattern in DI mode to be caught by movdi patterns. This approach failed to take into account the possibility that the two matched insns operated on memory with different aliasing information. The peepholes lost the aliasing information on one of the insns, which could then cause the scheduler to make an invalid transformation. This patch changes the peepholes so they generate a PARALLEL expression of the two relevant loads or stores, which means the aliasing information of both is kept. Such a PARALLEL pattern is what the peepholes currently produce for TARGET_THUMB2. In order to match these new insn patterns, we add two new define_insn's. These define_insn's use the same checks as the peepholes to find valid insns. Note that the patterns now created by the peepholes for LDRD and STRD are very similar to those created by the peepholes for LDM and STM. Many patterns could be matched by the LDM and STM define_insns, which means we rely on the order the define_insn patterns are defined in the machine description, with those for LDRD/STRD defined before those for LDM/STM. The difference between the peepholes for LDRD/STRD and those for LDM/STM are mainly that those for LDRD/STRD have some logic to ensure that the two registers are consecutive and the first one is even. Bootstrapped and regtested on arm-none-linux-gnu. Demonstrated fix of bug 88714 by bootstrapping on armv7l. gcc/ChangeLog: 2019-02-07 Matthew Malcomson Jakub Jelinek PR bootstrap/88714 * config/arm/arm-protos.h (valid_operands_ldrd_strd, arm_count_ldrdstrd_insns): New declarations. * config/arm/arm.c (mem_ok_for_ldrd_strd): Remove broken handling of MINUS. (valid_operands_ldrd_strd): New function. (arm_count_ldrdstrd_insns): New function. * config/arm/ldrdstrd.md: Change peepholes to generate PARALLEL SImode sets instead of single DImode set and define new insns to match this. gcc/testsuite/ChangeLog: 2019-02-07 Matthew Malcomson Jakub Jelinek PR bootstrap/88714 * gcc.c-torture/execute/pr88714.c: New test. * gcc.dg/rtl/arm/ldrd-peepholes.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr88714.c trunk/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm-protos.h trunk/gcc/config/arm/arm.c trunk/gcc/config/arm/ldrdstrd.md trunk/gcc/testsuite/ChangeLog
[Bug target/89324] [9 Regression] ICE in extract_constrain_insn, at recog.c:2211 on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89324 --- Comment #3 from Matthew Malcomson --- (In reply to ktkachov from comment #2) > The sub3_compare1_imm pattern was introduced for GCC 9. It's probably > something going wrong with the constraints. Matthew, could you take a look > please? On first blush it looks like the define_peephole2 generating this instruction allows the stack pointer while the 'r' constraint in the pattern doesn't accept it. A quick check of only allowing GENERAL_REGS registers in the peephole indeed stops the generation of this instruction and hence avoids the bug. I haven't yet checked whether the pattern should allow the stack pointer or not.
[Bug target/89324] [9 Regression] ICE in extract_constrain_insn, at recog.c:2211 on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89324 --- Comment #4 from Matthew Malcomson --- There were similar problems in handling the stack pointer with subs/adds instructions elsewhere in the aarch64 backend. Patch proposed & being worked on here: https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01458.html
[Bug target/89324] [9 Regression] ICE in extract_constrain_insn, at recog.c:2211 on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89324 --- Comment #5 from Matthew Malcomson --- Author: matmal01 Date: Fri Feb 22 16:35:22 2019 New Revision: 269122 URL: https://gcc.gnu.org/viewcvs?rev=269122&root=gcc&view=rev Log: Handle stack pointer with SUBS/ADDS instructions. In general the stack pointer was not handled for many SUBS/ADDS patterns in aarch64.md. Both the "extended register" and "immediate" forms allow the stack pointer to be used as the source register, while no form allows the stack pointer for the destination register. The define_insn patterns generating ADDS/SUBS did not allow the stack pointer for any operand, while the define_peephole2 patterns that generated RTX to be matched by these patterns allowed the stack pointer for any operand. The patterns are fixed by adding the 'k' constraint for the first source operand to all define_insns that generate the ADDS/SUBS "extended register" and "immediate" forms (but not the "shifted register" form). In peephole optimizations, constraint strings are ignored (see "(gccint) C Constraint Interface" info node in the documentation), so the decision to act or not is based solely on the predicate and condition. This patch introduces a new predicate "aarch64_general_reg" to be used in define_peephole2 patterns where only GENERAL_REGS registers are acceptable and uses that predicate in the peepholes that generate patterns for ADDS/SUBS. Full bootstrap and regtest done on aarch64-none-linux-gnu. Regression tests done on aarch64-none-linux-gnu and aarch64-none-elf cross compiler. OK for trunk? gcc/ChangeLog: 2019-02-22 Matthew Malcomson PR target/89324 * config/aarch64/aarch64.md: Use aarch64_general_reg predicate on destination register in peepholes generating patterns for ADDS/SUBS. (add3_compare0, *addsi3_compare0_uxtw, add3_compareC, add3_compareV_imm, add3_compareV, *adds__, *subs__, *adds__shift_, *subs__shift_, *adds__multp2, *subs__multp2, *sub3_compare0, *subsi3_compare0_uxtw, sub3_compare1): Allow stack pointer for source register. * config/aarch64/predicates.md (aarch64_general_reg): New predicate. gcc/testsuite/ChangeLog: 2019-02-22 Matthew Malcomson PR target/89324 * gcc.dg/rtl/aarch64/subs_adds_sp.c: New test. * gfortran.fortran-torture/compile/pr89324.f90: New test. Added: trunk/gcc/testsuite/gcc.dg/rtl/aarch64/subs_adds_sp.c trunk/gcc/testsuite/gfortran.fortran-torture/compile/pr89324.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64.md trunk/gcc/config/aarch64/predicates.md trunk/gcc/testsuite/ChangeLog
[Bug target/89324] [9 Regression] ICE in extract_constrain_insn, at recog.c:2211 on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89324 Matthew Malcomson changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Matthew Malcomson --- Fixed on trunk.
[Bug ada/89493] Stack smashing on armv7hl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89493 Matthew Malcomson changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-03-08 CC||matmal01 at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Matthew Malcomson --- I've reproduced manually using the reproducer and a bootstrapped gcc at r268766 gcc r265397 does not reproduce the problem. Hence I'm marking this as a regression. Between these two versions bootstrap with the configure line below fails. ../gcc-source/configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,lto --prefix=${HOME}/gcc-install --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-gnu-indirect-function --disable-sjlj-exceptions --with-tune=generic-armv7-a --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-abi=aapcs-linux --build=armv7hl-redhat-linux-gnueabi (mostly taken from the configure shown in `gcc -v` from the given package version)
[Bug target/90024] New: [7/8/9 Regression] ICE on AArch32 NEON mov with TImode constant.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024 Bug ID: 90024 Summary: [7/8/9 Regression] ICE on AArch32 NEON mov with TImode constant. Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: ice-on-valid-code, patch Severity: normal Priority: P3 Component: target Assignee: matmal01 at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- Created attachment 46111 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46111&action=edit Proposed fix The below code causes an ICE for AArch32 targets with NEON at all optimisation levels except -O0. union a { char b; long long c; }; union a d; int g(int, union a, union a); void e() { union a f[2] = {-1L}; g(0, d, f[0]); } With the backtrace below. $ arm-none-eabi-gcc -march=armv8-a -c test.c -O1 -mfloat-abi=hard -mfpu=neon-fp-armv8 during RTL pass: final test.c: In function 'e': test.c:10:1: internal compiler error: in output_950, at config/arm/neon.md:89 10 | } | ^ 0x1352bfb output_950 /tmp/dgboter/bbs/rhev-vm4--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/config/arm/neon.md:89 0x8aafbd get_insn_template(int, rtx_insn*) /tmp/dgboter/bbs/rhev-vm4--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:2071 I have a patch to fix the problem, creating a bugzilla report for tracking purposes (patch added as attachment, the explanation will be added in comments).
[Bug target/90024] [7/8/9 Regression] ICE on AArch32 NEON mov with TImode constant.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024 Matthew Malcomson changed: What|Removed |Added Target||arm Known to work||4.9.0 --- Comment #1 from Matthew Malcomson --- The "*neon_mov" patterns for 128 bit sized quantities uses the "Dn" constraint to match vmov.f32 and vmov.i patterns. This constraint boils down to using the `neon_immediate_valid` function. Once the constraint has matched, the output C statement asserts the same function passes. The output C statement calls `neon_immediate_valid` with the mode taken from the iterator, while the constraint takes the mode from the operand. In the above testcase the operand is a CONST_INT, which means the constraint passes VOIDmode (treated the same as DImode in `neon_immediate_valid`), while the C statement passes TImode (the mode of the iterator). This causes second call to `neon_immediate_valid` to fail as the value provided is only valid in DImode but not TImode, and that causes the ICE. The attached patch splits the original "Dn" constraint into three new constraints, "DN" for TImode CONST_INT, "Dn" for DImode CONST_INT, and "Dm" for CONST_VECTOR. This requires one extra alternative in the "*neon_mov" patterns, but makes it clear from the constraint what mode is being used. We use the "DN" constraint for the define_insn that matches TImode values, and hence avoid the above problem.
[Bug target/90024] [7/8/9 Regression] ICE on AArch32 NEON mov with TImode constant.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024 --- Comment #2 from Matthew Malcomson --- Author: matmal01 Date: Tue Apr 9 11:39:59 2019 New Revision: 270226 URL: https://gcc.gnu.org/viewcvs?rev=270226&root=gcc&view=rev Log: Hi there, The "*neon_mov" patterns for 128 bit sized quantities uses the "Dn" constraint to match vmov.f32 and vmov.i patterns. This constraint boils down to using the `neon_immediate_valid` function. Once the constraint has matched, the output C statement asserts that function passes. The output C statement calls `neon_immediate_valid` with the mode taken from the iterator, while the constraint takes the mode from the operand. This can cause a discrepency when the operand is a CONST_INT, as the constraint passes VOIDmode which `neon_immediate_valid` treats as DImode, while the C statement passes the mode of the iterator which can be TImode. When this happens, the `neon_immediate_valid` can fail in the second call (if e.g. the CONST_INT is a valid immediate in DImode but not TImode) which would trigger the assertion. The testcase added with this patch triggers this when compiled with an arm cross compiler using the command line below. gcc -march=armv8-a -c neon-immediate-timode.c -O1 -mfloat-abi=hard -mfpu=neon-fp-armv8 This patch splits the original "Dn" constraint into three new constraints, "DN" for TImode CONST_INT, "Dn" for DImode CONST_INT, and "Dm" for CONST_VECTOR. Splitting things up this way requires using one extra alternative in the "*neon_mov" patterns, but makes it clear from the constraint what mode is being used. We also remove the behaviour of treating VOIDmode as DImode in `neon_valid_immediate` since the original "Dn" constraint was the only place that functionality was used. VOIDmode is now never passed to that function. An assertion has been added to the function to ensure this problem is caught earlier on. Bootstrapped on arm-none-linux-gnueabihf Regtested on cross-compiler arm-none-eabi gcc/ChangeLog: 2019-04-09 Matthew Malcomson PR target/90024 * config/arm/arm.c (neon_valid_immediate): Disallow VOIDmode parameter. * config/arm/constraints.md (Dm, DN, Dn): Split previous Dn constraint into three. * config/arm/neon.md (*neon_mov): Account for TImode and DImode differences directly. (*smax3_neon, vashl3, vashr3_imm): Use Dm constraint. gcc/testsuite/ChangeLog: 2019-04-09 Matthew Malcomson PR target/90024 * gcc.dg/torture/neon-immediate-timode.c: New test. Added: trunk/gcc/testsuite/gcc.dg/torture/neon-immediate-timode.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c trunk/gcc/config/arm/constraints.md trunk/gcc/config/arm/neon.md trunk/gcc/testsuite/ChangeLog
[Bug target/90024] [7/8 Regression] ICE on AArch32 NEON mov with TImode constant.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024 --- Comment #3 from Matthew Malcomson --- Author: matmal01 Date: Wed Apr 10 13:34:54 2019 New Revision: 270253 URL: https://gcc.gnu.org/viewcvs?rev=270253&root=gcc&view=rev Log: Backport of r270226 from mainline to gcc-7-branch The "*neon_mov" patterns for 128 bit sized quantities uses the "Dn" constraint to match vmov.f32 and vmov.i patterns. This constraint boils down to using the `neon_immediate_valid` function. Once the constraint has matched, the output C statement asserts that function passes. The output C statement calls `neon_immediate_valid` with the mode taken from the iterator, while the constraint takes the mode from the operand. This can cause a discrepency when the operand is a CONST_INT, as the constraint passes VOIDmode which `neon_immediate_valid` treats as DImode, while the C statement passes the mode of the iterator which can be TImode. When this happens, the `neon_immediate_valid` can fail in the second call (if e.g. the CONST_INT is a valid immediate in DImode but not TImode) which would trigger the assertion. The testcase added with this patch triggers this when compiled with an arm cross compiler using the command line below. gcc -march=armv8-a -c neon-immediate-timode.c -O1 -mfloat-abi=hard -mfpu=neon-fp-armv8 This patch splits the original "Dn" constraint into three new constraints, "DN" for TImode CONST_INT, "Dn" for DImode CONST_INT, and "Dm" for CONST_VECTOR. Splitting things up this way requires using one extra alternative in the "*neon_mov" patterns, but makes it clear from the constraint what mode is being used. We also remove the behaviour of treating VOIDmode as DImode in `neon_valid_immediate` since the original "Dn" constraint was the only place that functionality was used. VOIDmode is now never passed to that function. An assertion has been added to the function to ensure this problem is caught earlier on. bootstrapped and regtested on arm-none-linux-gnueabihf gcc/ChangeLog: 2019-04-10 Matthew Malcomson PR target/90024 * config/arm/arm.c (neon_valid_immediate): Disallow VOIDmode parameter. * config/arm/constraints.md (Dm, DN, Dn): Split previous Dn constraint into three. * config/arm/neon.md (*neon_mov): Account for TImode and DImode differences directly. (*smax3_neon, vashl3, vashr3_imm): Use Dm constraint. gcc/testsuite/ChangeLog: 2019-04-10 Matthew Malcomson PR target/90024 * gcc.dg/torture/neon-immediate-timode.c: New test. Added: branches/gcc-7-branch/gcc/testsuite/gcc.dg/torture/neon-immediate-timode.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/arm/arm.c branches/gcc-7-branch/gcc/config/arm/constraints.md branches/gcc-7-branch/gcc/config/arm/neon.md branches/gcc-7-branch/gcc/testsuite/ChangeLog
[Bug target/90024] [7/8 Regression] ICE on AArch32 NEON mov with TImode constant.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024 --- Comment #4 from Matthew Malcomson --- Author: matmal01 Date: Wed Apr 10 13:41:21 2019 New Revision: 270254 URL: https://gcc.gnu.org/viewcvs?rev=270254&root=gcc&view=rev Log: Backport of r270226 from mainline to gcc-8-branch The "*neon_mov" patterns for 128 bit sized quantities uses the "Dn" constraint to match vmov.f32 and vmov.i patterns. This constraint boils down to using the `neon_immediate_valid` function. Once the constraint has matched, the output C statement asserts that function passes. The output C statement calls `neon_immediate_valid` with the mode taken from the iterator, while the constraint takes the mode from the operand. This can cause a discrepency when the operand is a CONST_INT, as the constraint passes VOIDmode which `neon_immediate_valid` treats as DImode, while the C statement passes the mode of the iterator which can be TImode. When this happens, the `neon_immediate_valid` can fail in the second call (if e.g. the CONST_INT is a valid immediate in DImode but not TImode) which would trigger the assertion. The testcase added with this patch triggers this when compiled with an arm cross compiler using the command line below. gcc -march=armv8-a -c neon-immediate-timode.c -O1 -mfloat-abi=hard -mfpu=neon-fp-armv8 This patch splits the original "Dn" constraint into three new constraints, "DN" for TImode CONST_INT, "Dn" for DImode CONST_INT, and "Dm" for CONST_VECTOR. Splitting things up this way requires using one extra alternative in the "*neon_mov" patterns, but makes it clear from the constraint what mode is being used. We also remove the behaviour of treating VOIDmode as DImode in `neon_valid_immediate` since the original "Dn" constraint was the only place that functionality was used. VOIDmode is now never passed to that function. An assertion has been added to the function to ensure this problem is caught earlier on. bootstrapped and regtested on arm-none-linux-gnueabihf gcc/ChangeLog: 2019-04-10 Matthew Malcomson PR target/90024 * config/arm/arm.c (neon_valid_immediate): Disallow VOIDmode parameter. * config/arm/constraints.md (Dm, DN, Dn): Split previous Dn constraint into three. * config/arm/neon.md (*neon_mov): Account for TImode and DImode differences directly. (*smax3_neon, vashl3, vashr3_imm): Use Dm constraint. gcc/testsuite/ChangeLog: 2019-04-10 Matthew Malcomson PR target/90024 * gcc.dg/torture/neon-immediate-timode.c: New test. Added: branches/gcc-8-branch/gcc/testsuite/gcc.dg/torture/neon-immediate-timode.c Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/config/arm/arm.c branches/gcc-8-branch/gcc/config/arm/constraints.md branches/gcc-8-branch/gcc/config/arm/neon.md branches/gcc-8-branch/gcc/testsuite/ChangeLog
[Bug target/90024] [7/8 Regression] ICE on AArch32 NEON mov with TImode constant.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024 Matthew Malcomson changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #5 from Matthew Malcomson --- Fixed
[Bug target/90024] [7/8 Regression] ICE on AArch32 NEON mov with TImode constant.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024 Matthew Malcomson changed: What|Removed |Added Target Milestone|7.5 |7.6
[Bug sanitizer/90414] New: [Feature] Implementing HWASAN (and eventually MTE)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90414 Bug ID: 90414 Summary: [Feature] Implementing HWASAN (and eventually MTE) Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org, ramana at gcc dot gnu.org, rearnsha at gcc dot gnu.org Target Milestone: --- Hello, I'm looking into how we can implement MTE in the compiler. A productive first step could be implementing HWASAN for GCC, which does a software implementation of MTE using the top-byte-ignore feature. This has already been implemented in LLVM and the design can be found at the link below. https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html Hopefully we can make this change in such a way that will enable the use of MTE in the future. I don't know the best approach here, and would appreciate any feedback. >From inspection it looks like most of the work is already handled by ASAN -- especially in finding all those places that need to be instrumented -- so I was looking into what modifications would need to be made from that starting point. I believe that tagging stack allocated memory can be done in a similar way to ASAN by expanding the equivalent of ASAN_MARK in a relevant manner. However, checking memory accesses seems to need a different approach to the current ASAN one with ASAN_CHECK. For both HWASAN and MTE we need to find the tag that a given memory access should be done through. In order to produce the best machine-code we would need to associate each stack variable with a tag internally. In the LLVM implementation this is done by generating a random tag for the current stack, and associating each stack variable with an increment from this tag. Also, for MTE the access itself needs to be made with a tagged pointer, which means the current method of adding instructions before a memory access can't be used and instead we need to modify the memory access itself. I have some very basic questions that I would appreciate any help in answering. 1) Where should such passes be put? I would guess that putting HWASAN and/or MTE passes in the same position as the ASAN passes and updating the SANOPT pass to handle any changes would be ok, but I don't have a good understanding of why they are in their current position. 2) Can we always find the base object that's being referenced from the gimple statement where memory is accessed or a pointer is created? If not, when is it problematic? Finding the base object is pretty fundamental to getting the tag for a pointer. It seems like this should be possible based on a reading of the documentation and looking at the TREE_CODEs that the current ASAN `instrument_derefs` function works on. (ARRAY_REF -> first operand is the array MEM_REF -> first operand is the base COMPONENT_REF -> first operand is the object INDIRECT_REF -> first operand is the pointer which should reference object VAR_DECL -> this is the object BIT_FIELD_REF -> first operand is the object) 3) Would there be any obvious difficulties with a transformation of the form: _4 = big_arrayD.3771[num_3(D)] TO _6 = &big_arrayD.3771[num_3(D)]; _7 = HWASAN_CHECK(6, _6, 4, 4); _4 = *_7; Instead of _4 = big_arrayD.3771[num_3(D)] TO _6 = &big_arrayD.3771[num_3(D)]; ASAN_CHECK(6, _6, 4, 4); _4 = big_arrayD.3771[num_3(D)] which is what ASAN currently does. This new form would enable using MTE by allowing the check to modify the pointer that the access will be made with (so it can have have its tag). 4) Builtin memory calls look like they could be handled with HWASAN in basically the same way as ASAN, while for MTE they should be fine once the pointers the calls are provided are tagged. Is there anything stopping that approach? Thanks, MM
[Bug sanitizer/90414] [Feature] Implementing HWASAN (and eventually MTE)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90414 --- Comment #2 from Matthew Malcomson --- (In reply to Richard Biener from comment #1) > (In reply to Matthew Malcomson from comment #0) > > Hello, > > > > I'm looking into how we can implement MTE in the compiler. > > What is MTE? It's an architecture extension, otherwise known as memory tagging or memtag. There's a high-level explanation in the link below, https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a the instructions are introduced in Armv8.5-a. https://developer.arm.com/docs/ddi0596/latest/base-instructions-alphabetic-order I can't find a document describing *just* the MTE extension right now, but the gist is that memory regions can have associated tags, and accesses to those memory regions must be done with a pointer that has the corresponding tag in bits 56-59 inclusive. This is why the MTE extension will need to modify how the access is performed instead of just adding in a check before the access is done -- it needs to ensure that the pointer has the correct tag associated with the base object that it's trying to access. It's for the MTE extension that we would need the transformation below (so that the *_CHECK ifn can ensure the pointer has the relevant tag before access). > > ... > > 3) Would there be any obvious difficulties with a transformation of the > > form: > > _4 = big_arrayD.3771[num_3(D)] > > > > TO > > > > _6 = &big_arrayD.3771[num_3(D)]; > > _7 = HWASAN_CHECK(6, _6, 4, 4); > > _4 = *_7; > > > >Instead of > > _4 = big_arrayD.3771[num_3(D)] > > > > TO > > > > _6 = &big_arrayD.3771[num_3(D)]; > > ASAN_CHECK(6, _6, 4, 4); > > _4 = big_arrayD.3771[num_3(D)] > > > >which is what ASAN currently does. > >This new form would enable using MTE by allowing the check to modify the > >pointer that the access will be made with (so it can have have its tag). > > The "obvious" difficulties is that HWASAN_CHECK expansion needs to handle > expanding the actual memory reference. But that's only a slight > complication. > > Other complication is of course that it may pessimize optimization more > than the old approach.
[Bug sanitizer/90414] [Feature] Implementing HWASAN (and eventually MTE)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90414 --- Comment #4 from Matthew Malcomson --- (In reply to Martin Liška from comment #3) > (In reply to Matthew Malcomson from comment #0) > > 2) Can we always find the base object that's being referenced from the > > gimple > >statement where memory is accessed or a pointer is created? > >If not, when is it problematic? > >Finding the base object is pretty fundamental to getting the tag for a > >pointer. > >It seems like this should be possible based on a reading of the > > documentation > >and looking at the TREE_CODEs that the current ASAN `instrument_derefs` > >function works on. > > > >(ARRAY_REF -> first operand is the array > > MEM_REF -> first operand is the base > > COMPONENT_REF -> first operand is the object > > INDIRECT_REF -> first operand is the pointer which should reference > > object > > VAR_DECL -> this is the object > > BIT_FIELD_REF -> first operand is the object) > > There would be cases where a base is known and for these you could probably > instrument checks with a constant known tag. For other situation, you'll > probably > need to extract the tag from the pointer. Right? > Yes, I'll need to extract the tag from the pointer in cases that don't match one of these patterns. That actually leads into something I forgot to mention when I wrote the comment above -- I'll need to instrument ADDR_EXPR statements to make sure any pointers in the program will already have their tag assigned. To do that I think I need to add another instrumentation site for when the address of something is taken to handle for any statements taking the address of something. This may be by adding another if statement in `transform_statements` to make this transformation before the one instrumenting the actual access, or it may be in a separate iteration before the one inserting the current checks since statements like the below would need to be split to instrument the ADDR_EXPR and MEM_REF expressions seperately. MEM[(int *)&stack_object] = direction_8(D); > > > > Thanks, > > MM > > In general, I'm interested in implementation of the feature, but I'll > probably not > find a time to do it. However, I can help you with that. Great! I'll appreciate any help and/or advice you can give.
[Bug target/90588] [AArch64] SVE2 flag patch omits aarch64-protos.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90588 Matthew Malcomson changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2019-05-23 CC||matmal01 at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |matmal01 at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Matthew Malcomson --- Thanks -- I guess I missed that because I bootstrapped on aarch64 instead of building a cross compiler on an arch where unsigned long and uint64_t are different. I'll go and fix it & test it better this time ;-)
[Bug target/90588] [AArch64] SVE2 flag patch omits aarch64-protos.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90588 --- Comment #2 from Matthew Malcomson --- Author: matmal01 Date: Fri May 24 10:39:38 2019 New Revision: 271599 URL: https://gcc.gnu.org/viewcvs?rev=271599&root=gcc&view=rev Log: [aarch64] Change two function declaration types Commit r271514 missed changing the type of two functions in aarch64-protos.h. The function definitions had been updated to use uint64_t while the function declarations had been missed. They were missed since I only tested the patch on aarch64 where `unsigned long` is the same as `uint64_t`. This patch updates these declarations in aarch64-protos.h. Tested by building an aarch64 cross-compiler on arm-none-linux-gnu (so that `unsigned long` and `uint64_t` are different and would give error messages), and bootstrapping on aarch64-none-linux-gnu. Also manually tested command line options to see that -march=armv8-a+typo prints out the expected flags while using the new feature flags does not complain about missing flags. gcc/ChangeLog: 2019-05-24 Matthew Malcomson PR target/90588 * common/config/aarch64/aarch64-common.c (aarch64_rewrite_selected_cpu): Change local temporary variable type from unsigned long to uint64_t. * config/aarch64/aarch64-protos.h (aarch64_parse_extension, aarch64_get_extension_string_for_isa_flags): Change declaration to match new definition by replacing unsigned long with uint64_t. Modified: trunk/gcc/ChangeLog trunk/gcc/common/config/aarch64/aarch64-common.c trunk/gcc/config/aarch64/aarch64-protos.h
[Bug target/90588] [AArch64] SVE2 flag patch omits aarch64-protos.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90588 Matthew Malcomson changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from Matthew Malcomson --- fixed on trunk
[Bug testsuite/88021] New: aarch64 Busy hang running testcase pr60183.c since revision 265914
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88021 Bug ID: 88021 Summary: aarch64 Busy hang running testcase pr60183.c since revision 265914 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- Since revision 265914, the testcase pr60183.c has been FAILing on aarch64-none-linux-gnu regression tests with a timeout. Some initial debugging has shown this is a busy hang in lambda_matrix_right_hermite. The inner "while (S[i][j] != 0)" loop never gets out of i == 1, j == 0. (v) hw-a20-6:~ [11:10:29] % gcc-install/bin/native-gcc /home/matmal01/gcc-source/gcc/testsuite/gcc.dg/torture/pr60183.c -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never-O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions-lm -o ./pr60183.exe -wrapper gdb,-q,--args Reading symbols from /home/matmal01/gcc-install/libexec/gcc/aarch64-unknown-linux-gnu/9.0.0/cc1...done. (gdb) run Starting program: /home/matmal01/gcc-install/libexec/gcc/aarch64-unknown-linux-gnu/9.0.0/cc1 -quiet -imultiarch aarch64-linux-gnu /home/matmal01/gcc-source/gcc/testsuite/gcc.dg/torture/pr60183.c -quiet -dumpbase pr60183.c -mlittle-endian -mabi=lp64 -auxbase pr60183 -O3 -fdiagnostics-color=never -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -o /tmp/ccy2hzG0.s ^C Program received signal SIGINT, Interrupt. 0x013e54f8 in lambda_matrix_right_hermite (n=, U=, S=, m=, A=) at ../../gcc-source/gcc/tree-data-ref.c:3500 3500 a = S[i-1][j]; (gdb) print i $1 = 1 (gdb) print j $2 = 0 (gdb) cont Continuing. Wait for a while^M^C Program received signal SIGINT, Interrupt. 0x013e54f8 in lambda_matrix_right_hermite (n=, U=, S=, m=, A=) at ../../gcc-source/gcc/tree-data-ref.c:3500 3500 a = S[i-1][j]; (gdb) print i $3 = 1 (gdb) print j $4 = 0 (gdb) next 3502 sigma = (a * b < 0) ? -1: 1;
[Bug testsuite/88021] aarch64 Busy hang running testcase pr60183.c since revision 265914
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88021 --- Comment #2 from Matthew Malcomson --- Hi Richard, Applying that on top of r265914 does fix the problem. Thanks for the quick reply!
[Bug sanitizer/97696] New: ICE since ASAN_MARK does not handle poly_int sized varibales
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696 Bug ID: 97696 Summary: ICE since ASAN_MARK does not handle poly_int sized varibales Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: ice-checking Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org Target Milestone: --- Target: aarch64 asan_expand_mark_ifn asserts that the length to check is a SHWI. (i.e. it uses `gcc_assert (tree_fits_shwi_p (len))` ). It attempts to ensure this by avoiding VLA's in `gimplify_decl_expr`. poly_int sized decls were added, and they were not treated as VLA's since commit 22b62991 (SVN r275870). Since then, poly_int sized variables can have ASAN_MARK called on them, which means the `len` parameter of ASAN_MARK can be a poly_int causing an ICE in asan_expand_mark_ifn (n.b. in order to emit an ASAN_CHECK on a poly_int sized variable so that the ASAN_MARK is not removed in the sanopt pass we need to pass the poly_int sized variable to a builtin memory function). An example (modified from gcc/testsuite/c-c++-common/asan/pr80308.c): (v3) work-lin:gcc [Tue 12:25:10] % cat ~/asan-ice.c #include __attribute__((noinline, noclone)) int foo (char *a) { int i, j = 0; asm volatile ("" : "+r" (a) : : "memory"); for (i = 0; i < 12; i++) j += a[i]; return j; } int main () { int i, j = 0; for (i = 0; i < 4; i++) { char a[12]; __SVInt8_t freq; __builtin_bcmp (&freq, a, 10); __builtin_memset (a, 0, sizeof (a)); j += foo (a); } return j; } (v3) work-lin:gcc [Tue 12:31:53] % /installdir/aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc -march=armv8.6-a+sve -fsanitize=address -fsanitize-address-use-after-scope ~/asan-ice.c -S -o /dev/null during GIMPLE pass: sanopt /home/matmal01/asan-ice.c: In function ‘main’: /home/matmal01/asan-ice.c:14:1: internal compiler error: in asan_expand_mark_ifn, at asan.c:3235 14 | main () | ^~~~ 0xdde454 asan_expand_mark_ifn(gimple_stmt_iterator*) /builddir/src/gcc/gcc/asan.c:3235 0xdf6b7a execute /builddir/src/gcc/gcc/sanopt.c:1341 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions.
[Bug sanitizer/97696] ICE since ASAN_MARK does not handle poly_int sized varibales
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696 --- Comment #1 from Matthew Malcomson --- I guess this may also happen for the emission of ASAN_MARK in `gimple_target_expr`, but haven't yet been able to trigger that.
[Bug sanitizer/97941] [HWASAN] use After free not working as per expectation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97941 --- Comment #1 from Matthew Malcomson --- Hi Akhilesh, No that's certainly not a known issue -- thanks for reporting it! I'm having trouble reproducing your issue, do you mind giving a little more information on your command line and the machine you're running on etc? One point that seems worth looking into is that the line numbers on your backtrace don't seem to match up with the line numbers in my source tree. (e.g. GetAccessInfo is given line number 383 of hwasan_linux.cpp, while in my source tree that function spans lines 328-376). Have you made any modifications to the source? Or maybe you're running a different libsanitizer version? For reference I'm using libsanitizer from LLVM hash 6e7dd1e3e1170080b76b5dcc5716bdd974343233, and the sha256sum of hwasan_linux.cpp in my source tree is 3986e9f4e519409e7c73a7b97722125300afc4dc1f44a3f966fedf679329fd0a. Based on what line number `HwasanOnSIGTRAP` calls `GetAccessInfo` in my source tree, and assuming the offset between our line numbers are the same for the GetAccessInfo line in your stack trace, it seems that the SEGV happens when dereferencing the address that caused the signal. That value should be the address of the `brk` instruction in __hwasan_load1 (having been inlined from `SigTrap` in hwasan_checks.h) which caught the bad access, but the value of 0x30 which caused this SEGV is clearly not that value. If the offset between our line numbers is a bit different, then getting that address might make a bit more sense. There are various struct member accesses via pointers that `GetAccessInfo` recieves. However, those arguments are just taken from the siginfo_t and ucontext_t pointers that the kernel provides on receipt of a deadly signal. I haven't found any access in that function which look like they would have an offset of 0x30 from a NULL pointer, although I guess different kernel versions would have different offsets. What kernel are you running on? Is there any chance the signal handler HwasanOnDeadlySignal is getting a NULL pointer as one of its arguments? For reference I happen to be running on a linux kernel based off of commit 585e5b17b9 (but with some modifications that should not affect anything -- just config changes so I can build the kernel itself with -fsanitize=hwaddress). Just for reference -- what I see when compiling your testcase: ubuntu@ubuntu:~/working-directory/temp/pr97941$ ../../gcc-hwasan-install/bin/gcc -fsanitize=hwaddress ./test.c -o test ./test.c: In function ‘main’: ./test.c:2:20: warning: implicit declaration of function ‘malloc’ [-Wimplicit-function-declaration] 2 | char *x = (char*)malloc(10 * sizeof(char*)); |^~ ./test.c:1:1: note: include ‘’ or provide a declaration of ‘malloc’ +++ |+#include 1 | int main() { ./test.c:2:20: warning: incompatible implicit declaration of built-in function ‘malloc’ [-Wbuiltin-declaration-mismatch] 2 | char *x = (char*)malloc(10 * sizeof(char*)); |^~ ./test.c:2:20: note: include ‘’ or provide a declaration of ‘malloc’ ./test.c:3:3: warning: implicit declaration of function ‘free’ [-Wimplicit-function-declaration] 3 | free(x); | ^~~~ ./test.c:3:3: note: include ‘’ or provide a declaration of ‘free’ ./test.c:3:3: warning: incompatible implicit declaration of built-in function ‘free’ [-Wbuiltin-declaration-mismatch] ./test.c:3:3: note: include ‘’ or provide a declaration of ‘free’ ubuntu@ubuntu:~/working-directory/temp/pr97941$ LD_LIBRARY_PATH=~/working-directory/gcc-hwasan-install/lib64 ./test ==8600==ERROR: HWAddressSanitizer: tag-mismatch on address 0xefe00005 at pc 0xa828be70 READ of size 1 at 0xefe00005 tags: e2/d5 (ptr/mem) in thread T0 #0 0xa828be6c in SigTrap<0> ../../../../gcc-source/libsanitizer/hwasan/hwasan_checks.h:27 #1 0xa828be6c in CheckAddress<(__hwasan::ErrorAction)0, (__hwasan::AccessType)0, 0> ../../../../gcc-source/libsanitizer/hwasan/hwasan_checks.h:88 #2 0xa828be6c in __hwasan_load1 ../../../../gcc-source/libsanitizer/hwasan/hwasan.cpp:375 #3 0x400944 in main (/home/ubuntu/working-directory/temp/pr97941/test+0x400944) #4 0xa81598dc in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x1f8dc) [0xefe0,0xefe00060) is a small unallocated heap chunk; size: 96 offset: 5 0xefe00005 is located 5 bytes inside of 80-byte region [0xefe0,0xefe00050) freed by thread T0 here: #0 0xa828d64c in __sanitizer_free ../../../../gcc-source/libsanitizer/hwasan/hwasan_interceptors.cpp:108 #1 0x400934 in main (/home/ubuntu/working-directory/temp/pr97941/test+0x400934) #2 0xa81598dc in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x1f8dc) #3 0x400814 (/home/ubuntu/working-directory/temp/pr97941/test+0x400814) previously allocated here: #0 0xa828db30 in __sanitizer_malloc ../../../../gcc-source/libsanitizer/hwasan/hwasan
[Bug sanitizer/97941] [HWASAN] use After free not working as per expectation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97941 Matthew Malcomson changed: What|Removed |Added Resolution|--- |WORKSFORME Status|NEW |RESOLVED --- Comment #2 from Matthew Malcomson --- Resolving since this works for me and haven't any extra information to believe that's a coincidence.
[Bug sanitizer/100665] [hwsanitizer] nested funtion pointer is tagged but never checked.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100665 --- Comment #1 from Matthew Malcomson --- Hi there. I believe this is how it should work (if I'm understanding & remembering correctly). When creating a nested function, we make a single object on the stack that includes all variables used in the nested function plus a trampoline. This is called the "nonlocal frame struct" as described in gcc/tree-nested.c. That single object gets a single tag like all other objects in tagged memory (trying to separate the closed-over objects from the trampoline and argument pointers would be pretty awkward when the object is just one struct as far as the expand code is concerned). That tag is checked when accessing the closed over variables (i.e. big_array in the example), so we definitely want to tag the object. Given that, the question of whether the function pointer (i.e. the pointer to the trampoline inside that object) should be tagged when passed elsewhere then has a few benefits: 1) In this case there is no check performed, but there may be checks performed if e.g. this function pointer gets cast to an integer pointer and some code elsewhere attempts to read that integer. 2) This is just more self-consistent. Every pointer to a tagged object is tagged with the same value. 3) There are hardware extensions to automatically check memory accesses. If the function pointer is not tagged in this case then (at least for AArch64) the PC-relative ldr's in the trampoline stored in that structure will end up without a tag and I believe that would trigger a fault. Point (1) is the main one. In general when passing a pointer into another function we don't know if it's going to be accessed or not, so we always need to pass tagged pointers.
[Bug sanitizer/100665] [hwsanitizer] nested funtion pointer is tagged but never checked.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100665 Matthew Malcomson changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #3 from Matthew Malcomson --- (In reply to Hongtao.liu from comment #2) > (In reply to Matthew Malcomson from comment #1) > > Given that, the question of whether the function pointer (i.e. the pointer > > to > > the trampoline inside that object) should be tagged when passed elsewhere > > then > > has a few benefits: > > 1) In this case there is no check performed, but there may be checks > > performed > >if e.g. this function pointer gets cast to an integer pointer and some > > code > >elsewhere attempts to read that integer. > I'm not sure there're cases where code pointers are casted to integer > pointers. But consider the above comment, I agree that tag is needed for the > object. Fair ;-). My reasoning was along the lines of "it's an escaped pointer, and I don't know what other code will do with it" than actually expecting that to happen.
[Bug sanitizer/101744] [12 regression] hwasan new failures since r12-2424
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101744 --- Comment #7 from Matthew Malcomson --- Hi there, I didn't check all the new tests that Christophe mentioned, but all those I checked had `dg-require-effective-target hwaddress_exec` in them. The test that determines that effective target should only pass with a modern enough kernel (one that supports passing tagged pointers to its syscalls). It is still failing on my native AArch64 machine. For anyone that is seeing them -- what kernel version are you running? If your kernel has not changed could you manually run the check and see if it passes and why? I've unfortunately lost my testing environment. I'm working on getting it back but will be a while before I can see if I can reproduce the failures on a machine with the required kernel.
[Bug target/114905] New: aarch64 locally_streaming function ICE in dwarf2cfi due to mismatched CFA instructions in prologue/epilogue
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114905 Bug ID: 114905 Summary: aarch64 locally_streaming function ICE in dwarf2cfi due to mismatched CFA instructions in prologue/epilogue Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- Bug observed (testcase + ICE) is below. I believe this happens because we use `aarch64_add_sp` to adjust the stack pointer when `maybe_ne (sve_callee_saves, 0)` in `aarch64_expand_epilogue`. This marks the adjustment as adjusting the CFA. However in `aarch64_expand_prologue` we might have set the CFA to the frame pointer (instead of the stack pointer) if `frame_pointer_needed && frame_size.is_constant()`. Hence when both these conditions are held we have a CFA adjust note that affects a different register to the current CFA register. vshcmd: > cat streaming-prologues.c [[arm::locally_streaming,arm::streaming_compatible]] void no_gprs_saved_very_streaming (__SVBool_t x) { asm (""); } gnu-work [13:47:36] $ vshcmd: > ${install_dir}/aarch64-none-linux-gnu-gcc \ vshcmd: > streaming-prologues.c \ vshcmd: > -fdiagnostics-plain-output -O -fomit-frame-pointer -fstack-clash-protection\ vshcmd: > -march=armv9-a+sme -mtune=generic -moverride=tune=none \ vshcmd: > -fdump-rtl-all-all \ vshcmd: > -S -o locally_streaming_1_scp.s gnu-work [13:47:38] $ > > > > > during RTL pass: dwarf2 dump file: locally_streaming_1_scp.c.356r.dwarf2 streaming-prologues.c: In function ‘no_gprs_saved_very_streaming’: streaming-prologues.c:5:1: internal compiler error: in dwarf2out_frame_debug_adjust_cfa, at dwarf2cfi.cc:1339 0xa540bd dwarf2out_frame_debug_adjust_cfa /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:1339 0xa540bd dwarf2out_frame_debug /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:2277 0xa540bd scan_insn_after /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:2726 0xa557e0 scan_trace /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:2893 0xa562cf create_cfi_notes /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:2938 0xa562cf execute_dwarf2_frame /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:3309 0xa562cf execute /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:3797 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. gnu-work [13:47:39] $
[Bug target/114906] New: aarch64 locally_streaming ICE in aarch64_expand_prologue
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114906 Bug ID: 114906 Summary: aarch64 locally_streaming ICE in aarch64_expand_prologue Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- Bug (testcase + ICE) below. I believe this is because: 1) We save `r20` below `VG_REGNUM` in `aarch64_layout_frame` (and above the point that `bytes_below_hard_fp` describes). 2) Despite that save of `r20` causing us to also set `frame.wb_push_candidate1`, because we have a poly-int sized frame (due to the -O0 in this case, but I don't think has to be -O0) we still end up in the "General case" in `aarch64_layout_frame`. 3) Hence we end up with `initial_adjust` non zero, `sve_callee_adjust` non-zero, and the `VG_REGNUM` not pointing to the same place as `bytes_below_hard_fp` because there is that r20 saved in between. My initial guess would be that we should simply change the assertion that failed to check that VG_REGNUM is *greater than or equal to* `bytes_below_sp`. To be honest I'm not entirely sure what this assertion is there for so would not like to actually make that suggestion. The commit message of ad4df8cd080c seems to say the assertion is there to ensure that the allocation of VG_REGNUM is not folded into the initial_allocation, but I don't 100% follow what's going on. vshcmd: > cat ../streaming-prologues.c [[arm::locally_streaming]] void with_callee_saved_regs (__SVBool_t x) { asm ("" : : : "r20"); } testing [14:47:20] $ vshcmd: > ${install_dir}/aarch64-none-linux-gnu-gcc \ vshcmd: > ../streaming-prologues.c \ vshcmd: > -fdiagnostics-plain-output -O0 -fstack-clash-protection \ vshcmd: > -march=armv9-a+sme -mtune=generic -moverride=tune=none \ vshcmd: > -S -o prologues-with-streaming-1.s > > > > during RTL pass: late_pro_and_epilogue > > > > > > > > > > > > > > > > ../streaming-prologues.c: In function ‘with_callee_saved_regs’: ../streaming-prologues.c:5:1: internal compiler error: in aarch64_expand_prologue, at config/aarch64/aarch64.cc:9705 0x142e3af aarch64_expand_prologue() /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/config/aarch64/aarch64.cc:9701 0x1a7eee7 gen_prologue() /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/config/aarch64/aarch64.md:1008 0x140219f target_gen_prologue /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/config/aarch64/aarch64.md:8121 0xb8d242 make_prologue_seq /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/function.cc:5818 0xb8d3aa thread_prologue_and_epilogue_insns() /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/function.cc:6053 0xb8dc4e rest_of_handle_thread_prologue_and_epilogue /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/function.cc:6567 0xb8dcbf execute /workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/function.cc:6692 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. testing [14:47:22] $
[Bug target/115043] New: aarch64 locally_streaming function appears to have CFA note on wrong instruction in prologue
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115043 Bug ID: 115043 Summary: aarch64 locally_streaming function appears to have CFA note on wrong instruction in prologue Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- Apologies if I'm misunderstanding something here -- but I noticed this RTL sequence and I believe the `REG_CFA_DEF_CFA` note is on the wrong insn. I have not observed wrong behaviour coming from this, but figured still worth a bug report in case it is indeed wrong. There seem to be a pair of instructions, one doing some special SME operation and another storing the stack pointer into x11. The instruction doing the special SME thing has a note saying that it sets the CFA to x11. I would have expected the note to be on the insn after that records SP into x11. vshcmd: > cat basic-streaming.c [[arm::locally_streaming]] void no_gprs_saved (__SVBool_t x) { asm (""); } gnu-work [13:19:27] $ vshcmd: > ${install_dir}/aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc \ vshcmd: > basic-streaming.c \ vshcmd: > -fdiagnostics-plain-output -march=armv8.2-a+sme+sve -fno-stack-protector \ vshcmd: > -fdump-rtl-all-all \ vshcmd: > -O -fshrink-wrap -fstack-clash-protection -g -S -o /dev/null > > > > gnu-work [13:19:36] $ > > > > > > > > > > > > > > > > vshcmd: > # I'm surprised that the REG_CFA_DEF_CFA note is on the instruction vshcmd: > # just before we move the stack pointer into x11. vshcmd: > grep -C 4 REG_CFA_DEF_CFA.*x11 basic-streaming.c.*.late_pro_and_epilogue (insn/f 15 14 16 2 (set (reg:DI 13 x13) (const:DI (unspec:DI [ (const_int 288 [0x120]) ] UNSPEC_SME_VQ))) "basic-streaming.c":3:1 -1 (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11) (nil))) (insn 16 15 17 2 (set (reg:DI 11 x11) (reg/f:DI 31 sp)) "basic-streaming.c":3:1 -1 (nil)) gnu-work [13:21:21] $
[Bug tree-optimization/116776] Complex if conditions not hoisted from loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116776 --- Comment #1 from Matthew Malcomson --- N.b. from experimentation it seems that gcc 11 didn't move any part of the condition outside of the loop, and since gcc 12 part of the condition has been moved outside the loop. I don't think this hoisting has ever happened.
[Bug tree-optimization/116776] New: Complex if conditions not hoisted from loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116776 Bug ID: 116776 Summary: Complex if conditions not hoisted from loop Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- The condition in the following loop does not get hoisted at `-O3` on GCC trunk. Simplifying the condition (by either removing some of the `shouldthischange` checks, or simplifying the `shouldthischange` function) allows hoisting. N.b. Some of the condition gets hoisted, just not all. N.b. having the condition inside the loop blocks vectorisation when compiled with `-march=armv8.6-a+sve+sve2`. ``` struct teststruct { unsigned long dims[2]; double *data; bool ** allocated; }; bool shouldthischange(struct teststruct *v, int b, int l) { return // true || v->dims[1] > l && v->allocated[b][l] ; } void DoLoop(struct teststruct *x, struct teststruct *y, struct teststruct *z, unsigned long len) { for (unsigned long i = 0; i < len; i++) if (shouldthischange(x, 0, 0) && shouldthischange(y, 0, 0) && shouldthischange(z, 0, 0)) { z->data[i] = x->data[i] + y->data[i]; } } ```
[Bug target/117991] [15 regression] RISC-V: g++/template/builtin-speculation-overloads[14].C assertion error since addition in r15-6042-g9ed094a817e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117991 Matthew Malcomson changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |matmal01 at gcc dot gnu.org --- Comment #3 from Matthew Malcomson --- Created attachment 60477 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60477&action=edit Proposed patch -- update testsuite
[Bug target/117991] [15 regression] RISC-V: g++/template/builtin-speculation-overloads[14].C assertion error since addition in r15-6042-g9ed094a817e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117991 --- Comment #2 from Matthew Malcomson --- (In reply to Jeffrey A. Law from comment #1) > Still occurring on the trunk. In my case I saw them in a native build & > test scenario. Ah -- apologies I missed when this was raised -- will look into this next week.
[Bug middle-end/119108] New: [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=68326d5d1a593d) causes regression in Snappy workload for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108 Bug ID: 119108 Summary: [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=68326d5 d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2. Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org Target Milestone: --- Created attachment 60650 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60650&action=edit Script to reproduce the observed slowdown. Have observed a slowdown after the referenced commit. Attaching script for reproduction. Results when `master` is commit 78380fd7f inlined below (numbers are percentage change in time from "TOT with problem commit reverted" to "TOT" -- positive numbers demonstrating the peeling has caused a slowdown). I ran the script with: ``` vshcmd: > cd $HOME vshcmd: > rm -rf $HOME/testing-reproduction-script vshcmd: > newdir $HOME/testing-reproduction-script vshcmd: > git clone $HOME/gcc-source gcc_src vshcmd: > parentdir=$HOME/testing-reproduction-script $HOME/Snappy/reproduce.sh ``` ``` BM_UFlat/3/1 5.3 BM_UFlat/3/2 7.14286 BM_UFlat/4/2 2.59319 BM_UFlat/5/1 2.86533 BM_UFlat/5/2 5.10708 BM_UValidate/3/2 2.08333 BM_UValidate/5/2 2.41758 BM_UIOVecSource/1/2 4.21903 BM_UIOVecSource/5/2 5.44218 BM_UIOVecSource/6/2 4.21348 BM_UIOVecSource/7/1 -3.6036 BM_UIOVecSource/7/2 6.84039 BM_UIOVecSource/8/2 3.86905 BM_UIOVecSource/9/2 2.90987 BM_UIOVecSource/11/2 5 BM_UIOVecSink/0 21.3523 BM_UFlatSink/3/1 9.58904 BM_UFlatSink/3/2 10.1449 BM_UFlatSink/5/1 3.17919 BM_ZFlat/1/2 4.54959 BM_ZFlat/5/1 2.73973 BM_ZFlat/5/2 6.31579 BM_ZFlat/6/1 -2.35294 BM_ZFlat/6/2 3.9548 BM_ZFlat/7/1 -3.15315 BM_ZFlat/7/2 5.51948 BM_ZFlat/8/2 3.99202 BM_ZFlat/9/2 3.25145 BM_ZFlat/11/2 5.83942 BM_ZFlatAll/2 3.57955 ```
[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108 --- Comment #9 from Matthew Malcomson --- (In reply to Tamar Christina from comment #8) > Ok, so having looked at this I'm not sure the compiler is at fault here. > > Similar to the SVN case the snappy code is misaligning the loads > intentionally and loading 64-bits at a time from the 8-bit pointer: ... > So I think this is a case where the compiler can't do anything. (I also > think that the C code uses UB similar to SVN, they misalign the byte array > to 4-bytes but load 8-bytes at a time. They get lucky that the vector code > is never entered). ... > > The could would be beneficial if they: > > 1. added restrict to the functions, as eg in `FindMatchLengthPlain` values > manually vectorized anyway so aliasing must not be a problem > 2. they have a simple scalar loop variant that's left up to the vectorizer > to vectorize. This would actually give them faster code and allow e.g. SVE > codegen. Thanks for looking into it Tamar! Few questions (some just because I want to make sure I understand -- some more on topic ;-) Just to understand: - What SVN case are you referencing? - How is this UB? The UNALIGNED_LOAD64 seems to use `memcpy`, and they provide a relevant limit on the reads of 8 bytes at a time. More relevant to the issue: - I tried by adding `__restrict__` to `s1` and `s2` in `FindMatchLengthPlain` and replacing the function with a plain loop. I saw a significant slowdown. Is your point that this would allow the compiler to do something about the code even though it may not be better right now? Or did you mean inline the loop or something. (N.b. didn't double-check the codegen of that function -- just ran the benchmark naively again -- so if there was any obvious adjustment in flags or the like I should make I didn't make it ;-)
[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108 --- Comment #3 from Matthew Malcomson --- I only looked into VecSource/5/2, and unfortunately I looked into it on an internal setup that compiles slightly differently. In that slightly different compilation I noticed that `FindMatchLengthPlain` was affected by the patch, and perf pointed to extra branch mispredictions on the changed code. This was particularly noticeable in that different compilation since `FindMatchLengthPlain` was not inlined. Am currently looking to reproduce that finding with upstream sources so it's more useful than hearsay.
[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108 --- Comment #7 from Matthew Malcomson --- FWIW I have managed to figure out what the difference between my internal build and the upstream one was -- my reproduction script has the line `-DCMAKE_BUILD_TYPE=Release` in it and the local build that I did some performance analysis on does not. >From looking at the build logs it seems the only real difference due to this difference in flags is that `-DNDEBUG` is passed to the compiler. So things still got optimised -- though obviously this is not the best for a benchmark run. However it does seem somewhat useful that without the abvoe `cmake` argument I can see the (now not inlined) `FindMatchLengthPlain` function change and start to take up a much greater proportion in the perf statistics with 68326d5d.
[Bug libgomp/119588] New: Possible improvement in locking strategies for libgomp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119588 Bug ID: 119588 Summary: Possible improvement in locking strategies for libgomp Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- Created attachment 60960 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60960&action=edit Demonstrating locking differences Hello, Summary is: I'm proposing that we implement the "hypercube-embedded tree" locking strategy that LLVM libomp uses by default in libgomp. Would appreciate feedback on whether this would be welcome and/or feasible. Below contains the observations I've made to come to that suggestion. Apologies for taking my time between asking on IRC and raising the PR. -- We've seen on some internal workloads (NVPL BLAS running GEMM routine on a small matrix) that the overhead of a `#pragma omp parallel` statement when running with a high number of cores (72 or 144) is much higher with the libgomp implementation than with LLVM's libomp. In a program which has both some work that can be handled with high parallelism (so OMP is running with many threads) and a large number of small pieces of work that need to be performed with low overhead, this has been seen to cause a significant overhead when accumulated. I'm attaching a benchmark for just the creation of a `#pragma omp parallel` region (around an `asm` statement so the region doesn't get optimised away). We can see that with many threads libgomp scales worse than llvm's libomp. When compiled with the below: #+begin_example vshcmd: > ${gcc_install_path}/bin/g++ -O3 -fopenmp OpenMP-reproducer.cpp -o bench.gcc.x vshcmd: > ${clang_install_path}/bin/clang++ -O3 -fopenmp OpenMP-reproducer.cpp -o bench.clang.x lego-c2-qs-56:openmp-parallel-gomp-slow [04:02:41] $ lego-c2-qs-56:openmp-parallel-gomp-slow [04:02:44] $ #+end_example Numbers I've observed are such showing that at 144 threads the cost of just the barrier is much higher with GNU than with LLVM (N.b. this is on an AArch64 machine with 144 cores): #+begin_example vshcmd: > bench_gcc () { vshcmd: > LD_LIBRARY_PATH=${gcc_install_path}/lib64 ./bench.gcc.x vshcmd: > } vshcmd: > bench_clang () { vshcmd: > LD_LIBRARY_PATH=${clang_install_path}/lib ./bench.clang.x vshcmd: > } vshcmd: > three_times () { vshcmd: > for i in 1 2 3; do vshcmd: > $1 vshcmd: > done vshcmd: > } vshcmd: > high_thread_counts () { vshcmd: > for num_threads in 72 144; do vshcmd: > export OMP_NUM_THREADS=$num_threads vshcmd: > echo " NUM = $num_threads" vshcmd: > OMP_PROC_BIND=true OMP_WAIT_POLICY=active three_times $1 vshcmd: > done vshcmd: > } > > lego-c2-qs-56:openmp-parallel-gomp-slow [04:37:02] $ > > lego-c2-qs-56:openmp-parallel-gomp-slow [04:37:02] $ > > > > lego-c2-qs-56:openmp-parallel-gomp-slow [04:37:02] $ > > > > > > lego-c2-qs-56:openmp-parallel-gomp-slow [04:37:02] $ vshcmd: > # Without any specification of locking mechanisms, clang approx thrice performance of GCC. vshcmd: > high_thread_counts bench_gcc NUM = 72 creation maxthr:72 nthr:72 min_time:10.694 us max_time:11.181 us avg_time:10.839 us stddev:23.127 us creation maxthr:72 nthr:72 min_time:10.214 us max_time:10.567 us avg_time:10.335 us stddev:11.986 us creation maxthr:72 nthr:72 min_time:10.147 us max_time:10.615 us avg_time:10.357 us stddev:19.212 us NUM = 144 creation maxthr:144 nthr:144 min_time:31.421 us max_time:32.003 us avg_time:31.735 us stddev:31.332 us creation maxthr:144 nthr:144 min_time:30.592 us max_time:31.953 us avg_time:31.352 us stddev:132.466 us creation maxthr:144 nthr:144 min_time:31.089 us max_time:31.953 us avg_time:31.640 us stddev:60.002 us lego-c2-qs-56:openmp-parallel-gomp-slow [04:37:05] $ vshcmd: > high_thread_counts bench_clang NUM = 72 creation maxthr:72 nthr:72 min_time:8.574 us max_time:9.006 us avg_time:8.877 us stddev:17.170 us creation maxthr:72 nthr:72 min_time:8.601 us max_time:8.749 us avg_time:8.686 us stddev:3.635 us creation maxthr:72 nthr:72 min_time:8.206 us max_time:8.471 us avg_time:8.421 us stddev:6.070 us NUM = 144 creation maxthr:144 nthr:144 min_time:9.958 us max_time:11.293 us avg_time:10.388 us stddev:133.078 us creation maxthr:144 nthr:144 min_time:9.685 us max_time:10.618 us avg_time:10.232 us stddev:83.710 us creation maxthr:144 nthr:144 min_time:9.132 us max_time:9.783 us avg_time:9.434 us stddev:42.769 us lego-c2-qs-56:openmp-parall