from:"matmal01 at gcc dot gnu.org"

[Bug target/94341] New: mve_mov can produce ICE on latest trunk

2020-03-26 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94341

Bug ID: 94341
   Summary: mve_mov can produce ICE on latest trunk
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: sripar01 at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

The following code ICE's on fsf-trunk.

#include "arm_mve.h"

uint8x16_t test()
{
  uint8x16_t accum = (uint8x16_t)(uint32x4_t){0, 0, 0, 2};
  uint8x16_t accum2 = (uint8x16_t)(uint32x4_t){0, 0, 0, 1};
  accum += __arm_vddupq_m_n_u8 (accum2, 0, 4, (mve_pred16_t)1);
  return accum;
}


It ICE's because the first (define_insn "*mve_mov" ...)  pattern has the
following clause for it's 4th alternative:

case 4:
  if ((TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))
  || (MEM_P (operands[1])
  && (GET_CODE (XEXP (operands[1], 0)) == LABEL_REF)))
return output_move_neon (operands);
  else
return "vldrb.8 %q0, %E1";



For the test above, we get an RTL pattern of

(insn 5 7 6 2 (set (reg:V16QI 28 s12 [116])
(mem:V16QI (const:SI (plus:SI (label_ref 28)
(const_int 16 [0x10]))) [0  S16 A64]))
"cde-mve-tests.c":6:12 2990 {*mve_movv16qi}
 (expr_list:REG_EQUIV (const_vector:V16QI [
(const_int 0 [0]) repeated x16
])
(nil)))


This matches the *mve_movv16qi pattern on the fourth alternative.
the clause above does not match for going into `output_move_neon` (since the
operand is not a mem of a label_ref, it's the mem of an offset to a label_ref).

Hence the compiler tries to emit "vldrb.8 %q0, %E1", but since the 'E' syntax
is for registers it does not match the RTL pattern for the operand.
This results in an ICE.


test:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
mov r3, #1  @ movhi
movsr2, #0
vldr.64 d6, .L3
vldr.64 d7, .L3+8
during RTL pass: final
dump file: cde-mve-tests.c.314r.final
/home/matmal01/Documents/gnu-work/cde-intrinsics/cde-mve-tests.c: In function
'test':
/home/matmal01/Documents/gnu-work/cde-intrinsics/cde-mve-tests.c:9:1: internal
compiler error: in arm_print_operand, at config/arm/arm.c:23953
0x113c138 arm_print_operand
   
/tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/config/arm/arm.c:23953
0x9332c5 output_operand(rtx_def*, int)
   
/tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:4051
0x933b63 output_asm_insn(char const*, rtx_def**)
   
/tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:3944
0x9366b6 final_scan_insn_1
   
/tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:3106
0x9369a7 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
   
/tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:3152
0x9376ac final_1
   
/tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:2020
0x9378f6 rest_of_handle_final
   
/tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:4658
0x9378f6 execute
   
/tmp/dgboter/bbs/rhev-vm10--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:4736
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
vldrb.8 q0, gcc [11:11:19] $

[Bug target/94341] mve_mov can produce ICE on latest trunk

2020-03-26 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94341

Matthew Malcomson  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-03-26
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Matthew Malcomson  ---
Have already discussed bug privately and work is being done on it.

Just posting this so I have a public page for reference in public discussions.

[Bug target/94383] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on aarch64

2020-04-16 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94383

--- Comment #9 from Matthew Malcomson  ---
(In reply to Jakub Jelinek from comment #8)
> I'd like to ping this, it would be nice to at least decide if this should be
> handled for GCC10 or postponed to GCC11 only.

Hi Jakub -- I'm taking a look at this at the moment, so I'm hoping I can get it
done for GCC10.

So far I've only double checked what the AAPCS64 says and confirmed that we're
producing the correct code for gnu++14 and not gnu++17 (the base class is empty
and a language type that occupies zero bytes has no mapping on the ABI level,
hence that `pair` structure is an HFA and should be passed in the vector
registers).

I'm just about to start looking at changes & testing.

[Bug target/94711] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on arm

2020-04-28 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94711

--- Comment #3 from Matthew Malcomson  ---
This has been fixed by.
https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544625.html

In the email linked above I mentioned another problem using `-mabi=apcs-gnu`.
Since that ABI is obsolete (Kyrylo pointed that out to me) I don't think that
problem should hold up GCC10.

[Bug target/94711] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on arm

2020-04-28 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94711

Matthew Malcomson  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #4 from Matthew Malcomson  ---
Resolving since the issue with the obsolete ABI is different to this ticket.

[Bug target/95816] New: Aarch64 jumps between Hot/Cold sections use possibly clobbered registers x16/x17

2020-06-22 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95816

Bug ID: 95816
   Summary: Aarch64 jumps between Hot/Cold sections use possibly
clobbered registers x16/x17
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---
Target: AArch64

When splitting a function into two different sections (hot/cold).

The assembler produces a relocation on jumps between the two sections.

The linker is permitted to use a veneer to implement such a relocated jump.

The registers x16 and x17 are reserved for use in those veneers.

Hence the registers x16 and x17 should be treated as clobbered when jumping
between the hot/cold sections in a function.

This is not done.

We can use the testcase below to demonstrate this (modified from predict-22.c
in the testsuite).

-
$ aarch64-none-linux-gnu-gcc \  
>predict-22.c \
>-O2 -w -fPIC -freorder-blocks-and-partition \
>-c -o predict-22.o

-
volatile int v;
void bar (void) __attribute__((leaf, cold));
void baz (int *);
void alt (long long);

void
foo (int x, int y, int z)
{
  static int f __attribute__((section ("mysection")));
  register long long k asm ("x16");
  __asm__ ("mov\t%0, 10" : "=r" (k) : "0" (k));
  f = 1;
  if (__builtin_expect (x, 0))
  if (__builtin_expect (y, 0))
  if (__builtin_expect (z, 0))
{
  f = 2;
  k *= 13;
  bar ();
  v += 1;
  v *= 2;
  v /= 2;
  v -= 1;
  v += 1;
  v *= 2;
  v /= 2;
  v -= 1;
  v += 1;
  v *= 2;
  v /= 2;
  v -= 1;
  v += k;
  v *= 2;
  v /= 2;
  v -= 1;
  v += 1;
  v *= 2;
  v /= 2;
  v -= 1;
  v += 1;
  v *= 2;
  v /= 2;
  v -= 1;
  v += 1;
  v *= 2;
  v /= 2;
  v -= 1;
  v += 1;
  v *= 2;
  v /= 2;
  v -= 1;
  f = 3;
  __builtin_abort ();
}
  f = k;
  baz (&f);
}



This produces an object file which is dumped below.
The dump below demonstrates that there is a R_AARCH64_JUMP26 relocation on the
jump between the hot/cold sections, and that the value stored in x16 is used
after that jump.



$  aarch64-none-linux-gnu-objdump -dr predict-22.o  
predict-22.o: file format elf64-littleaarch64


Disassembly of section .text:

 :
   0:   713fcmp w1, #0x0
   4:   7a401844ccmpw2, #0x0, #0x4, ne  // ne = any
   8:   7a401804ccmpw0, #0x0, #0x4, ne  // ne = any
   c:   d2800150mov x16, #0xa   // #10
  10:   54a1b.ne24   // b.any
  14:   9001adrpx1, 0 
14: R_AARCH64_ADR_PREL_PG_HI21  .bss
  18:   9120add x0, x1, #0x0
18: R_AARCH64_ADD_ABS_LO12_NC   .bss
  1c:   b930str w16, [x1]
1c: R_AARCH64_LDST32_ABS_LO12_NC.bss
  20:   1400b   0 
20: R_AARCH64_JUMP26baz
  24:   a9bd7bfdstp x29, x30, [sp, #-48]!
  28:   910003fdmov x29, sp
  2c:   a90153f3stp x19, x20, [sp, #16]
  30:   f90013f5str x21, [sp, #32]
  34:   1400b   0# Here is the
relocation.
34: R_AARCH64_JUMP26.text.unlikely

Disassembly of section .text.unlikely:

 :
   0:   9015adrpx21, 0 
0: R_AARCH64_ADR_PREL_PG_HI21   .bss
   4:   52800053mov w19, #0x2   // #2
   8:   aa1003f4mov x20, x16# Here we try
and use the clobbered x16 register.
   c:   b90002b3str w19, [x21]
c: R_AARCH64_LDST32_ABS_LO12_NC .bss
  10:   9400bl  0 
10: R_AARCH64_CALL26bar
  14:   9000adrpx0, 4 
14: R_AARCH64_ADR_GOT_PAGE  v
  18:   d28001a3mov x3, #0xd// #13
  1c:   52800062mov w2, #0x3// #3
  20:   f940ldr x0, [x0]
20: R_AARCH64_LD64_GOT_LO12_NC  v

[Bug fortran/96381] New: gfc_find_vtab can use a character type typespec as a derived type (causing invalid access)

2020-07-29 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96381

Bug ID: 96381
   Summary: gfc_find_vtab can use a character type typespec as a
derived type (causing invalid access)
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

When running the testsuite with a compiler sanitized with -fsanitize=hwaddress
(HWASAN sanitizer which is not yet committed upstream) I saw the error below.

The error comes from the testsuite running `pr93337.f90`.

It is complaining that `gfc_find_derived_vtab` is attempting to access an 8
byte chunk of data 88 bytes after a region that is only 48 bytes long.
That seems to be coming from the access of `derived->attr.pdt_template` (which
is a one-bit field in the byte 92 bytes into a `gfc_symbol` structure).

According to the dump the 48 byte long structure is allocated in
`gfc_new_charlen`.
This function only ever sets the `cl` alternative of the union in a
`gfc_typespec`.
I've inlined a GDB session demonstrating the mis-use under the HWASAN dump.





==25394==ERROR: HWAddressSanitizer: tag-mismatch on address 0xefdf79d8 at
pc 0x006a8560
READ of size 8 at 0xefdf79d8 tags: 58/ff (ptr/mem) in thread T0
#0 0x6a855c in SigTrap<3>
../../../../gcc-source/libsanitizer/hwasan/hwasan_checks.h:27
#1 0x6a855c in CheckAddress<(__hwasan::ErrorAction)0,
(__hwasan::AccessType)0, 3>
../../../../gcc-source/libsanitizer/hwasan/hwasan_checks.h:88
#2 0x6a855c in __hwasan_load8
../../../../gcc-source/libsanitizer/hwasan/hwasan.cpp:454
#3 0x6ff654 in gfc_find_derived_vtab(gfc_symbol*)
../../gcc-source/gcc/fortran/class.c:2269
#4 0x707498 in gfc_find_vtab(gfc_typespec*)
../../gcc-source/gcc/fortran/class.c:2908
#5 0x707498 in gfc_find_vtab(gfc_typespec*)
../../gcc-source/gcc/fortran/class.c:2898
#6 0x7a7578 in gfc_match_assignment()
../../gcc-source/gcc/fortran/match.c:1393
#7 0x80d53c in match_word ../../gcc-source/gcc/fortran/parse.c:65
#8 0x80d53c in decode_statement ../../gcc-source/gcc/fortran/parse.c:361
#9 0x812f28 in next_free ../../gcc-source/gcc/fortran/parse.c:1280
#10 0x812f28 in next_statement ../../gcc-source/gcc/fortran/parse.c:1512
#11 0x816190 in parse_spec ../../gcc-source/gcc/fortran/parse.c:3923
#12 0x819948 in parse_progunit ../../gcc-source/gcc/fortran/parse.c:5853
#13 0x81c02c in gfc_parse_file() ../../gcc-source/gcc/fortran/parse.c:6394
#14 0x898d98 in gfc_be_parse_file
../../gcc-source/gcc/fortran/f95-lang.c:212
#15 0x152ac7c in compile_file ../../gcc-source/gcc/toplev.c:458
#16 0x69d114 in do_compile ../../gcc-source/gcc/toplev.c:2320
#17 0x69d114 in toplev::main(int, char**)
../../gcc-source/gcc/toplev.c:2459
#18 0x6a0218 in main ../../gcc-source/gcc/main.c:39
#19 0xa03c38dc in __libc_start_main
(/lib/aarch64-linux-gnu/libc.so.6+0x1f8dc)

[0xefdf79c0,0xefdf7a00) is a small allocated heap chunk; size: 64
offset: 24
0xefdf79d8 is located 40 bytes to the right of 48-byte region
[0xefdf7980,0xefdf79b0)
allocated here:
#0 0x6a9d40 in __sanitizer_calloc
../../../../gcc-source/libsanitizer/hwasan/hwasan_interceptors.cpp:138
#1 0x2d1ebbc in xcalloc ../../gcc-source/libiberty/xmalloc.c:162
#2 0x8831a0 in gfc_new_charlen(gfc_namespace*, gfc_charlen*)
../../gcc-source/gcc/fortran/symbol.c:3964
#3 0x7146ec in gfc_match_char_spec(gfc_typespec*)
../../gcc-source/gcc/fortran/decl.c:3478
#4 0x71e324 in gfc_match_decl_type_spec(gfc_typespec*, int)
../../gcc-source/gcc/fortran/decl.c:4169
#5 0x7220d8 in gfc_match_data_decl()
../../gcc-source/gcc/fortran/decl.c:6129
#6 0x80d6b0 in match_word ../../gcc-source/gcc/fortran/parse.c:65
#7 0x80d6b0 in decode_statement ../../gcc-source/gcc/fortran/parse.c:376
#8 0x812f28 in next_free ../../gcc-source/gcc/fortran/parse.c:1280
#9 0x812f28 in next_statement ../../gcc-source/gcc/fortran/parse.c:1512
#10 0x816600 in parse_derived ../../gcc-source/gcc/fortran/parse.c:3343
#11 0x816600 in parse_spec ../../gcc-source/gcc/fortran/parse.c:3884
#12 0x819948 in parse_progunit ../../gcc-source/gcc/fortran/parse.c:5853
#13 0x81c02c in gfc_parse_file() ../../gcc-source/gcc/fortran/parse.c:6394
#14 0x898d98 in gfc_be_parse_file
../../gcc-source/gcc/fortran/f95-lang.c:212
#15 0x152ac7c in compile_file ../../gcc-source/gcc/toplev.c:458
#16 0x69d114 in do_compile ../../gcc-source/gcc/toplev.c:2320
#17 0x69d114 in toplev::main(int, char**)
../../gcc-source/gcc/toplev.c:2459
#18 0x6a0218 in main ../../gcc-source/gcc/main.c:39
#19 0xa03c38dc in __libc_start_main
(/lib/aarch64-linux-gnu/libc.so.6+0x1f8dc)
#20 0x6a3e5c 
(/home/ubuntu/working-directory/gcc-hwasan-install/libexec/gcc/aarch64-unkn

[Bug middle-end/92410] New: Invalid access to df->insns[] in regstat_bb_compute_calls_crossed (caught by hwasan)

2019-11-07 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410

Bug ID: 92410
   Summary: Invalid access to df->insns[] in
regstat_bb_compute_calls_crossed (caught by hwasan)
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---
  Host: aarch64-none-linux-gnu
Target: aarch64-none-linux-gnu
 Build: aarch64-none-linux-gnu

On trying to rebase the hwasan patches to a newer GCC version, I've found that
hwasan catches a bug newly introduced between commits r275833 and r277678.


The stack trace of the access (as given by hwasan) is:

#0 0x6293ac in SigTrap<3>
../../../../gcc/libsanitizer/hwasan/hwasan_checks.h:27
#1 0x6293ac in CheckAddress<(__hwasan::ErrorAction)0,
(__hwasan::AccessType)0, 3>
../../../../gcc/libsanitizer/hwasan/hwasan_checks.h:88
#2 0x6293ac in __hwasan_load8
../../../../gcc/libsanitizer/hwasan/hwasan.cpp:478
#3 0x10cff94 in regstat_bb_compute_calls_crossed
../../gcc/gcc/regstat.c:327
#4 0x10cff94 in regstat_compute_calls_crossed() ../../gcc/gcc/regstat.c:379
#5 0x21c0858 in sched_init() ../../gcc/gcc/haifa-sched.c:7337
#6 0x21d3994 in haifa_sched_init() ../../gcc/gcc/haifa-sched.c:7354
#7 0x11376c8 in schedule_insns() ../../gcc/gcc/sched-rgn.c:3514
#8 0x1138584 in rest_of_handle_sched2 ../../gcc/gcc/sched-rgn.c:3746
#9 0x1138584 in execute ../../gcc/gcc/sched-rgn.c:3882
#10 0x102911c in execute_one_pass(opt_pass*) ../../gcc/gcc/passes.c:2494
#11 0x1029c58 in execute_pass_list_1 ../../gcc/gcc/passes.c:2580
#12 0x1029c74 in execute_pass_list_1 ../../gcc/gcc/passes.c:2581
#13 0x1029c74 in execute_pass_list_1 ../../gcc/gcc/passes.c:2581
#14 0x1029cd0 in execute_pass_list(function*, opt_pass*)
../../gcc/gcc/passes.c:2591
#15 0x955aa4 in cgraph_node::expand() ../../gcc/gcc/cgraphunit.c:2196
#16 0x956f38 in expand_all_functions ../../gcc/gcc/cgraphunit.c:2334
#17 0x956f38 in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2684
#18 0x95ac58 in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2597
#19 0x95ac58 in symbol_table::finalize_compilation_unit()
../../gcc/gcc/cgraphunit.c:2864
#20 0x1209bf8 in compile_file ../../gcc/gcc/toplev.c:481
#21 0x61e9d4 in do_compile ../../gcc/gcc/toplev.c:2199
#22 0x61e9d4 in toplev::main(int, char**) ../../gcc/gcc/toplev.c:2334
#23 0x621758 in main ../../gcc/gcc/main.c:39
#24 0x8b46889c in __libc_start_main
(/lib/aarch64-linux-gnu/libc.so.6+0x1f89c)


While the stack trace of the allocation aronud this address (as given by
hwasan) is:

[0xefe68400,0xefe68500) is a small allocated heap chunk; size: 256
offset: 240
0xefe684f0 is located 0 bytes to the right of 240-byte region
[0xefe68400,0xefe684f0)
allocated here:
#0 0x62ae0c in __sanitizer_malloc
../../../../gcc/libsanitizer/hwasan/hwasan_interceptors.cpp:169
#1 0x24b3df8 in xrealloc ../../gcc/libiberty/xmalloc.c:177
#2 0x99c434 in df_grow_insn_info() ../../gcc/gcc/df-scan.c:545
#3 0x99e928 in df_scan_alloc(bitmap_head*) ../../gcc/gcc/df-scan.c:262
#4 0xe34ee0 in do_reload ../../gcc/gcc/ira.c:5574
#5 0xe34ee0 in execute ../../gcc/gcc/ira.c:5697
#6 0x102911c in execute_one_pass(opt_pass*) ../../gcc/gcc/passes.c:2494
#7 0x1029c58 in execute_pass_list_1 ../../gcc/gcc/passes.c:2580
#8 0x1029c74 in execute_pass_list_1 ../../gcc/gcc/passes.c:2581
#9 0x1029cd0 in execute_pass_list(function*, opt_pass*)
../../gcc/gcc/passes.c:2591
#10 0x955aa4 in cgraph_node::expand() ../../gcc/gcc/cgraphunit.c:2196
#11 0x956f38 in expand_all_functions ../../gcc/gcc/cgraphunit.c:2334
#12 0x956f38 in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2684
#13 0x95ac58 in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2597
#14 0x95ac58 in symbol_table::finalize_compilation_unit()
../../gcc/gcc/cgraphunit.c:2864
#15 0x1209bf8 in compile_file ../../gcc/gcc/toplev.c:481
#16 0x61e9d4 in do_compile ../../gcc/gcc/toplev.c:2199
#17 0x61e9d4 in toplev::main(int, char**) ../../gcc/gcc/toplev.c:2334
#18 0x621758 in main ../../gcc/gcc/main.c:39
#19 0x8b46889c in __libc_start_main
(/lib/aarch64-linux-gnu/libc.so.6+0x1f89c)
#20 0x624e34 
(/home/ubuntu/working-directory/gcc-objdir-hwasan/gcc/cc1+0x624e34)



I've verified the bug by compiling on r277678 and viewing the relevent command
in gdb.
After running a full bootstrap (convert filenames as relevent):

/home/ubuntu/working-directory/gcc-objdir/./gcc/xgcc
-B/home/ubuntu/working-directory/gcc-objdir/./gcc/
-B/home/ubuntu/working-directory/gcc-install/aarch64-unknown-linux-gnu/bin/
-B/home/ubuntu/working-directory/gcc-install/aarch64-unknown-linux-gnu/lib/
-isystem
/home

[Bug middle-end/92410] Invalid access to df->insns[] in regstat_bb_compute_calls_crossed (caught by hwasan)

2019-11-08 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410

--- Comment #5 from Matthew Malcomson  ---
I've had a little look into it, and the below seems promising:

Based on a comment in haifa-sched.c, notes are removed before scheduling and
added back in.
Since the insn that is larger than the df buffer is a note, and I saw in gdb
that it's added during `reemit_notes`, I figure the root problem might be that
the notes are removed, then the df->insns array is calculated, then notes are
added back in.

I hence tested the below patch, and the testcase that Martin found no longer
crashes.

I have not yet looked into whether `df_recompute_luids` is the correct function
to call or if there's a better approach.  Just sharing an update.


diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 41cf1f3..564a358 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -6231,6 +6231,7 @@ commit_schedule (rtx_insn *prev_head, rtx_insn *tail,
basic_block *target_bb)
reemit_notes (insn);
   last_scheduled_insn = insn;
 }
+  df_recompute_luids(*target_bb);

   scheduled_insns.truncate (0);
 }
diff --git a/gcc/regstat.c b/gcc/regstat.c
index 4da9b7c..c6cefb11 100644
--- a/gcc/regstat.c
+++ b/gcc/regstat.c
@@ -324,6 +324,7 @@ regstat_bb_compute_calls_crossed (unsigned int bb_index,
bitmap live)

   FOR_BB_INSNS_REVERSE (bb, insn)
 {
+  gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ());
   struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
   unsigned int regno;

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 8a1d414..5d8 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -4673,6 +4673,7 @@ sel_restore_notes (void)
if (NONDEBUG_INSN_P (insn))
  reemit_notes (insn);

+ df_recompute_luids (first);
   first = first->next_bb;
}
   while (first != last);

[Bug middle-end/92410] Invalid access to df->insns[] in regstat_bb_compute_calls_crossed (caught by hwasan)

2019-11-08 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410

--- Comment #6 from Matthew Malcomson  ---
I believe the problem is that `remove_notes` followed by `reemit_notes` can
generate these notes with a different UID.

When `reemit_notes` adds the new note, the dataflow information is not updated
automatically because `add_insn_before` only updates the information for
INSN_P(insn).

Hence the later lookup of this dataflow information is problematic.

I'm not sure whether there's any pre-existing "should not use dataflow queries
on notes" rule.  If there is, then the regstat_bb_compute_calls_crossed
function should be modified to check for NONDEBUG_INSN_P and continue earlier
on its loop.

If there isn't such a rule then I guess the best approach would be to ensure we
call `df_insn_create_insn_record` whenever calling `emit_note_before` or
`emit_note_after` once the dataflow information has been created.
(assuming that notes don't need the information to be populated since
`df_insn_rescan` seems to ignore notes).

I've tried both moving the check for NONDEBUG_INSN_P in
`regstat_bb_compute_calls_crossed` and adding a call to
`df_insn_create_insn_record` into `reemit_notes` on a cross-compiler and both
pass the testcase Martin found.

[Bug middle-end/92410] Invalid access to df->insns[] in regstat_bb_compute_calls_crossed (caught by hwasan)

2019-11-08 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410

--- Comment #7 from Matthew Malcomson  ---
(In reply to Matthew Malcomson from comment #6)
> I'm not sure whether there's any pre-existing "should not use dataflow
> queries on notes" rule.  If there is, then the
> regstat_bb_compute_calls_crossed function should be modified to check for
> NONDEBUG_INSN_P and continue earlier on its loop.

I now see that `df_bb_refs_record` generates insn info for notes (but leaves it
mostly zeroed out).  I figure doing the same for the notes emitted by
`reemit_notes` seems reasonable.


Currently bootstrapping and regtesting (both with HWASAN and without) the
following patch.


diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 41cf1f3..2e1a84f 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -5433,6 +5433,7 @@ reemit_notes (rtx_insn *insn)

  last = emit_note_before (note_type, last);
  remove_note (insn, note);
+ df_insn_create_insn_record (last);
}
 }
 }
diff --git a/gcc/regstat.c b/gcc/regstat.c
index 4da9b7c..c6cefb11 100644
--- a/gcc/regstat.c
+++ b/gcc/regstat.c
@@ -324,6 +324,7 @@ regstat_bb_compute_calls_crossed (unsigned int bb_index,
bitmap live)

   FOR_BB_INSNS_REVERSE (bb, insn)
 {
+  gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ());
   struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
   unsigned int regno;

[Bug middle-end/92410] Invalid access to df->insns[] in regstat_bb_compute_calls_crossed (caught by hwasan)

2019-12-09 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410

--- Comment #8 from Matthew Malcomson  ---
Author: matmal01
Date: Mon Dec  9 12:03:53 2019
New Revision: 279124

URL: https://gcc.gnu.org/viewcvs?rev=279124&root=gcc&view=rev
Log:
[mid-end] Add notes to dataflow insn info when re-emitting (PR92410)

In scheduling passes, notes are removed with `remove_notes` before the
scheduling is done, and added back in with `reemit_notes` once the
scheduling has been decided.

This process leaves the notes in the RTL chain with different insn uid's
than were there before.  Having different UID's (larger than the
previous ones) means that DF_INSN_INFO_GET(insn) will access outside of
the allocated array.

This has been seen in the `regstat_bb_compute_calls_crossed` function.
This patch adds an assert to the `regstat_bb_compute_calls_crossed`
function so that bad accesses here are caught instead of going
unnoticed, and then avoids the problem.

We avoid the problem by ensuring that new notes added by `reemit_notes` have an
insn record given to them.  This is done by adding a call to
`df_insn_create_insn_record` on each note added in `reemit_notes`.
`df_insn_create_insn_record` leaves this new record zeroed out, which appears
to be fine for notes (e.g. `df_bb_refs_record` already does not set
anything except the luid for notes, and notes have no dataflow information to
record).

We add the testcase that Martin found here
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410#c2 .
This testcase fails with the "regstat.c" change, and then succeeds with the
"haifa-sched.c" change.

There is a similar problem with labels, that the `gcc_assert` catches
when running regression tests in gcc.dg/fold-eqandshift-1.c and
gcc.c-torture/compile/pr32482.c.
This is due to the `cfg_layout_finalize` call in `bb-reorder.c` emitting
new labels, and these labels not having a dataflow df_insn_info member.

We solve this by manually calling `df_recompute_luids` on each basic
block once this pass has finished.

Testing done:
  Ran regression tests on aarch64-none-linux-gnu cross compiler.
  Bootstrapped and ran tests on aarch64-none-linux-gnu native.

gcc/ChangeLog:

2019-12-09  Matthew Malcomson  

PR middle-end/92410
* bb-reorder.c (pass_reorder_blocks::execute): Recompute
dataflow luids once basic blocks have been reordered.
* haifa-sched.c (reemit_notes): Create df insn record for each
new note.
* regstat.c (regstat_bb_compute_calls_crossed): Assert every
insn has an insn record before trying to use it.

gcc/testsuite/ChangeLog:

2019-12-09  Matthew Malcomson  

PR middle-end/92410
* gcc.dg/torture/pr92410.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/torture/pr92410.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/bb-reorder.c
trunk/gcc/haifa-sched.c
trunk/gcc/regstat.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/92882] [10 Regression] ICE in regstat_bb_compute_calls_crossed, at regstat.c:327 since r279124

2019-12-10 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92882

--- Comment #3 from Matthew Malcomson  ---
(In reply to Jakub Jelinek from comment #2)
> The question is if we just have some exception that for new labels etc. we
> don't grow the tables, while for insns we always do.  If yes, the patch is a
> real fix, if not, we can wait for further ICEs on the same assertion.
> 

That's a good point.

At the time I proposed r279124 I decided that the comment inside
`df_recompute_luids` was enough of an indication that labels should be in the
tables, but another interpretation could be that we already know labels can be
outside of a table for some period of computation.



void
df_recompute_luids (basic_block bb)
{
  rtx_insn *insn;
  int luid = 0;

  df_grow_insn_info ();

  /* Scan the block an insn at a time from beginning to end.  */
  FOR_BB_INSNS (bb, insn)
{
  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
  /* Inserting labels does not always trigger the incremental
 rescanning.  */
  if (!insn_info)
{
  gcc_assert (!INSN_P (insn));
  insn_info = df_insn_create_insn_record (insn);
}

  DF_INSN_INFO_LUID (insn_info) = luid;
  if (INSN_P (insn))
luid++;
}
}

[Bug c++/92919] New: invalid memory access in wide_str_to_charconst when running ucn2.C testcase (caught by hwasan)

2019-12-12 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92919

Bug ID: 92919
   Summary: invalid memory access in wide_str_to_charconst when
running ucn2.C testcase (caught by hwasan)
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64-none-linux-gnu

When running the ucn2.C testcase, hwasan catches an invalid access in the
function `wide_str_to_charconst`.

The problematic line is:
const char16_t p = u'\U00110003';

It seems this is to do with the size of the constant, since the line below does
not trigger this invalid access.
const char16_t j = u'\U0001F914';
yet changing that constant to the below does.
const char16_t j = u'\U0011F914';

HWASAN output is below.


==9608==ERROR: HWAddressSanitizer: tag-mismatch on address 0xefdf80bf at pc
0x00651270
READ of size 1 at 0xefdf80bf tags: 5f/79 (ptr/mem) in thread T0
#0 0x65126c in SigTrap<0>
../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_checks.h:27
#1 0x65126c in CheckAddress<(__hwasan::ErrorAction)0,
(__hwasan::AccessType)0, 0>
../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_checks.h:88
#2 0x65126c in __hwasan_load1
../../../../gcc-pdtl/libsanitizer/hwasan/hwasan.cpp:469
#3 0x2b143dc in wide_str_to_charconst ../../gcc-pdtl/libcpp/charset.c:1980
#4 0x2b143dc in cpp_interpret_charconst(cpp_reader*, cpp_token const*,
unsigned int*, int*) ../../gcc-pdtl/libcpp/charset.c:2045
#5 0xb31a48 in lex_charconst ../../gcc-pdtl/gcc/c-family/c-lex.c:1368
#6 0xb35964 in c_lex_with_flags(tree_node**, unsigned int*, unsigned char*,
int) ../../gcc-pdtl/gcc/c-family/c-lex.c:617
#7 0x89c6bc in cp_lexer_get_preprocessor_token
../../gcc-pdtl/gcc/cp/parser.c:807
#8 0x943cc0 in cp_lexer_new_main ../../gcc-pdtl/gcc/cp/parser.c:654
#9 0x943cc0 in cp_parser_new ../../gcc-pdtl/gcc/cp/parser.c:3968
#10 0x943cc0 in c_parse_file() ../../gcc-pdtl/gcc/cp/parser.c:42963
#11 0xb50c90 in c_common_parse_file()
../../gcc-pdtl/gcc/c-family/c-opts.c:1185
#12 0x16a49fc in compile_file ../../gcc-pdtl/gcc/toplev.c:458
#13 0x6466bc in do_compile ../../gcc-pdtl/gcc/toplev.c:2280
#14 0x6466bc in toplev::main(int, char**) ../../gcc-pdtl/gcc/toplev.c:2419
#15 0x649468 in main ../../gcc-pdtl/gcc/main.c:39
#16 0x93dd689c in __libc_start_main
(/lib/aarch64-linux-gnu/libc.so.6+0x1f89c)

[0xefdf80a0,0xefdf80c0) is a small unallocated heap chunk; size: 32
offset: 31
0xefdf80bf is located 1 bytes to the left of 2-byte region
[0xefdf80c0,0xefdf80c2)
allocated here:
#0 0x652bc0 in __sanitizer_realloc
../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_interceptors.cpp:146
#1 0x2b95f40 in xrealloc ../../gcc-pdtl/libiberty/xmalloc.c:179
#2 0x2b122ec in cpp_interpret_string_1 ../../gcc-pdtl/libcpp/charset.c:1753
#3 0x2b14284 in cpp_interpret_string(cpp_reader*, cpp_string const*,
unsigned long, cpp_string*, cpp_ttype) ../../gcc-pdtl/libcpp/charset.c:1784
#4 0x2b14284 in cpp_interpret_charconst(cpp_reader*, cpp_token const*,
unsigned int*, int*) ../../gcc-pdtl/libcpp/charset.c:2036
#5 0xb31a48 in lex_charconst ../../gcc-pdtl/gcc/c-family/c-lex.c:1368
#6 0xb35964 in c_lex_with_flags(tree_node**, unsigned int*, unsigned char*,
int) ../../gcc-pdtl/gcc/c-family/c-lex.c:617
#7 0x89c6bc in cp_lexer_get_preprocessor_token
../../gcc-pdtl/gcc/cp/parser.c:807
#8 0x943cc0 in cp_lexer_new_main ../../gcc-pdtl/gcc/cp/parser.c:654
#9 0x943cc0 in cp_parser_new ../../gcc-pdtl/gcc/cp/parser.c:3968
#10 0x943cc0 in c_parse_file() ../../gcc-pdtl/gcc/cp/parser.c:42963
#11 0xb50c90 in c_common_parse_file()
../../gcc-pdtl/gcc/c-family/c-opts.c:1185
#12 0x16a49fc in compile_file ../../gcc-pdtl/gcc/toplev.c:458
#13 0x6466bc in do_compile ../../gcc-pdtl/gcc/toplev.c:2280
#14 0x6466bc in toplev::main(int, char**) ../../gcc-pdtl/gcc/toplev.c:2419
#15 0x649468 in main ../../gcc-pdtl/gcc/main.c:39
#16 0x93dd689c in __libc_start_main
(/lib/aarch64-linux-gnu/libc.so.6+0x1f89c)
#17 0x64cb24 
(/home/ubuntu/working-directory/gcc-hwasan-install/libexec/gcc/aarch64-unknown-linux-gnu/10.0.0/cc1plus+0x64cb24)

Thread: T0 0xeffe2000 stack: [0xe544a000,0xe944a000) sz: 67108864
tls: [0x9402,0x94020850)
Memory tags around the buggy address (one tag corresponds to 16 bytes):
   0d  00  09  00  09  00  e7  09  09  00  e2  0c  9a  0c  0a  4a   
   e7  0c  0d  00  0d  00  05  00  0d  00  08  00  08  00  08  00   
   08  00  0b  00  0b  00  0b  00  0b  00  0e  00  0e  00  05  00   
   0e  00  08  00  08  00  09  00  08  00  0c  00  0c  00  09  00   
   0c  00  0c  00  0c  00  08  00  0c  00  0b  00  0b  0

[Bug rtl-optimization/88904] New: Basic block incorrectly skipped in jump threading.

2019-01-18 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88904

Bug ID: 88904
   Summary: Basic block incorrectly skipped in jump threading.
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

When compiling the attached code, with an arm-none-eabi cross compiler from
trunk, 

arm-none-eabi-gcc -march=armv6-m -S test.c -o test.s -Os

incorrect assembly is generated, which leads to the second assert always being
triggered.


This happens since revision r266734 which introduced a new pass running
jump-threading just after reload.

For the attached testcase this triggers a latent bug in the `thread_jump`
function.

The combine pass can modify a jump_insn so that its pattern is of the form
(parallel [
(set (pc) ...)
(clobber (scratch))])

which after reload can end up in the form 
(parallel [
(set (pc) (if_then_else (

[Bug rtl-optimization/88904] Basic block incorrectly skipped in jump threading.

2019-01-18 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88904

--- Comment #1 from Matthew Malcomson  ---
Created attachment 45458
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45458&action=edit
Problematic testcase

[Bug middle-end/88950] New: stack_protect_prologue can be reordered by sched1 around memory accesses

2019-01-21 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950

Bug ID: 88950
   Summary: stack_protect_prologue can be reordered by sched1
around memory accesses
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

I've found a testcase where the stack protector code generated through
`-fstack-protector-all` doesn't actually protect anything.

With the following testcase

int foo (int a, int b, int c) {
char buf[64];
buf[a] = 1;
buf[b] = c;

// Just add something so that the assignments above have some
// observable behaviour.
int retval = 0;
for (size_t i = 0; i < 32; i++)
{
retval += buf[i];
}
return retval;
}


When compiling for aarch64 with
gcc -fstack-protector-all -g -S stack-reorder.c  -o test.s -O3 -fdump-rtl-final
(with ~gcc (GCC) 9.0.0 20181214 (experimental)~)

We get an RTL dump on the final pass that has the snippet
(insn 8 21 130 (parallel [
(set (mem/v/f/c:DI (plus:DI (reg/f:DI 31 sp)
(const_int 88 [0x58])) [1 D.4227+0 S8 A64])
(unspec:DI [
(mem/v/f/c:DI (reg/f:DI 0 x0 [116]) [1
__stack_chk_guard+0 S8 A64])
] UNSPEC_SP_SET))
(set (reg:DI 1 x1 [141])
(const_int 0 [0]))
]) "stack-reorder.c":3:31 1046 {stack_protect_set_di}
 (expr_list:REG_UNUSED (reg:DI 1 x1 [141])
(nil)))
(note 130 8 117 (var_location b (entry_value:SI (reg:SI 1 x1 [ b ])))
NOTE_INSN_VAR_LOCATION)
(note 117 130 118 stack-reorder.c:4 NOTE_INSN_BEGIN_STMT)
(note 118 117 119 stack-reorder.c:5 NOTE_INSN_BEGIN_STMT)
(note 119 118 120 stack-reorder.c:6 NOTE_INSN_BEGIN_STMT)
(note 120 119 131 stack-reorder.c:10 NOTE_INSN_BEGIN_STMT)
(note 131 120 121 (var_location retval (const_int 0 [0]))
NOTE_INSN_VAR_LOCATION)
(note 121 131 144 stack-reorder.c:11 NOTE_INSN_BEGIN_STMT)
(note 144 121 122 0xb76fd960 NOTE_INSN_BLOCK_BEG)
(note 122 144 132 stack-reorder.c:11 NOTE_INSN_BEGIN_STMT)
(note 132 122 123 (var_location retval (nil)) NOTE_INSN_VAR_LOCATION)
(note 123 132 145 stack-reorder.c:13 NOTE_INSN_BEGIN_STMT)
(note 145 123 75 0xb76fd960 NOTE_INSN_BLOCK_END)
(insn:TI 75 145 133 (parallel [
(set (reg:DI 1 x1 [137])
(unspec:DI [
(mem/v/f/c:DI (plus:DI (reg/f:DI 31 sp)
(const_int 88 [0x58])) [1 D.4227+0 S8 A64])
(mem/v/f/c:DI (reg/f:DI 0 x0 [116]) [1
__stack_chk_guard+0 S8 A64])
] UNSPEC_SP_TEST))
(clobber (reg:DI 2 x2 [142]))
]) "stack-reorder.c":16:1 1048 {stack_protect_test_di}
 (expr_list:REG_DEAD (reg/f:DI 0 x0 [116])
(expr_list:REG_UNUSED (reg:DI 2 x2 [142])
(nil


In this snippet the stack protect set and test patterns are right next to each
other, causing the stack protector to essentially do nothing.
The RTL insns to set the two elements in `buf[]` are before this snippet.

The stack_protect_set and stack_protect_test patterns are put together in the
sched1 pass (as seen by the change in the RTL between the previous dump and
that
one).

I would like to know what is supposed to stop RTL from the stack_protect_set
pattern from being reordered around the code it protects like this?
I don't believe aarch64 is doing anything special here -- the stack protect set
and test patterns are very similar to those of other backends.

I recognise this is an unlikely pattern of code and that it doesn't present as
much of a security risk as things like calling memcpy or setting memory through
some sort of loop.

[Bug middle-end/88950] stack_protect_prologue can be reordered by sched1 around memory accesses

2019-01-21 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950

--- Comment #1 from Matthew Malcomson  ---
Created attachment 45480
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45480&action=edit
Testcase

[Bug middle-end/88950] stack_protect_prologue can be reordered by sched1 around memory accesses

2019-01-21 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950

--- Comment #3 from Matthew Malcomson  ---
aarch64 (both aarch64-none-linux-gnu and aarch64-none-elf)

[Bug rtl-optimization/88904] [9 Regression] Basic block incorrectly skipped in jump threading.

2019-01-21 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88904

--- Comment #3 from Matthew Malcomson  ---
I agree Jakub -- I've been testing a patch that does the same thing and
everything seems to be working (though my patch was not as neat).

[Bug middle-end/88950] stack_protect_prologue can be reordered by sched1 around memory accesses

2019-01-22 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950

Matthew Malcomson  changed:

   What|Removed |Added

  Known to fail||5.4.0

--- Comment #5 from Matthew Malcomson  ---
This problem has been around for a long time -- I have seen the same
fundamental problem on gcc 5.4 (when looking for a version to put in the "known
to work" field).

With   "gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" on the same
testcase, the stack_protect_test pattern gets reordered to before the second
memory access (the "buf[b] = c" line), and again the stack protection does not
guard this memory access.


(insn:TI 8 126 16 (parallel [
(set (mem/v/f/c:DI (plus:DI (reg/f:DI 29 x29)
(const_int 88 [0x58])) [1 D.2834+0 S8 A64])
(unspec:DI [
(mem/v/f/c:DI (reg/f:DI 3 x3 [100]) [1
__stack_chk_guard+0 S8 A64])
] UNSPEC_SP_SET))
(set (reg:DI 5 x5 [126])
(const_int 0 [0]))
]) stack-reorder.c:1 864 {stack_protect_set_di}
 (expr_list:REG_UNUSED (reg:DI 5 x5 [126])
(nil)))
(insn:TI 16 8 71 (set (mem/j:QI (plus:DI (reg:DI 0 x0 [105])
(const_int 4016 [0xfb0])) [0 buf S1 A8])
(reg:QI 4 x4 [106])) stack-reorder.c:3 45 {*movqi_aarch64}
 (expr_list:REG_DEAD (reg:QI 4 x4 [106])
(expr_list:REG_DEAD (reg:DI 0 x0 [105])
(nil
(insn 71 16 22 (parallel [
(set (reg:DI 3 x3 [125])
(unspec:DI [
(mem/v/f/c:DI (plus:DI (reg/f:DI 29 x29)
(const_int 88 [0x58])) [1 D.2834+0 S8 A64])
(mem/v/f/c:DI (reg/f:DI 3 x3 [100]) [1
__stack_chk_guard+0 S8 A64])
] UNSPEC_SP_TEST))
(clobber (reg:DI 0 x0 [127]))
]) stack-reorder.c:14 866 {stack_protect_test_di}
 (expr_list:REG_UNUSED (reg:DI 0 x0 [127])
(nil)))
(insn:TI 22 71 140 (set (mem/j:QI (plus:DI (reg:DI 1 x1 [110])
(const_int 4016 [0xfb0])) [0 buf S1 A8])
(reg:QI 2 x2 [ c ])) stack-reorder.c:4 45 {*movqi_aarch64}
 (expr_list:REG_DEAD (reg:QI 2 x2 [ c ])
(expr_list:REG_DEAD (reg:DI 1 x1 [110])
(nil

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-31 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

Matthew Malcomson  changed:

   What|Removed |Added

 CC||matmal01 at gcc dot gnu.org

--- Comment #29 from Matthew Malcomson  ---
Hi Jakub,

I've been working on a patch that does very similar to the draft patch posted
above, and I notice a few things I've tried to avoid in it.
I doubt there are any actual bugs, since I don't know if the patterns that
trigger actual faults can occur at the moment.



Using the `address_operand` predicate and 'p' constraint to ensure the address
is a valid address would use the mode SImode of the operand rather than
checking
it's valid for the DImode of the emitted ldrd.

If this happens we generate an ICE in the `adjust_address` call just before
`output_move_double`.

I don't know if such a pattern can actually be generated, but we could use
`arm_legitimate_address_p (DImode, XEXP (operands[1], 0), true)` in the
condition to avoid it just in case.



There's a similar problem to the `address_operand` one above with using the
`arm_count_output_move_double_insns` function.

It's called on the original operands, which means it eventually calls
`output_move_double` with the first two operands (which are in SImode).

This function has some calls to `reg_overlap_mentioned_p`, which depends on the
number of hard registers for a given registers mode.

I've only found cases where the `arm_count_output_move_double_insns` function
returns something other than what it should in cases that only match because of
the `address_operand` problem above.

This could be replaced by a wrapper that generates DImode registers
specifically
for checking this.

---

I think generation of patterns of the form 
(plus:SI (plus:SI (reg) (const_int)) (const_int)) 
which can happen with these peepholes isn't very nice.
I can't find any constraint against these patterns in the canonicalization
rules (maybe there should be?) so I can't say this is an actual problem.


As an example: the following
int __RTL (startwith ("peephole2")) foo_x4 (int *a)
{
(function "foo_x4"
  (insn-chain
(cnote 1 NOTE_INSN_DELETED)
(block 2
  (edge-from entry (flags "FALLTHRU"))
  (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
  (cinsn 101 (set (reg:SI r2)
  (mem/c:SI (plus:SI (reg:SI r0) (const_int 8)) [0 S4
A64])) "/home/matmal01/test.c":18)
  (cinsn 102 (set (reg:SI r3)
  (mem/c:SI (plus:SI (reg:SI r0) (const_int 12)) [0 S4
A32])) "/home/matmal01/test.c":18)
  (cinsn 103 (set (reg:SI r0)
  (plus:SI (reg:SI r2) (reg:SI r3)))
"/home/matmal01/test.c":18)
  (edge-to exit (flags "FALLTHRU"))
) ;; block 2
  ) ;; insn-chain
  (crtl
(return_rtx 
  (reg/i:SI r0)
) ;; return_rtx
  ) ;; crtl
) ;; function "main"
}

Produces
(insn 104 3 103 2 (parallel [
(set (reg:SI 2 r2)
(mem/c:SI (plus:SI (reg:SI 0 r0)
(const_int 8 [0x8])) [0 S4 S4 A64]))
(set (reg:SI 3 r3)
(mem/c:SI (plus:SI (plus:SI (reg:SI 0 r0)
(const_int 8 [0x8]))
(const_int 4 [0x4])) [0 S4 S4 A32]))
]) -1
 (nil))


Maybe we could use the existing operands, and match with
`rtx_equal_p (..., plus_constant (...))`
so that the plus_constant can take care of adding the constants together.
This is what we do in the load_pair patterns for aarch64.




There are a few other tidy-up points around the define_insn patterns, but
overall I believe they can be merged into one pattern.
The difference between the 'q' and 'r' constraints are using either CORE_REGS
or
GENERAL_REGS, where CORE_REGS allows r13 and GENERAL_REGS doesn't.
I guess this is from a line in infocenter that mentions r12 is strongly
recommended to not be used as the first register for ldrdb, as this is stopped
by requiring both the first and second register to not be r13.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cihjffga.html

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-31 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #31 from Matthew Malcomson  ---
(In reply to Jakub Jelinek from comment #30)
> (In reply to Matthew Malcomson from comment #29)
> > I've been working on a patch that does very similar to the draft patch 
> > posted
> > above, and I notice a few things I've tried to avoid in it.
> > I doubt there are any actual bugs, since I don't know if the patterns that
> > trigger actual faults can occur at the moment.
> > 
> > 
> > 
> > Using the `address_operand` predicate and 'p' constraint to ensure the
> > address
> > is a valid address would use the mode SImode of the operand rather than
> > checking
> > it's valid for the DImode of the emitted ldrd.
> 
> Sure, but does it really matter?
> This is a post reload pattern created by the peephole2s, so nothing that can
> be matched out of the blue sky like combiner normally matches.
> So, if it didn't pass the conditions in the peephole2s, the patterns
> wouldn't be created.

True -- as I mentioned I don't know if a problematic pattern could actually
occur, so I doubt this is actually a problem.

> Are there any addresses that pass arm_legitimate_address_p (DImode, x, true)
> and fail address_operand (x, SImode)?  From brief skimming I couldn't find
> anything.
> So, would you be happy if the && arm_legitimate_address_p (DImode, XEXP
> (operands[n], 0), true)
> condition is added to the insn conditions (after the rtx_equal_p check)?

That sounds good to me.

> 
> > There's a similar problem to the `address_operand` one above with using the
> > `arm_count_output_move_double_insns` function.
> > 
> > It's called on the original operands, which means it eventually calls
> > `output_move_double` with the first two operands (which are in SImode).
> > 
> > This function has some calls to `reg_overlap_mentioned_p`, which depends on
> > the
> > number of hard registers for a given registers mode.
> > 
> > I've only found cases where the `arm_count_output_move_double_insns` 
> > function
> > returns something other than what it should in cases that only match because
> > of
> > the `address_operand` problem above.
> > 
> > This could be replaced by a wrapper that generates DImode registers
> > specifically
> > for checking this.
> 
> For non-vfp or iwmmxt, the length is always 8, are there cases in the vfp
> insn that the length is not 8?

I believe the length *can* be 4 non-vfp, vfp, or iwmmxt (the case below
produces a single ldrd when compiled with each of them).

int __RTL (startwith ("peephole2")) foo_x4 (int *a)
{
(function "foo_x4"
  (insn-chain
(cnote 1 NOTE_INSN_DELETED)
(block 2
  (edge-from entry (flags "FALLTHRU"))
  (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
  (cinsn 101 (set (reg:SI r2)
  (mem/c:SI (plus:SI (reg:SI r0) (const_int 8)) [0 S4
A64])) "/home/matmal01/test.c":18)
  (cinsn 102 (set (reg:SI r3)
  (mem/c:SI (plus:SI (reg:SI r0) (const_int 12)) [0 S4
A32])) "/home/matmal01/test.c":18)
  (cinsn 103 (set (reg:SI r0)
  (plus:SI (reg:SI r2) (reg:SI r3)))
"/home/matmal01/test.c":18)
  (edge-to exit (flags "FALLTHRU"))
) ;; block 2
  ) ;; insn-chain
  (crtl
(return_rtx 
  (reg/i:SI r0)
) ;; return_rtx
  ) ;; crtl
) ;; function "main"
}




Something else I've just noticed:
When compiling for vfp or iwmmxt, the ldm2_ define_insn matches the simpler
case below as it comes first in the md order.
That means we get a ldm instruction instead of the ldrd.

int __RTL (startwith ("peephole2")) foo_x5 (int *a)
{
(function "foo_x5"
  (insn-chain
(cnote 1 NOTE_INSN_DELETED)
(block 2
  (edge-from entry (flags "FALLTHRU"))
  (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
  (cinsn 101 (set (reg:SI r2)
  (mem/c:SI (reg:SI r0) [0 S4 A64]))
"/home/matmal01/test.c":18)
  (cinsn 102 (set (reg:SI r3)
  (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4
A32])) "/home/matmal01/test.c":18)
  (cinsn 103 (set (reg:SI r0)
  (plus:SI (reg:SI r2) (reg:SI r3)))
"/home/matmal01/test.c":18)
  (edge-to exit (flags "FALLTHRU"))
) ;; block 2
  ) ;; insn-chain
  (crtl
(return_rtx 
  (reg/i:SI r0)
) ;; return_rtx
  ) ;; crtl
) ;; function "main"
}

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #32 from Matthew Malcomson  ---
Created attachment 45584
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45584&action=edit
Single define_insn version of above patch

FWIW I've attached the patch I'd made.

The only interesting differences are that I'd added only one define_insn as I
don't believe the existing patterns' difference in constraints is needed and I
made some RTL testcases.


(I've just now added the testcase you found).

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #34 from Matthew Malcomson  ---
Yes, I needed to redo that check for an offset of 4 -- I compared the
expression of the first MEM with the result of `plus_constant` with 4 on the
expression of the second MEM in the condition.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #37 from Matthew Malcomson  ---
Good point (and interesting about the HOST_WIDE_INT_MIN exception -- I didn't
know that).

To avoid duplication of effort would you prefer I make the change or do you
want to handle it?

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #39 from Matthew Malcomson  ---
(In reply to Jakub Jelinek from comment #38)
> I don't mind if you take over, I don't really have good opportunities to
> test on arm anyway.  Though, do you have copyright assignment on file (or
> covered by ARM or Linaro or similar assignments)?

OK, will do.

I'm covered by the ARM assignment.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-07 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #42 from Matthew Malcomson  ---
Author: matmal01
Date: Thu Feb  7 14:54:15 2019
New Revision: 268644

URL: https://gcc.gnu.org/viewcvs?rev=268644&root=gcc&view=rev
Log:
[Patch] [arm] Fix 88714, Arm LDRD/STRD peepholes.

These peepholes match a pair of SImode loads or stores that can be
implemented with a single LDRD or STRD instruction.
When compiling for TARGET_ARM, these peepholes originally created a set
pattern in DI mode to be caught by movdi patterns.

This approach failed to take into account the possibility that the two
matched insns operated on memory with different aliasing information.
The peepholes lost the aliasing information on one of the insns, which
could then cause the scheduler to make an invalid transformation.

This patch changes the peepholes so they generate a PARALLEL expression
of the two relevant loads or stores, which means the aliasing
information of both is kept.  Such a PARALLEL pattern is what the
peepholes currently produce for TARGET_THUMB2.

In order to match these new insn patterns, we add two new define_insn's.  These
define_insn's use the same checks as the peepholes to find valid insns.

Note that the patterns now created by the peepholes for LDRD and STRD
are very similar to those created by the peepholes for LDM and STM.
Many patterns could be matched by the LDM and STM define_insns, which
means we rely on the order the define_insn patterns are defined in the
machine description, with those for LDRD/STRD defined before those for
LDM/STM.

The difference between the peepholes for LDRD/STRD and those for LDM/STM
are mainly that those for LDRD/STRD have some logic to ensure that the
two registers are consecutive and the first one is even.

Bootstrapped and regtested on arm-none-linux-gnu.
Demonstrated fix of bug 88714 by bootstrapping on armv7l.


gcc/ChangeLog:

2019-02-07  Matthew Malcomson  
Jakub Jelinek  

PR bootstrap/88714
* config/arm/arm-protos.h (valid_operands_ldrd_strd,
arm_count_ldrdstrd_insns): New declarations.
* config/arm/arm.c (mem_ok_for_ldrd_strd): Remove broken handling of
MINUS.
(valid_operands_ldrd_strd): New function.
(arm_count_ldrdstrd_insns): New function.
* config/arm/ldrdstrd.md: Change peepholes to generate PARALLEL SImode
sets instead of single DImode set and define new insns to match this.

gcc/testsuite/ChangeLog:

2019-02-07  Matthew Malcomson  
Jakub Jelinek  

PR bootstrap/88714
* gcc.c-torture/execute/pr88714.c: New test.
* gcc.dg/rtl/arm/ldrd-peepholes.c: New test.

Added:
trunk/gcc/testsuite/gcc.c-torture/execute/pr88714.c
trunk/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm-protos.h
trunk/gcc/config/arm/arm.c
trunk/gcc/config/arm/ldrdstrd.md
trunk/gcc/testsuite/ChangeLog

[Bug target/89324] [9 Regression] ICE in extract_constrain_insn, at recog.c:2211 on aarch64

2019-02-13 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89324

--- Comment #3 from Matthew Malcomson  ---
(In reply to ktkachov from comment #2)
> The sub3_compare1_imm pattern was introduced for GCC 9. It's probably
> something going wrong with the constraints. Matthew, could you take a look
> please?

On first blush it looks like the define_peephole2 generating this instruction
allows the stack pointer while the 'r' constraint in the pattern doesn't accept
it.

A quick check of only allowing GENERAL_REGS registers in the peephole indeed
stops the generation of this instruction and hence avoids the bug.

I haven't yet checked whether the pattern should allow the stack pointer or
not.

[Bug target/89324] [9 Regression] ICE in extract_constrain_insn, at recog.c:2211 on aarch64

2019-02-18 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89324

--- Comment #4 from Matthew Malcomson  ---
There were similar problems in handling the stack pointer with subs/adds
instructions elsewhere in the aarch64 backend.

Patch proposed & being worked on here:
https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01458.html

[Bug target/89324] [9 Regression] ICE in extract_constrain_insn, at recog.c:2211 on aarch64

2019-02-22 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89324

--- Comment #5 from Matthew Malcomson  ---
Author: matmal01
Date: Fri Feb 22 16:35:22 2019
New Revision: 269122

URL: https://gcc.gnu.org/viewcvs?rev=269122&root=gcc&view=rev
Log:
Handle stack pointer with SUBS/ADDS instructions.

In general the stack pointer was not handled for many SUBS/ADDS patterns in
aarch64.md.
Both the "extended register" and "immediate" forms allow the stack pointer to
be
used as the source register, while no form allows the stack pointer for the
destination register.

The define_insn patterns generating ADDS/SUBS did not allow the stack pointer
for any operand, while the define_peephole2 patterns that generated RTX to be
matched by these patterns allowed the stack pointer for any operand.

The patterns are fixed by adding the 'k' constraint for the first source
operand
to all define_insns that generate the ADDS/SUBS "extended register" and
"immediate" forms (but not the "shifted register" form).

In peephole optimizations, constraint strings are ignored (see "(gccint) C
Constraint Interface" info node in the documentation), so the decision to act
or
not is based solely on the predicate and condition.
This patch introduces a new predicate "aarch64_general_reg" to be used in
define_peephole2 patterns where only GENERAL_REGS registers are acceptable and
uses that predicate in the peepholes that generate patterns for ADDS/SUBS.

Full bootstrap and regtest done on aarch64-none-linux-gnu.
Regression tests done on aarch64-none-linux-gnu and aarch64-none-elf cross
compiler.

OK for trunk?


gcc/ChangeLog:

2019-02-22  Matthew Malcomson  

PR target/89324
* config/aarch64/aarch64.md: Use aarch64_general_reg predicate on
destination register in peepholes generating patterns for ADDS/SUBS.
(add3_compare0,
*addsi3_compare0_uxtw, add3_compareC,
add3_compareV_imm, add3_compareV,
*adds__,
*subs__,
*adds__shift_,
*subs__shift_,
*adds__multp2, *subs__multp2,
*sub3_compare0, *subsi3_compare0_uxtw,
sub3_compare1): Allow stack pointer for source register.
* config/aarch64/predicates.md (aarch64_general_reg): New predicate.


gcc/testsuite/ChangeLog:

2019-02-22  Matthew Malcomson  

PR target/89324
* gcc.dg/rtl/aarch64/subs_adds_sp.c: New test.
* gfortran.fortran-torture/compile/pr89324.f90: New test.

Added:
trunk/gcc/testsuite/gcc.dg/rtl/aarch64/subs_adds_sp.c
trunk/gcc/testsuite/gfortran.fortran-torture/compile/pr89324.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64.md
trunk/gcc/config/aarch64/predicates.md
trunk/gcc/testsuite/ChangeLog

[Bug target/89324] [9 Regression] ICE in extract_constrain_insn, at recog.c:2211 on aarch64

2019-02-22 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89324

Matthew Malcomson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Matthew Malcomson  ---
Fixed on trunk.

[Bug ada/89493] Stack smashing on armv7hl

2019-03-08 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89493

Matthew Malcomson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-03-08
 CC||matmal01 at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Matthew Malcomson  ---
I've reproduced manually using the reproducer and a bootstrapped gcc at r268766

gcc r265397 does not reproduce the problem.

Hence I'm marking this as a regression.

Between these two versions bootstrap with the configure line below fails.

../gcc-source/configure --enable-bootstrap
--enable-languages=c,c++,fortran,objc,obj-c++,ada,lto
--prefix=${HOME}/gcc-install --mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared
--enable-threads=posix --enable-checking=release --enable-multilib
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin
--enable-initfini-array --with-isl --enable-gnu-indirect-function
--disable-sjlj-exceptions --with-tune=generic-armv7-a --with-arch=armv7-a
--with-float=hard --with-fpu=vfpv3-d16 --with-abi=aapcs-linux
--build=armv7hl-redhat-linux-gnueabi

(mostly taken from the configure shown in `gcc -v` from the given package
version)

[Bug target/90024] New: [7/8/9 Regression] ICE on AArch32 NEON mov with TImode constant.

2019-04-09 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024

Bug ID: 90024
   Summary: [7/8/9 Regression] ICE on AArch32 NEON mov with TImode
constant.
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code, patch
  Severity: normal
  Priority: P3
 Component: target
  Assignee: matmal01 at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

Created attachment 46111
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46111&action=edit
Proposed fix

The below code causes an ICE for AArch32 targets with NEON at all optimisation
levels except -O0.


union a { 
  char b; 
  long long c; 
}; 
union a d; 
int g(int, union a, union a); 
void e() { 
  union a f[2] = {-1L}; 
  g(0, d, f[0]); 
} 


With the backtrace below.

$ arm-none-eabi-gcc -march=armv8-a -c test.c -O1 -mfloat-abi=hard
-mfpu=neon-fp-armv8
during RTL pass: final
test.c: In function 'e':
test.c:10:1: internal compiler error: in output_950, at config/arm/neon.md:89
   10 | }
  | ^
0x1352bfb output_950
   
/tmp/dgboter/bbs/rhev-vm4--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/config/arm/neon.md:89
0x8aafbd get_insn_template(int, rtx_insn*)
   
/tmp/dgboter/bbs/rhev-vm4--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:2071
  


I have a patch to fix the problem, creating a bugzilla report for tracking
purposes (patch added as attachment, the explanation will be added in
comments).

[Bug target/90024] [7/8/9 Regression] ICE on AArch32 NEON mov with TImode constant.

2019-04-09 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024

Matthew Malcomson  changed:

   What|Removed |Added

 Target||arm
  Known to work||4.9.0

--- Comment #1 from Matthew Malcomson  ---
The "*neon_mov" patterns for 128 bit sized quantities uses the "Dn"
constraint to match vmov.f32 and vmov.i patterns.

This constraint boils down to using the `neon_immediate_valid` function.
Once the constraint has matched, the output C statement asserts the same
function
passes.

The output C statement calls `neon_immediate_valid` with the mode taken from
the
iterator, while the constraint takes the mode from the operand.


In the above testcase the operand is a CONST_INT, which means the constraint
passes VOIDmode (treated the same as DImode in `neon_immediate_valid`), while
the C statement passes TImode (the mode of the iterator).

This causes second call to `neon_immediate_valid` to fail as the value provided
is only valid in DImode but not TImode, and that causes the ICE.


The attached patch splits the original "Dn" constraint into three new
constraints, "DN" for TImode CONST_INT, "Dn" for DImode CONST_INT, and "Dm" for
CONST_VECTOR.
This requires one extra alternative in the "*neon_mov" patterns, but
makes it clear from the constraint what mode is being used.

We use the "DN" constraint for the define_insn that matches TImode values, and
hence avoid the above problem.

[Bug target/90024] [7/8/9 Regression] ICE on AArch32 NEON mov with TImode constant.

2019-04-09 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024

--- Comment #2 from Matthew Malcomson  ---
Author: matmal01
Date: Tue Apr  9 11:39:59 2019
New Revision: 270226

URL: https://gcc.gnu.org/viewcvs?rev=270226&root=gcc&view=rev
Log:
Hi there,

The "*neon_mov" patterns for 128 bit sized quantities uses the "Dn"
constraint to match vmov.f32 and vmov.i patterns.
This constraint boils down to using the `neon_immediate_valid` function.
Once the constraint has matched, the output C statement asserts that function
passes.

The output C statement calls `neon_immediate_valid` with the mode taken from
the
iterator, while the constraint takes the mode from the operand.
This can cause a discrepency when the operand is a CONST_INT, as the constraint
passes VOIDmode which `neon_immediate_valid` treats as DImode, while the C
statement passes the mode of the iterator which can be TImode.
When this happens, the `neon_immediate_valid` can fail in the second call (if
e.g. the CONST_INT is a valid immediate in DImode but not TImode) which would
trigger the assertion.

The testcase added with this patch triggers this when compiled with an arm
cross
compiler using the command line below.
gcc -march=armv8-a -c neon-immediate-timode.c -O1 -mfloat-abi=hard
-mfpu=neon-fp-armv8

This patch splits the original "Dn" constraint into three new constraints, "DN"
for TImode CONST_INT, "Dn" for DImode CONST_INT, and "Dm" for CONST_VECTOR.
Splitting things up this way requires using one extra alternative in the
"*neon_mov" patterns, but makes it clear from the constraint what mode is
being used.

We also remove the behaviour of treating VOIDmode as DImode in
`neon_valid_immediate` since the original "Dn" constraint was the only place
that functionality was used.  VOIDmode is now never passed to that function.
An assertion has been added to the function to ensure this problem is caught
earlier on.

Bootstrapped on arm-none-linux-gnueabihf
Regtested on cross-compiler arm-none-eabi

gcc/ChangeLog:

2019-04-09  Matthew Malcomson  

PR target/90024
* config/arm/arm.c (neon_valid_immediate): Disallow VOIDmode parameter.
* config/arm/constraints.md (Dm, DN, Dn): Split previous Dn constraint
into three.
* config/arm/neon.md (*neon_mov): Account for TImode and DImode
differences directly.
(*smax3_neon, vashl3, vashr3_imm): Use Dm constraint.

gcc/testsuite/ChangeLog:

2019-04-09  Matthew Malcomson  

PR target/90024
* gcc.dg/torture/neon-immediate-timode.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/torture/neon-immediate-timode.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm.c
trunk/gcc/config/arm/constraints.md
trunk/gcc/config/arm/neon.md
trunk/gcc/testsuite/ChangeLog

[Bug target/90024] [7/8 Regression] ICE on AArch32 NEON mov with TImode constant.

2019-04-10 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024

--- Comment #3 from Matthew Malcomson  ---
Author: matmal01
Date: Wed Apr 10 13:34:54 2019
New Revision: 270253

URL: https://gcc.gnu.org/viewcvs?rev=270253&root=gcc&view=rev
Log:
Backport of r270226 from mainline to gcc-7-branch

The "*neon_mov" patterns for 128 bit sized quantities uses the "Dn"
constraint to match vmov.f32 and vmov.i patterns.
This constraint boils down to using the `neon_immediate_valid` function.
Once the constraint has matched, the output C statement asserts that function
passes.

The output C statement calls `neon_immediate_valid` with the mode taken from
the
iterator, while the constraint takes the mode from the operand.
This can cause a discrepency when the operand is a CONST_INT, as the constraint
passes VOIDmode which `neon_immediate_valid` treats as DImode, while the C
statement passes the mode of the iterator which can be TImode.
When this happens, the `neon_immediate_valid` can fail in the second call (if
e.g. the CONST_INT is a valid immediate in DImode but not TImode) which would
trigger the assertion.

The testcase added with this patch triggers this when compiled with an arm
cross
compiler using the command line below.
gcc -march=armv8-a -c neon-immediate-timode.c -O1 -mfloat-abi=hard
-mfpu=neon-fp-armv8

This patch splits the original "Dn" constraint into three new constraints, "DN"
for TImode CONST_INT, "Dn" for DImode CONST_INT, and "Dm" for CONST_VECTOR.
Splitting things up this way requires using one extra alternative in the
"*neon_mov" patterns, but makes it clear from the constraint what mode is
being used.

We also remove the behaviour of treating VOIDmode as DImode in
`neon_valid_immediate` since the original "Dn" constraint was the only place
that functionality was used.  VOIDmode is now never passed to that function.
An assertion has been added to the function to ensure this problem is caught
earlier on.

bootstrapped and regtested on arm-none-linux-gnueabihf

gcc/ChangeLog:

2019-04-10  Matthew Malcomson  

PR target/90024
* config/arm/arm.c (neon_valid_immediate): Disallow VOIDmode parameter.
* config/arm/constraints.md (Dm, DN, Dn): Split previous Dn constraint
into three.
* config/arm/neon.md (*neon_mov): Account for TImode and DImode
differences directly.
(*smax3_neon, vashl3, vashr3_imm): Use Dm constraint.

gcc/testsuite/ChangeLog:

2019-04-10  Matthew Malcomson  

PR target/90024
* gcc.dg/torture/neon-immediate-timode.c: New test.

Added:
branches/gcc-7-branch/gcc/testsuite/gcc.dg/torture/neon-immediate-timode.c
Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config/arm/arm.c
branches/gcc-7-branch/gcc/config/arm/constraints.md
branches/gcc-7-branch/gcc/config/arm/neon.md
branches/gcc-7-branch/gcc/testsuite/ChangeLog

[Bug target/90024] [7/8 Regression] ICE on AArch32 NEON mov with TImode constant.

2019-04-10 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024

--- Comment #4 from Matthew Malcomson  ---
Author: matmal01
Date: Wed Apr 10 13:41:21 2019
New Revision: 270254

URL: https://gcc.gnu.org/viewcvs?rev=270254&root=gcc&view=rev
Log:
Backport of r270226 from mainline to gcc-8-branch


The "*neon_mov" patterns for 128 bit sized quantities uses the "Dn"
constraint to match vmov.f32 and vmov.i patterns.
This constraint boils down to using the `neon_immediate_valid` function.
Once the constraint has matched, the output C statement asserts that function
passes.

The output C statement calls `neon_immediate_valid` with the mode taken from
the
iterator, while the constraint takes the mode from the operand.
This can cause a discrepency when the operand is a CONST_INT, as the constraint
passes VOIDmode which `neon_immediate_valid` treats as DImode, while the C
statement passes the mode of the iterator which can be TImode.
When this happens, the `neon_immediate_valid` can fail in the second call (if
e.g. the CONST_INT is a valid immediate in DImode but not TImode) which would
trigger the assertion.

The testcase added with this patch triggers this when compiled with an arm
cross
compiler using the command line below.
gcc -march=armv8-a -c neon-immediate-timode.c -O1 -mfloat-abi=hard
-mfpu=neon-fp-armv8

This patch splits the original "Dn" constraint into three new constraints, "DN"
for TImode CONST_INT, "Dn" for DImode CONST_INT, and "Dm" for CONST_VECTOR.
Splitting things up this way requires using one extra alternative in the
"*neon_mov" patterns, but makes it clear from the constraint what mode is
being used.

We also remove the behaviour of treating VOIDmode as DImode in
`neon_valid_immediate` since the original "Dn" constraint was the only place
that functionality was used.  VOIDmode is now never passed to that function.
An assertion has been added to the function to ensure this problem is caught
earlier on.

bootstrapped and regtested on arm-none-linux-gnueabihf

gcc/ChangeLog:

2019-04-10  Matthew Malcomson  

PR target/90024
* config/arm/arm.c (neon_valid_immediate): Disallow VOIDmode parameter.
* config/arm/constraints.md (Dm, DN, Dn): Split previous Dn constraint
into three.
* config/arm/neon.md (*neon_mov): Account for TImode and DImode
differences directly.
(*smax3_neon, vashl3, vashr3_imm): Use Dm constraint.

gcc/testsuite/ChangeLog:

2019-04-10  Matthew Malcomson  

PR target/90024
* gcc.dg/torture/neon-immediate-timode.c: New test.

Added:
branches/gcc-8-branch/gcc/testsuite/gcc.dg/torture/neon-immediate-timode.c
Modified:
branches/gcc-8-branch/gcc/ChangeLog
branches/gcc-8-branch/gcc/config/arm/arm.c
branches/gcc-8-branch/gcc/config/arm/constraints.md
branches/gcc-8-branch/gcc/config/arm/neon.md
branches/gcc-8-branch/gcc/testsuite/ChangeLog

[Bug target/90024] [7/8 Regression] ICE on AArch32 NEON mov with TImode constant.

2019-04-10 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024

Matthew Malcomson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Matthew Malcomson  ---
Fixed

[Bug target/90024] [7/8 Regression] ICE on AArch32 NEON mov with TImode constant.

2019-04-10 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024

Matthew Malcomson  changed:

   What|Removed |Added

   Target Milestone|7.5 |7.6

[Bug sanitizer/90414] New: [Feature] Implementing HWASAN (and eventually MTE)

2019-05-09 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90414

Bug ID: 90414
   Summary: [Feature] Implementing HWASAN (and eventually MTE)
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org,
marxin at gcc dot gnu.org, ramana at gcc dot gnu.org,
rearnsha at gcc dot gnu.org
  Target Milestone: ---

Hello,

I'm looking into how we can implement MTE in the compiler.
A productive first step could be implementing HWASAN for GCC, which does a
software implementation of MTE using the top-byte-ignore feature.

This has already been implemented in LLVM and the design can be found at the
link below.
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html


Hopefully we can make this change in such a way that will enable the use of MTE
in the future.


I don't know the best approach here, and would appreciate any feedback.
>From inspection it looks like most of the work is already handled by ASAN --
especially in finding all those places that need to be instrumented -- so I was
looking into what modifications would need to be made from that starting point.


I believe that tagging stack allocated memory can be done in a similar way to
ASAN by expanding the equivalent of ASAN_MARK in a relevant manner.

However, checking memory accesses seems to need a different approach to the
current ASAN one with ASAN_CHECK.

For both HWASAN and MTE we need to find the tag that a given memory access
should be done through.
In order to produce the best machine-code we would need to associate each stack
variable with a tag internally.
In the LLVM implementation this is done by generating a random tag for the
current stack, and associating each stack variable with an increment from this
tag.

Also, for MTE the access itself needs to be made with a tagged pointer, which
means the current method of adding instructions before a memory access can't be
used and instead we need to modify the memory access itself.


I have some very basic questions that I would appreciate any help in answering.

1) Where should such passes be put?
   I would guess that putting HWASAN and/or MTE passes in the same position as
   the ASAN passes and updating the SANOPT pass to handle any changes would be
   ok, but I don't have a good understanding of why they are in their current
   position.

2) Can we always find the base object that's being referenced from the gimple
   statement where memory is accessed or a pointer is created?
   If not, when is it problematic?
   Finding the base object is pretty fundamental to getting the tag for a
   pointer.
   It seems like this should be possible based on a reading of the
documentation
   and looking at the TREE_CODEs that the current ASAN `instrument_derefs`
   function works on.

   (ARRAY_REF -> first operand is the array
MEM_REF   -> first operand is the base
COMPONENT_REF -> first operand is the object
INDIRECT_REF  -> first operand is the pointer which should reference object
VAR_DECL  -> this is the object
BIT_FIELD_REF -> first operand is the object)

3) Would there be any obvious difficulties with a transformation of the form:
  _4 = big_arrayD.3771[num_3(D)]

  TO

  _6 = &big_arrayD.3771[num_3(D)];
  _7 = HWASAN_CHECK(6, _6, 4, 4);
  _4 = *_7;

   Instead of
  _4 = big_arrayD.3771[num_3(D)]

  TO

  _6 = &big_arrayD.3771[num_3(D)];
  ASAN_CHECK(6, _6, 4, 4);
  _4 = big_arrayD.3771[num_3(D)]

   which is what ASAN currently does.
   This new form would enable using MTE by allowing the check to modify the
   pointer that the access will be made with (so it can have have its tag).

4) Builtin memory calls look like they could be handled with HWASAN in
basically
   the same way as ASAN, while for MTE they should be fine once the pointers
the
   calls are provided are tagged.
   Is there anything stopping that approach?



Thanks,
MM

[Bug sanitizer/90414] [Feature] Implementing HWASAN (and eventually MTE)

2019-05-10 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90414

--- Comment #2 from Matthew Malcomson  ---
(In reply to Richard Biener from comment #1)
> (In reply to Matthew Malcomson from comment #0)
> > Hello,
> > 
> > I'm looking into how we can implement MTE in the compiler.
> 
> What is MTE?

It's an architecture extension, otherwise known as memory tagging or memtag.
There's a high-level explanation in the link below, 
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a

the instructions are introduced in Armv8.5-a.
https://developer.arm.com/docs/ddi0596/latest/base-instructions-alphabetic-order

I can't find a document describing *just* the MTE extension right now, but the
gist is that memory regions can have associated tags, and accesses to those
memory regions must be done with a pointer that has the corresponding tag in
bits 56-59 inclusive.

This is why the MTE extension will need to modify how the access is performed
instead of just adding in a check before the access is done -- it needs to
ensure that the pointer has the correct tag associated with the base object
that it's trying to access.
It's for the MTE extension that we would need the transformation below (so that
the *_CHECK ifn can ensure the pointer has the relevant tag before access).


> 
> ...
> > 3) Would there be any obvious difficulties with a transformation of the 
> > form:
> >   _4 = big_arrayD.3771[num_3(D)]
> >   
> >   TO
> >   
> >   _6 = &big_arrayD.3771[num_3(D)];
> >   _7 = HWASAN_CHECK(6, _6, 4, 4);
> >   _4 = *_7;
> > 
> >Instead of
> >   _4 = big_arrayD.3771[num_3(D)]
> >   
> >   TO
> >   
> >   _6 = &big_arrayD.3771[num_3(D)];
> >   ASAN_CHECK(6, _6, 4, 4);
> >   _4 = big_arrayD.3771[num_3(D)]
> > 
> >which is what ASAN currently does.
> >This new form would enable using MTE by allowing the check to modify the
> >pointer that the access will be made with (so it can have have its tag).
> 
> The "obvious" difficulties is that HWASAN_CHECK expansion needs to handle
> expanding the actual memory reference.  But that's only a slight
> complication.
> 
> Other complication is of course that it may pessimize optimization more
> than the old approach.

[Bug sanitizer/90414] [Feature] Implementing HWASAN (and eventually MTE)

2019-05-13 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90414

--- Comment #4 from Matthew Malcomson  ---
(In reply to Martin Liška from comment #3)
> (In reply to Matthew Malcomson from comment #0)
> > 2) Can we always find the base object that's being referenced from the 
> > gimple
> >statement where memory is accessed or a pointer is created?
> >If not, when is it problematic?
> >Finding the base object is pretty fundamental to getting the tag for a
> >pointer.
> >It seems like this should be possible based on a reading of the
> > documentation
> >and looking at the TREE_CODEs that the current ASAN `instrument_derefs`
> >function works on.
> > 
> >(ARRAY_REF -> first operand is the array
> > MEM_REF   -> first operand is the base
> > COMPONENT_REF -> first operand is the object
> > INDIRECT_REF  -> first operand is the pointer which should reference
> > object
> > VAR_DECL  -> this is the object
> > BIT_FIELD_REF -> first operand is the object)
> 
> There would be cases where a base is known and for these you could probably
> instrument checks with a constant known tag. For other situation, you'll
> probably
> need to extract the tag from the pointer. Right?
> 

Yes, I'll need to extract the tag from the pointer in cases that don't match
one
of these patterns.

That actually leads into something I forgot to mention when I wrote the comment
above -- I'll need to instrument ADDR_EXPR statements to make sure any pointers
in the program will already have their tag assigned.

To do that I think I need to add another instrumentation site for when the
address of something is taken to handle for any statements taking the address
of
something.

This may be by adding another if statement in `transform_statements` to make
this transformation before the one instrumenting the actual access, or it may
be
in a separate iteration before the one inserting the current checks since
statements like the below would need to be split to instrument the ADDR_EXPR
and
MEM_REF expressions seperately.

  MEM[(int *)&stack_object] = direction_8(D);

> > 
> > Thanks,
> > MM
> 
> In general, I'm interested in implementation of the feature, but I'll
> probably not
> find a time to do it. However, I can help you with that.

Great! I'll appreciate any help and/or advice you can give.

[Bug target/90588] [AArch64] SVE2 flag patch omits aarch64-protos.h

2019-05-23 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90588

Matthew Malcomson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-05-23
 CC||matmal01 at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |matmal01 at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Matthew Malcomson  ---
Thanks -- I guess I missed that because I bootstrapped on aarch64 instead of
building a cross compiler on an arch where unsigned long and uint64_t are
different.

I'll go and fix it & test it better this time ;-)

[Bug target/90588] [AArch64] SVE2 flag patch omits aarch64-protos.h

2019-05-24 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90588

--- Comment #2 from Matthew Malcomson  ---
Author: matmal01
Date: Fri May 24 10:39:38 2019
New Revision: 271599

URL: https://gcc.gnu.org/viewcvs?rev=271599&root=gcc&view=rev
Log:
[aarch64] Change two function declaration types


Commit r271514 missed changing the type of two functions in
aarch64-protos.h.  The function definitions had been updated to use
uint64_t while the function declarations had been missed.
They were missed since I only tested the patch on aarch64 where
`unsigned long` is the same as `uint64_t`.

This patch updates these declarations in aarch64-protos.h.

Tested by building an aarch64 cross-compiler on arm-none-linux-gnu (so
that `unsigned long` and `uint64_t` are different and would give error
messages), and bootstrapping on aarch64-none-linux-gnu.
Also manually tested command line options to see that
-march=armv8-a+typo prints out the expected flags while using the new
feature flags does not complain about missing flags.

gcc/ChangeLog:

2019-05-24  Matthew Malcomson  

PR target/90588
* common/config/aarch64/aarch64-common.c
(aarch64_rewrite_selected_cpu): Change local temporary variable
type from unsigned long to uint64_t.
* config/aarch64/aarch64-protos.h (aarch64_parse_extension,
aarch64_get_extension_string_for_isa_flags): Change declaration to
match new definition by replacing unsigned long with uint64_t.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/common/config/aarch64/aarch64-common.c
trunk/gcc/config/aarch64/aarch64-protos.h

[Bug target/90588] [AArch64] SVE2 flag patch omits aarch64-protos.h

2019-05-24 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90588

Matthew Malcomson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Matthew Malcomson  ---
fixed on trunk

[Bug testsuite/88021] New: aarch64 Busy hang running testcase pr60183.c since revision 265914

2018-11-14 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88021

Bug ID: 88021
   Summary: aarch64 Busy hang running testcase pr60183.c since
revision 265914
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

Since revision 265914, the testcase pr60183.c has been FAILing on
aarch64-none-linux-gnu regression tests with a timeout.

Some initial debugging has shown this is a busy hang in
lambda_matrix_right_hermite.

The inner "while (S[i][j] != 0)" loop never gets out of i == 1, j == 0.




(v) hw-a20-6:~ [11:10:29] % gcc-install/bin/native-gcc
/home/matmal01/gcc-source/gcc/testsuite/gcc.dg/torture/pr60183.c   
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never-O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions-lm  -o ./pr60183.exe -wrapper
gdb,-q,--args
Reading symbols from
/home/matmal01/gcc-install/libexec/gcc/aarch64-unknown-linux-gnu/9.0.0/cc1...done.
(gdb) run
Starting program:
/home/matmal01/gcc-install/libexec/gcc/aarch64-unknown-linux-gnu/9.0.0/cc1
-quiet -imultiarch aarch64-linux-gnu
/home/matmal01/gcc-source/gcc/testsuite/gcc.dg/torture/pr60183.c -quiet
-dumpbase pr60183.c -mlittle-endian -mabi=lp64 -auxbase pr60183 -O3
-fdiagnostics-color=never -fno-diagnostics-show-caret
-fno-diagnostics-show-line-numbers -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions -o /tmp/ccy2hzG0.s
^C
Program received signal SIGINT, Interrupt.
0x013e54f8 in lambda_matrix_right_hermite (n=,
U=, S=, m=, A=) at
../../gcc-source/gcc/tree-data-ref.c:3500
3500  a = S[i-1][j];
(gdb) print i
$1 = 1
(gdb) print j
$2 = 0
(gdb) cont
Continuing.
  Wait for a while^M^C
Program received signal SIGINT, Interrupt.
0x013e54f8 in lambda_matrix_right_hermite (n=,
U=, S=, m=, A=) at
../../gcc-source/gcc/tree-data-ref.c:3500
3500  a = S[i-1][j];
(gdb) print i
$3 = 1
(gdb) print j
$4 = 0
(gdb) next
3502  sigma = (a * b < 0) ? -1: 1;

[Bug testsuite/88021] aarch64 Busy hang running testcase pr60183.c since revision 265914

2018-11-14 Thread matmal01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88021

--- Comment #2 from Matthew Malcomson  ---
Hi Richard,

Applying that on top of r265914 does fix the problem.

Thanks for the quick reply!

[Bug sanitizer/97696] New: ICE since ASAN_MARK does not handle poly_int sized varibales

2020-11-03 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696

Bug ID: 97696
   Summary: ICE since ASAN_MARK does not handle poly_int sized
varibales
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: ice-checking
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

asan_expand_mark_ifn asserts that the length to check is a SHWI.
(i.e. it uses `gcc_assert (tree_fits_shwi_p (len))` ).

It attempts to ensure this by avoiding VLA's in `gimplify_decl_expr`.
poly_int sized decls were added, and they were not treated as VLA's since
commit 22b62991 (SVN r275870).

Since then, poly_int sized variables can have ASAN_MARK called on them, which
means the `len` parameter of ASAN_MARK can be a poly_int causing an ICE in
asan_expand_mark_ifn  (n.b. in order to emit an ASAN_CHECK on a poly_int sized
variable so that the ASAN_MARK is not removed in the sanopt pass we need to
pass the poly_int sized variable to a builtin memory function).


An example  (modified from gcc/testsuite/c-c++-common/asan/pr80308.c):



(v3) work-lin:gcc [Tue 12:25:10] % cat ~/asan-ice.c
#include 

__attribute__((noinline, noclone)) int
foo (char *a)
{
  int i, j = 0;
  asm volatile ("" : "+r" (a) : : "memory");
  for (i = 0; i < 12; i++)
j += a[i];
  return j;
}

int
main ()
{
  int i, j = 0;
  for (i = 0; i < 4; i++)
{
  char a[12];
  __SVInt8_t freq;
  __builtin_bcmp (&freq, a, 10);
  __builtin_memset (a, 0, sizeof (a));
  j += foo (a);
}
  return j;
}


(v3) work-lin:gcc [Tue 12:31:53] %
/installdir/aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
-march=armv8.6-a+sve -fsanitize=address -fsanitize-address-use-after-scope
~/asan-ice.c -S  -o /dev/null
during GIMPLE pass: sanopt
/home/matmal01/asan-ice.c: In function ‘main’:
/home/matmal01/asan-ice.c:14:1: internal compiler error: in
asan_expand_mark_ifn, at asan.c:3235
   14 | main ()
  | ^~~~
0xdde454 asan_expand_mark_ifn(gimple_stmt_iterator*)
/builddir/src/gcc/gcc/asan.c:3235
0xdf6b7a execute
/builddir/src/gcc/gcc/sanopt.c:1341
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

[Bug sanitizer/97696] ICE since ASAN_MARK does not handle poly_int sized varibales

2020-11-03 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696

--- Comment #1 from Matthew Malcomson  ---
I guess this may also happen for the emission of ASAN_MARK in
`gimple_target_expr`, but haven't yet been able to trigger that.

[Bug sanitizer/97941] [HWASAN] use After free not working as per expectation

2020-11-23 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97941

--- Comment #1 from Matthew Malcomson  ---
Hi Akhilesh,

No that's certainly not a known issue -- thanks for reporting it!

I'm having trouble reproducing your issue, do you mind giving a little more
information on your command line and the machine you're running on etc?

One point that seems worth looking into is that the line numbers on your
backtrace don't seem to match up with the line numbers in my source tree.
(e.g. GetAccessInfo is given line number 383 of hwasan_linux.cpp, while in my
source tree that function spans lines 328-376).  Have you made any
modifications to the source?  Or maybe you're running a different libsanitizer
version?
For reference I'm using libsanitizer from LLVM hash
6e7dd1e3e1170080b76b5dcc5716bdd974343233, and the sha256sum of hwasan_linux.cpp
in my source tree is
3986e9f4e519409e7c73a7b97722125300afc4dc1f44a3f966fedf679329fd0a.

Based on what line number `HwasanOnSIGTRAP` calls `GetAccessInfo` in my source
tree, and assuming the offset between our line numbers are the same for the
GetAccessInfo line in your stack trace, it seems that the SEGV happens when
dereferencing the address that caused the signal.

That value should be the address of the `brk` instruction in __hwasan_load1
(having been inlined from `SigTrap` in hwasan_checks.h) which caught the bad
access, but the value of 0x30 which caused this SEGV is clearly not that value.

If the offset between our line numbers is a bit different, then getting that
address might make a bit more sense.  There are various struct member accesses
via pointers that `GetAccessInfo` recieves.
However, those arguments are just taken from the siginfo_t and ucontext_t
pointers that the kernel provides on receipt of a deadly signal.
I haven't found any access in that function which look like they would have an
offset of 0x30 from a NULL pointer, although I guess different kernel versions
would have different offsets.

What kernel are you running on?  Is there any chance the signal handler
HwasanOnDeadlySignal is getting a NULL pointer as one of its arguments?
For reference I happen to be running on a linux kernel based off of commit
585e5b17b9 (but with some modifications that should not affect anything -- just
config changes so I can build the kernel itself with -fsanitize=hwaddress).


Just for reference -- what I see when compiling your testcase:


ubuntu@ubuntu:~/working-directory/temp/pr97941$
../../gcc-hwasan-install/bin/gcc -fsanitize=hwaddress ./test.c -o test
./test.c: In function ‘main’:
./test.c:2:20: warning: implicit declaration of function ‘malloc’
[-Wimplicit-function-declaration]
2 |   char *x = (char*)malloc(10 * sizeof(char*));
  |^~
./test.c:1:1: note: include ‘’ or provide a declaration of ‘malloc’
  +++ |+#include 
1 | int main() {
./test.c:2:20: warning: incompatible implicit declaration of built-in function
‘malloc’ [-Wbuiltin-declaration-mismatch]
2 |   char *x = (char*)malloc(10 * sizeof(char*));
  |^~
./test.c:2:20: note: include ‘’ or provide a declaration of ‘malloc’
./test.c:3:3: warning: implicit declaration of function ‘free’
[-Wimplicit-function-declaration]
3 |   free(x);
  |   ^~~~
./test.c:3:3: note: include ‘’ or provide a declaration of ‘free’
./test.c:3:3: warning: incompatible implicit declaration of built-in function
‘free’ [-Wbuiltin-declaration-mismatch]
./test.c:3:3: note: include ‘’ or provide a declaration of ‘free’
ubuntu@ubuntu:~/working-directory/temp/pr97941$
LD_LIBRARY_PATH=~/working-directory/gcc-hwasan-install/lib64 ./test
==8600==ERROR: HWAddressSanitizer: tag-mismatch on address 0xefe00005 at pc
0xa828be70
READ of size 1 at 0xefe00005 tags: e2/d5 (ptr/mem) in thread T0
#0 0xa828be6c in SigTrap<0>
../../../../gcc-source/libsanitizer/hwasan/hwasan_checks.h:27
#1 0xa828be6c in CheckAddress<(__hwasan::ErrorAction)0,
(__hwasan::AccessType)0, 0>
../../../../gcc-source/libsanitizer/hwasan/hwasan_checks.h:88
#2 0xa828be6c in __hwasan_load1
../../../../gcc-source/libsanitizer/hwasan/hwasan.cpp:375
#3 0x400944 in main
(/home/ubuntu/working-directory/temp/pr97941/test+0x400944)
#4 0xa81598dc in __libc_start_main
(/lib/aarch64-linux-gnu/libc.so.6+0x1f8dc)

[0xefe0,0xefe00060) is a small unallocated heap chunk; size: 96
offset: 5
0xefe00005 is located 5 bytes inside of 80-byte region
[0xefe0,0xefe00050)
freed by thread T0 here:
#0 0xa828d64c in __sanitizer_free
../../../../gcc-source/libsanitizer/hwasan/hwasan_interceptors.cpp:108
#1 0x400934 in main
(/home/ubuntu/working-directory/temp/pr97941/test+0x400934)
#2 0xa81598dc in __libc_start_main
(/lib/aarch64-linux-gnu/libc.so.6+0x1f8dc)
#3 0x400814  (/home/ubuntu/working-directory/temp/pr97941/test+0x400814)

previously allocated here:
#0 0xa828db30 in __sanitizer_malloc
../../../../gcc-source/libsanitizer/hwasan/hwasan

[Bug sanitizer/97941] [HWASAN] use After free not working as per expectation

2020-12-11 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97941

Matthew Malcomson  changed:

   What|Removed |Added

 Resolution|--- |WORKSFORME
 Status|NEW |RESOLVED

--- Comment #2 from Matthew Malcomson  ---
Resolving since this works for me and haven't any extra information to believe
that's a coincidence.

[Bug sanitizer/100665] [hwsanitizer] nested funtion pointer is tagged but never checked.

2021-05-27 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100665

--- Comment #1 from Matthew Malcomson  ---
Hi there.
I believe this is how it should work (if I'm understanding & remembering
correctly).

When creating a nested function, we make a single object on the stack that
includes all variables used in the nested function plus a trampoline.
This is called the "nonlocal frame struct" as described in gcc/tree-nested.c.

That single object gets a single tag like all other objects in tagged memory
(trying to separate the closed-over objects from the trampoline and argument
pointers would be pretty awkward when the object is just one struct as far as
the expand code is concerned).

That tag is checked when accessing the closed over variables (i.e. big_array in
the example), so we definitely want to tag the object.

Given that, the question of whether the function pointer (i.e. the pointer to
the trampoline inside that object) should be tagged when passed elsewhere then
has a few benefits:
1) In this case there is no check performed, but there may be checks performed
   if e.g. this function pointer gets cast to an integer pointer and some code
   elsewhere attempts to read that integer.
2) This is just more self-consistent.  Every pointer to a tagged object is
   tagged with the same value.
3) There are hardware extensions to automatically check memory accesses.  If
the
   function pointer is not tagged in this case then (at least for AArch64) the
   PC-relative ldr's in the trampoline stored in that structure will end up
   without a tag and I believe that would trigger a fault.

Point (1) is the main one.  In general when passing a pointer into another
function we don't know if it's going to be accessed or not, so we always need
to
pass tagged pointers.

[Bug sanitizer/100665] [hwsanitizer] nested funtion pointer is tagged but never checked.

2021-06-01 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100665

Matthew Malcomson  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Matthew Malcomson  ---
(In reply to Hongtao.liu from comment #2)
> (In reply to Matthew Malcomson from comment #1)
> > Given that, the question of whether the function pointer (i.e. the pointer 
> > to
> > the trampoline inside that object) should be tagged when passed elsewhere
> > then
> > has a few benefits:
> > 1) In this case there is no check performed, but there may be checks
> > performed
> >if e.g. this function pointer gets cast to an integer pointer and some
> > code
> >elsewhere attempts to read that integer.
> I'm not sure there're cases where code pointers are casted to integer
> pointers. But consider the above comment, I agree that tag is needed for the
> object.

Fair ;-).
My reasoning was along the lines of "it's an escaped pointer, and I don't know
what other code will do with it" than actually expecting that to happen.

[Bug sanitizer/101744] [12 regression] hwasan new failures since r12-2424

2021-08-05 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101744

--- Comment #7 from Matthew Malcomson  ---
Hi there,

I didn't check all the new tests that Christophe mentioned, but all those I
checked had `dg-require-effective-target hwaddress_exec` in them.

The test that determines that effective target should only pass with a modern
enough kernel (one that supports passing tagged pointers to its syscalls).
It is still failing on my native AArch64 machine.

For anyone that is seeing them -- what kernel version are you running?
If your kernel has not changed could you manually run the check and see if it
passes and why?

I've unfortunately lost my testing environment.  I'm working on getting it back
but will be a while before I can see if I can reproduce the failures on a
machine with the required kernel.

[Bug target/114905] New: aarch64 locally_streaming function ICE in dwarf2cfi due to mismatched CFA instructions in prologue/epilogue

2024-05-01 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114905

Bug ID: 114905
   Summary: aarch64 locally_streaming function ICE in dwarf2cfi
due to mismatched CFA instructions in
prologue/epilogue
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

Bug observed (testcase + ICE) is below.  I believe this happens because we use
`aarch64_add_sp` to adjust the stack pointer when `maybe_ne (sve_callee_saves,
0)` in `aarch64_expand_epilogue`.  This marks the adjustment as adjusting the
CFA.  However in `aarch64_expand_prologue` we might have set the CFA to the
frame pointer (instead of the stack pointer) if `frame_pointer_needed &&
frame_size.is_constant()`.
Hence when both these conditions are held we have a CFA adjust note that
affects a different register to the current CFA register.



vshcmd: > cat streaming-prologues.c 
[[arm::locally_streaming,arm::streaming_compatible]] void   
no_gprs_saved_very_streaming (__SVBool_t x) 
{   
  asm (""); 
}   

gnu-work [13:47:36] $   
vshcmd: > ${install_dir}/aarch64-none-linux-gnu-gcc \   
vshcmd: > streaming-prologues.c \   
vshcmd: > -fdiagnostics-plain-output -O -fomit-frame-pointer
-fstack-clash-protection\   
vshcmd: > -march=armv9-a+sme -mtune=generic -moverride=tune=none \  
vshcmd: > -fdump-rtl-all-all \  
vshcmd: > -S -o locally_streaming_1_scp.s   
gnu-work [13:47:38] $ > > > > > during RTL pass: dwarf2 
dump file: locally_streaming_1_scp.c.356r.dwarf2
streaming-prologues.c: In function ‘no_gprs_saved_very_streaming’:  
streaming-prologues.c:5:1: internal compiler error: in
dwarf2out_frame_debug_adjust_cfa, at dwarf2cfi.cc:1339  
0xa540bd dwarf2out_frame_debug_adjust_cfa   
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:1339
0xa540bd dwarf2out_frame_debug  
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:2277
0xa540bd scan_insn_after
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:2726
0xa557e0 scan_trace 
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:2893
0xa562cf create_cfi_notes   
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:2938
0xa562cf execute_dwarf2_frame   
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:3309
0xa562cf execute
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/dwarf2cfi.cc:3797
Please submit a full bug report, with preprocessed source (by using
-freport-bug).  
Please include the complete backtrace with any bug report.  
See <https://gcc.gnu.org/bugs/> for instructions.   
gnu-work [13:47:39] $

[Bug target/114906] New: aarch64 locally_streaming ICE in aarch64_expand_prologue

2024-05-01 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114906

Bug ID: 114906
   Summary: aarch64 locally_streaming ICE in
aarch64_expand_prologue
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

Bug (testcase + ICE) below.
I believe this is because:
1) We save `r20` below `VG_REGNUM` in `aarch64_layout_frame` (and above the
point that `bytes_below_hard_fp` describes).
2) Despite that save of `r20` causing us to also set
`frame.wb_push_candidate1`, because we have a poly-int sized frame (due to the
-O0 in this case, but I don't think has to be -O0) we still end up in the
"General case" in `aarch64_layout_frame`.
3) Hence we end up with `initial_adjust` non zero, `sve_callee_adjust`
non-zero, and the `VG_REGNUM` not pointing to the same place as
`bytes_below_hard_fp` because there is that r20 saved in between.

My initial guess would be that we should simply change the assertion that
failed to check that VG_REGNUM is *greater than or equal to* `bytes_below_sp`.
To be honest I'm not entirely sure what this assertion is there for so would
not like to actually make that suggestion.  The commit message of ad4df8cd080c
seems to say the assertion is there to ensure that the allocation of VG_REGNUM
is not folded into the initial_allocation, but I don't 100% follow what's going
on.


vshcmd: > cat ../streaming-prologues.c  
[[arm::locally_streaming]] void 
with_callee_saved_regs (__SVBool_t x)   
{   
  asm ("" : : : "r20"); 
}   
testing [14:47:20] $
vshcmd: > ${install_dir}/aarch64-none-linux-gnu-gcc \   
vshcmd: >   ../streaming-prologues.c \  
vshcmd: >   -fdiagnostics-plain-output -O0 -fstack-clash-protection \   
vshcmd: >   -march=armv9-a+sme -mtune=generic -moverride=tune=none \
vshcmd: >   -S -o prologues-with-streaming-1.s  
> > > > during RTL pass: late_pro_and_epilogue  
> > > > 
> > > > 
> > > > 
> > > >
../streaming-prologues.c: In function ‘with_callee_saved_regs’: 
../streaming-prologues.c:5:1: internal compiler error: in
aarch64_expand_prologue, at config/aarch64/aarch64.cc:9705  
0x142e3af aarch64_expand_prologue() 
   
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/config/aarch64/aarch64.cc:9701   
0x1a7eee7 gen_prologue()
   
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/config/aarch64/aarch64.md:1008   
0x140219f target_gen_prologue   
   
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/config/aarch64/aarch64.md:8121   
0xb8d242 make_prologue_seq  
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/function.cc:5818 
0xb8d3aa thread_prologue_and_epilogue_insns()   
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/function.cc:6053 
0xb8dc4e rest_of_handle_thread_prologue_and_epilogue
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/function.cc:6567 
0xb8dcbf execute
/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/function.cc:6692 
Please submit a full bug report, with preprocessed source (by using
-freport-bug).  
Please include the complete backtrace with any bug report.  
See <https://gcc.gnu.org/bugs/> for instructions.   
testing [14:47:22] $

[Bug target/115043] New: aarch64 locally_streaming function appears to have CFA note on wrong instruction in prologue

2024-05-11 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115043

Bug ID: 115043
   Summary: aarch64 locally_streaming function appears to have CFA
note on wrong instruction in prologue
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

Apologies if I'm misunderstanding something here -- but I noticed this RTL
sequence and I believe the `REG_CFA_DEF_CFA` note is on the wrong insn.

I have not observed wrong behaviour coming from this, but figured still worth a
bug report in case it is indeed wrong.

There seem to be a pair of instructions, one doing some special SME operation
and another storing the stack pointer into x11.  The instruction doing the
special SME thing has a note saying that it sets the CFA to x11.  I would have
expected the note to be on the insn after that records SP into x11.



vshcmd: > cat basic-streaming.c 
[[arm::locally_streaming]] void 
no_gprs_saved (__SVBool_t x)
{   
  asm (""); 
}   
gnu-work [13:19:27] $   
vshcmd: > ${install_dir}/aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
\   
vshcmd: >   basic-streaming.c \ 
vshcmd: >   -fdiagnostics-plain-output -march=armv8.2-a+sme+sve
-fno-stack-protector \  
vshcmd: >   -fdump-rtl-all-all \
vshcmd: >   -O -fshrink-wrap -fstack-clash-protection -g -S -o /dev/null
> > > > gnu-work [13:19:36] $   
> > > > 
> > > > 
> > > > 
> > > >
vshcmd: > # I'm surprised that the REG_CFA_DEF_CFA note is on the instruction   
vshcmd: > # just before we move the stack pointer into x11. 
vshcmd: > grep -C 4 REG_CFA_DEF_CFA.*x11
basic-streaming.c.*.late_pro_and_epilogue   
(insn/f 15 14 16 2 (set (reg:DI 13 x13) 
(const:DI (unspec:DI [  
(const_int 288 [0x120]) 
] UNSPEC_SME_VQ))) "basic-streaming.c":3:1 -1   
 (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11) 
(nil))) 
(insn 16 15 17 2 (set (reg:DI 11 x11)   
(reg/f:DI 31 sp)) "basic-streaming.c":3:1 -1
 (nil)) 
gnu-work [13:21:21] $

[Bug tree-optimization/116776] Complex if conditions not hoisted from loop

2024-09-19 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116776

--- Comment #1 from Matthew Malcomson  ---
N.b. from experimentation it seems that gcc 11 didn't move any part of the
condition outside of the loop, and since gcc 12 part of the condition has been
moved outside the loop.

I don't think this hoisting has ever happened.

[Bug tree-optimization/116776] New: Complex if conditions not hoisted from loop

2024-09-19 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116776

Bug ID: 116776
   Summary: Complex if conditions not hoisted from loop
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

The condition in the following loop does not get hoisted at `-O3` on GCC trunk.
Simplifying the condition (by either removing some of the `shouldthischange`
checks, or simplifying the `shouldthischange` function) allows hoisting.

N.b. Some of the condition gets hoisted, just not all.
N.b. having the condition inside the loop blocks vectorisation when compiled
with `-march=armv8.6-a+sve+sve2`.


```
struct teststruct {
unsigned long dims[2];
double *data;
bool ** allocated;
};

bool shouldthischange(struct teststruct *v, int b, int l) {
return 
// true ||
v->dims[1] > l
&& v->allocated[b][l]
;
}

void DoLoop(struct teststruct *x, struct teststruct *y, struct teststruct *z,
unsigned long len)
{
for (unsigned long i = 0; i < len; i++)
if (shouldthischange(x, 0, 0) && shouldthischange(y, 0, 0) &&
shouldthischange(z, 0, 0)) {
z->data[i] = x->data[i] + y->data[i];
}
}
```

[Bug target/117991] [15 regression] RISC-V: g++/template/builtin-speculation-overloads[14].C assertion error since addition in r15-6042-g9ed094a817e

2025-02-12 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117991

Matthew Malcomson  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |matmal01 at gcc dot 
gnu.org

--- Comment #3 from Matthew Malcomson  ---
Created attachment 60477
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60477&action=edit
Proposed patch -- update testsuite

[Bug target/117991] [15 regression] RISC-V: g++/template/builtin-speculation-overloads[14].C assertion error since addition in r15-6042-g9ed094a817e

2025-02-08 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117991

--- Comment #2 from Matthew Malcomson  ---
(In reply to Jeffrey A. Law from comment #1)
> Still occurring on the trunk.  In my case I saw them in a native build &
> test scenario.

Ah -- apologies I missed when this was raised -- will look into this next week.

[Bug middle-end/119108] New: [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=68326d5d1a593d) causes regression in Snappy workload for

2025-03-04 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108

Bug ID: 119108
   Summary: [15 Regression] AArch64 Commit 'vect: Force alignment
peeling ...'
(https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=68326d5
d1a593d) causes regression  in Snappy workload for
-mcpu=neoverse-v2.
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

Created attachment 60650
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60650&action=edit
Script to reproduce the observed slowdown.

Have observed a slowdown after the referenced commit.

Attaching script for reproduction.  Results when `master` is commit 78380fd7f
inlined below (numbers are percentage change in time from "TOT with problem
commit reverted" to "TOT" -- positive numbers demonstrating the peeling has
caused a slowdown).

I ran the script with:
```
vshcmd: > cd $HOME
vshcmd: > rm -rf $HOME/testing-reproduction-script
vshcmd: > newdir $HOME/testing-reproduction-script
vshcmd: > git clone $HOME/gcc-source gcc_src
vshcmd: > parentdir=$HOME/testing-reproduction-script $HOME/Snappy/reproduce.sh
```


```
BM_UFlat/3/1 5.3
BM_UFlat/3/2 7.14286
BM_UFlat/4/2 2.59319
BM_UFlat/5/1 2.86533
BM_UFlat/5/2 5.10708
BM_UValidate/3/2 2.08333
BM_UValidate/5/2 2.41758
BM_UIOVecSource/1/2 4.21903
BM_UIOVecSource/5/2 5.44218
BM_UIOVecSource/6/2 4.21348
BM_UIOVecSource/7/1 -3.6036
BM_UIOVecSource/7/2 6.84039
BM_UIOVecSource/8/2 3.86905
BM_UIOVecSource/9/2 2.90987
BM_UIOVecSource/11/2 5
BM_UIOVecSink/0 21.3523
BM_UFlatSink/3/1 9.58904
BM_UFlatSink/3/2 10.1449
BM_UFlatSink/5/1 3.17919
BM_ZFlat/1/2 4.54959
BM_ZFlat/5/1 2.73973
BM_ZFlat/5/2 6.31579
BM_ZFlat/6/1 -2.35294
BM_ZFlat/6/2 3.9548
BM_ZFlat/7/1 -3.15315
BM_ZFlat/7/2 5.51948
BM_ZFlat/8/2 3.99202
BM_ZFlat/9/2 3.25145
BM_ZFlat/11/2 5.83942
BM_ZFlatAll/2 3.57955
```

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

2025-03-11 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108

--- Comment #9 from Matthew Malcomson  ---
(In reply to Tamar Christina from comment #8)
> Ok, so having looked at this I'm not sure the compiler is at fault here.
> 
> Similar to the SVN case the snappy code is misaligning the loads
> intentionally and  loading 64-bits at a time from the 8-bit pointer:

... 

> So I think this is a case where the compiler can't do anything. (I also
> think that the C code uses UB similar to SVN, they misalign the byte array
> to 4-bytes but load 8-bytes at a time. They get lucky that the vector code
> is never entered).

...

> 
> The could would be beneficial if they:
> 
> 1. added restrict to the functions, as eg in `FindMatchLengthPlain` values
> manually vectorized anyway so aliasing must not be a problem
> 2. they have a simple scalar loop variant that's left up to the vectorizer
> to vectorize.  This would actually give them faster code and allow e.g. SVE
> codegen.


Thanks for looking into it Tamar!

Few questions (some just because I want to make sure I understand -- some more
on topic ;-)

Just to understand:
- What SVN case are you referencing?
- How is this UB?  The UNALIGNED_LOAD64 seems to use `memcpy`, and they provide
a relevant limit on the reads of 8 bytes at a time.

More relevant to the issue:
- I tried by adding `__restrict__` to `s1` and `s2` in `FindMatchLengthPlain`
and replacing the function with a plain loop.  I saw a significant slowdown. 
Is  your point that this would allow the compiler to do something about the
code even though it may not be better right now?  Or did you mean inline the
loop or something.  (N.b. didn't double-check the codegen of that function --
just ran the benchmark naively again -- so if there was any obvious adjustment
in flags or the like I should make I didn't make it ;-)

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

2025-03-05 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108

--- Comment #3 from Matthew Malcomson  ---
I only looked into VecSource/5/2, and unfortunately I looked into it on an
internal setup that compiles slightly differently.

In that slightly different compilation I noticed that `FindMatchLengthPlain`
was affected by the patch, and perf pointed to extra branch mispredictions on
the changed code.  This was particularly noticeable in that different
compilation since `FindMatchLengthPlain` was not inlined.

Am currently looking to reproduce that finding with upstream sources so it's
more useful than hearsay.

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

2025-03-05 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108

--- Comment #7 from Matthew Malcomson  ---
FWIW I have managed to figure out what the difference between my internal build
and the upstream one was -- my reproduction script has the line
`-DCMAKE_BUILD_TYPE=Release` in it and the local build that I did some
performance analysis on does not.

>From looking at the build logs it seems the only real difference due to this
difference in flags is that `-DNDEBUG` is passed to the compiler.  So things
still got optimised -- though obviously this is not the best for a benchmark
run.


However it does seem somewhat useful that without the abvoe `cmake` argument I
can see the (now not inlined) `FindMatchLengthPlain` function change and start
to take up a much greater proportion in the perf statistics with 68326d5d.

[Bug libgomp/119588] New: Possible improvement in locking strategies for libgomp

2025-04-02 Thread matmal01 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119588

Bug ID: 119588
   Summary: Possible improvement in locking strategies for libgomp
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Created attachment 60960
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60960&action=edit
Demonstrating locking differences

Hello,

Summary is:  I'm proposing that we implement the "hypercube-embedded tree"
locking strategy that LLVM libomp uses by default in libgomp.
Would appreciate feedback on whether this would be welcome and/or feasible.

Below contains the observations I've made to come to that suggestion.

Apologies for taking my time between asking on IRC and raising the PR.

--
We've seen on some internal workloads (NVPL BLAS running GEMM routine on a
small
matrix) that the overhead of a `#pragma omp parallel` statement when running
with a high number of cores (72 or 144) is much higher with the libgomp
implementation than with LLVM's libomp.

In a program which has both some work that can be handled with high parallelism
(so OMP is running with many threads) and a large number of small pieces of
work
that need to be performed with low overhead, this has been seen to cause a
significant overhead when accumulated.

I'm attaching a benchmark for just the creation of a `#pragma omp parallel`
region (around an `asm` statement so the region doesn't get optimised away). 
We
can see that with many threads libgomp scales worse than llvm's libomp.

When compiled with the below:
#+begin_example
  vshcmd: > ${gcc_install_path}/bin/g++ -O3 -fopenmp OpenMP-reproducer.cpp -o
bench.gcc.x
  vshcmd: > ${clang_install_path}/bin/clang++ -O3 -fopenmp
OpenMP-reproducer.cpp -o bench.clang.x
  lego-c2-qs-56:openmp-parallel-gomp-slow [04:02:41] $
lego-c2-qs-56:openmp-parallel-gomp-slow [04:02:44] $ 
#+end_example


Numbers I've observed are such showing that at 144 threads the cost of just the
barrier is much higher with GNU than with LLVM (N.b. this is on an AArch64
machine
with 144 cores):

#+begin_example
  vshcmd: > bench_gcc () {
  vshcmd: > LD_LIBRARY_PATH=${gcc_install_path}/lib64 ./bench.gcc.x
  vshcmd: > }
  vshcmd: > bench_clang () {
  vshcmd: > LD_LIBRARY_PATH=${clang_install_path}/lib ./bench.clang.x
  vshcmd: > }
  vshcmd: > three_times () {
  vshcmd: > for i in 1 2 3; do
  vshcmd: > $1
  vshcmd: > done
  vshcmd: > }
  vshcmd: > high_thread_counts () {
  vshcmd: > for num_threads in 72 144; do
  vshcmd: > export OMP_NUM_THREADS=$num_threads
  vshcmd: > echo " NUM = $num_threads"
  vshcmd: > OMP_PROC_BIND=true OMP_WAIT_POLICY=active three_times $1
  vshcmd: > done
  vshcmd: > }
  > > lego-c2-qs-56:openmp-parallel-gomp-slow [04:37:02] $ > >
lego-c2-qs-56:openmp-parallel-gomp-slow [04:37:02] $ > > > >
lego-c2-qs-56:openmp-parallel-gomp-slow [04:37:02] $ > > > > > >
lego-c2-qs-56:openmp-parallel-gomp-slow [04:37:02] $ 
  vshcmd: > # Without any specification of locking mechanisms, clang approx
thrice performance of GCC.
  vshcmd: > high_thread_counts bench_gcc
   NUM = 72
  creation maxthr:72 nthr:72 min_time:10.694 us max_time:11.181 us
avg_time:10.839 us stddev:23.127 us
  creation maxthr:72 nthr:72 min_time:10.214 us max_time:10.567 us
avg_time:10.335 us stddev:11.986 us
  creation maxthr:72 nthr:72 min_time:10.147 us max_time:10.615 us
avg_time:10.357 us stddev:19.212 us
   NUM = 144
  creation maxthr:144 nthr:144 min_time:31.421 us max_time:32.003 us
avg_time:31.735 us stddev:31.332 us
  creation maxthr:144 nthr:144 min_time:30.592 us max_time:31.953 us
avg_time:31.352 us stddev:132.466 us
  creation maxthr:144 nthr:144 min_time:31.089 us max_time:31.953 us
avg_time:31.640 us stddev:60.002 us
  lego-c2-qs-56:openmp-parallel-gomp-slow [04:37:05] $ 
  vshcmd: > high_thread_counts bench_clang
   NUM = 72
  creation maxthr:72 nthr:72 min_time:8.574 us max_time:9.006 us avg_time:8.877
us stddev:17.170 us
  creation maxthr:72 nthr:72 min_time:8.601 us max_time:8.749 us avg_time:8.686
us stddev:3.635 us
  creation maxthr:72 nthr:72 min_time:8.206 us max_time:8.471 us avg_time:8.421
us stddev:6.070 us
   NUM = 144
  creation maxthr:144 nthr:144 min_time:9.958 us max_time:11.293 us
avg_time:10.388 us stddev:133.078 us
  creation maxthr:144 nthr:144 min_time:9.685 us max_time:10.618 us
avg_time:10.232 us stddev:83.710 us
  creation maxthr:144 nthr:144 min_time:9.132 us max_time:9.783 us
avg_time:9.434 us stddev:42.769 us
  lego-c2-qs-56:openmp-parall

67 matches

Mail list logo