[Bug c/99316] ICE: in final_scan_insn_1, at final.c:3073 (error: could not split insn)

2021-03-01 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99316

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||ktkachov at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Dup. Fixed for GCC 10.3

*** This bug has been marked as a duplicate of bug 96357 ***

[Bug target/96357] [10/11 Regression] could not split insn UNSPEC_COND_FSUB with AArch64 SVE

2021-03-01 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96357

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||andrew.goodbody at linaro dot 
org

--- Comment #5 from ktkachov at gcc dot gnu.org ---
*** Bug 99316 has been marked as a duplicate of this bug. ***

[Bug middle-end/93235] [AArch64] ICE with __fp16 in a struct

2021-03-02 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93235

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #6 from ktkachov at gcc dot gnu.org ---
Created attachment 50290
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50290&action=edit
reduced file from PR 99346

Attaching reduced reproducer from PR 99346 that ICEs with GCC 8, 9, 10, 11 at
-O3

[Bug target/99195] Optimise away vec_concat of 64-bit AdvancedSIMD operations with zeroes in aarch64

2021-03-04 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99195

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Using a define_subst like:
(define_subst "add_vec_concat_subst"
  [(set (match_operand:VDMOV 0 "" "")
(match_operand:VDMOV 1 "" ""))]
  "!BYTES_BIG_ENDIAN"
  [(set (match_operand: 0 "register_operand" "=w")
(vec_concat:
 (match_dup 1)
 (match_operand:VDMOV 2 "aarch64_simd_or_scalar_imm_zero")))]
)

(define_subst_attr "add_vec_concat" "add_vec_concat_subst" "" "_vec_concat")

and adding it to patterns in aarch64-simd.md through  seems to
work. It doesn't handle the big-endian case, but maybe we can handle that
separately (or with a second define_subst?)

Does this approach make sense?

[Bug target/99312] __ARM_ARCH is not implemented correctly when compiled with -march=armv8.1-a

2021-03-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99312

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
We intend to deprecate that macro going forward as it's not a useful way for
testing architecture features in aarch64. It made sense in the pre-Armv7-a
days, but now the recommended way to test for features is the __ARM_FEATURE*
macros.

The scheme is also not very well-suited for things like the recent AArch64
Armv8-R.

Is there a particular use case that you have in mind?

[Bug target/99437] [11 Regression] Error: immediate value out of range 1 to 8 at operand 3 -- `shrn v1.8b,v1.8h,15'

2021-03-07 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99437

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P1
   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org
 CC||ktkachov at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from ktkachov at gcc dot gnu.org ---
Confirmed:
#include 

uint8x16_t
foo (uint16x8_t a, uint8x8_t b)
{
  return vcombine_u8 (vmovn_u16 (vshrq_n_u16 (a, 9)), b);
}

Testing a patch.

[Bug middle-end/99520] New: Failure to detect bswap pattern

2021-03-10 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99520

Bug ID: 99520
   Summary: Failure to detect bswap pattern
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

#include 

 uint32_t endian_fix32( uint32_t x ){
return (x<<24) + ((x<<8)&0xff) + ((x>>8)&0xff00) + (x>>24);

}

For aarch64 clang manages to optimise it to:
endian_fix32(unsigned int):  // @endian_fix32(unsigned int)
rev w0, w0
ret

GCC doesn't:
endian_fix32(unsigned int):
lsl w1, w0, 8
lsr w2, w0, 8
and w2, w2, 65280
lsr w3, w0, 24
and w1, w1, 16711680
add w0, w3, w0, lsl 24
orr w1, w1, w2
add w0, w1, w0
ret

Is there something missing in the bswap pass?

[Bug middle-end/99520] Failure to detect bswap pattern

2021-03-10 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99520

--- Comment #4 from ktkachov at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #3)
> Consider e.g.
> unsigned foo (unsigned x)
> {
>   return (x<<24) + ((x<<8)&0xff) + ((x>>8)&0xff00) + (x>>24) +
> (((x&0xff00)<<16)>>8);
> }
> as example that should not be optimized into __builtin_bswap32 (but should
> be with | instead of +).

interesting. That said, clang does do a better job than GCC on that too:
foo(unsigned int):// @foo(unsigned int)
lsl w9, w0, #8
rev w8, w0
and w9, w9, #0xff
add w0, w9, w8
ret

[Bug middle-end/99520] Failure to detect bswap pattern

2021-03-10 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99520

--- Comment #6 from ktkachov at gcc dot gnu.org ---
FWIW the version in the initial comment is what appears in 525.x264_r in
SPEC2017

[Bug target/99540] [10/11 Regression] ICE: Segmentation fault in aarch64_add_offset

2021-03-11 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99540

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |10.3
  Known to work||9.3.1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-03-11
 Ever confirmed|0   |1
 CC||ktkachov at gcc dot gnu.org
  Known to fail||10.2.1, 11.0
Summary|ICE: Segmentation fault in  |[10/11 Regression] ICE:
   |aarch64_add_offset  |Segmentation fault in
   ||aarch64_add_offset

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. ICEs with -march=armv8.2-a+sve -O3 -ffloat-store -ftrapv

[Bug rtl-optimization/99560] aarch64: ICE (segfault) in LRA with SVE intrinsics

2021-03-12 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99560

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

  Component|target  |rtl-optimization
   Last reconfirmed||2021-03-12
 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed. Not sure if it's a target bug. Looking through gdb the recog failure
is on insn:
(insn 207 206 208 4 (set (reg:DI 264)
(nil)) "ice.c":5:17 -1
 (expr_list:REG_EQUAL (const_poly_int:DI [76, 76])
(nil)))

That (nil) looks very bogus

[Bug target/99593] [11 Regression] arm Neon ICE when compiling firefox (skia) since r11-6708

2021-03-15 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99593

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

Summary|[11 Regression] arm MVE ICE |[11 Regression] arm Neon
   |when compiling firefox  |ICE when compiling firefox
   |(skia) since r11-6708   |(skia) since r11-6708
   Keywords||ice-on-valid-code

--- Comment #2 from ktkachov at gcc dot gnu.org ---
(Note it's a Neon ICE, not MVE). Yeah, that fix looks ok.
Chritophe, could you help here to write a testcase using arm_neon.h intrinsics
(rather than the builtins they decompose to)?

[Bug ipa/96825] [11 Regression] Commit r11-2645 degrades CPU2017 548.exchange2_r by 35%

2021-03-17 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96825

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 for more analysis

[Bug target/99593] [11 Regression] arm Neon ICE when compiling firefox (skia) since r11-6708

2021-03-17 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99593

--- Comment #8 from ktkachov at gcc dot gnu.org ---
(In reply to Christophe Lyon from comment #7)
> Created attachment 50412 [details]
> proposed testcase
> 
> Here is a proposal for a testcase derived from the initial description:
> - added relevant dg-* directives
> - replaced builtin calls with intrinsics
> 
> Jakub, Kyrill, is that OK with you?

Thanks, that looks ok except:
+typedef __simd128_int32_t g;
+typedef __simd128_float32_t h;
+typedef __simd128_uint32_t i;

Can we replace them with the right ACLE types as well?

[Bug target/99593] [11 Regression] arm Neon ICE when compiling firefox (skia) since r11-6708

2021-03-17 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99593

--- Comment #10 from ktkachov at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #9)
> Comment on attachment 50412 [details]
> proposed testcase
> 
> Any reason not to replace
> __simd128_int32_t with int32x4_t ,
> __simd128_float32_t with float32x4_t and
> __simd128_uint32_t with uint32x2_t ?
> Drop the commented out __builtin_* names etc.?  Drop the (__builtin_neon_hi
> *) cast?
> Otherwise LGTM if it still FAILs without the above patch and PASSes with it,
> but the final call is Kyrill's (or other ARM maintainers').

Indeed. Let's have a consolidated patch on gcc-patches for review.

[Bug target/96582] aarch64:ICE during GIMPLE pass: veclower

2021-03-25 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96582

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #3 from ktkachov at gcc dot gnu.org ---
I don't see it ICEing with recent trunk anymore. Can this be closed?

[Bug target/99766] New: ICE: unable to generate reloads with SVE code

2021-03-25 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99766

Bug ID: 99766
   Summary: ICE: unable to generate reloads with SVE code
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

#include 
#include 
typedef float b __attribute__((__mode__(HF)));
typedef struct {
  b c;
  b d;
} e;
int f;
e *g;
void l(int s) {
  std::vector m(s);
  for (; f;) {
auto a = &g[0];
for (int i;; ++i) {
  int n = i;
  auto o = &a[n];
  auto p = &m[i];
  float16_t q;
  for (int k; k < s;)
for (int j; j < i; ++j) {
  auto r = o;
  for (k = 0; k < s; ++k)
p[k].c = r[k].c * r[k].d;
}
  for (int k; k < s; ++k) {
p[k].c *= q;
p[k].d *= q;
  }
}
  }
}

With -std=c++17 -O3  -march=armv8.5-a+sve2 ICEs with:
ice.c: In function 'void l(int)':
ice.c:31:1: error: unable to generate reloads for:
   31 | }
  | ^
(insn 312 308 326 39 (set (subreg:VNx4SI (reg:VNx8HF 105 [ _17 ]) 0)
(unspec:VNx4SI [
(subreg:VNx4BI (reg:VNx16BI 365) 0)
(vec_duplicate:VNx4SI (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int 272 [0x110])) [12  S4 A128]))
(const_vector:VNx4SI [
(const_int 0 [0])
])
] UNSPEC_SEL)) 4825 {sve_ld1rvnx4si}
 (nil))
during RTL pass: reload

This seems to be very recent. A trunk from 23 March doesn't ICE

[Bug target/99766] [11 Regression] ICE: unable to generate reloads with SVE code

2021-03-25 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99766

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |11.0
   Priority|P3  |P1
  Known to work||10.2.1
Summary|ICE: unable to generate |[11 Regression] ICE: unable
   |reloads with SVE code   |to generate reloads with
   ||SVE code
  Known to fail||11.0

[Bug target/99807] ICE in vect_slp_analyze_node_operations_1, at tree-vect-slp.c:3727

2021-03-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99807

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org,
   ||tnfchris at gcc dot gnu.org
   Priority|P3  |P1
  Known to fail||11.0
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Target Milestone|--- |11.0
   Last reconfirmed||2021-03-29
  Known to work||10.2.1

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. Tamar, can you have a look please?

[Bug target/99813] [11 Regression] SVE: Invalid assembly at -O3 (multiplier out of range in incb instruction)

2021-03-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99813

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

  Known to work||10.2.1
Summary|SVE: Invalid assembly at|[11 Regression] SVE:
   |-O3 (multiplier out of  |Invalid assembly at -O3
   |range in incb instruction)  |(multiplier out of range in
   ||incb instruction)
   Target Milestone|--- |11.0
 CC||ktkachov at gcc dot gnu.org
   Priority|P3  |P1
   Last reconfirmed||2021-03-29
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug target/99766] [11 Regression] ICE: unable to generate reloads with SVE code since r11-7807-gbe70bb5e

2021-03-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99766

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from ktkachov at gcc dot gnu.org ---
Fixed, thanks.

[Bug target/99808] [8/9/10/11 Regression] ICE in as_a, at machmode.h:365

2021-03-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99808

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 CC||ktkachov at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #6 from ktkachov at gcc dot gnu.org ---
This is a dup of PR 99037 then. I was planning on fixing it in GCC 12 but since
there is an ICE reproducer now I can push it in now

*** This bug has been marked as a duplicate of bug 99037 ***

[Bug target/99037] Invalid representation of vector zero in aarch64-simd.md

2021-03-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99037

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||asolokha at gmx dot com

--- Comment #2 from ktkachov at gcc dot gnu.org ---
*** Bug 99808 has been marked as a duplicate of this bug. ***

[Bug target/99822] [11 Regression] Assembler messages: Error: integer register expected in the extended/shifted operand register at operand 3 -- `adds x1,xzr,#2'

2021-03-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99822

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

  Known to work||10.2.1
  Known to fail||11.0
Summary|Assembler messages: Error:  |[11 Regression] Assembler
   |integer register expected   |messages: Error: integer
   |in the extended/shifted |register expected in the
   |operand register at operand |extended/shifted operand
   |3 -- `adds x1,xzr,#2'   |register at operand 3 --
   ||`adds x1,xzr,#2'
   Priority|P3  |P1
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 CC||ktkachov at gcc dot gnu.org
   Last reconfirmed||2021-03-30
   Target Milestone|--- |11.0

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug target/99822] [11 Regression] Assembler messages: Error: integer register expected in the extended/shifted operand register at operand 3 -- `adds x1,xzr,#2'

2021-03-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99822

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
I'll take it.

[Bug target/99822] [11 Regression] Assembler messages: Error: integer register expected in the extended/shifted operand register at operand 3 -- `adds x1,xzr,#2'

2021-03-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99822

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from ktkachov at gcc dot gnu.org ---
Fixed.

[Bug target/99820] aarch64: ICE (segfault) in aarch64_analyze_loop_vinfo with -moverride=tune=use_new_vector_costs

2021-03-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99820

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Fixed.

[Bug tree-optimization/96974] [10/11 Regression] ICE in vect_get_vector_types_for_stmt compiling for SVE

2021-03-31 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

--- Comment #13 from ktkachov at gcc dot gnu.org ---
Fixed now?

[Bug target/97349] Incorrect types for some AArch64 Neon vdupq_n_<...> intrinsics

2020-10-09 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97349

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Testing a patch.

[Bug target/97349] Incorrect types for some AArch64 Neon vdupq_n_<...> intrinsics

2020-10-10 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97349

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|ASSIGNED
 Resolution|FIXED   |---

[Bug target/97349] Incorrect types for some AArch64 Neon vdupq_n_<...> intrinsics

2020-10-13 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97349

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from ktkachov at gcc dot gnu.org ---
Fixed on all active branches. Thanks for the report.

[Bug tree-optimization/97405] New: ICE in get_or_alloc_expr_for in code hoisting with SVE intrinsics

2020-10-13 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97405

Bug ID: 97405
   Summary: ICE in get_or_alloc_expr_for in code hoisting with SVE
intrinsics
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

The following ICEs:
#include "arm_sve.h"

void
a (svuint8x3_t b, unsigned char *p, int c) {
  if (c)
svst1_u8(svptrue_pat_b8(SV_VL16), p, svget3_u8(b, 1));
  else
svst1_u8(svwhilelt_b8(6, 6), p, svget3_u8(b, 1));
}

with -O2 -march=armv8.2-a+sve on aarch64 with both GCC 10 and 11 branches

during GIMPLE pass: pre
sveice.c: In function 'a':
sveice.c:4:1: internal compiler error: in get_or_alloc_expr_for, at
tree-ssa-pre.c:1098
4 | a (svuint8x3_t b, unsigned char *p, int c) {
  | ^
0xf3a0e9 get_or_alloc_expr_for
$SRC/gcc/tree-ssa-pre.c:1098
0xf3a0e9 find_or_generate_expression
$SRC/gcc/tree-ssa-pre.c:2693
0xf3aadd create_component_ref_by_pieces_1
$SRC/gcc/tree-ssa-pre.c:2613
0xf393fc create_component_ref_by_pieces
$SRC/gcc/tree-ssa-pre.c:2681
0xf393fc create_expression_by_pieces
$SRC/gcc/tree-ssa-pre.c:2830
0xf3da24 do_hoist_insertion
$SRC/gcc/tree-ssa-pre.c:3598
0xf3da24 insert
$SRC/gcc/tree-ssa-pre.c:3685
0xf3da24 execute
$SRC/gcc/tree-ssa-pre.c:4235
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug tree-optimization/97405] ICE in get_or_alloc_expr_for in code hoisting with SVE intrinsics

2020-10-13 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97405

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |10.2
  Known to fail||10.1.1, 11.0

[Bug target/97442] New: Wrong represenation of AArch64 saba in RTL

2020-10-15 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97442

Bug ID: 97442
   Summary: Wrong represenation of AArch64 saba in RTL
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

This is similar to the issue fixed in
https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=8544ed6eea68a80999504c8a4b21b77d29cd86e2

The representation of the SABA (and UABA) in RTL is not correct and the below
testcase will abort at -O3 -fwrapv:

#define N 16
signed char a[] = {-100, -100, -100, -100,-100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100 };
signed char b[] = { 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100 };

signed char out[N] = { 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100 };

__attribute__ ((noinline,noipa))
void
foo (void)
{
  for (int i = 0; i < N; i++)
{
  signed char diff = b[i] - a[i];
  out[i] += diff > 0 ? diff : -diff;
}
}

signed char out2[N] = { 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100 };

__attribute__ ((noinline,noipa))
void
foo_scalar (void)
{
  for (int i = 0; i < N; i++)
{
  asm volatile ("");
  signed char diff = b[i] - a[i];
  out2[i] += diff > 0 ? diff : -diff;
}
}

int
main (void)
{
  foo ();
  foo_scalar ();
  for (int i = 0; i < N; i++)
if (out[i] != out2[i])
  __builtin_abort ();

  return 0;
}

due to function foo generating a saba when it shouldn't

[Bug target/97534] [10/11 Regression] ICE in decompose, at rtl.h:2280 (arm-linux-gnueabihf)

2020-10-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97534

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2020-10-23
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug target/97528] [10/11 Regression] ICE in decompose_automod_address, at rtlanal.c:6298 (arm-linux-gnueabihf)

2020-10-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97528

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-10-23
 CC||ktkachov at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.
A better testcase, using arm_neon.h intrinsics is:
#include 

typedef __simd64_int16_t a;
typedef __simd64_uint16_t b;
unsigned short c;
int d;
b e;
void f() {
  unsigned short *dst = &c;
  int g = d, bw = 4;
  b dc = e;
  for (int h = 0; h < bw; h++) {
unsigned short *i = dst;
b j = dc;
vst1_s16 ((int16_t *)i, (a) j);
dst += g;
  }
}

I see this ICEing on 9.3.1 as well (GCC 8 branch is ok)

[Bug target/97546] [11 Regression][SVE] ICE with -fenable-tree-bswap

2020-10-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97546

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P1
 CC||ktkachov at gcc dot gnu.org
   Target Milestone|--- |11.0
 Status|UNCONFIRMED |NEW
 Target||aarch64
  Known to fail||11.0
 Ever confirmed|0   |1
   Last reconfirmed||2020-10-23
  Known to work||10.2.1
   Keywords||ice-on-valid-code
Summary|[SVE] ICE with  |[11 Regression][SVE] ICE
   |-fenable-tree-bswap |with -fenable-tree-bswap

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. GCC 10 branch doesn't ICE.

[Bug target/97546] [11 Regression][SVE] ICE with -fenable-tree-bswap

2020-10-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97546

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Likely find_bswap_or_nop just needs to bail out if !cst_and_fits_in_hwi
(TYPE_SIZE_UNIT (gimple_expr_type (stmt)))

[Bug tree-optimization/97546] [11 Regression][SVE] ICE with -fenable-tree-bswap

2020-10-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97546

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org

--- Comment #3 from ktkachov at gcc dot gnu.org ---
(In reply to ktkachov from comment #2)
> Likely find_bswap_or_nop just needs to bail out if !cst_and_fits_in_hwi
> (TYPE_SIZE_UNIT (gimple_expr_type (stmt)))

I'll test a patch to that effect.

[Bug tree-optimization/97546] [11 Regression][SVE] ICE with -fenable-tree-bswap

2020-10-26 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97546

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from ktkachov at gcc dot gnu.org ---
Fixed.

[Bug target/70210] -march=native and -mcpu=native do not detect ARM cortex-a53 in 32 bit mode on Linux

2020-11-03 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70210

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #4 from ktkachov at gcc dot gnu.org ---
(In reply to Viktor Engelmann from comment #3)
> I see a (probably related) problem in gcc 9.3 on docker image alpine:latest
> on raspberry pi 3. There, Makefiles generated by autoconf pass -mcpu=armv7l.
> And you can't just force it to use a different cpu by passing --host= to
> ./configure in your Dockerfile, because that would break the Dockerfiles'
> cross-platform compatibility. This breaks all autoconf projects in docker on
> raspberry pi 3.
> 
> This shouldn't be that hard IMHO, because -mcpu=armv7l only needs to be
> reinterpreted as -mcpu=cortex-a53.

That's not correct usage though. -mcpu isn't documented to accepted
architecture names (see
https://gcc.gnu.org/onlinedocs/gcc-9.3.0/gcc/ARM-Options.html#ARM-Options for
the accepted list of arguments accepted by mcpu/mtune)

There is the option -march that accepts architecture names, but armv7l isn't a
valid architecture either. I'm not familiar with what Docker does here, but
from the aarch32 GCC backend perspective the architecture for Cortex-A53 is
-march=armv8-a+crc+simd

BTW, I believe the original bug in this report has been fixed for some time.
All currently supported GCC versions (GCC 8.4 onwards) have the right part
number for Cortex-A53

[Bug target/70210] -march=native and -mcpu=native do not detect ARM cortex-a53 in 32 bit mode on Linux

2020-11-03 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70210

--- Comment #6 from ktkachov at gcc dot gnu.org ---
(In reply to Viktor Engelmann from comment #5)
> Hmmm the problem isn't related to docker - I get the same problem when I run
> gcc on the raspberry directly with -mcpu=armv7l as autoconf does in docker.
> 

yep, armv7l is not a valid argument to -mcpu by design.

> When I run gcc manually with -mcpu=cortex-a53, everything is fine.
> lscpu says
> 
> Architecture:armv7l
> Byte Order:  Little Endian
> CPU(s):  4
> On-line CPU(s) list: 0-3
> Thread(s) per core:  1
> Core(s) per socket:  4
> Socket(s):   1
> Vendor ID:   ARM
> Model:   4
> Model name:  Cortex-A53
> Stepping:r0p4
> CPU max MHz: 1200.
> CPU min MHz: 600.
> BogoMIPS:38.40
> Flags:   half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva
> idivt vfpd32 lpae evtstrm crc32
> 
> and /proc/cpuinfo is basically the same as the one Andrew Roberts had posted.
> Maybe it's a problem in autoconf for passing armv7l (the "Architecture:"
> line) to -mcpu.

[Bug tree-optimization/97904] New: ICE with AArch64 SVE intrinsics

2020-11-19 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97904

Bug ID: 97904
   Summary: ICE with AArch64 SVE intrinsics
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

The testcase below ICEs for -march=armv8.2-a+sve

at -O0 the ICE is:
ice.c:19:1: internal compiler error: in tree_to_uhwi, at tree.c:7377
   19 | }
  | ^
0x136e268 tree_to_uhwi(tree_node const*)
$SRC/gcc/tree.c:7377
0x13ed7b2 assemble_noswitch_variable
$SRC/gcc/varasm.c:2104
0x13ed7b2 assemble_variable(tree_node*, int, int, int)
$SRC/gcc/varasm.c:2321
0x13f0af1 varpool_node::assemble_decl()
$SRC/gcc/varpool.c:587
0xb0a787 cgraph_order_sort::process()
$SRC/gcc/cgraphunit.c:2548
0xb0b658 output_in_order
$SRC/gcc/cgraphunit.c:2613
0xb0b658 symbol_table::compile()
$SRC/gcc/cgraphunit.c:2831
0xb0de34 symbol_table::finalize_compilation_unit()
$SRC/gcc/cgraphunit.c:3014
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

whereas at -O3 it is:
during GIMPLE pass: ccp
ice.c: In function ‘int main()’:
ice.c:19:1: internal compiler error: in maybe_canonicalize_mem_ref_addr, at
gimple-fold.c:4906
   19 | }
  | ^
0xcafec9 maybe_canonicalize_mem_ref_addr
$SRC/gcc/gimple-fold.c:4906
0xcbaf32 fold_stmt_1
$SRC/gcc/gimple-fold.c:4988
0xcc0a1d fold_stmt(gimple_stmt_iterator*, tree_node* (*)(tree_node*))
$SRC/gcc/gimple-fold.c:5329
0x121a859 substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)
$SRC/gcc/tree-ssa-propagate.c:1070
0x1a8d7d9 dom_walker::walk(basic_block_def*)
$SRC/gcc/domwalk.c:309
0x121b938 substitute_and_fold_engine::substitute_and_fold(basic_block_def*)
$SRC/gcc/tree-ssa-propagate.c:1199
0x1158927 ccp_finalize
$SRC/gcc/tree-ssa-ccp.c:1022
0x1158ec8 do_ssa_ccp
$SRC/gcc/tree-ssa-ccp.c:2586
0x1158ec8 execute
$SRC/gcc/tree-ssa-ccp.c:2629
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


#include 
#include 

const std::array log_tab =
{
{
svdup_n_f32(-2.29561495781f),
svdup_n_f32(-2.47071170807f),
}
 };

int main(void)
{
svbool_t pg;
svfloat32_t x;
auto A = svmla_f32_z(pg, log_tab[0], log_tab[1], x);

return 0;
}

[Bug tree-optimization/97904] ICE with AArch64 SVE intrinsics

2020-11-19 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97904

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

  Known to fail||10.2.1, 11.0
   Target Milestone|--- |10.3

[Bug tree-optimization/97929] [11 Regression] ICE: in exact_div, at poly-int.h:2219 (vect_get_num_vectors)

2020-11-20 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97929

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
  Known to work||10.2.1
   Priority|P3  |P1
   Last reconfirmed||2020-11-20
 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug c/97969] [9/10/11 Regression][ARM/Thumb] Certain combo of codegen options leads to compilation infinite loop with growing memory use

2020-11-24 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Target||arm
 Status|UNCONFIRMED |NEW
  Known to fail||10.2.1, 11.0, 9.3.1
 CC||ktkachov at gcc dot gnu.org
  Known to work||8.4.1
   Last reconfirmed||2020-11-24
Summary|[ARM/Thumb] Certain combo   |[9/10/11
   |of codegen options leads to |Regression][ARM/Thumb]
   |compilation infinite loop   |Certain combo of codegen
   |with growing memory use |options leads to
   ||compilation infinite loop
   ||with growing memory use
   Keywords||memory-hog
 Ever confirmed|0   |1

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Confirmed on the 9, 10, 11 branches. On GCC 8 it completes successfully.
Doesn't reproduce on aarch64, looks like it needs all of -mthumb
-fno-omit-frame-pointer -Os.

[Bug tree-optimization/97984] New: [10/11 Regression] Worse code for -O3 than -O2 on aarch64 vector multiply-add

2020-11-25 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97984

Bug ID: 97984
   Summary: [10/11 Regression] Worse code for -O3 than -O2 on
aarch64 vector multiply-add
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

The code:
void x (long * __restrict a, long * __restrict b)
{
  a[0] *= b[0];
  a[1] *= b[1];
  a[0] += b[0];
  a[1] += b[1];
}

at -O2 generates:
x:
ldp x4, x3, [x0]
ldp x2, x1, [x1]
maddx2, x2, x4, x2
maddx1, x1, x3, x1
stp x2, x1, [x0]
ret

whereas at -O3 it does:
x:
ldp x2, x3, [x0]
ldr x4, [x1]
ldr q1, [x1]
mul x2, x2, x4
ldr x4, [x1, 8]
fmovd0, x2
ins v0.d[1], x3
mul x1, x3, x4
ins v0.d[1], x1
add v0.2d, v0.2d, v1.2d
str q0, [x0]
ret

which is clearly inferior.
GCC 9 used to generate the good code for both -O2 and -O3

[Bug target/97969] [9/10/11 Regression][ARM/Thumb] Certain combo of codegen options leads to compilation infinite loop with growing memory use

2020-11-25 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||vmakarov at gcc dot gnu.org

--- Comment #4 from ktkachov at gcc dot gnu.org ---
This seems to go into a bad loop somewhere in LRA.

[Bug target/97969] [9/10/11 Regression][ARM/Thumb] Certain combo of codegen options leads to compilation infinite loop with growing memory use

2020-11-27 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969

--- Comment #6 from ktkachov at gcc dot gnu.org ---
Bisection shows it started with g:8d2d39587d941a40f25ea0144cceb677df115040

[Bug target/97969] [9/10/11 Regression][ARM/Thumb] Certain combo of codegen options leads to compilation infinite loop with growing memory use

2020-12-07 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969

--- Comment #8 from ktkachov at gcc dot gnu.org ---
(In reply to Haoxin Tu from comment #7)
> (In reply to Paul Sokolovsky from comment #0)
> 
> Hi, Paul. May I ask how to reduce compile-time-hog/memory-hog test cases
> using CReduce?
> I know CReduce can be easily used to reduce crash/wrong-code test cases, but
> I don't know how to minimize compile-time/memory ones. I will be very
> appreciated if you can give me any tips. Thanks.
> 
> 
> Best,
> Haoxin

When I had to do it in the past I've used the 'ulimit' command in linux. That
allows you to kill a process if it exceeds a time limit (ulimit -t) or a memory
limit (ulimit -m). You can use it in the validation script to check for the
pathological behaviour

[Bug target/98177] [11 Regression] SVE: ICE in expand_direct_optab_fn, at internal-fn.c:3368

2020-12-07 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98177

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2020-12-07
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||ktkachov at gcc dot gnu.org

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug c/98199] [11 Regression] ICE: Aborted (stack smashing detected)

2020-12-08 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98199

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2020-12-08
 Status|UNCONFIRMED |NEW
 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug target/98259] [11 Regression] error: 'void verify_insn_chain()' causes a section type conflict with 'void init_rtl_bb_info(basic_block)'

2020-12-14 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98259

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
May be a dup of PR98146?

[Bug tree-optimization/98279] ICE in apply_scale with --param=hot-bb-frequency-fraction >= 2^31

2020-12-14 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98279

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #1 from ktkachov at gcc dot gnu.org ---
(In reply to Alex Coplan from comment #0)
> The following fails:
> 
> $ cat test.c
> int a() {}
> $ aarch64-elf-gcc test.c -c -O --param=hot-bb-frequency-fraction=2147483648
> during GIMPLE pass: cdce
> test.c: In function 'a':
> test.c:1:5: internal compiler error: in apply_scale, at profile-count.h:1082
> 1 | int a() {}
>   | ^
> 0xcf2c8d profile_count::apply_scale(long, long) const
> /home/alecop01/toolchain/src/gcc/gcc/profile-count.h:1082
> 0xce6ed7 maybe_hot_count_p(function*, profile_count)
> /home/alecop01/toolchain/src/gcc/gcc/predict.c:175
> 0xce7459 maybe_hot_bb_p(function*, basic_block_def const*)
> /home/alecop01/toolchain/src/gcc/gcc/predict.c:193
> 0xce76c1 optimize_bb_for_size_p(basic_block_def const*)
> /home/alecop01/toolchain/src/gcc/gcc/predict.c:301
> 0xe11269 execute
> /home/alecop01/toolchain/src/gcc/gcc/tree-call-cdce.c:1195
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> 
> Related to PR98271 and PR98276 (both ICEs that occur when a --param is >=
> 2^31). Perhaps a meta-bug would be useful here?
> 
> Equally, perhaps I'm misunderstanding the contract for --param values: are
> these meant to be user-facing or internal? I.e. is it expected that the
> compiler ICEs if the user provides an unreasonable value for the param?

My understanding is the compiler should never ICE, even for developer-oriented
params. The fact that this is an ICE due to an outrageous input to a param
would lower the priority of the bug, but it doesn't invalidate it.

[Bug c++/98322] optimizes to false instead true

2020-12-16 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98322

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |INVALID
 CC||ktkachov at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from ktkachov at gcc dot gnu.org ---
I don't think that's true.
Note, that with -Wall you get the warning:
true.c: In function 'always_true':
true.c:3:29: warning: '~' on a boolean expression [-Wbool-operation]
3 | return (a == b) == (~a ^ b);
  | ^
true.c:3:29: note: did you mean to use logical not?
3 | return (a == b) == (~a ^ b);
  | ^
  |

[Bug tree-optimization/98350] New: Reassociation breaks FMA chains

2020-12-17 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350

Bug ID: 98350
   Summary: Reassociation breaks FMA chains
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

Consider the testcase:

#define N 1024
double a[N];
double b[N];
double c[N];
double d[N];
double e[N];
double f[N];
double g[N];
double h[N];
double j[N];
double k[N];
double l[N];
double m[N];
double o[N];
double p[N];


void
foo (void)
{
  for (int i = 0; i < N; i++)
  {
a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * l[i]
+ m[i]* o[i] + p[i];
  }
}

For -Ofast --param=tree-reassoc-width=1 GCC generates the loop:
.L2:
ldr q1, [x1, x0]
ldr q0, [x12, x0]
ldr q3, [x14, x0]
faddv0.2d, v0.2d, v1.2d
ldr q1, [x13, x0]
ldr q2, [x11, x0]
fmlav0.2d, v3.2d, v1.2d
ldr q1, [x10, x0]
ldr q3, [x9, x0]
fmlav0.2d, v2.2d, v1.2d
ldr q1, [x8, x0]
ldr q2, [x7, x0]
fmlav0.2d, v3.2d, v1.2d
ldr q1, [x6, x0]
ldr q3, [x5, x0]
fmlav0.2d, v2.2d, v1.2d
ldr q1, [x4, x0]
ldr q2, [x3, x0]
fmlav0.2d, v3.2d, v1.2d
ldr q1, [x2, x0]
fmlav0.2d, v2.2d, v1.2d
str q0, [x1, x0]
add x0, x0, 16
cmp x0, 8192
bne .L2

with --param=tree-reassoc-width=4 it generates:
.L2:
ldr q5, [x11, x0]
ldr q4, [x7, x0]
ldr q0, [x3, x0]
ldr q3, [x12, x0]
ldr q1, [x8, x0]
ldr q2, [x4, x0]
fmulv3.2d, v3.2d, v5.2d
fmulv1.2d, v1.2d, v4.2d
fmulv2.2d, v2.2d, v0.2d
ldr q16, [x1, x0]
ldr q18, [x14, x0]
ldr q17, [x13, x0]
ldr q0, [x2, x0]
ldr q7, [x10, x0]
ldr q6, [x9, x0]
ldr q5, [x6, x0]
ldr q4, [x5, x0]
fmlav3.2d, v18.2d, v17.2d
faddv0.2d, v0.2d, v16.2d
fmlav1.2d, v7.2d, v6.2d
fmlav2.2d, v5.2d, v4.2d
faddv0.2d, v0.2d, v3.2d
faddv1.2d, v1.2d, v2.2d
faddv0.2d, v0.2d, v1.2d
str q0, [x1, x0]
add x0, x0, 16
cmp x0, 8192
bne .L2

The reassociation is evident. The problem here is that the fmla chains are
something we'd want to preserve.
Is there a way we can get the reassoc pass to handle FMAs more intelligently?

[Bug target/96313] [AArch64] vqmovun* return types should be unsigned

2020-09-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96313

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org
 Status|NEW |ASSIGNED
 CC||ktkachov at gcc dot gnu.org
  Known to work||11.0
   Target Milestone|--- |8.5
  Known to fail||10.0, 8.4.1, 9.3.1

--- Comment #6 from ktkachov at gcc dot gnu.org ---
Fixed on trunk. Will backport to branches later

[Bug target/97150] [AArch64] 2nd parameter of unsigned Neon scalar shift intrinsics should be signed

2020-09-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97150

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2020-09-30
   Target Milestone|--- |8.5
  Known to fail||10.0, 8.4.1, 9.3.1
   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org
 CC||ktkachov at gcc dot gnu.org
  Known to work||11.0

--- Comment #4 from ktkachov at gcc dot gnu.org ---
Fixed on trunk. Will backport to the branches later

[Bug target/97323] [10/11 Regression] ICE 'verify_type' failed on arm-linux-gnueabihf

2020-10-08 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97323

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC|kyrylo.tkachov at arm dot com  |
   Last reconfirmed||2020-10-08
 Ever confirmed|0   |1

--- Comment #2 from ktkachov at gcc dot gnu.org ---
confirmed on trunk with the extra checking enabled

[Bug target/97150] [AArch64] 2nd parameter of unsigned Neon scalar shift intrinsics should be signed

2020-10-08 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97150

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from ktkachov at gcc dot gnu.org ---
Fixed on GCC 8.5 onwards.

[Bug target/96313] [AArch64] vqmovun* return types should be unsigned

2020-10-08 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96313

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from ktkachov at gcc dot gnu.org ---
Fixed on GCC 8.5 onwards.

[Bug target/97349] Incorrect types for some Arm Neon vdupq_n_<...> intrinsics

2020-10-09 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97349

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2020-10-09
 Target|arm |aarch64
 Ever confirmed|0   |1
   Target Milestone|--- |8.5
  Known to fail||10.2.1, 11.0, 8.4.1, 9.3.1
 Status|UNCONFIRMED |NEW
 CC||ktkachov at gcc dot gnu.org

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug c++/98448] New: [11 Regression] bootstrap-O3 comparison fails due to libcody

2020-12-26 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98448

Bug ID: 98448
   Summary: [11 Regression] bootstrap-O3 comparison fails due to
libcody
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: build
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
CC: nathan at gcc dot gnu.org
  Target Milestone: ---

Doing a bootstrap-O3 on aarch64-none-linux-gnu fails comparison with:
Comparing stages 2 and 3
Bootstrap comparison failure!
libcody/fatal.o differs

Maybe some flags need to be passed down in a Makefile somewhere?

[Bug target/98453] New: aarch64: Missed opportunity for STP for vec_duplicate

2020-12-27 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98453

Bug ID: 98453
   Summary: aarch64: Missed opportunity for STP for vec_duplicate
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

typedef long long v2di __attribute__((vector_size (16)));
typedef int v2si __attribute__((vector_size (8)));

void
foo (v2di *x, long long a)
{
  v2di tmp = {a, a};
  *x = tmp;
}

void
foo2 (v2si *x, int a)
{
  v2si tmp = {a, a};
  *x = tmp;
}

at -O2 on aarch64 gives:
foo:
dup v0.2d, x1
str q0, [x0]
ret

foo2:
dup v0.2s, w1
str d0, [x0]
ret

These could just be: stp x1, x1, [x0] and stp w1, w1, [x0]
Combine already tries and fails to match:
(set (mem:V2DI (reg:DI 97) [1 *x_4(D)+0 S16 A128])
(vec_duplicate:V2DI (reg:DI 98)))
and
(set (mem:V2SI (reg:DI 97) [2 *x_4(D)+0 S8 A64])
(vec_duplicate:V2SI (reg:SI 98)))

So can be fixed by some new patterns in aarch64-simd.md.
We should make sure to handle the other 32-bit and 64-bit modes as well

[Bug target/98477] New: aarch64: Unnecessary GPR -> FPR moves for conditional select

2020-12-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98477

Bug ID: 98477
   Summary: aarch64: Unnecessary GPR -> FPR moves for conditional
select
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

Code like
void
foo (int a, double *b)
{
  *b = a ? 1.0 : 200.0;
}

generates:
foo:
cmp w0, 0
mov x2, 149533581377536
movkx2, 0x40c3, lsl 48
mov x0, 4641240890982006784
fmovd0, x2
fmovd1, x0
fcsel   d0, d0, d1, ne
str d0, [x1]
ret

We don't need to do the FCSEL on the FPR side if we're just storing it to
memory. We can just do a GPR CSEL and avoid the FMOVs.
I've seen this pattern in the disassembly of some math library routines.
Maybe we should add a =w,w,w alternative to the CSEL patterns in the backend?

[Bug target/98477] aarch64: Unnecessary GPR -> FPR moves for conditional select

2020-12-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98477

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Or a =r,r,r alternative to the FCSEL pattern instead...

[Bug c++/98448] [11 Regression] bootstrap-O3 comparison fails due to libcody

2021-01-04 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98448

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Indeed, I see it passing with today's trunk.
Nathan, if you can't reproduce it feel free to close this.

[Bug tree-optimization/96974] [10/11 Regression] ICE in vect_get_vector_types_for_stmt compiling for SVE

2021-01-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Priority|P3  |P2
   Last reconfirmed|2020-10-15 00:00:00 |2021-01-05
 Ever confirmed|0   |1

--- Comment #2 from ktkachov at gcc dot gnu.org ---
This has gone latent on GCC 11 trunk with
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=f5e18dd9c7dacc9671044fc669bd5c1b26b6bdba
and still appears on the GCC 10 branch

[Bug target/98532] New: Use load/store pairs for 2-element vector in memory permutes

2021-01-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98532

Bug ID: 98532
   Summary: Use load/store pairs for 2-element vector in memory
permutes
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

I've seen these patterns while looking at some disassemblies but I believe it
can be reproduced in C with:
typedef long v2di __attribute__((vector_size (16)));

void
foo (v2di *a, v2di *b)
{
  v2di tmp = {(*a)[1], (*a)[0]};
  *b = tmp;
}

This, for aarch64 -O2 generates:
foo:
ldr d0, [x0, 8]
ld1 {v0.d}[1], [x0]
str q0, [x1]
ret

clang does:
foo:// @foo
ldr q0, [x0]
ext v0.16b, v0.16b, v0.16b, #8
str q0, [x1]
ret

I suspect we can do better in these cases with:
ldp x2, x3, [x0]
stp x3, x2, [x1]
or something similar.
In the combine phase we already try and fail to match:
Failed to match this instruction:
(set (reg:V2DI 97 [ tmp ])
(vec_concat:V2DI (mem/j:DI (plus:DI (reg/v/f:DI 95 [ a ])
(const_int 8 [0x8])) [1 BIT_FIELD_REF <*a_4(D), 64, 64>+0 S8
A64])
(mem/j:DI (reg/v/f:DI 95 [ a ]) [1 BIT_FIELD_REF <*a_4(D), 64, 0>+0 S8
A128])))


so maybe we can solve this purely in the backend?

[Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r

2021-01-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

Bug ID: 98535
   Summary: [11 Regression] ICE in
operands_scanner::get_expr_operands(tree_node**, int)
building 538.imagick_r
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

Building 538.imagick_r from SPEC2017 ICEs on aarch64 with -O3 -mcpu=neoverse-v1
(SVE-enabled)

A reduced testcase is:
typedef short a;

typedef struct {
  a b, c, d, e;
} f;

f *g;

long h;

void
i() {
  f j;
  for (; h; h++)
*g++ = j;
}

during GIMPLE pass: vect
foo.c: In function 'i':
foo.c:12:1: internal compiler error: Segmentation fault
   12 | i() {
  | ^
0xd8f177 crash_signal
$SRC/gcc/toplev.c:327
0xf7daa9 operands_scanner::get_expr_operands(tree_node**, int)
$SRC/gcc/tree-ssa-operands.c:780
0xf7f823 operands_scanner::parse_ssa_operands()
$SRC/gcc/tree-ssa-operands.c:998
0xf80eff operands_scanner::build_ssa_operands()
$SRC/gcc/tree-ssa-operands.c:1013
0xf817b4 update_stmt_operands(function*, gimple*)
$SRC/gcc/tree-ssa-operands.c:1155
0xa25ef9 update_stmt_if_modified
$SRC/gcc/gimple-ssa.h:185
0xa25ef9 update_modified_stmt
$SRC/gcc/gimple-iterator.c:44
0xa25ef9 gsi_insert_after(gimple_stmt_iterator*, gimple*, gsi_iterator_update)
$SRC/gcc/gimple-iterator.c:540
0xa1a104 gimple_seq_add_stmt(gimple**, gimple*)
$SRC/gcc/gimple.c:1282
0x10cd184 duplicate_and_interleave(vec_info*, gimple**, tree_node*,
vec, unsigned int, vec&)
$SRC/gcc/tree-vect-slp.c:4958
0x10cdc02 vect_create_constant_vectors
$SRC/gcc/tree-vect-slp.c:5112
0x10cdc02 vect_schedule_slp_node
$SRC/gcc/tree-vect-slp.c:5696
0x10dbff9 vect_schedule_scc
$SRC/gcc/tree-vect-slp.c:5958
0x10dc2d2 vect_schedule_scc
$SRC/gcc/tree-vect-slp.c:5975
0x10dcc0f vect_schedule_slp(vec_info*, vec<_slp_instance*, va_heap, vl_ptr>)
$SRC/gcc/tree-vect-slp.c:6110
0x10b658d vect_transform_loop(_loop_vec_info*, gimple*)
$SRC/gcc/tree-vect-loop.c:9468
0x10e6dd8 try_vectorize_loop_1
$SRC/gcc/tree-vectorizer.c:1104
0x10e7458 try_vectorize_loop_1
$SRC/gcc/tree-vectorizer.c:1141
0x10e7501 try_vectorize_loop
$SRC/gcc/tree-vectorizer.c:1161
0x10e7847 vectorize_loops()
$SRC/gcc/tree-vectorizer.c:1242
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r

2021-01-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |11.0
   Priority|P3  |P1

[Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r

2021-01-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

--- Comment #1 from ktkachov at gcc dot gnu.org ---
This backtrace started with my commit 64432b680eab0bddbe9a4ad4798457cf6a14ad60
but before this it still ICEd with:
foo.c: In function 'i':
foo.c:12:1: error: type mismatch in 'vec_perm_expr'
   12 | i() {
  | ^
vector([4,4]) unsigned short
vector([4,4]) unsigned short
unsigned long
vector([4,4]) ssizetype
_112 = VEC_PERM_EXPR <_111, niters.19_72, { 0, POLY_INT_CST [4, 4], 1,
POLY_INT_CST [5, 4], 2, POLY_INT_CST [6, 4], ... }>;
during GIMPLE pass: vect

[Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r

2021-01-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Bisection points to 6c3ce63b04b38f84c0357e4648383f0e3ab11cd9

[Bug tree-optimization/98581] New: unexpected reassociation for umin/umax ?

2021-01-07 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98581

Bug ID: 98581
   Summary: unexpected reassociation for umin/umax ?
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

typedef signed int *__restrict__ pSINT;
typedef unsigned int *__restrict__ pUINT;

#define MIN(a, b) ((a) < (b) ? (a) : (b))
#define MAX(a, b) ((a) > (b) ? (a) : (b))

void saba_s (pSINT a, pSINT b, pSINT c)
{
  int i;
  for (i = 0; i < 4; i++)
c[i] += (MAX (a[i], b[i]) - MIN (a[i], b[i]));
}

void saba_u (pUINT a, pUINT b, pUINT c)
{
  int i;
  for (i = 0; i < 4; i++)
c[i] += (MAX (a[i], b[i]) - MIN (a[i], b[i]));
}

On aarch64 at -O3 generates:
saba_s:
ldr q0, [x0]
ldr q1, [x1]
ldr q2, [x2]
sabdv0.4s, v0.4s, v1.4s
add v0.4s, v0.4s, v2.4s
str q0, [x2]
ret

saba_u:
ldr q1, [x0]
ldr q2, [x1]
ldr q3, [x2]
umaxv0.4s, v1.4s, v2.4s
uminv1.4s, v1.4s, v2.4s
add v0.4s, v0.4s, v3.4s
sub v0.4s, v0.4s, v1.4s
str q0, [x2]
ret

I would expect the (MAX (a[i], b[i]) - MIN (a[i], b[i])) part to match a uabd
instruction for the unsigned case, but it looks like the add and sub operations
are swapped which prevents the RTL pattern matching the operation.
This comes out this way out of GIMPLE. At expand the signed version is:
  vect__4.6_40 = MEM  [(int *)c_16(D)];
  vect__6.9_37 = MEM  [(int *)b_17(D)];
  vect__8.12_34 = MEM  [(int *)a_18(D)];
  vect__9.13_33 = MAX_EXPR ;
  vect__10.14_32 = MIN_EXPR ;
  vect__11.15_31 = vect__9.13_33 - vect__10.14_32;
  vect__12.16_30 = vect__11.15_31 + vect__4.6_40;
  MEM  [(int *)c_16(D)] = vect__12.16_30;
  return;


the unsigned is:
  vect__4.25_38 = MEM  [(unsigned int *)c_16(D)];
  vect__6.28_35 = MEM  [(unsigned int *)b_17(D)];
  vect__8.31_32 = MEM  [(unsigned int *)a_18(D)];
  vect__9.32_31 = MAX_EXPR ;
  vect__10.33_30 = MIN_EXPR ;
  vect__13.34_29 = vect__9.32_31 + vect__4.25_38;
  vect__12.35_28 = vect__13.34_29 - vect__10.33_30;
  MEM  [(unsigned int *)c_16(D)] = vect__12.35_28;
  return;

[Bug target/98636] [ARM] ICE on passing incompatible options for fp16 - global_options’ are modified in local context

2021-01-12 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98636

--- Comment #8 from ktkachov at gcc dot gnu.org ---
(In reply to prathamesh3492 from comment #7)
> I think the error is correct.
> CCing Kyrill -- could you please confirm if the error is valid for
> above case ?
> Thanks!

Yes, -mfp16-format=alternative is incompatible with the intrinsics

[Bug c++/98641] New: Feature request: implement pointer alignment builtins

2021-01-12 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98641

Bug ID: 98641
   Summary: Feature request: implement pointer alignment builtins
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

We received reports that users found the pointer alignment builtins provided by
LLVM useful in avoiding error-prone casting to and from intptr_t:
https://clang.llvm.org/docs/LanguageExtensions.html#alignment-builtins

It would be great if we could support them in GCC as well.

This would involve implementing:
Type __builtin_align_up(Type value, size_t alignment);
Type __builtin_align_down(Type value, size_t alignment);
bool __builtin_is_aligned(Type value, size_t alignment);

Using these builtins the compiler can also preserve pointer provenance
information  more easily.

[Bug c++/98641] Feature request: implement pointer alignment builtins

2021-01-12 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98641

--- Comment #1 from ktkachov at gcc dot gnu.org ---
The component is marked as C++, but it would be good to have these in C as
well.

[Bug target/98657] [11 Regression] SVE: ICE (unrecognizable insn) with shift at -O3 -msve-vector-bits=256

2021-01-13 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98657

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Priority|P3  |P1
   Last reconfirmed||2021-01-13
 Status|UNCONFIRMED |NEW
  Known to work||10.2.1
Summary|SVE: ICE (unrecognizable|[11 Regression] SVE: ICE
   |insn) wtih shift at -O3 |(unrecognizable insn) with
   |-msve-vector-bits=256   |shift at -O3
   ||-msve-vector-bits=256
 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug target/98681] [8/9/10/11 Regression] aarch64: Invalid ubfiz instruction rejected by assembler

2021-01-14 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98681

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||ktkachov at gcc dot gnu.org
Summary|aarch64: Invalid ubfiz  |[8/9/10/11 Regression]
   |instruction rejected by |aarch64: Invalid ubfiz
   |assembler   |instruction rejected by
   ||assembler
   Priority|P3  |P2
   Last reconfirmed||2021-01-14

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. GCC 5 didn't generate the bogus assembly.
I suppose the predicate and/or constraint of *andim_ashift_bfiz needs
tightening

[Bug tree-optimization/98268] [10 Regression] ICE: verify_gimple failed with LTO and SVE

2021-01-18 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98268

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Priority|P3  |P2
   Target Milestone|--- |10.3
 CC||ktkachov at gcc dot gnu.org
   Last reconfirmed||2021-01-18
  Known to work||9.3.1
  Known to fail||10.2.1
 Ever confirmed|0   |1
Summary|ICE: verify_gimple failed   |[10 Regression] ICE:
   |with LTO and SVE|verify_gimple failed with
   ||LTO and SVE

--- Comment #4 from ktkachov at gcc dot gnu.org ---
I can't reproduce on trunk even with the param, but it does trigger on GCC 10
branch.
Does it need some extra checking in the configuration?

[Bug tree-optimization/98268] [10/11 Regression] ICE: verify_gimple failed with LTO and SVE

2021-01-18 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98268

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

Summary|[10 Regression] ICE:|[10/11 Regression] ICE:
   |verify_gimple failed with   |verify_gimple failed with
   |LTO and SVE |LTO and SVE
  Known to fail||11.0

--- Comment #6 from ktkachov at gcc dot gnu.org ---
(In reply to Alex Coplan from comment #5)
> Ah, sorry, I hit this with a VLA compile on trunk, so the full command line
> is:
> 
> $ aarch64-elf-gcc a.c b.c -flto -O1 -ftree-vectorize -march=armv8.2-a+sve
> --param=aarch64-autovec-preference=3
> 
> with the above source files. It doesn't ICE with the -msve-vector-bits=128
> flag as above.

Indeed, reproduced now, thanks.

[Bug tree-optimization/98726] [10/11 Regression] SVE: tree check: expected integer_cst, have poly_int_cst in to_wide, at tree.h:5984

2021-01-18 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98726

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

  Known to work||9.3.1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-01-18
  Known to fail||10.2.1
 Ever confirmed|0   |1
   Target Milestone|11.0|10.3
 CC||ktkachov at gcc dot gnu.org
   Priority|P3  |P2
Summary|SVE: tree check: expected   |[10/11 Regression] SVE:
   |integer_cst, have   |tree check: expected
   |poly_int_cst in to_wide, at |integer_cst, have
   |tree.h:5984 |poly_int_cst in to_wide, at
   ||tree.h:5984

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug tree-optimization/98766] [10/11 Regression] SVE: ICE in tree_to_shwi with -O3 --param=avoid-fma-max-bits

2021-01-20 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98766

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

  Known to fail||10.2.1
Summary|SVE: ICE in tree_to_shwi|[10/11 Regression] SVE: ICE
   |with -O3|in tree_to_shwi with -O3
   |--param=avoid-fma-max-bits  |--param=avoid-fma-max-bits
   Last reconfirmed||2021-01-20
   Priority|P3  |P2
  Known to work||9.3.1
   Target Milestone|--- |10.3
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 CC||ktkachov at gcc dot gnu.org

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug c/97172] [11 Regression] ICE: tree code ‘ssa_name’ is not supported in LTO streams since r11-3303-g6450f07388f9fe57

2021-01-22 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed|2020-09-23 00:00:00 |2021-1-22
 CC||ktkachov at gcc dot gnu.org

--- Comment #23 from ktkachov at gcc dot gnu.org ---
I also see this in gcc.dg/atomic/pr65345-4.c on aarch64

[Bug rtl-optimization/98791] [11 Regression] ICE in paradoxical_subreg_p (in ira) with SVE, LTO

2021-01-22 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Keywords||ice-on-valid-code
Summary|ICE in paradoxical_subreg_p |[11 Regression] ICE in
   |(in ira) with SVE, LTO  |paradoxical_subreg_p (in
   ||ira) with SVE, LTO
   Target Milestone|--- |11.0
   Last reconfirmed||2021-01-22
  Known to work||10.2.1
 Ever confirmed|0   |1
   Priority|P3  |P1

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed on trunk.

[Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64

2021-01-22 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792

Bug ID: 98792
   Summary: Fail to use SHRN instructions for narrowing shift on
aarch64
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

#define N 1024
unsigned short res[N];
unsigned int in[N];

void
foo (void)
{
  for (int i = 0; i < N; i++)
res[i] = in[i] >> 3;
}

with -O3 -mcpu=neoverse-n1 on aarch64 generates the loop:
.L2:
ldp q1, q0, [x0]
add x0, x0, 32
ushrv1.4s, v1.4s, 3
ushrv0.4s, v0.4s, 3
xtn v2.4h, v1.4s
xtn2v2.8h, v0.4s
str q2, [x1], 16
cmp x0, x2
bne .L2

it could be using the SHRN narrowing shift instruction insted. LLVM can do it
(some other inefficiencies aside):
.LBB0_1:// %vector.body
// =>This Inner Loop Header: Depth=1
add x11, x10, x8
ldp q0, q1, [x11]
add x8, x8, #32 // =32
cmp x8, #1, lsl #12 // =4096
shrnv0.4h, v0.4s, #3
shrnv1.4h, v1.4s, #3
stp d0, d1, [x9, #-8]
add x9, x9, #16 // =16
b.ne.LBB0_1

Some backend patterns can probably handle it, but maybe the vectoriser can do
something useful earlier as well?

[Bug tree-optimization/98766] [10/11 Regression] SVE: ICE in tree_to_shwi with -O3 --param=avoid-fma-max-bits

2021-01-22 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98766

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564082.html

[Bug tree-optimization/98766] [10 Regression] SVE: ICE in tree_to_shwi with -O3 --param=avoid-fma-max-bits

2021-01-22 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98766

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

Summary|[10/11 Regression] SVE: ICE |[10 Regression] SVE: ICE in
   |in tree_to_shwi with -O3|tree_to_shwi with -O3
   |--param=avoid-fma-max-bits  |--param=avoid-fma-max-bits

--- Comment #5 from ktkachov at gcc dot gnu.org ---
Fixed for GCC 11.

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

--- Comment #5 from ktkachov at gcc dot gnu.org ---
Looks like after the refactoring to introduce MVE shifts (which doesn't ICE) we
need to make sure the optab is still disabled for iwmmxt?

[Bug tree-optimization/98766] [10 Regression] SVE: ICE in tree_to_shwi with -O3 --param=avoid-fma-max-bits

2021-01-27 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98766

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
  Known to fail|11.0|

--- Comment #7 from ktkachov at gcc dot gnu.org ---
Fixed on branch too.

[Bug target/98867] New: Failure to use SRI instruction for shift-right-and-insert vector operations

2021-01-28 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98867

Bug ID: 98867
   Summary: Failure to use SRI instruction for
shift-right-and-insert vector operations
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

#define N 1024
unsigned char in[N];
unsigned char out[N];

#define SHIFT 6

void
foo (void)
{
  for (int i = 0; i < N; i++)
{
  unsigned char mask = 255u >> SHIFT;
  unsigned char shifted = in[i] >> SHIFT;
  out[i] = (out[i] & ~mask) | shifted;
}
}

at -O3 generates:
foo:
adrpx1, .LANCHOR0
add x1, x1, :lo12:.LANCHOR0
moviv2.16b, 0xfffc
add x2, x1, 1024
mov x0, 0
.L2:
ldr q0, [x1, x0]
ldr q1, [x0, x2]
and v0.16b, v0.16b, v2.16b
ushrv1.16b, v1.16b, 6
orr v0.16b, v0.16b, v1.16b
str q0, [x1, x0]
add x0, x0, 16
cmp x0, 1024
bne .L2
ret

whereas it could use the SRI instruction as clang does (unrolled 2x):
foo:// @foo
adrpx9, in
adrpx10, out
mov x8, xzr
add x9, x9, :lo12:in
add x10, x10, :lo12:out
.LBB0_1:// %vector.body
add x11, x9, x8
add x12, x10, x8
ldp q0, q1, [x11]
ldp q2, q3, [x12]
add x8, x8, #32 // =32
cmp x8, #1024   // =1024
sri v2.16b, v0.16b, #6
sri v3.16b, v1.16b, #6
stp q2, q3, [x12]
b.ne.LBB0_1

This may be a bit too complex for combine to match though

[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

2021-01-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
  Known to work||11.0

--- Comment #8 from ktkachov at gcc dot gnu.org ---
The issue in this bug report is that the "get low lane" operation should just
be a move rather than a vec_select so that it can be optimised away.
After g:e140f5fd3e235c5a37dc99b79f37a5ad4dc59064 GCC 11 does the right thing
for all testcases in this PR

So marking this as fixed.

[Bug target/95265] aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull

2021-01-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265
Bug 95265 depends on bug 92665, which changed state.

Bug 92665 Summary: [AArch64] low lanes select not optimized out for vmlal 
intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/98877] [AArch64] Inefficient code generated for tbl NEON intrinsics

2021-01-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||ktkachov at gcc dot gnu.org,
   ||tnfchris at gcc dot gnu.org
   Last reconfirmed||2021-01-29
 Status|UNCONFIRMED |NEW

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Confirmed. I think the whole moving in and out the structure modes (OImode,
XImode and friends) really hurts codegen at the RTL level.

[Bug target/91753] Bad register allocation of multi-register types

2021-01-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91753

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #8 from ktkachov at gcc dot gnu.org ---
The issue with the many moves is still there, however for GCC 11 at least
they're hoisted outside the loop

[Bug target/97528] [9/10 Regression] ICE in decompose_automod_address, at rtlanal.c:6298 (arm-linux-gnueabihf)

2021-02-01 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97528

--- Comment #8 from ktkachov at gcc dot gnu.org ---
If the patch tests cleanly we should apply it to GCC 9 and 8 too (if
applicable)

[Bug target/98917] SVE: wrong code with -O -ftree-vectorize -msve-vector-bits=128 --param=aarch64-autovec-preference=2

2021-02-01 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98917

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Is this a 11 regression?

[Bug target/98931] [11 Regression] arm: Assembly fails with "branch out of range or not a multiple of 2"

2021-02-02 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98931

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

Summary|arm: Assembly fails with|[11 Regression] arm:
   |"branch out of range or not |Assembly fails with "branch
   |a multiple of 2"|out of range or not a
   ||multiple of 2"
   Target Milestone|--- |11.0
  Known to work||10.2.1
 Status|UNCONFIRMED |NEW
 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1
   Priority|P3  |P1
   Last reconfirmed||2021-02-02

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. Assembles fine on GCC 10.

[Bug tree-optimization/98949] gcc-9.3 aarch64 -ftree-vectorize generates wrong code

2021-02-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98949

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #4 from ktkachov at gcc dot gnu.org ---
I can confirm that the commit
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82
from PR97236 fixes the abort here.

[Bug middle-end/98974] [11 Regression] ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2021-02-05
  Known to fail||11.0
 Status|UNCONFIRMED |NEW
 CC||ktkachov at gcc dot gnu.org
Summary|ICE in  |[11 Regression] ICE in
   |vectorizable_condition  |vectorizable_condition
   |after STMT_VINFO_VEC_STMTS  |after STMT_VINFO_VEC_STMTS
   Priority|P3  |P1
   Target Milestone|--- |11.0
 Ever confirmed|0   |1
 Target||aarch64
  Known to work||10.2.1

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed. This affects building 521.wrf_r from SPEC2017 with LTO

  1   2   3   4   >