from:"jgreenhalgh at gcc dot gnu.org"

[Bug c/95133] [9/10/11 Regression] ICE in gimple_redirect_edge_and_branch_force, at tree-cfg.c:6075

2020-05-14 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95133

--- Comment #2 from James Greenhalgh  ---
Should reproduce further back if you force it on with -ftree-vectorize .

i.e.

gcc foo.c -ftree-vectorize -O3

Breaks somewhere between:

gcc version 7.0.0 20160615
gcc version 7.0.0 20160907

[Bug target/96313] [AArch64] vqmovun* return types should be unsigned

2020-07-27 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96313

James Greenhalgh  changed:

   What|Removed |Added

 CC||jgreenhalgh at gcc dot gnu.org
 Status|WAITING |NEW

--- Comment #2 from James Greenhalgh  ---
Confirmed by inspection; types in arm_neon.h are:

  int8_t vqmovunh_s16 (int16_t __a)
  int16_t vqmovuns_s32 (int32_t __a)
  int32_t vqmovund_s64 (int64_t __a)

Types in the documentation are:

  uint8_t vqmovunh_s16 (int16_t a)
  uint16_t vqmovuns_s32 (int32_t a)
  uint32_t vqmovund_s64 (int64_t a)

[Bug libstdc++/96958] New: Long Double in Hash Table policy forces soft-float calculations

2020-09-07 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96958

Bug ID: 96958
   Summary: Long Double in Hash Table policy forces soft-float
calculations
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
  Target Milestone: ---

It was pointed out that some forks of GCC (
https://github.com/FEX-Emu/gcc/commit/8a2b7389f50a50a4e26ec98101d47fb1fc1c1bcd
) reduce the hashtable policy implementation from a long double to a double.
Doing this reduces it from a soft-float calculation to hardware floating-point.

Reading the discussion on libstdc++ from when this code was introduced the
intention was to provide massive amounts of forwards compatibility for Very Big
hash tables. We're taking quite an efficiency hit for that future proofing.

[Bug libstdc++/96958] Long Double in Hash Table policy forces soft-float calculations

2020-09-07 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96958

--- Comment #1 from James Greenhalgh  ---
Asleep at the wheel today, I had intended to link to the
https://gcc.gnu.org/pipermail/libstdc++/2011-September/036420.html original
discussion rather than leave it as a tedious exercise for the reader.

[Bug target/57586] New: ICE when expanding volatile asm using unaligned pointer

2013-06-11 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57586

Bug ID: 57586
   Summary: ICE when expanding volatile asm using unaligned
pointer
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org

Created attachment 30290
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30290&action=edit
Reduced testcase

Using built-in specs.
COLLECT_GCC=../build-arm-none-eabi/install/bin/arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/work/build-arm-none-eabi/install/libexec/gcc/arm-none-eabi/4.9.0/lto-wrapper
Target: arm-none-eabi
Configured with: /work/src/gcc/configure --target=arm-none-eabi
--prefix=/work/build-arm-none-eabi/install
--with-gmp=/work/build-arm-none-eabi/host-tools
--with-mpfr=/work/build-arm-none-eabi/host-tools
--with-mpc=/work/build-arm-none-eabi/host-tools --with-pkgversion=unknown
--disable-shared --disable-nls --disable-threads --disable-tls
--enable-checking=yes --enable-languages=c,c++ --with-newlib
Thread model: single
gcc version 4.9.0 20130326 (experimental) (unknown)

../build-arm-none-eabi/install/bin/arm-none-eabi-gcc ../testcases/pr-reduced.c
-O1
../testcases/pr-reduced.c: In function 'foo':
../testcases/pr-reduced.c:12:3: error: output number 0 not directly addressable
   __asm__ __volatile__("": "+m" (c->x) : "r" (&c->x) : );
   ^
../testcases/pr-reduced.c:12:3: internal compiler error: in
expand_asm_operands, at stmt.c:910
0x8c1be8 expand_asm_operands
/work/oban-dev/src/gcc/gcc/stmt.c:910
0x8c28a7 expand_asm_stmt(gimple_statement_d*)
/work/oban-dev/src/gcc/gcc/stmt.c:1151
0x5dfe5f expand_gimple_stmt_1
/work/oban-dev/src/gcc/gcc/cfgexpand.c:2154
0x5dfe5f expand_gimple_stmt
/work/oban-dev/src/gcc/gcc/cfgexpand.c:2309
0x5e1b69 expand_gimple_basic_block
/work/oban-dev/src/gcc/gcc/cfgexpand.c:4143
0x5e4a33 gimple_expand_cfg
/work/oban-dev/src/gcc/gcc/cfgexpand.c:4662
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

No difference with -mno-unaligned-access or -maligned-access.

A manifestation of this bug prevents a Linux Kernel build.

[Bug target/57586] ICE when expanding volatile asm using unaligned pointer

2013-06-11 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57586

--- Comment #1 from jgreenhalgh at gcc dot gnu.org ---
A bisect shows that this bug first occurs after r197095:

2013-03-26  Richard Biener  

* emit-rtl.c (set_mem_attributes_minus_bitpos): Remove
alignment computations and rely on get_object_alignment_1
for the !TYPE_P case.
Commonize DECL/COMPONENT_REF handling in the ARRAY_REF path.

[Bug target/57586] ICE when expanding volatile asm using unaligned pointer

2013-06-11 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57586

jgreenhalgh at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jgreenhalgh at gcc dot gnu.org

--- Comment #2 from jgreenhalgh at gcc dot gnu.org ---
Created attachment 30292
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30292&action=edit
Working testcase

Modifying the '1' in:

  counter *c = &((counter_wrapper *)(1))->y; 

To something more aligned like a '4' as in the attached file and in:

  counter *c = &((counter_wrapper *)(4))->y; 

causes compilation to proceed as expected without an ICE.

[Bug target/57586] ICE when expanding volatile asm using unaligned pointer

2013-06-11 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57586

--- Comment #4 from jgreenhalgh at gcc dot gnu.org ---
Created attachment 30293
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30293&action=edit
Less reduced failing testcase

Yes, the same thing happens for packed versions of those structs. Perhaps the
attached, less-reduced, version of the testcase will make the issue more clear.
This expanded testcase fails with the same error and ICE:

../build-arm-none-eabi/install/bin/arm-none-eabi-gcc
../testcases/pr-less-reduced.c -O1 -Wall
../testcases/pr-less-reduced.c: In function 'inet_rtm_getroute':
../testcases/pr-less-reduced.c:22:3: error: output number 0 not directly
addressable
   __asm__ __volatile__(""
   ^
../testcases/pr-less-reduced.c:22:3: internal compiler error: in
expand_asm_operands, at stmt.c:910
0x8c1be8 expand_asm_operands
/work/oban-dev/src/gcc/gcc/stmt.c:910
0x8c28a7 expand_asm_stmt(gimple_statement_d*)
/work/oban-dev/src/gcc/gcc/stmt.c:1151
0x5dfe5f expand_gimple_stmt_1
/work/oban-dev/src/gcc/gcc/cfgexpand.c:2154
0x5dfe5f expand_gimple_stmt
/work/oban-dev/src/gcc/gcc/cfgexpand.c:2309
0x5e1b69 expand_gimple_basic_block
/work/oban-dev/src/gcc/gcc/cfgexpand.c:4143
0x5e4a33 gimple_expand_cfg
/work/oban-dev/src/gcc/gcc/cfgexpand.c:4662
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

[Bug middle-end/58106] ICE: in ipa_edge_duplication_hook, at ipa-prop.c:2839

2013-08-09 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58106

jgreenhalgh at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2013-08-09
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from jgreenhalgh at gcc dot gnu.org ---
Confirmed on aarch64-none-elf.

[Bug rtl-optimization/58383] New: ICE when RTL folds vector operations using constants after gne_int_mode changes

2013-09-10 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58383

Bug ID: 58383
   Summary: ICE when RTL folds vector operations using constants
after gne_int_mode changes
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org

The patch set around [1/4] Using gen_int_mode instead of GEN_INT causes a
number of similair regressions when building for AArch64.

To pick one example, when building gcc.target/aarch64/vect-fcm-eq-d.c we can
get in to the situation where simplify_unary_expression_1 is trying to simplify
(V2DI: NOT (NEG X)) and will thus try to generate (V2DI: PLUS (X - 1)).

Now we will call plus_constant, and from there gen_int_mode (-1, v2di). From
here we call trunc_int_for_mode (-1, v2di) and trigger the assert:

   /* You want to truncate to a _what_?  */
   gcc_assert (SCALAR_INT_MODE_P (mode));

The failures eventually look like:

In file included from
../src/gcc/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-d.c:9:0:
../src/gcc/gcc/testsuite/gcc.target/aarch64/vect-fcm.x: In function 'foo':
../src/gcc/gcc/testsuite/gcc.target/aarch64/vect-fcm.x:25:1: internal compiler
error: in trunc_int_for_mode, at explow.c:55
 }
 ^
0x6abc8e trunc_int_for_mode(long, machine_mode)
/work/gcc-dev/src/gcc/gcc/explow.c:55
0x69bb28 gen_int_mode(long, machine_mode)
/work/gcc-dev/src/gcc/gcc/emit-rtl.c:420
0x6abcf2 plus_constant
/work/gcc-dev/src/gcc/gcc/explow.c:189
0x6abcf2 plus_constant
/work/gcc-dev/src/gcc/gcc/explow.c:79
0x8f107f simplify_gen_unary(rtx_code, machine_mode, rtx_def*, machine_mode)
/work/gcc-dev/src/gcc/gcc/simplify-rtx.c:369
0xc55e09 propagate_rtx_1
/work/gcc-dev/src/gcc/gcc/fwprop.c:490
0xc55e6f propagate_rtx_1
/work/gcc-dev/src/gcc/gcc/fwprop.c:497
0xc55e86 propagate_rtx_1
/work/gcc-dev/src/gcc/gcc/fwprop.c:498
0xc56409 propagate_rtx
/work/gcc-dev/src/gcc/gcc/fwprop.c:675
0xc57dff forward_propagate_and_simplify
/work/gcc-dev/src/gcc/gcc/fwprop.c:1337
0xc57dff forward_propagate_into
/work/gcc-dev/src/gcc/gcc/fwprop.c:1394
0xc58593 forward_propagate_into
/work/gcc-dev/src/gcc/gcc/fwprop.c:1359
0xc58593 fwprop
/work/gcc-dev/src/gcc/gcc/fwprop.c:1479
0xc58593 execute
/work/gcc-dev/src/gcc/gcc/fwprop.c:1515
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

[Bug rtl-optimization/58383] ICE when RTL folds vector operations using constants after gne_int_mode changes

2013-09-10 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58383

jgreenhalgh at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jgreenhalgh at gcc dot gnu.org

--- Comment #1 from jgreenhalgh at gcc dot gnu.org ---
Created attachment 30788
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30788&action=edit
Proposed fix

A patch along these lines works for me, covering the case where gen_int_mode is
called to generate a vector integer.

[Bug tree-optimization/58553] New: New fail in PASS->FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64

2013-09-27 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553

Bug ID: 58553
   Summary: New fail in PASS->FAIL:
gcc.c-torture/execute/memcpy-2.c execution on arm and
aarch64
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org

Created attachment 30917
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30917&action=edit
Preprocessed source

Jeff's change to the Jump-Threading code here:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01910.html

Introduced a regression for arm and aarch64 in 
gcc.c-torture/execute/memcpy-2.c, such that I now see:

 *** EXIT code
 emu: host signal 0

When executing the testcase on a model with command line:

/work/gcc-clean/build-arm-none-eabi/install/bin/arm-none-eabi-gcc
-B/work/gcc-clean/build-arm-none-eabi/obj/gcc2/gcc/
/work/gcc-clean/src/gcc/gcc/testsuite/gcc.c-torture/execute/memcpy-2.c
-fno-diagnostics-show-caret -fdiagnostics-color=never -w -O3 -g
-Wa,-mno-warn-deprecated -lm -marm -march=armv7-a -mfpu=vfpv3-d16
-mfloat-abi=softfp -o
/work/gcc-clean/build-arm-none-eabi/obj/gcc2/gcc/testsuite/gcc/memcpy-2.x
-save-temps

I've attached the preprocessed source and the output from
-fdump-tree-dom1-details

[Bug tree-optimization/58553] New fail in PASS->FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64

2013-09-27 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553

--- Comment #1 from jgreenhalgh at gcc dot gnu.org ---
Created attachment 30918
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30918&action=edit
Output of dom1

[Bug middle-end/59037] ICE when accessing invalid element (nelts + 1) of vector

2013-11-07 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59037

jgreenhalgh at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-11-07
 CC||jgreenhalgh at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from jgreenhalgh at gcc dot gnu.org ---
Reproduced on aarch64-none-elf and arm-none-eabi.

[Bug tree-optimization/54742] Switch elimination in FSM loop

2013-11-27 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742

jgreenhalgh at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jgreenhalgh at gcc dot gnu.org

--- Comment #27 from jgreenhalgh at gcc dot gnu.org ---
Created attachment 31308
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31308&action=edit
Dumps for less reduced testcase in comment 27

As of revision 205398, I'm not seeing this optimisation trigger when compiling
the benchmark in question.

I've attached the dumps from a less agressively reduced version of the testcase
given in the intial report, which we don't currently thread.

This testcase is more representative of the control structure in the benchmark
code. In particular, we have the problematic scenario of two 'joiner' blocks in
the thread path.

Looking at the dumps for this testcase I think that we would need to spot
threads like:

  (17, 23) incoming edge; (23, 4) joiner; (4, 5) joiner; (5, 8) back-edge; (8,
15) switch-statement;

The testcase I am using is:

---

int sum0, sum1, sum2, sum3;
int foo(char * s, char** ret)
{
  int state=0;
  char c;

  for (; *s && state != 4; s++)
{
  c = *s;
  if (c == '*')
{
  s++;
  break;
}
  switch (state) {
case 0:
  if (c == '+') state = 1;
  else if (c != '-') sum0+=c;
  break;
case 1:
  if (c == '+') state = 2;
  else if (c == '-') state = 0;
  else sum1+=c;
  break;
case 2:
  if (c == '+') state = 3;
  else if (c == '-') state = 1;
  else sum2+=c;
  break;
case 3:
  if (c == '-') state = 2;
  else if (c == 'x') state = 4;
  break;
default:
  break;
  }
}
  *ret = s;
  return state;
}

[Bug tree-optimization/54742] Switch elimination in FSM loop

2013-11-27 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742

jgreenhalgh at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #28 from jgreenhalgh at gcc dot gnu.org ---
I've REOPENED this bug for the less-reduced testcase given in #27.

If anyone has objections, or thinks it would be more appropriate, I can open a
new bug.

[Bug tree-optimization/19794] [meta-bug] Jump threading related bugs

2013-11-27 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19794

Bug 19794 depends on bug 54742, which changed state.

Bug 54742 Summary: Switch elimination in FSM loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

[Bug tree-optimization/59471] New: ICE using vector extensions (non-top-level BIT_FIELD_REF, IMAGPART_EXPR or REALPART_EXPR)

2013-12-11 Thread jgreenhalgh at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59471

Bug ID: 59471
   Summary: ICE using vector extensions (non-top-level
BIT_FIELD_REF, IMAGPART_EXPR or REALPART_EXPR)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org

The following code:

typedef unsigned char uint8x8_t
  __attribute__ ((__vector_size__ (8)));

typedef unsigned short uint16x8_t
  __attribute__ ((__vector_size__ (16)));

typedef unsigned long uint64x2_t
  __attribute__ ((__vector_size__ (16)));

uint8x8_t
foo (uint16x8_t x)
{
  return (uint8x8_t) ((uint64x2_t) x)[0];
}

Will give this ICE for current trunk on AArch64, ARM and X86_64:

/work/build-x86/install/bin/gcc ../testcases/view-convert-expr.c -O3
../testcases/view-convert-expr.c: In function ‘foo’:
../testcases/view-convert-expr.c:11:1: error: non-top-level BIT_FIELD_REF,
IMAGPART_EXPR or REALPART_EXPR
 foo (uint16x8_t x)
 ^
BIT_FIELD_REF (x), 64, 0>
../testcases/view-convert-expr.c:13:3: note: in statement
   return (uint8x8_t) ((uint64x2_t) x)[0];
   ^
D.1792 = VIEW_CONVERT_EXPR(BIT_FIELD_REF
(x), 64, 0>);
../testcases/view-convert-expr.c:11:1: internal compiler error: verify_gimple
failed
 foo (uint16x8_t x)
 ^
0x9b5a5a verify_gimple_in_cfg(function*)
../../src/gcc/gcc/tree-cfg.c:4837
0x8df347 execute_function_todo
../../src/gcc/gcc/passes.c:1847
0x8dfb73 execute_todo
../../src/gcc/gcc/passes.c:1877
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

Looking at -fdump-tree-all-raw, I see this expression in
view-convert-expr.c.004t.gimple:

foo (uint16x8_t x)
gimple_bind <
  uint8x8_t D.1792;

  gimple_assign (BIT_FIELD_REF (x), 64, 0>), NULL, NULL>
  gimple_return 
>

For reference, my x86 compiler was configured as:

Configured with: ../src/gcc/configure --prefix=/work/build-x86/install

[Bug c/88887] New: Warn on unexpected continuation of 'return' to new line in if statement.

2019-01-16 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7

Bug ID: 7
   Summary: Warn on unexpected continuation of 'return' to new
line in if statement.
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
  Target Milestone: ---

A colleague tripped up on this typo:

  void bar();
  void
  foo (int x)
  {
if (x) return

bar ();
  }

Their intention was to return immediately if (x) holds, but they missed the
semicolon after 'return' and because bar() is declared with a void return type
didn't hit any warnings.

In my opinion, it would be reasonable for -wmisleading-indentation to cover a
case like this. The related case:

  void
  foo2 (int x)
  {
if (x)
  return

bar ();
  }

Could also be warned.

[Bug c++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

James Greenhalgh  changed:

   What|Removed |Added

 CC||jgreenhalgh at gcc dot gnu.org

--- Comment #3 from James Greenhalgh  ---
Created attachment 43988
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43988&action=edit
Reduced testcase

I believe this testcase shows the issue being reported here. Clang seems to
spot this is essentially a memset across the array, while GCC doesn't.

On AArch64 with Clang:

  .LBB1_9:// =>This Inner Loop Header: Depth=1
stp q0, q0, [x8, #-16]
subsx20, x20, #8// =8
add x8, x8, #32 // =32
b.ne.LBB1_9

On x86-64 with Clang:

  .LBB1_9:# =>This Inner Loop Header: Depth=1
movups  %xmm0, -144(%rax,%rcx,4)
movups  %xmm0, -128(%rax,%rcx,4)
movups  %xmm0, -112(%rax,%rcx,4)
movups  %xmm0, -96(%rax,%rcx,4)
movups  %xmm0, -80(%rax,%rcx,4)
movups  %xmm0, -64(%rax,%rcx,4)
movups  %xmm0, -48(%rax,%rcx,4)
movups  %xmm0, -32(%rax,%rcx,4)
movups  %xmm0, -16(%rax,%rcx,4)
movups  %xmm0, (%rax,%rcx,4)
addq$40, %rcx
cmpq$100036, %rcx   # imm = 0x186C4
jne .LBB1_9

GCC doesn't spot this.

On the other hand G++'s inlining of the various random number initialisation
routines really hammers Clang, which ends up emulating 128-bit arithmetic on
AArch64.

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #11 from James Greenhalgh  ---
With Jonathon's suggested change, copied in to the original poster's framework
(without -fno-trapping-math), Clang hot loop ( score: 165065
http://quick-bench.com/6NaD8ay0f8qMh9n0aMriYEiuKNA ) is:

0.16%  movups 0x61a80(%r15,%rax,4),%xmm6
1.15%  movups 0x61a90(%r15,%rax,4),%xmm7
0.60%  movaps %xmm1,%xmm3
5.44%  cmpltps %xmm6,%xmm3
0.44%  movaps %xmm1,%xmm6
0.40%  cmpltps %xmm7,%xmm6
0.44%  movaps %xmm5,%xmm7
4.97%  andps  %xmm3,%xmm7
0.20%  andnps %xmm4,%xmm3
0.36%  orps   %xmm7,%xmm3
1.04%  movaps %xmm5,%xmm7
4.97%  andps  %xmm6,%xmm7
0.11%  andnps %xmm4,%xmm6
4.95%  orps   %xmm7,%xmm6
5.53%  movups %xmm3,0x61a80(%rbx,%rax,4)
0.47%  movups %xmm6,0x61a90(%rbx,%rax,4)
4.42%  movups 0x61aa0(%r15,%rax,4),%xmm3
20.42% movups 0x61ab0(%r15,%rax,4),%xmm6
1.00%  movaps %xmm1,%xmm7
0.49%  cmpltps %xmm3,%xmm7
9.79%  movaps %xmm1,%xmm3
0.16%  cmpltps %xmm6,%xmm3
2.26%  movaps %xmm5,%xmm6
0.60%  andps  %xmm7,%xmm6
4.20%  andnps %xmm4,%xmm7
1.18%  orps   %xmm6,%xmm7
2.22%  movaps %xmm5,%xmm6
0.47%  andps  %xmm3,%xmm6
4.24%  andnps %xmm4,%xmm3
4.88%  movups %xmm7,0x61aa0(%rbx,%rax,4)
0.27%  orps   %xmm6,%xmm3
5.22%  movups %xmm3,0x61ab0(%rbx,%rax,4)
6.02%  add$0x10,%rax
   jne405b30 

GCC hot loop ( score: 2385754
http://quick-bench.com/ehLe-aqkpXkkx2sHLd6TWq_p4g4 ) is:

0.56%  movss  0x0(%rbp,%rdx,1),%xmm0
1.47%  xor%eax,%eax
2.00%  subss  %xmm2,%xmm0
7.02%  ucomiss %xmm1,%xmm0
6.77%  seta   %al
4.96%  xor%ecx,%ecx
0.25%  ucomiss %xmm0,%xmm1
0.84%  pxor   %xmm0,%xmm0
0.09%  seta   %cl
5.40%  sub%ecx,%eax
3.22%  cvtsi2ss %eax,%xmm0
9.87%  ucomiss %xmm0,%xmm1
6.53%  ja 4053a8 
10.24% mulss  %xmm4,%xmm0
11.55% addss  %xmm3,%xmm0
5.46%  movss  %xmm0,(%rbx,%rdx,1)
2.00%  add$0x4,%rdx
   cmp$0x61a80,%rdx
   jne405350 

Daniel Elliott does that better match your expectations? If so, I think this
can be resolved as missed optimization of invalid code.

[Bug middle-end/85682] New: Regression: gcc.dg/tree-ssa/prefetch-5.c at r259995

2018-05-07 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85682

Bug ID: 85682
   Summary: Regression: gcc.dg/tree-ssa/prefetch-5.c at r259995
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: luis.machado at linaro dot org
  Reporter: jgreenhalgh at gcc dot gnu.org
CC: hjl.tools at gmail dot com, law at redhat dot com,
luis.machado at linaro dot org
  Target Milestone: ---
Target: x86-64-none-linux-gnu

Hi, our bisect robot spotted failures in

  gcc.dg/tree-ssa/prefetch-9.c
  gcc.dg/tree-ssa/prefetch-8.c
  gcc.dg/tree-ssa/prefetch-7.c
  gcc.dg/tree-ssa/prefetch-6.c
  gcc.dg/tree-ssa/prefetch-3.c
  gcc.target/i386/opt-1.c
  gcc.target/i386/opt-2.c
  gcc.dg/tree-ssa/loop-28.c
  gcc.dg/tree-ssa/prefetch-5.c

after revision r259995 on x86-64-none-linux-gnu. Would you mind taking a look?

[Bug middle-end/85682] Regression: gcc.dg/tree-ssa/prefetch-5.c at r259995

2018-05-16 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85682

--- Comment #3 from James Greenhalgh  ---
The bisect robot doesn't bootstrap, only build a stage 1 compiler.

I've checked your most recent patch against these testcases, and they execute
and complete fine.

(In reply to Luis Machado from comment #2)
> I did a fresh x86-64 bootstrap with the changes in and those prefetch tests
> are not executed as part of dg.exp. Running by hand they look sane to me.

I'm sure this is just a typo, but you probably didn't mean "dg.exp" in this
case - the prefetch tests are in tree-ssa.exp and the opt-1 and opt-2 tests are
in i386.exp .

[Bug target/83663] [8 regression] aarch64_be regressions after r255946

2018-01-03 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83663

James Greenhalgh  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2018-01-03
 CC||jgreenhalgh at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jgreenhalgh at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from James Greenhalgh  ---
I spotted this too, the problem is (as it always is for big-endian vectors in
GCC) the mismatch in lane numbering between our architecture and GCC's
numbering. I'm working on a patch. Sorry for the inconvenience.

[Bug middle-end/84040] [8 regression] compilation time of gcc.c-torture/compile/limits-blockid.c is 50x slower

2018-01-25 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84040

James Greenhalgh  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-01-25
 CC||jgreenhalgh at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from James Greenhalgh  ---
Confirmed on aarch64-none-linux-gnu. My bisect pointed to the same revision
r255569 . The 50x slow-down is surprising, and may be much larger than
expected? Otherwise we could workaround this with -gno-statement-frontiers for
this test.

[Bug lto/84242] New: [8 Regression] g++.dg/torture/pr67600.C at r257412

2018-02-06 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84242

Bug ID: 84242
   Summary: [8 Regression] g++.dg/torture/pr67600.C at r257412
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64-none-linux-gnu, x86-64-none-linux-gnu

Hi

Our testing robot spotted a failure in g++.dg/torture/pr67600.C, after revision
r257412 on aarch64-none-linux-gnu and x86-64-none-linux-gnu. Would you mind
taking a look?

[Bug lto/84242] [8 Regression] g++.dg/torture/pr67600.C at r257412

2018-02-06 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84242

--- Comment #1 from James Greenhalgh  ---
Also gcc.target/i386/mvc9.c on x86-64-none-linux-gnu.

[Bug testsuite/84243] New: [8 Regression] gcc.target/i386/cet-intrin-4.c at r257414

2018-02-06 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84243

Bug ID: 84243
   Summary: [8 Regression] gcc.target/i386/cet-intrin-4.c at
r257414
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
CC: itsimbal at gcc dot gnu.org
  Target Milestone: ---
Target: x86-64-none-linux-gnu, aarch64-none-linux-gnu

Hi, our bisect robot spotted a failure in gcc.target/i386/cet-intrin-3.c,
gcc.target/i386/cet-intrin-4.c, after revision r257414 on
x86-64-none-linux-gnu, and c-c++-common/fcf-protection-6.c and
c-c++-common/fcf-protection-7.c on aarch64-none-linux.gnu. Would you mind
taking a look?

Your new tests will always FAIL on non-x86 targets (for example
aarch64-none-linux-gnu). Is dg-error really the right directive, that is a
guaranteed FAIL, I would expect a skip.

[Bug testsuite/84243] [8 Regression] gcc.target/i386/cet-intrin-4.c at r257414

2018-02-06 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84243

--- Comment #2 from James Greenhalgh  ---
gcc -v:

  Configured with: .../gcc/configure --disable-bootstrap
--enable-languages=c,c++,fortran --disable-multilib --disable-libsanitizer
--prefix=.../build/install/ 

FAIL: gcc.target/i386/cet-intrin-3.c (internal compiler error)
FAIL: gcc.target/i386/cet-intrin-3.c (test for excess errors)
  Excess errors:
  .../build/gcc/include/pmmintrin.h:35:9: internal compiler error: in
ix86_option_override_internal, at config/i386/i386.c:4952
  0xfa1687 ix86_option_override_internal
.../gcc/config/i386/i386.c:4952
  0xfaf246 ix86_valid_target_attribute_tree(tree_node*, gcc_options*,
gcc_options*)
.../gcc/config/i386/i386.c:5656
  0x76b7cb ix86_pragma_target_parse
.../gcc/config/i386/i386-c.c:539
  0x743cd3 handle_pragma_target
.../gcc/c-family/c-pragma.c:907
  0x6c2349 c_parser_pragma
.../gcc/c/c-parser.c:11122
  0x6e600d c_parser_external_declaration
.../gcc/c/c-parser.c:1624
  0x6e6971 c_parser_translation_unit
.../gcc/c/c-parser.c:1524
  0x6e6971 c_parse_file()
.../gcc/c/c-parser.c:18410
  0x7417f5 c_common_parse_file()
.../gcc/c-family/c-opts.c:1132

FAIL: gcc.target/i386/cet-intrin-4.c (test for excess errors)
  Excess errors:
  cc1: error: '-fcf-protection=full' requires Intel CET support. Use -mcet or
both of -mibt and -mshstk options to enable CET

[Bug rtl-optimization/86685] [8/9 Regression] 436.cactusADM regression on aarch64

2018-07-30 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86685

James Greenhalgh  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-07-30
 CC||jgreenhalgh at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from James Greenhalgh  ---
On the platforms I'm looking at, this is equal to a 13% regression in dynamic
instruction count, and a code size regression in the key loop. Confirmed.

[Bug target/84521] [8 Regression] aarch64: Frame-pointer corruption with setjmp/longjmp and -fomit-frame-pointer

2018-02-22 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521

James Greenhalgh  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-02-22
 CC||ramana.radhakrishnan at arm 
dot co
   ||m
 Ever confirmed|0   |1

--- Comment #2 from James Greenhalgh  ---
It is a bug that we have changed to -fomit-frame-pointer by default for
AArch64. That changes a long standing ABI decision made at the dawn of the
port, and promised as a feature of the architecture. I would like to see this
fixed for GCC 8.

Ramana was testing a patch to fix this and change us back to
-fno-omit-frame-pointer, it (or someone else's patch achieving the same) would
be appreciated as the immediate fix for this issue.

I haven't validated the longer-term problem you mention with
-fomit-frame-pointer.

Ramana, can you pick this up and set us back to the appropriate default?
Otherwise, I can spin a patch. We should fix this urgently, or we miss the good
value that comes from whole-distribution testing.

[Bug tree-optimization/69556] New: [6 Regression] forwprop4/match.pd undoing work from recip

2016-01-29 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69556

Bug ID: 69556
   Summary: [6 Regression] forwprop4/match.pd undoing work from
recip
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
  Target Milestone: ---

For this code compiled at -Ofast:

double bar (double, double, double, double, double);

double
foo (double a)
{
  return bar (1.0/a, 2.0/a, 4.0/a, 8.0/a, 16.0/a);
}


GCC 5 generates:

foo:
.LFB0:
.cfi_startproc
movsd   .LC0(%rip), %xmm1
movsd   .LC1(%rip), %xmm4
movsd   .LC2(%rip), %xmm3
divsd   %xmm0, %xmm1
movsd   .LC3(%rip), %xmm2
mulsd   %xmm1, %xmm4
movapd  %xmm1, %xmm0
mulsd   %xmm1, %xmm3
mulsd   %xmm1, %xmm2
addsd   %xmm1, %xmm1
jmp bar

(i.e. one divide, 4 multiplies)

GCC trunk at revision r232907 generates:

foo:
.LFB0:
.cfi_startproc
movapd  %xmm0, %xmm5
movsd   .LC0(%rip), %xmm4
movsd   .LC4(%rip), %xmm0
movsd   .LC1(%rip), %xmm3
movsd   .LC2(%rip), %xmm2
movsd   .LC3(%rip), %xmm1
divsd   %xmm5, %xmm0
divsd   %xmm5, %xmm4
divsd   %xmm5, %xmm3
divsd   %xmm5, %xmm2
divsd   %xmm5, %xmm1
jmp bar

(i.e. 5 divides)

This is bad for performance.

forwprop4 shows:

Applying pattern match.pd:453, gimple-match.c:32116
gimple_simplified to _2 = 1.6e+1 / a_1(D);
Applying pattern match.pd:453, gimple-match.c:32116
gimple_simplified to _3 = 8.0e+0 / a_1(D);
Applying pattern match.pd:453, gimple-match.c:32116
gimple_simplified to _4 = 4.0e+0 / a_1(D);
Applying pattern match.pd:453, gimple-match.c:32116
gimple_simplified to _5 = 2.0e+0 / a_1(D);


This starts with r229107 which moves the (C1/X)*C2 into (C1*C2)/X pattern from
fold-const.c to match.pd.

[Bug tree-optimization/69556] [6 Regression] forwprop4/match.pd undoing work from recip

2016-01-29 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69556

James Greenhalgh  changed:

   What|Removed |Added

 CC||jgreenhalgh at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #3 from James Greenhalgh  ---
(In reply to Andrew Pinski from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > I suspect we should disable "Fold (C1/X)*C2 into (C1*C2)/X" for gimple then
> > and have it only for generic.
> 
> Or check for single use of the divide.

I had thought that was what the :s in the first line of pattern was trying to
do:

  (simplify
   (mult (rdiv:s REAL_CST@0 @1) REAL_CST@2)
(if (flag_associative_math)
 (with
  { tree tem = const_binop (MULT_EXPR, type, @0, @2); }
  (if (tem)
   (rdiv { tem; } @1)

If I capture the rdiv, and explicitly check it for single_use (as in the
untested patch below), then the rule fails. So there's either a
misunderstanding/disagreement here about what :s implies, or the match.pd
machinery has a bug.

diff --git a/gcc/match.pd b/gcc/match.pd
index 5f28215..9460a9b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -445,11 +445,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

 /* Fold (C1/X)*C2 into (C1*C2)/X.  */
 (simplify
- (mult (rdiv:s REAL_CST@0 @1) REAL_CST@2)
+ (mult (rdiv:s@3 REAL_CST@0 @1) REAL_CST@2)
   (if (flag_associative_math)
(with
 { tree tem = const_binop (MULT_EXPR, type, @0, @2); }
-(if (tem)
+(if (tem && single_use (@3))
  (rdiv { tem; } @1)

 /* Convert C1/(X*C2) into (C1/C2)/X  */

[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86

2016-01-31 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570

--- Comment #2 from James Greenhalgh  ---
(In reply to Jakub Jelinek from comment #1)
> I guess ifcvt only triggers some latent bug, either RA or more likely in
> reg-stack.  That said, all the comments about the r229822 changes say its
> purpose is to handle multiple sets in the conditional block, but clearly the
> patch as implemented considers one set to be also multiple sets.  The
> problem with that is that it handles it worse than the code later on in
> ifcvt - it uses temporaries and hopes later passes get rid of those
> temporaries, but they actually affect the register allocation.
> By restricting the ifcvt multiple sets coversion to actually multiple sets
> like:
> --- gcc/ifcvt.c.jj2016-01-21 17:53:32.0 +0100
> +++ gcc/ifcvt.c   2016-01-31 13:47:34.171323086 +0100
> @@ -3295,7 +3295,7 @@ bb_ok_for_noce_convert_multiple_sets (ba
>if (count > limit)
>  return false;
>  
> -  return count > 0;
> +  return count > 1;
>  }
>  
>  /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to
> convert
> the test passes again, after ifcvt there are no additional unneeded
> temporaries and e.g. postreload dump contains 5 fewer instruction, and has
> fewer spills/fills.  Of course we really need to figure out what the bug
> actually is, but unless there is some strong reason (which should be
> documented), IMHO the above patch is right too.

Yes, that patch makes sense to me. If other ifcvt paths are doing a better job
of handling a single register move than the multiple-set code, then we should
use them.

[Bug target/69671] New: [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]%ymm[0-9]+[^\n]%xmm[0-9]+{%k[1-7]}{z}(?

2016-02-04 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671

Bug ID: 69671
   Summary: [6 Regression] FAIL:
gcc.target/i386/avx512vl-vpmovqb-1.c
scan-assembler-times vpmovqb[
\\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
CC: kyrylo.tkachov at arm dot com
  Target Milestone: ---
Target: x86_64-none-linux-gnu

Starts with r233133:

PASS->FAIL: gcc.target/i386/avx512vl-vpmovdb-1.c scan-assembler-times vpmovdb[
\\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovdb-1.c scan-assembler-times vpmovdb[
\\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovdb-1.c scan-assembler-times vpmovdb[
\\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovdb-1.c scan-assembler-times vpmovdb[
\\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[
\\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[
\\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[
\\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[
\\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovqw-1.c scan-assembler-times vpmovqw[
\\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovqw-1.c scan-assembler-times vpmovqw[
\\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovqw-1.c scan-assembler-times vpmovqw[
\\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovqw-1.c scan-assembler-times vpmovqw[
\\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsdb-1.c scan-assembler-times
vpmovsdb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsdb-1.c scan-assembler-times
vpmovsdb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsdb-1.c scan-assembler-times
vpmovsdb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsdb-1.c scan-assembler-times
vpmovsdb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqb-1.c scan-assembler-times
vpmovsqb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqb-1.c scan-assembler-times
vpmovsqb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqb-1.c scan-assembler-times
vpmovsqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqb-1.c scan-assembler-times
vpmovsqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqw-1.c scan-assembler-times
vpmovsqw[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqw-1.c scan-assembler-times
vpmovsqw[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqw-1.c scan-assembler-times
vpmovsqw[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqw-1.c scan-assembler-times
vpmovsqw[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovusdb-1.c scan-assembler-times
vpmovusdb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovusdb-1.c scan-assembler-times
vpmovusdb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovusdb-1.c scan-assembler-times
vpmovusdb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovusdb-1.c scan-assembler-times
vpmovusdb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovusqb-1.c scan-assembler-times
vpmovusqb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovusqb-1.c scan-assembler-times
vpmovusqb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovusqb-1.c scan-assembler-times
vpmovusqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?
PASS->FAIL: gcc.target/i386/avx512vl-vpmovusqb-1.c scan-assembler-times
vpmovusqb[ \\t]+[^{\n]*%ymm[0-9]+

[Bug testsuite/69371] UNRESOLVED: special_functions/18_riemann_zeta/check_value.cc compilation failed to produce executable

2016-02-04 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69371

James Greenhalgh  changed:

   What|Removed |Added

 Target|arm-none-eabi   |arm-none-eabi,
   ||aarch64-none-elf,
   ||aarch64_be-none-elf
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-04
 Ever confirmed|0   |1

--- Comment #6 from James Greenhalgh  ---
Confirmed, and also seen on aarch64-none-elf and aarch64_be-none-elf.

[Bug target/69841] Wrong template instantiation in C++11 on armv7l

2016-02-22 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841

James Greenhalgh  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-22
 CC||jgreenhalgh at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from James Greenhalgh  ---
Confirmed, I'm trying to figure out what is going wrong.

[Bug target/69841] Wrong template instantiation in C++11 on armv7l

2016-02-24 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841

James Greenhalgh  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jgreenhalgh at gcc dot 
gnu.org

--- Comment #2 from James Greenhalgh  ---
At the heart of the problem, the compiler has decided that the second parameter
to this templated function has an overaligned member (64-byte aligned in f2,
8-byte aligned in f1). This gives different parameter passing rules, and you
get the code difference above.

I haven't figured out what causes the alignment to differ between the two TUs,
or why the compiler feels it is safe to propagate the alignment information
without specializing the function name.

I'll take the bug while I look deeper.

[Bug target/69841] Wrong template instantiation in C++11 on armv7l

2016-02-25 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841

James Greenhalgh  changed:

   What|Removed |Added

 CC||alan.lawrence at arm dot com,
   ||jakub at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #3 from James Greenhalgh  ---
I'm still confused by this. After coming out of the front end I checked the
DECL_ALIGN for each field of each of the parameters being passed to this
function. I see:


f1.ii

std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator
std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare,
_Alloc>::_M_emplace_hint_unique(std::_Rb_tree<_Key, _Val, _KeyOfValue,
_Compare, _Alloc>::const_iterator, _Args&& ...)  
  (struct _Rb_tree * const this
   (decl alignment: 32),
   struct const_iterator __pos
 Fields:
   _M_node (decl alignment: 32)
   _Rb_tree_const_iterator (decl alignment: 8)
   value_type (decl alignment: 8)
   reference (decl alignment: 32)
   pointer (decl alignment: 32)
   iterator (decl alignment: 8)
   iterator_category (decl alignment: 8)
   difference_type (decl alignment: 32)
   _Self (decl alignment: 8)
   _Base_ptr (decl alignment: 32)
   _Link_type (decl alignment: 32)
   (decl alignment: 32, max field alignment: 32),
   const struct piecewise_construct_t & __args#0
   (decl alignment: 32),
   struct tuple & __args#1
   (decl alignment: 32),
   struct tuple & __args#2
   (decl alignment: 32))

f2.ii

std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator
std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare,
_Alloc>::_M_emplace_hint_unique(std::_Rb_tree<_Key, _Val, _KeyOfValue,
_Compare, _Alloc>::const_iterator, _Args&& ...) 
  (struct _Rb_tree * const this
   (decl alignment: 32),
   struct const_iterator __pos
 Fields:
   _M_node (decl alignment: 32)
   _Rb_tree_const_iterator (decl alignment: 8)
   value_type (decl alignment: 64)
   reference (decl alignment: 32)
   pointer (decl alignment: 32)
   iterator (decl alignment: 32)
   iterator_category (decl alignment: 8)
   difference_type (decl alignment: 32)
   _Self (decl alignment: 8)
   _Base_ptr (decl alignment: 32)
   _Link_type (decl alignment: 32)
   (decl alignment: 32, max field alignment: 64),
   const struct piecewise_construct_t & __args#0
   (decl alignment: 32),
   struct tuple & __args#1
   (decl alignment: 32),
   struct tuple & __args#2
   (decl alignment: 32))
---

That is to say, after gimplification we've already decided that the alignment
of the value_type field of the std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare,
_Alloc>::const_iterator parameter to std::_Rb_tree<_Key, _Val, _KeyOfValue,
_Compare, _Alloc>::_M_emplace_hint_unique in f2.ii is 64, whereas in f1.ii we
don't have any extra alignment information.

I know nothing about the C++ front-end and how we could end up in this
situation. I can understand why, given this, we would generate the code we do
for ARM.

[Bug target/69841] Wrong template instantiation in C++11 on armv7l

2016-02-26 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841

James Greenhalgh  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #4 from James Greenhalgh  ---
Goes away on trunk after r223301

Author: jason 
Date:   Mon May 18 17:14:11 2015 +

DR 1391
* pt.c (type_unification_real): Check convertibility here.
(unify_one_argument): Not here.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@223301 

After which the DECL_ALIGN in both TUs is 64, fixing the bug.

[Bug testsuite/70009] test case libgomp.oacc-c-c++-common/vprop.c fails starting with its introduction in r233607

2016-03-07 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70009

James Greenhalgh  changed:

   What|Removed |Added

 Target|powerpc*-*-*, aarch64-*-*   |powerpc*-*-*, aarch64-*-*,
   ||arm*-*-*
   Last reconfirmed|2016-02-29 00:00:00 |2016-3-7
 CC||jgreenhalgh at gcc dot gnu.org

--- Comment #5 from James Greenhalgh  ---
Also failing on arm/aarch64 (so good further evidence of signed vs. unsigned
char). Forcing the macro to use signed types clears the error for me on
arm-none-linux-gnueabihf (though I don't know if this is correct).

[Bug target/69841] Wrong template instantiation in C++11 on armv7l

2016-03-09 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841

--- Comment #5 from James Greenhalgh  ---
I don't know enough about the C++ standard to know whether this patch is
reasonable to backport to GCC 5. Jason, do you have an opinion?

[Bug testsuite/68232] gcc.dg/ifcvt-4.c fails on some arm configurations

2016-03-14 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68232

--- Comment #8 from James Greenhalgh  ---
(In reply to Pat Haugen from comment #6)
> (In reply to James Greenhalgh from comment #5)
> > "Fixed" with the testsuite skips. Feel free to add any other target triplets
> > for which this test is unreliable.
> 
> I was going to modify the powerpc64le triplet to just powerpc*-*-* since it
> also fails for powerpc64 (big endian) and powerpc-ibm-aix, but looking at
> gcc/config/rs6000/rs6000.h, it has BRANCH_COST defined to a non-zero value:
> 
> #define BRANCH_COST(speed_p, predictable_p) 3
> 
> 
> So there must be something more than just "doesn't work for targets with
> branch cost == 0". I'm still happy to make the change if there are other
> reasons, but didn't want to do so without hearing first.

Sorry that I took a while to get round to looking at this.

For powerpc64 you'll need to enable conditional move instructions using
"-misel" (or equivalent) for this test to pass.

For hppa64, the "experimental" movdicc pattern has this restriction:

  if (GET_MODE (XEXP (operands[1], 0)) != DImode
 || GET_MODE (XEXP (operands[1], 0)) != GET_MODE (XEXP (operands[1], 1)))

But, we're trying to expand with this comparison in operands[1]:

  (le (subreg/s/u:SI (reg/v:DI 70 [ x+-4 ]) 4)
  (subreg/s/u:SI (reg/v:DI 71 [ y+-4 ]) 4))

so this test fails, and we fail to ifcvt the sequence. The test should be
skipped on hppa64 until more complete support for conditional moves is added.

[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters

2016-03-19 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133

James Greenhalgh  changed:

   What|Removed |Added

 Status|WAITING |NEW
 CC||jgreenhalgh at gcc dot gnu.org

--- Comment #5 from James Greenhalgh  ---
The crux of this issue is going to be that your Cortex-A53 has no support for
the cryptography extension, but does have support for the CRC extensions.

By inspection of host_detect_local_cpu, I see that we run through all the
extensions that we know about, checking to see whether that extension is a
substring of the Features we read from /proc/cpuinfo . If it is we add
+extension, if not we add +noextension.

So, it seems reasonable to me that if we run this algorithm on a core without
crypto, but with CRC, we'll get the string described
(-march=armv8-a+fp+simd+nocrypto+crc+nolse) forwarded to the assembler on
command line.

And sure enough, the assembler wants to read everything you've got before you
start telling it what you've not got.

I see a few issues.

1) There's not really a good reason for an assembler to have this syntax
restriction. The code does the right thing whatever order you put your features
in.
2) We'll have to support these older assemblers anyway, so at the least we'll
have to hold off writing the "+no" extension strings until we're done with the
"+" extension strings.
3) We should think about whether we need to put out these +no extension strings
at all. I don't like that for my older systems I'll need to keep updating my
binutils to cover any new extension strings (e.g. +nolse) that are added by GCC
if I want to use -march=native . We shouldn't force that if we don't have to.

So, Confirmed.

[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters

2016-03-19 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133

James Greenhalgh  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jgreenhalgh at gcc dot 
gnu.org

--- Comment #8 from James Greenhalgh  ---
(In reply to Christophe Lyon from comment #6)
> > 3) We should think about whether we need to put out these +no extension
> > strings at all. I don't like that for my older systems I'll need to keep
> > updating my binutils to cover any new extension strings (e.g. +nolse) that
> > are added by GCC if I want to use -march=native . We shouldn't force that if
> > we don't have to.
> > 
> 
> Do you know why these +no where introduced in the first place?
> 
> Why would there be a difference between "+nolse" and "" for instance?

We don't keep track (in aarch64-driver.c) of which flags are implicitly
included (e.g. +fp+simd) and would need an explicit +nofp to disable, and which
flags need explicitly enabled (e.g. +crc) and so don't need to be explicitly
disabled.

I'm working on a clean-up.

[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters

2016-03-20 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133

James Greenhalgh  changed:

   What|Removed |Added

 Target||aarch64*-none-linux-gnu
   Host||aarch64*-none-linux-gnu
Version|5.3.1   |6.0
   Target Milestone|--- |6.0

[Bug rtl-optimization/68749] FAIL: gcc.dg/ifcvt-4.c scan-rtl-dump ce1 "2 true changes made"

2016-03-22 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68749

--- Comment #4 from James Greenhalgh  ---
Hi, sorry I missed this. I need to write a better filter for bugs I'm CCed on,
I'll work on that.

I'm hitting the limits of what I can guess from the Sparc machine files. I
don't understand why we get an expansion for the conditional branch that
explicitly generates new temporaries for i and j, necessitating an if..else..
structure. 

Compare how we expand on Sparc:

---
(insn 12 5 13 2 (set (reg:CC 100 %icc)
(compare:CC (subreg/s/u:SI (reg/v:DI 113 [ xD.1388+-4 ]) 4)
(subreg/s/u:SI (reg/v:DI 114 [ yD.1389+-4 ]) 4)))
../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:12 -1
 (nil))
(jump_insn 13 12 14 2 (set (pc)
(if_then_else (le (reg:CC 100 %icc)
(const_int 0 [0]))
(label_ref:DI 29)
(pc))) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:12 48
{*normal_branch}
 (int_list:REG_BR_PROB 3900 (nil))
 -> 29)
;;  succ:   4 [61.0%]  (FALLTHRU)
;;  5 [39.0%] 

;; basic block 4, loop depth 0, count 0, freq 6100, maybe hot
;;  prev block 2, next block 5, flags: (NEW, REACHABLE, RTL, MODIFIED)
;;  pred:   2 [61.0%]  (FALLTHRU)
(note 14 13 8 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 8 14 9 4 (set (reg/v:SI 110 [ jD.1394 ])
(subreg:SI (reg/v:DI 115 [ aD.1390+-4 ]) 4))
../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:15 -1
 (nil))
(insn 9 8 26 4 (set (reg/v:SI 109 [ iD.1393 ])
(subreg:SI (reg/v:DI 115 [ aD.1390+-4 ]) 4))
../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:14 -1
 (nil))
(jump_insn 26 9 27 4 (set (pc)
(label_ref 15)) -1
 (nil)
 -> 15)
;;  succ:   6 [100.0%] 

(barrier 27 26 29)
;; basic block 5, loop depth 0, count 0, freq 3900, maybe hot
;;  prev block 4, next block 6, flags: (NEW, REACHABLE, RTL, MODIFIED)
;;  pred:   2 [39.0%] 
(code_label 29 27 28 5 3 "" [1 uses])
(note 28 29 6 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(insn 6 28 7 5 (set (reg/v:SI 110 [ jD.1394 ])
(subreg:SI (reg/v:DI 114 [ yD.1389+-4 ]) 4)) -1
 (nil))
(insn 7 6 15 5 (set (reg/v:SI 109 [ iD.1393 ])
(subreg:SI (reg/v:DI 113 [ xD.1388+-4 ]) 4)) -1
 (nil))
;;  succ:   6 [100.0%]  (FALLTHRU)

;; basic block 6, loop depth 0, count 0, freq 1, maybe hot
;;  prev block 5, next block 1, flags: (NEW, REACHABLE, RTL)
;;  pred:   5 [100.0%]  (FALLTHRU)
;;  4 [100.0%] 
(code_label 15 7 16 6 2 "" [1 uses])
(note 16 15 17 6 [bb 6] NOTE_INSN_BASIC_BLOCK)
(insn 17 16 18 6 (set (reg:DI 117)
(mult:DI (subreg:DI (reg/v:SI 109 [ iD.1393 ]) 0)
(subreg:DI (reg/v:SI 110 [ jD.1394 ]) 0)))
../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:17 -1
 (nil))

---

Where [bb 5] acts as an else block setting registers 109/110 to the "old"
values. And the AArch64 expansion of the same:

---

(insn 10 5 11 2 (set (reg:CC 66 cc)
(compare:CC (reg/v:SI 74 [ xD.2750 ])
(reg/v:SI 75 [ yD.2751 ])))
../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:12 -1
 (nil))
(jump_insn 11 10 12 2 (set (pc)
(if_then_else (le (reg:CC 66 cc)
(const_int 0 [0]))
(label_ref 13)
(pc))) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:12 -1
 (int_list:REG_BR_PROB 3900 (nil))
 -> 13)
;;  succ:   4 [61.0%]  (FALLTHRU)
;;  5 [39.0%] 

;; basic block 4, loop depth 0, count 0, freq 6100, maybe hot
;;  prev block 2, next block 5, flags: (NEW, REACHABLE, RTL, MODIFIED)
;;  pred:   2 [61.0%]  (FALLTHRU)
(note 12 11 6 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 6 12 7 4 (set (reg/v:SI 75 [ yD.2751 ])
(reg/v:SI 76 [ aD.2752 ]))
../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:15 -1
 (nil))
(insn 7 6 13 4 (set (reg/v:SI 74 [ xD.2750 ])
(reg/v:SI 76 [ aD.2752 ]))
../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:14 -1
 (nil))
;;  succ:   5 [100.0%]  (FALLTHRU)

;; basic block 5, loop depth 0, count 0, freq 1, maybe hot
;;  prev block 4, next block 1, flags: (NEW, REACHABLE, RTL)
;;  pred:   2 [39.0%] 
;;  4 [100.0%]  (FALLTHRU)
(code_label 13 7 14 5 2 "" [1 uses])
(note 14 13 15 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(insn 15 14 16 5 (set (reg:SI 77)
(mult:SI (reg/v:SI 74 [ xD.2750 ])
(reg/v:SI 75 [ yD.2751 ])))
../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:17 -1
 (nil))

---

I guess it is those subregs down from DImode to SImode. Sure enough, if we swap
int for long in this testcase, we get the expected expansion and the expected
number of true changes made.

So, I'm not worried that the optimization is broken for Sparc (it does the
right thing for long), but I'm not sure I know the best way to work around this
for your target. swapping int for long would also help HPPA. HPPA chose to skip
the test entirely. That might also be right for Sparc.

What do you think?

[Bug c++/70494] Capturing an array of vectors in a lambda

2016-04-01 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70494

James Greenhalgh  changed:

   What|Removed |Added

 Target||*-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-04-01
 CC||jgreenhalgh at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from James Greenhalgh  ---
Fails for me on trunk and 5.3. Trunk backtrace for an aarch64-none-elf compiler
(but the target doesn't matter, same fail on arm-none-eabi and a
not-quite-trunk x86_64-none-linux-gnu):

foo.cpp: In function ‘int main()’:
foo.cpp:7:23: internal compiler error: tree check: expected record_type or
union_type or qual_union_type, have array_type in build_special_member_call, at
cp/call.c:7936
 auto lambda = [v]{};

0xf52300 tree_check_failed(tree_node const*, char const*, int, char const*,
...)
.../tree.c:9643
0x5b24b2 tree_check3(tree_node*, char const*, int, char const*, tree_code,
tree_code, tree_code)
.../tree.h:3046
0x5b24b2 build_special_member_call(tree_node*, tree_node*, vec**, tree_node*, int, int)
.../cp/call.c:7951
0x661169 split_nonconstant_init_1
.../cp/typeck2.c:695
0x66248d split_nonconstant_init(tree_node*, tree_node*)
.../cp/typeck2.c:745
0x666ca1 store_init_value(tree_node*, tree_node*, vec**, int)
.../cp/typeck2.c:850
0x5df656 check_initializer
.../cp/decl.c:6150
0x5e4d52 cp_finish_decl(tree_node*, tree_node*, bool, tree_node*, int)
.../cp/decl.c:6798
0x6e7109 cp_parser_init_declarator
.../cp/parser.c:18658
0x6e73bb cp_parser_simple_declaration
.../cp/parser.c:12379
0x6e7f7c cp_parser_block_declaration
.../cp/parser.c:12248
0x6e80c6 cp_parser_declaration_statement
.../cp/parser.c:11860
0x6c7b07 cp_parser_statement
.../cp/parser.c:10528
0x6c7bea cp_parser_statement_seq_opt
.../cp/parser.c:10806
0x6c7ce6 cp_parser_compound_statement
.../cp/parser.c:10760
0x6e647d cp_parser_function_body
.../cp/parser.c:20653
0x6e647d cp_parser_ctor_initializer_opt_and_function_body
.../cp/parser.c:20689
0x6e677d cp_parser_function_definition_after_declarator
.../cp/parser.c:25351
0x6e6b52 cp_parser_function_definition_from_specifiers_and_declarator
.../cp/parser.c:25263
0x6e6b52 cp_parser_init_declarator
.../cp/parser.c:18429

[Bug target/67896] Inconsistent behaviour between C and C++ for types poly8x8_t and poly16x8_t

2016-04-01 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67896

--- Comment #6 from James Greenhalgh  ---
Author: jgreenhalgh
Date: Fri Apr  1 09:45:44 2016
New Revision: 234665

URL: https://gcc.gnu.org/viewcvs?rev=234665&root=gcc&view=rev
Log:
Backport: [PATCH] Do not set structural equality on polynomial types

gcc/ChangeLog:

PR target/67896
* config/aarch64/aarch64-builtins.c
(aarch64_init_simd_builtin_types): Do not set structural
equality to __Poly{8,16,64,128}_t types.

gcc/testsuite/ChangeLog:

PR target/67896
* gcc.target/aarch64/simd/pr67896.C: New.


Added:
branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/simd/pr67896.C
  - copied unchanged from r232818,
trunk/gcc/testsuite/gcc.target/aarch64/simd/pr67896.C
Modified:
branches/gcc-5-branch/   (props changed)
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/aarch64/aarch64-builtins.c
branches/gcc-5-branch/gcc/testsuite/ChangeLog

Propchange: branches/gcc-5-branch/
('svn:mergeinfo' modified)

[Bug target/67896] Inconsistent behaviour between C and C++ for types poly8x8_t and poly16x8_t

2016-04-01 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67896

James Greenhalgh  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from James Greenhalgh  ---
Fixed on trunk and 5.

[Bug c++/70531] Turning optimisation level 2 causes the output program to go into infinite loop

2016-04-04 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70531

James Greenhalgh  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||jgreenhalgh at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from James Greenhalgh  ---
Try compiling and running with -fsanitize=undefined. You have a bug in your
logic that results in an out-of-bounds memory access:

   .../ab2.cpp:97:26: runtime error: index -1 out of bounds for type 'long long
int [101]'
   .../ab2.cpp:97:18: runtime error: index -1 out of bounds for type 'long long
int [101][101][101]'
Segmentation fault (core dumped)

(At least) this condition is in the wrong place:

if (xs > xe || ys > ye)
return 0;

When rec is called with arguments (0, -1, 0, -1) (as it will be), this
condition comes after the memory dereference at:

if (dp[xs][xe][ys][ye] != -1)
return dp[xs][xe][ys][ye];

So you will be trying to access dp[0][-1][0][-1] - which is invalid.

I haven't fully audited your code for other logic errors. Please check your
algorithm. For simple inputs I always get a crash, not an infinite loop - but
such is the nature of undefined behaviour. If your bug report relies on
particular input to cause the loop, you'll need to provide that. As it stands,
this looks invalid, but feel free to reopen it after you have audited your code
for other undefined sequences.

[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters

2016-04-11 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133

--- Comment #10 from James Greenhalgh  ---
Author: jgreenhalgh
Date: Mon Apr 11 10:14:59 2016
New Revision: 234876

URL: https://gcc.gnu.org/viewcvs?rev=234876&root=gcc&view=rev
Log:
[Patch AArch64 2/3] Rework the code to print extension strings (pr70133)

gcc/

PR target/70133
* config/aarch64/aarch64-common.c (aarch64_option_extension): Keep
track of a canonical flag name.
(all_extensions): Likewise.
(arch_to_arch_name): Also track extension flags enabled by the arch.
(all_architectures): Likewise.
(aarch64_parse_extension): Move to here.
(aarch64_get_extension_string_for_isa_flags): Take a new argument,
rework.
(aarch64_rewrite_selected_cpu): Update for above change.
* config/aarch64/aarch64-option-extensions.def: Rework the way flags
are handled, such that the single explicit value enabled by an
extension is kept seperate from the implicit values it also enables.
* config/aarch64/aarch64-protos.h (aarch64_parse_opt_result): Move
to here.
(aarch64_parse_extension): New.
* config/aarch64/aarch64.c (aarch64_parse_opt_result): Move from
here to config/aarch64/aarch64-protos.h.
(aarch64_parse_extension): Move from here to
common/config/aarch64/aarch64-common.c.
(aarch64_option_print): Update.
(aarch64_declare_function_name): Likewise.
(aarch64_start_file): Likewise.
* config/aarch64/driver-aarch64.c (arch_extension): Keep track of
the canonical flag for extensions.
* config.gcc (aarch64*-*-*): Extend regex for capturing extension
flags.

gcc/testsuite/

PR target/70133
* gcc.target/aarch64/mgeneral-regs_4.c: Fix expected output.
* gcc.target/aarch64/target_attr_15.c: Likewise.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/common/config/aarch64/aarch64-common.c
trunk/gcc/config.gcc
trunk/gcc/config/aarch64/aarch64-option-extensions.def
trunk/gcc/config/aarch64/aarch64-protos.h
trunk/gcc/config/aarch64/aarch64.c
trunk/gcc/config/aarch64/driver-aarch64.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/aarch64/mgeneral-regs_4.c
trunk/gcc/testsuite/gcc.target/aarch64/target_attr_15.c

[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters

2016-04-11 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133

--- Comment #11 from James Greenhalgh  ---
Author: jgreenhalgh
Date: Mon Apr 11 10:16:26 2016
New Revision: 234877

URL: https://gcc.gnu.org/viewcvs?rev=234877&root=gcc&view=rev
Log:
[Patch AArch64 3/3] Fix up for pr70133

gcc/

PR target/70133
* config/aarch64/driver-aarch64.c
(aarch64_get_extension_string_for_isa_flags): New.
(arch_extension): Rename to...
(aarch64_arch_extension): ...This.
(ext_to_feat_string): Rename to...
(aarch64_extensions): ...This.
(aarch64_core_data): Keep track of architecture extension flags.
(cpu_data): Rename to...
(aarch64_cpu_data): ...This.
(aarch64_arch_driver_info): Keep track of architecture extension
flags.
(get_arch_name_from_id): Rename to...
(get_arch_from_id): ...This, change return type.
(host_detect_local_cpu): Update and reformat for renames, handle
extensions through common infrastructure.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/driver-aarch64.c

[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters

2016-04-11 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133

James Greenhalgh  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from James Greenhalgh  ---
Fixed on trunk with r234875 r234876 and r234877 . You'll need to contact Linaro
through their support/bug channels if you think these fixes should be ported to
the Linaro releases.

[Bug target/69841] Wrong template instantiation in C++11 on armv7l

2016-04-11 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841

James Greenhalgh  changed:

   What|Removed |Added

 CC||jason at redhat dot com

--- Comment #6 from James Greenhalgh  ---
*ping*

[Bug c++/70657] testing

2016-04-14 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70657

James Greenhalgh  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||jgreenhalgh at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from James Greenhalgh  ---
Please, stop this.

[Bug c/70707] INT_MAX used before it is defined

2016-04-18 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70707

James Greenhalgh  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||jgreenhalgh at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from James Greenhalgh  ---
Hi Lewis,

This bugzilla is for reporting bugs against GCC, rather than asking for usage
help. Feel free to post the same message on gcc-h...@gcc.gnu.org where you're
more likely to get an answer.

Thanks,
James Greenhalgh

[Bug target/70809] New: [AArch64] aarch64_vmls pattern should be rejected if -ffp-contract=off

2016-04-26 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70809

Bug ID: 70809
   Summary: [AArch64] aarch64_vmls pattern should be rejected if
-ffp-contract=off
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64*-*-*

Take this simple testcase:

  void
  foo (float * __restrict__ __attribute__ ((aligned (16))) a,
   float * __restrict__ __attribute__ ((aligned (16))) x,
   float * __restrict__ __attribute__ ((aligned (16))) y,
   float * __restrict__ __attribute__ ((aligned (16))) z)
  {
unsigned i = 0;
for (i = 0; i < 256; i++)
  a[i] = x[i] - (y[i] * z[i]);
  }

GCC for AArch64 (all versions) will generate a vectorized fmls instruction even
when given the --fp-contract=off command (for trunk and 6 you'll need to play
with -mcpu options to find one which permits the combine through the cost
model):

(for trunk) $ gcc -O3 -ffp-contract=off -mcpu=xgene1 foo.c

   
   .L4:
ldr q2, [x9, x4]
add w5, w5, 1
ldr q1, [x8, x4]
cmp w5, w7
ldr q0, [x10, x4]
fmlsv0.4s, v2.4s, v1.4s
str q0, [x6, x4]
add x4, x4, 16
bcc .L4
  

The problem seems pretty clear, the aarch64_vmls pattern needs to be
tightened up not to fuse multiplies and subtracts when we're not in
-ffp-contract=fast.

  (define_insn "aarch64_vmls"
[(set (match_operand:VDQF 0 "register_operand" "=w")
 (minus:VDQF (match_operand:VDQF 1 "register_operand" "0")
 (mult:VDQF (match_operand:VDQF 2 "register_operand" "w")
(match_operand:VDQF 3 "register_operand"
"w"]
"TARGET_SIMD"
   "fmls\\t%0., %2., %3."
[(set_attr "type" "neon_fp_mla__scalar")]
  )

[Bug target/66200] GCC for ARM / AArch64 doesn't define TARGET_RELAXED_ORDERING

2016-04-27 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66200

James Greenhalgh  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 CC||jgreenhalgh at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #11 from James Greenhalgh  ---
Looks like this is fixed on all live branches. Ramana, please reopen if there
is something more to be done that I've missed.

[Bug tree-optimization/71478] New: ICE in tree-ssa-reassoc.c after r236564

2016-06-09 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71478

Bug ID: 71478
   Summary: ICE in tree-ssa-reassoc.c after r236564
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
CC: kugan.vivekanandarajah at linaro dot org
  Target Milestone: ---

Created attachment 38671
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38671&action=edit
Reduced testcase

I took a look at PR71170, PR71230, PR71252 and PR 71281, but they seemed subtly
different to my issue.

The attached testcase fails for me on trunk with -O1 on
x86_x64-unknown-linux-gnu and aarch64-none-linux-gnu.

The ICE looks like:

x.c: In function 'foo':
x.c:7:1: internal compiler error: gimple check: expected
gimple_assign(error_mark), have gimple_call() in gimple_assign_rhs1, at
gimple.h:2493
 foo (void)
 ^~~
0x856d7b gimple_check_failed(gimple const*, char const*, int, char const*,
gimple_code, tree_code)
.../gcc/gimple.c:1177
0xc92547 GIMPLE_CHECK2
.../gcc/gimple.h:73
0xc92547 gimple_assign_rhs1
.../gcc/gimple.h:2493
0xc96d28 rewrite_expr_tree
.../gcc/tree-ssa-reassoc.c:3834
0xc97112 rewrite_expr_tree
.../gcc/tree-ssa-reassoc.c:3931
0xca1970 reassociate_bb
.../gcc/tree-ssa-reassoc.c:5372
0xca1c07 reassociate_bb
.../gcc/tree-ssa-reassoc.c:5414
0xca21c0 do_reassoc
.../gcc/tree-ssa-reassoc.c:5528
0xca21c0 execute_reassoc
.../gcc/tree-ssa-reassoc.c:5615
0xca21c0 execute
.../gcc/tree-ssa-reassoc.c:5654
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

[Bug target/81456] [7/8 Regression] x86-64 optimizer makes wrong decision when optimizing for size

2017-07-17 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81456

--- Comment #2 from James Greenhalgh  ---
(In reply to Martin Liška from comment #1)
> Confirmed, started with r238594.

The cost model relies on the target giving a reasonable approximation for an
instruction size through ix86_rtx_costs.

The basic branch structure looks like:


t = mod
if (a / b % 2)
  t = b - mod


In RTL, this looks like:

  (insn 14 13 15 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 99)
(const_int 0 [0]))) "foo.c":5 3 {*cmpsi_ccno_1}
 (expr_list:REG_DEAD (reg:SI 99)
(nil)))
  (jump_insn 15 14 16 2 (set (pc)
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:DI 22)
(pc))) "foo.c":5 617 {*jcc_1}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 2 (nil)))
   -> 22)

  (note 16 15 17 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
  (insn 17 16 22 3 (parallel [
(set (reg/v:SI 93 [  ])
(minus:SI (reg/v:SI 95 [ b ])
(reg/v:SI 93 [  ])))
(clobber (reg:CC 17 flags))
]) "foo.c":5 273 {*subsi_1}
 (expr_list:REG_DEAD (reg/v:SI 95 [ b ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
  (code_label 22 17 25 4 1 (nil) [1 uses])

That is to say, we're starting with a comparison, a branch and a subtract. We
want to know if that sequence is cheaper than a subtract a and conditional
select.

In the cost model, we take an approximation for the branch and comparison of
COST_N_INSNS(2) and the backend tells us the cost of a subtract is
COST_N_INSNS(1). Thus, the cost before transformation is COST_N_INSNS (3) ==
12.

After the transformation, we create this RTL:

  (insn 31 0 32 (set (reg:SI 102)
(reg/v:SI 93 [  ])) 82 {*movsi_internal}
   (nil))

  (insn 32 31 33 (parallel [
(set (reg:SI 101)
(minus:SI (reg/v:SI 95 [ b ])
(reg/v:SI 93 [  ])))
(clobber (reg:CC 17 flags))
]) 273 {*subsi_1}
   (nil))

  (insn 33 32 34 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 99)
(const_int 0 [0]))) 3 {*cmpsi_ccno_1}
   (nil))

  (insn 34 33 0 (set (reg/v:SI 93 [  ])
(if_then_else:SI (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg:SI 101)
(reg:SI 102))) 966 {*movsicc_noc}
   (nil))

That is a set to protect the "false" value, the same subtract, a comparison to
set the flags, and a conditional move. When we ask the backend to give us costs
for this it gives us COST_N_INSNS(1) for the set, COST_N_INSNS(1) for the
subtract, COST_N_INSNS(1) for the comparison, and COST_N_INSNS(2) for the
conditional move. That's a total cost of COST_N_INSNS(5) == 20 for the whole
sequence. 20 > 12, so from the perspective of the ifcvt cost model this is a
bad transformation.

Note that ifcvt is not aware that an extra set will be introduced after the
original subtract, nor does it care about the final movl %edx, %eax as that is
unconditional. I thinks it is being asked to trade test, branch, subtract for
set, subtract, test branch - when you spell it out like that it should be clear
why it makes the decision it does.

I can't treproduce your comment about -m32 - I still see branches at -Os.

[Bug middle-end/81832] [8 Regression] ICE in expand_LOOP_DIST_ALIAS, at internal-fn.c:2273

2017-08-14 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81832

James Greenhalgh  changed:

   What|Removed |Added

 CC||amker at gcc dot gnu.org

--- Comment #2 from James Greenhalgh  ---
(In reply to Martin Liška from comment #1)
> Confirmed, started with r250619.

Interesting. That commit seems unlikely to have broken anything (if it does,
the bug would be latent and would have been possible to trigger using the
revision prior). My bisect points to r250959 , which seems much more likely,
given the backtrace.

What I imagine you've done with your bisect is continued back through the
revisions with -ftree-loop-distribute set, that does get you to r250619, but as
this is also really just a change to default "options", you should continue
going back with -ftree-vectorize to find the real culprit. For example, r250617
will also ICE with -O3 -ftree-loop-distribute -ftree-vectorize .

I think this is a general and latent problem with the interaction between the
copy-header pass, and the loop distribution pass. Tracing back further I see
this start with r249994 .

[Bug rtl-optimization/82237] New: [AArch64] Destructive operations result in poor register allocation after scheduling

2017-09-18 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82237

Bug ID: 82237
   Summary: [AArch64] Destructive operations result in poor
register allocation after scheduling
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
  Target Milestone: ---

A destructive operation is one in which an input operand is both read and
written. For example, in the vector FMLA instruction in AArch64:

  FMLA v0.4s, v1.4s, v2.4s

The first operand is used for the accumulator value (the operation is v0 = v0 +
v1 * v2) and is both read and written by the instruction.

In RTL terms, this is:

  (define_insn "fma4"
[(set (match_operand:VHSDF 0 "register_operand" "=w")
 (fma:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
  (match_operand:VHSDF 2 "register_operand" "w")
(match_operand:VHSDF 3 "register_operand" "0")))]
"TARGET_SIMD"
   "fmla\\t%0., %1., %2."
[(set_attr "type" "neon_fp_mla_")]
  )

from config/aarch64/aarch64-simd.md .

We can get suboptimal code where a read/write operand is used both by a
destructive operation, and a non-destructive operation, and the destructive
operation is scheduled before the non-destructive operation. For example, with
this auto-vectorizable code (with trunk, -O3 -mcpu=cortex-a57):

  void
  foo (float* __restrict__ in1, float* __restrict__ in2,
   float* __restrict__ out1, float* __restrict__ out2)
  {
for (int i = 0; i < 1024; i++)
  {
float t = out1[i];
out1[i] = t + in1[i] * in2[i];
out2[i] = t + in1[i];
  }
  }

ldr q1, [x2, x4]
ldr q0, [x0, x4]
ldr q2, [x1, x4]
mov v3.16b, v1.16b  // <<<<<< 1)
fmlav3.4s, v2.4s, v0.4s // <<<<<< 2)
faddv0.4s, v0.4s, v1.4s // <<<<<< 3)
str q3, [x2, x4]
str q0, [x3, x4]


The scheduling of 2) before 3) forces a reload from v1 in to v3 at 1). With an
improved schedule, this could be:

ldr q1, [x2, x4]
ldr q0, [x0, x4]
ldr q2, [x1, x4]
faddv4.4s, v0.4s, v1.4s // <<<<<< 3)
fmlav3.4s, v2.4s, v0.4s // <<<<<< 2)
str q3, [x2, x4]
str q4, [x3, x4]

In larger loops, we can end up in this situation more frequently than we would
like - the cost of the extra move instructions can be high.

[Bug rtl-optimization/82237] [AArch64] Destructive operations result in poor register allocation after scheduling

2017-09-18 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82237

James Greenhalgh  changed:

   What|Removed |Added

 Target||aarch64*-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-09-18
 Ever confirmed|0   |1

[Bug testsuite/77634] some vectorized testcases fail with -mcpu=thunderx

2017-09-19 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77634

James Greenhalgh  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||jgreenhalgh at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #2 from James Greenhalgh  ---
Comment 1 claims this is fixed, Andrew, please reopen if it is still an issue.

[Bug target/63250] Complex fp16 arithmetic uses nonexistent libgcc functions

2017-09-19 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63250

James Greenhalgh  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from James Greenhalgh  ---
This should have been fixed by my work last year, I think.

[Bug tree-optimization/79534] [7 Regression] tree-ifcombine aarch64 performance regression with trunk@245151

2017-04-19 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79534

--- Comment #8 from James Greenhalgh  ---
In the case before Honza's patch, corrupt profile information leads to a branch
being marked as 100% taken. After Honza's patch, the branch is instead seen
with 95.6% taken:

(jump_insn 1916 1915 1922 309 (set (pc)
(if_then_else (ne (reg:CC 66 cc)
(const_int 0 [0]))
(label_ref 1905)
(pc))) "foo.cpp":59 9 {condjump}
 (expr_list:REG_DEAD (reg:CC 66 cc)
(int_list:REG_BR_PROB 1 (nil)))
 -> 1905)
;;  succ:   227 [95.6%] 
;;  226 [4.4%]  (FALLTHRU)

That's enough for GCC to consider the branch unpredictable, which in turn
causes GCC to use the "unpredictable" number for BRANCH_COST when setting the
maximum , which when tuning for Cortex-A57 is 1 for predictable branches (not
high enough to trigger the transform) and 3 for unpredictable branches (high
enough to trigger the transform). That explains why we don't see the
performance difference for -mcpu=generic, where BRANCH_COST always returns 2 -
which is always high enough to trigger this if-conversion.

The cost model looks reasonable, this is clearly a borderline case for the
heuristic. The only thing I found surprising in my analysis of this regression
is that GCC considers a 95.6% taken branch as unpredictable.

I'm not sure what the correct course for fixing this is - nothing in the
compiler seems to be broken, we're just on an unlucky side of the static
prediction engine and the ifcvt heuristics.

[Bug tree-optimization/79534] [7 Regression] tree-ifcombine aarch64 performance regression with trunk@245151

2017-04-20 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79534

--- Comment #10 from James Greenhalgh  ---
The most striking improvement was in libquantum, for which we saw a 15%
performance improvement on Cortex-A72 (3% on cortex-A57) directly attributable
to basic block ordering after this patch.

Otherwise, I don't have a direct before/after comparison for just Honza's patch
across a wider set of benchmarks, but our nightly runs show general
improvements in benchmarks from Spec which are sensitive to block reordering
after the day of the patch. I don't see any large regressions in this time.

[Bug tree-optimization/79534] [7/8 Regression] tree-ifcombine aarch64 performance regression with trunk@245151

2017-04-21 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79534

--- Comment #12 from James Greenhalgh  ---
So while there's nothing buggy about the if-conversion which causes the
performance issue, it does show an interesting missed optimization that ifcvt
can't handle.

We make the transform through find_if_case_2, which looks for things of the
form:

/* TEST BLOCK  */
if (test)
   goto E; // x not live
/* FALLTHRU  */
/* ELSE BLOCK  */
x = big();
goto L;
E:
/* THEN BLOCK  */
x = b;
goto M;

And transforms them to:

/* Unconditional copy of THEN BLOCK */
x = b;
/* TEST BLOCK  */
if (test)
  goto M;
/* ELSE BLOCK  */
x = big();
goto L;

In the testcase, using the naming conventions above, and snipping irrelevant
details, this looks like:

TEST BLOCK (309)

;; basic block 309, loop depth 4, count 0, freq 3153, maybe hot
;;  prev block 308, next block 311, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   219 [98.0%] 


(insn 1915 1914 1916 309 (set (reg:CC 66 cc)
(compare:CC (reg:SI 1117)
(const_int 0 [0]))) "foo.cpp":59 391 {cmpsi}
 (expr_list:REG_DEAD (reg:SI 1117)
(nil)))
(jump_insn 1916 1915 1922 309 (set (pc)
(if_then_else (ne (reg:CC 66 cc)
(const_int 0 [0]))
(label_ref 1905)
(pc))) "foo.cpp":59 9 {condjump}
 (expr_list:REG_DEAD (reg:CC 66 cc)
(int_list:REG_BR_PROB 9558 (nil)))
 -> 1905)

;;  succ:   227 [95.6%] 
;;  226 [4.4%]  (FALLTHRU)

ELSE BLOCK (226)

;; basic block 226, loop depth 4, count 0, freq 201, maybe hot
;;  prev block 224, next block 227, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   309 [4.4%]  (FALLTHRU)
;;  311 [3.8%]  (FALLTHRU)

(code_label 1917 1413 1417 226 141 (nil) [0 uses])
(note 1417 1917 1418 226 [bb 226] NOTE_INSN_BASIC_BLOCK)
(insn 1418 1417 1905 226 (set (reg:SI 690 [ _1517 ])
(plus:SI (reg:SI 703 [ ivtmp.56D.5375 ])
(const_int -3 [0xfffd]))) 95 {*addsi3_aarch64}
 (nil))

;;  succ:   237 [100.0%]  (FALLTHRU)

THEN BLOCK (227)

;; basic block 227, loop depth 4, count 0, freq 3013, maybe hot
;;  prev block 226, next block 228, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   309 [95.6%] 

(code_label 1905 1418 1421 227 140 (nil) [1 uses])
(note 1421 1905 1422 227 [bb 227] NOTE_INSN_BASIC_BLOCK)
(insn 1422 1421 1802 227 (set (reg:SI 690 [ _1517 ])
(plus:SI (reg:SI 703 [ ivtmp.56D.5375 ])
(const_int -3 [0xfffd]))) 95 {*addsi3_aarch64}
 (nil))

;;  succ:   237 [100.0%]  (FALLTHRU)

So the interesting thing is that the THEN block and the ELSE block are as good
as identical! Both compute (plus (reg 703) (const_int -3)) and both fall
through to block 237.

The normal if-convert machinery won't catch this because basic block 226 (the
ELSE block) has multiple predecessors. But the transformation we make through
find_if_case_2 ends up looking silly! (again, snipping some unrelated
details/insns):

TEST BLOCK (279)

;; basic block 279, loop depth 4, count 0, freq 3153, maybe hot
;;  prev block 278, next block 280, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   203 [98.0%] 


/* Unconditional copy of THEN BLOCK.  */

(insn 1422 1914 1915 279 (set (reg:SI 690 [ _1517 ])
(plus:SI (reg:SI 703 [ ivtmp.56D.5375 ])
(const_int -3 [0xfffd]))) 95 {*addsi3_aarch64}
 (nil))
(insn 1915 1422 1916 279 (set (reg:CC 66 cc)
(compare:CC (reg:SI 1117)
  (const_int 0 [0]))) "foo.cpp":59 391 {cmpsi}
 (expr_list:REG_DEAD (reg:SI 1117)
(nil)))
(jump_insn 1916 1915 1922 279 (set (pc)
(if_then_else (ne (reg:CC 66 cc)
(const_int 0 [0]))
(label_ref:DI 1470)
(pc))) "foo.cpp":59 9 {condjump}
 (expr_list:REG_DEAD (reg:CC 66 cc)
(int_list:REG_BR_PROB 9558 (nil)))
 -> 1470)
;;  succ:   218 [95.6%] 
;;  209 [4.4%]  (FALLTHRU)

ELSE BLOCK (209):

;; basic block 209, loop depth 4, count 0, freq 201, maybe hot
;;  prev block 208, next block 210, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   279 [4.4%]  (FALLTHRU)
;;  280 [3.8%]  (FALLTHRU)

(code_label 1917 1413 1417 209 141 (nil) [0 uses])
(note 1417 1917 1418 209 [bb 209] NOTE_INSN_BASIC_BLOCK)
(insn 1418 1417 1802 209 (set (reg:SI 690 [ _1517 ])
(plus:SI (reg:SI 703 [ ivtmp.56D.5375 ])
(const_int -3 [0xfffd]))) 95 {*addsi3_aarch64}
 (nil))
;;  succ:   218 [100.0%]  (FALLTHRU)

Note that if we are on the "else" path, we now we compute (pl

[Bug target/80530] New: [7 Regression][AArch64] ICE when expanding reciprocal square root with -mcpu=exynos-m1 or -mcpu=xgene-1

2017-04-26 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80530

Bug ID: 80530
   Summary: [7 Regression][AArch64] ICE when expanding reciprocal
square root with -mcpu=exynos-m1 or -mcpu=xgene-1
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
  Target Milestone: ---

This testcase:

double
bar (double a)
{
  return 1.0/__builtin_sqrt(a);
}

Fails with an ICE on AArch64 with the options:

 gcc -funsafe-math-optimizations -O1 foo.c -mcpu=xgene1

on Linux.

g.c: In function ‘bar’:
g.c:11:14: internal compiler error: in expand_insn, at optabs.c:7130
   return 1.0/__builtin_sqrt(a);
  ^
0xa70a15 expand_insn(insn_code, unsigned int, expand_operand*)
.../gcc/optabs.c:7130
0x94589e expand_direct_optab_fn
.../gcc/internal-fn.c:2600
0x71d4b7 expand_call_stmt
.../gcc/cfgexpand.c:2569
0x71d4b7 expand_gimple_stmt_1
.../gcc/cfgexpand.c:3571
0x71d4b7 expand_gimple_stmt
.../gcc/cfgexpand.c:3737
0x71ee69 expand_gimple_basic_block
.../gcc/cfgexpand.c:5744
0x7247d6 execute
.../gcc/cfgexpand.c:6357
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

The problem will be somewhere in the approximate square root expander, as the
same ICE does not occur for -mcpu values which do not use the approximate
square root expansion path.

[Bug tree-optimization/80457] vectorizable_condition does not update the vectorizer cost model

2017-05-03 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80457

--- Comment #3 from James Greenhalgh  ---
(In reply to Bill Schmidt from comment #2)
> Per https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00967.html, James
> Greenhalgh has a more comprehensive patch for this, so removing myself from
> the Assignee field and will await his patch.  Thanks, James!

I'm out of office until June, would you mind applying the patch on my behalf
(and reverting it if anything goes wrong!) in my abscence? Thanks!

[Bug tree-optimization/80457] vectorizable_condition does not update the vectorizer cost model

2017-05-31 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80457

--- Comment #7 from James Greenhalgh  ---
Thanks for your help!

[Bug target/71778] [6/7/8 Regression][ARM] ICE using non-constant argument to Neon intrinsic that requires constant arguments

2017-06-16 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71778

--- Comment #7 from James Greenhalgh  ---
Author: jgreenhalgh
Date: Fri Jun 16 17:29:56 2017
New Revision: 249272

URL: https://gcc.gnu.org/viewcvs?rev=249272&root=gcc&view=rev
Log:
[Patch ARM] Fix PR71778

gcc/

PR target/71778
* config/arm/arm-builtins.c (arm_expand_builtin_args): Return TARGET
if given a non-constant argument for an intrinsic which requires a
constant.

gcc/testsuite/

PR target/71778
* gcc.target/arm/pr71778.c: New.


Added:
trunk/gcc/testsuite/gcc.target/arm/pr71778.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm-builtins.c
trunk/gcc/testsuite/ChangeLog

[Bug target/71778] [6/7/8 Regression][ARM] ICE using non-constant argument to Neon intrinsic that requires constant arguments

2017-06-19 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71778

--- Comment #8 from James Greenhalgh  ---
Author: jgreenhalgh
Date: Mon Jun 19 16:58:03 2017
New Revision: 249379

URL: https://gcc.gnu.org/viewcvs?rev=249379&root=gcc&view=rev
Log:
Backport: [Patch ARM] Fix PR71778

gcc/

PR target/71778
* config/arm/arm-builtins.c (arm_expand_builtin_args): Return TARGET
if given a non-constant argument for an intrinsic which requires a
constant.

gcc/testsuite/

PR target/71778
* gcc.target/arm/pr71778.c: New.


Added:
branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/pr71778.c
  - copied unchanged from r249272,
trunk/gcc/testsuite/gcc.target/arm/pr71778.c
Modified:
branches/gcc-7-branch/   (props changed)
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config/arm/arm-builtins.c
branches/gcc-7-branch/gcc/testsuite/ChangeLog

Propchange: branches/gcc-7-branch/
('svn:mergeinfo' added)

[Bug target/71778] [6/7/8 Regression][ARM] ICE using non-constant argument to Neon intrinsic that requires constant arguments

2017-06-19 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71778

--- Comment #9 from James Greenhalgh  ---
Author: jgreenhalgh
Date: Mon Jun 19 17:12:12 2017
New Revision: 249380

URL: https://gcc.gnu.org/viewcvs?rev=249380&root=gcc&view=rev
Log:
Backport: [Patch ARM] Fix PR71778

gcc/

PR target/71778
* config/arm/arm-builtins.c (arm_expand_builtin_args): Return TARGET
if given a non-constant argument for an intrinsic which requires a
constant.

gcc/testsuite/

PR target/71778
* gcc.target/arm/pr71778.c: New.


Added:
branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/pr71778.c
  - copied unchanged from r249272,
trunk/gcc/testsuite/gcc.target/arm/pr71778.c
Modified:
branches/gcc-6-branch/   (props changed)
branches/gcc-6-branch/gcc/   (props changed)
branches/gcc-6-branch/gcc/ChangeLog
branches/gcc-6-branch/gcc/config/arm/arm-builtins.c
branches/gcc-6-branch/gcc/testsuite/ChangeLog

Propchange: branches/gcc-6-branch/
('svn:mergeinfo' modified)

Propchange: branches/gcc-6-branch/gcc/
('svn:mergeinfo' modified)

[Bug target/71778] [6/7/8 Regression][ARM] ICE using non-constant argument to Neon intrinsic that requires constant arguments

2017-06-19 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71778

James Greenhalgh  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from James Greenhalgh  ---
Fixed on all active branches.

[Bug target/63250] Complex fp16 arithmetic uses nonexistent libgcc functions

2016-11-23 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63250

--- Comment #5 from James Greenhalgh  ---
Author: jgreenhalgh
Date: Wed Nov 23 17:36:21 2016
New Revision: 242784

URL: https://gcc.gnu.org/viewcvs?rev=242784&root=gcc&view=rev
Log:
[Patch ARM 17/17] Enable _Float16 for ARM and fix PR target/63250

gcc/

PR target/63250
* config/arm/arm-builtins.c (arm_simd_floatHF_type_node): Rename to...
(arm_fp16_type_node): ...This, make visibile.
(arm_simd_builtin_std_type): Rename arm_simd_floatHF_type_node to
arm_fp16_type_node.
(arm_init_simd_builtin_types): Likewise.
(arm_init_fp16_builtins): Likewise.
* config/arm/arm.c (arm_excess_precision): New.
(arm_floatn_mode): Likewise.
(TARGET_C_EXCESS_PRECISION): Likewise.
(TARGET_FLOATN_MODE): Likewise.
(arm_promoted_type): Only promote arm_fp16_type_node.
* config/arm/arm.h (arm_fp16_type_node): Declare.

gcc/testsuite/

PR target/63250
* lib/target-supports.exp (add_options_for_float16): Add
-mfp16-format=ieee when testign arm*-*-*.



Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm-builtins.c
trunk/gcc/config/arm/arm.c
trunk/gcc/config/arm/arm.h
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/lib/target-supports.exp

[Bug middle-end/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875

2016-11-24 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509

--- Comment #2 from James Greenhalgh  ---
(In reply to Rainer Orth from comment #1)
> James, this is caused by your patch series
> 
> [Patch 1/17] Add a new target hook for describing excess precision intentions
> 
> I believe.
> 
>   Rainer

Thanks, and sorry for the break.

Can you help me out with a configure line that would get me to a stage 1
solaris/x32 compiler so I can debug this?

[Bug middle-end/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875

2016-11-24 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509

James Greenhalgh  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2016-11-24
   Assignee|unassigned at gcc dot gnu.org  |jgreenhalgh at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #4 from James Greenhalgh  ---
Well, certainly this comment and assert in tree.c:

  /* The target should not ask for unpredictable float evaluation (though
 it might advertise that implicitly the evaluation is unpredictable,
 but we don't care about that here, it will have been reported
 elsewhere).  If it does ask for unpredictable evaluation, we have
 nothing to do here.  */
  gcc_assert (target_flt_eval_method != FLT_EVAL_METHOD_UNPREDICTABLE);

Suggest that the implementation I've put in for TARGET_C_EXCESS_PRECISION on
i386 is wrong (or the assert needs to be weakened).

  static enum flt_eval_method
  ix86_excess_precision (enum excess_precision_type type)
  {
switch (type)
  {
case EXCESS_PRECISION_TYPE_FAST:
  /* The fastest type to promote to will always be the native type,
 whether that occurs with implicit excess precision or
 otherwise.  */
  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
case EXCESS_PRECISION_TYPE_STANDARD:
case EXCESS_PRECISION_TYPE_IMPLICIT:
  /* Otherwise, the excess precision we want when we are
 in a standards compliant mode, and the implicit precision we
 provide can be identical.  */
  if (!TARGET_80387)
return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
  else if (TARGET_MIX_SSE_I387)
return FLT_EVAL_METHOD_UNPREDICTABLE;
  else if (!TARGET_SSE_MATH)
return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
  else if (TARGET_SSE2)
return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
  else
return FLT_EVAL_METHOD_UNPREDICTABLE;
default:
  gcc_unreachable ();
  }

return FLT_EVAL_METHOD_UNPREDICTABLE;
  }

I think the right fix is probably to return FLT_METHOD_PROMOTE_TO_FLOAT for
EXCESS_PRECISION_TYPE_STANDARD, but I'll need to think about that.

Sorry again for the break, by inspection it is obvious how you hit that assert.

[Bug middle-end/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875

2016-11-24 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509

--- Comment #6 from James Greenhalgh  ---
None of the logic was there in the original code, so there is not much to
compare.

The question for the backend when TYPE is EXCESS_PRECISION_TYPE_FAST or
EXCESS_PRECISION_TYPE_STANDARD is, does it wants tree.c to insert operations to
guarantee explicit excess precision for the types, or does it wants tree.c to
keep them as their native types.

The assert exists because it makes no sense to ask the front-end to explicitly
make the operations unpredictable.

The fix which most closely maps to the semantics I think i386 wants is...

For EXCESS_PRECISION_TYPE_FAST:
  Always return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT

For EXCESS_PRECISION_TYPE_STANDARD:
  If we're in a mode which should never promote, or we're in a mode which will
be implicitly unpredictable, return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
  If we're in the mode which should explicitly promote to LONG_DOUBLE, do that.

For EXCESS_PRECISION_TYPE_IMPLICIT:
  Keep the current logic.

I'll write a patch along those lines, and test it as well as I can, but I don't
really know how to get good -m32 testing out of my x86_64 box, which doesn't
have a good multilib environment set up. If you can point me at a machine in
the compile farm I can use I'd be happy to test more extensively.

[Bug target/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875

2016-11-24 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509

--- Comment #8 from James Greenhalgh  ---
(In reply to Jakub Jelinek from comment #7)
> (In reply to James Greenhalgh from comment #6)
> > None of the logic was there in the original code, so there is not much to
> > compare.
> 
> ??  Since -fexcess-precision=standard has been introduced, gcc has the
> excess precision notion.  So there is something to compare.
> E.g. try
> float foo (float x, float y, float z)
> {
>   return x + y + z;
> }
> before your changes with
> -fdump-tree-gimple -m32 -msse2 -mno-80387 -fexcess-precision=standard
> -fdump-tree-gimple -m32 -msse2 -mfpmath=387+sse -fexcess-precision=standard
> -fdump-tree-gimple -m32 -msse2 -mfpmath=387 -fexcess-precision=standard
> -fdump-tree-gimple -m32 -msse2 -mfpmath=sse -fexcess-precision=standard
> -fdump-tree-gimple -m32 -msse -mno-sse2 -mfpmath=sse
> -fexcess-precision=standard
> to match the different cases in your hook, and compare that to what you get
> with the current trunk.

Right, I think we might have been talking about comparing different things.
That works for a test of observable behaviour.

I've done what I suggested above, tested it as you suggested, and posted a fix
to the mailing list https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02568.html

[Bug target/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875

2016-11-25 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509

--- Comment #9 from James Greenhalgh  ---
Author: jgreenhalgh
Date: Fri Nov 25 09:25:31 2016
New Revision: 242866

URL: https://gcc.gnu.org/viewcvs?rev=242866&root=gcc&view=rev
Log:
[Patch i386] PR78509 - TARGET_C_EXCESS_PRECISION should not return
 "unpredictable" for EXCESS_PRECISION_TYPE_STANDARD

gcc/

PR target/78509
* config/i386/i386.c (i386_excess_precision): Do not return
FLT_EVAL_METHOD_UNPREDICTABLE when "type" is
EXCESS_PRECISION_TYPE_STANDARD.
* target.def (excess_precision): Document that targets should
not return FLT_EVAL_METHOD_UNPREDICTABLE when "type" is
EXCESS_PRECISION_TYPE_STANDARD or EXCESS_PRECISION_TYPE_FAST.
Fix typo in first sentence.
* doc/tm.texi: Regenerate.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/doc/tm.texi
trunk/gcc/target.def

[Bug target/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875

2016-11-25 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509

--- Comment #10 from James Greenhalgh  ---
Should now be fixed, but I'll leave open for Rainer to confirm.

[Bug target/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875

2016-11-25 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509

--- Comment #12 from James Greenhalgh  ---
I tried looking at the generated assembly for that test with the compilers I
built before my patch series, and after the patch series + the fix above. I
couldn't see any difference in code generated for the testcase you mention for
each of the sets of options Jakub gave above (with -m3dnow, -O2, -m32 for the
testcase).

If this turns out to be my fault, I'll gladly look in to it - but I'll need
help getting the x86 flags right again!

[Bug target/70120] [6 Regression][aarch64] -g causes Assembler messages: Error: unaligned opcodes detected in executable segment

2016-11-25 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70120

James Greenhalgh  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 CC||jgreenhalgh at gcc dot gnu.org
 Resolution|FIXED   |---

--- Comment #12 from James Greenhalgh  ---
I can still trigger this with a testcase using 16-bit floating-point types, and
the tiny memory model:

  int
  main (__fp16 x)
  {
__fp16 a = 6.5504e4;
return (x <= a);
  }

gcc foo.c -O3 -mcmodel=tiny -g

/tmp/ccwJITmo.s: Assembler messages:
/tmp/ccwJITmo.s: Error: unaligned opcodes detected in executable segment

In this test case, a call to force_const_mem in ira adds a new 32-bit constant
in the constant pool, but ultimately doesn't use it. That means that when we
sweep patterns looking for which constant pool entries to emit, we don't mark
the unused pattern created by ira, and it doesn't get emitted. But, that leaves
us with inconsistent information between the offset we think we've got, and
what we've actually emitted.

Presumably IRA isn't the only pass at fault here. Anything which eliminates a
reference to a constant pool entry can cause the constant pool offset
information to become stale.

Maybe force_const_mem shouldn't be updating the offset information at all, and
we should only update that as we make the sweep looking for live pool entries?
I guess the trouble there is that we don't record the mode of the mem in the
constant_descriptor_rtx - but if we were to do that it looks like we might be
able to defer calculating offset until when we actually emit the pool. rs6000
might need some changes, but a better interface for their uses of get_pool_size
looks like it would be "pool_empty_p" anyway.

I'm not sure of this code though, so I don't know if that would make for a
clean design.

If you think this needs to be a separate bug, feel free to reclose this and
open a new one.

[Bug rtl-optimization/78561] New: Constant pool size (offset) can become stale where constant pool entires become unused

2016-11-28 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561

Bug ID: 78561
   Summary: Constant pool size (offset) can become stale where
constant pool entires become unused
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
  Target Milestone: ---

This bug report is mostly from inspection, but the effects of this issue can be
seen with this testcase on AArch64 (See also PR70120 for why we need the size
of the constant pool to be correct).

  int
  main (__fp16 x)
  {
__fp16 a = 6.5504e4;
return (x <= a);
  }

gcc foo.c -O3 -mcmodel=tiny -g

/tmp/ccwJITmo.s: Assembler messages:
/tmp/ccwJITmo.s: Error: unaligned opcodes detected in executable segment

In this test case, a call to force_const_mem in ira adds a new 32-bit constant
in the constant pool, but ultimately doesn't use it. That means that when we
sweep patterns looking for which constant pool entries to emit, we don't mark
the unused pattern created by ira, and it doesn't get emitted. But, that leaves
us with inconsistent information between the offset we think we've got, and
what we've actually emitted.

Presumably IRA isn't the only pass at fault here. Anything which eliminates a
reference to a constant pool entry can cause the constant pool offset
information to become stale, as it is only updated when inserting entries to
the constant pool, not when we decide those entries are actually used.

Maybe force_const_mem shouldn't be updating the offset information at all, and
we should only update that as we make the sweep in mark_constant_pool looking
for live pool entries? I guess the trouble there is that we don't record the
mode of the mem in the constant_descriptor_rtx - but if we were to do that it
looks like we might be able to defer calculating offset until when we actually
emit the pool. rs6000 might need some changes, but a better interface for their
uses of get_pool_size looks like it would be "pool_empty_p" anyway.

I'm not sure of this code though, so I don't know if that would make for a
clean design.

[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused

2016-11-28 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561

James Greenhalgh  changed:

   What|Removed |Added

 Target||aarch64*-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-11-28
 Ever confirmed|0   |1

[Bug target/70120] [6 Regression][aarch64] -g causes Assembler messages: Error: unaligned opcodes detected in executable segment

2016-11-28 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70120

James Greenhalgh  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from James Greenhalgh  ---
> I do think a new bug should be opened.

OK. PR78561 .

[Bug rtl-optimization/78547] [7 Regression] ICE: in loc_cmp, at var-tracking.c:3417 with -Os -g -mstringop-strategy=libcall -freorder-blocks-algorithm=simple

2016-11-28 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78547

James Greenhalgh  changed:

   What|Removed |Added

 CC||hjl at gcc dot gnu.org,
   ||ienkovich at gcc dot gnu.org

--- Comment #2 from James Greenhalgh  ---
I'd be surprised if r238594 was the root cause, but it may have exposed
something latent. This revision changed the cost model for if conversion,
effectively disabling it in this testcase. You can emulate turning off
if-conversion for the testcase with -fno-if-conversion on the command line.
Adding that, you can continue the bisect further back until r237647 which looks
more probable given the testcase body.

I'm now compiling with:

  -Os -g -mstringop-strategy=libcall -freorder-blocks-algorithm=simple
-fdump-rtl-all-all -fno-if-conversion


Author: hjl 
Date:   Tue Jun 21 14:24:31 2016 +

Convert V1TImode register to TImode in debug insn

TImode register referenced in debug insn can be converted to V1TImode
by scalar to vector optimization.  After converting a TImode register
to V1TImode, we need to check all debug insns on its use chain to
convert the V1TImode register to SUBREG TImode.

gcc/

2016-06-21  H.J. Lu  
Ilya Enkovich  

PR target/71549
* config/i386/i386.c (timode_scalar_chain::fix_debug_reg_uses):
New member function to convert V1TImode register to SUBREG
TImode in debug insn.
(timode_scalar_chain::convert_insn): Call fix_debug_reg_uses
after changing register mode to V1TImode.

gcc/testsuite/

2016-06-21  H.J. Lu  

PR target/71549
* gcc.target/i386/pr71549.c: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@237647

[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused

2016-11-30 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561

James Greenhalgh  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jgreenhalgh at gcc dot 
gnu.org

--- Comment #1 from James Greenhalgh  ---
Well, confirmed - and an easy fix is to recompute the offset data while
sweeping for valid constants at the end of compilation.

[Bug tree-optimization/77445] [7 Regression] Performance drop after r239219 on coremark test

2016-11-30 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77445

James Greenhalgh  changed:

   What|Removed |Added

   Last reconfirmed|2016-09-03 00:00:00 |2016-11-30
 CC||law at gcc dot gnu.org

--- Comment #5 from James Greenhalgh  ---
I posted this on list a few weeks back:

  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01454.html

The early threader is running with speed_p set to false (second parameter to
find_jump_threads_backwards)

  unsigned int
  pass_early_thread_jumps::execute (function *fun)
  {
/* Try to thread each block with more than one successor.  */
basic_block bb;
FOR_EACH_BB_FN (bb, fun)
  {
if (EDGE_COUNT (bb->succs) > 1)
find_jump_threads_backwards (bb, false);
  }
thread_through_all_blocks (true);
return 0;
  }

So even though profile information is ignored, we think we are compiling for
size and won't thread. The relevant check in profitable_jump_thread_path is:

  if (speed_p && optimize_edge_for_speed_p (taken_edge))
{
  
}
  else if (n_insns > 1)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "FSM jump-thread path not considered: "
 "duplication of %i insns is needed and optimizing for
size.\n",
 n_insns);
  path->pop ();
  return NULL;
}

Changing false to true (or even to optimize_bb_for_size_p ) in the above hunk
looks like it would enable some of the threading we're relying on here.

[Bug tree-optimization/77445] [7 Regression] Performance drop after r239219 on coremark test

2016-11-30 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77445

--- Comment #7 from James Greenhalgh  ---
Right, I've trimmed too much context from my message.

This performance regression starts with r239219 which adds a cost model to the
threader which relies on frequency information (arguably this is a bad cost
model for threading, as at a switch statement you might expect multiple cold
edges, and still want to thread the switch, but that's a separate discussion).
The threader does a bad job of updating frequency information when it creates
new paths, with the effect that the edges we'd want to thread in this test case
appear to be cold. The new cost model from r239219 sees the cold edges, and
rejects the threading opportunity.

The message I was replying to above had said:

  > Hmm, this is interesting. The patch should have "fixed" the previous
  > degradation by making the profile correct (backward threader still doe not
  > update it, but because most threading now happens early and profile is
built
  > afterwards this should be less of issue).  I am now looking into the
profile
  > update issues and will try to check why coremarks degrade again.

The answer to which is that the early-threader has hard-coded that it is
compiling for size, which causes most backward threading to be rejected, so
wouldn't fix this issue.

However, if we were to use optimize_bb_for_size_p in
pass_early_thread_jumps::execute rather than just passing false then the early
threader would have resolved this issue (as the profile information is not used
to decide if the edge should be optimised for speed).

[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused

2016-12-02 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561

--- Comment #2 from James Greenhalgh  ---
Author: jgreenhalgh
Date: Fri Dec  2 14:29:35 2016
New Revision: 243182

URL: https://gcc.gnu.org/viewcvs?rev=243182&root=gcc&view=rev
Log:
[Patch 1/2 PR78561] Rename get_pool_size to get_pool_size_upper_bound

gcc/

PR rtl-optimization/78561
* config/rs6000/rs6000.c (rs6000_reg_live_or_pic_offset_p) Rename
get_pool_size to get_pool_size_upper_bound.
(rs6000_stack_info): Likewise.
(rs6000_emit_prologue): Likewise.
(rs6000_elf_declare_function_name): Likewise.
(rs6000_set_up_by_prologue): Likewise.
(rs6000_can_eliminate): Likewise, reformat spaces to tabs.
* output.h (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.
* varasm.c (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/rs6000/rs6000.c
trunk/gcc/output.h
trunk/gcc/varasm.c

[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused

2016-12-02 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561

--- Comment #3 from James Greenhalgh  ---
Author: jgreenhalgh
Date: Fri Dec  2 14:31:10 2016
New Revision: 243183

URL: https://gcc.gnu.org/viewcvs?rev=243183&root=gcc&view=rev
Log:
[Patch 2/2 PR78561] Recalculate constant pool size before emitting it

gcc/

PR rtl-optimization/78561
* varasm.c (recompute_pool_offsets): New.
(output_constant_pool): Call it.

gcc/testsuite/

PR rtl-optimization/78561
* gcc.target/aarch64/pr78561.c: New.



Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/varasm.c

[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused

2016-12-05 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561

--- Comment #6 from James Greenhalgh  ---
Author: jgreenhalgh
Date: Mon Dec  5 09:35:28 2016
New Revision: 243239

URL: https://gcc.gnu.org/viewcvs?rev=243239&root=gcc&view=rev
Log:
[Patch 2/2 PR78561] Recalculate constant pool size before emitting it

gcc/testsuite/

PR rtl-optimization/78561
* gcc.target/aarch64/pr78561.c: Add missing testcase from r243183.



Added:
trunk/gcc/testsuite/gcc.target/aarch64/pr78561.c
Modified:
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused

2016-12-05 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561

--- Comment #7 from James Greenhalgh  ---
(In reply to Segher Boessenkool from comment #5)
> Oh btw, you forgot to commit the testcase in 2/2.

Thanks, that's the easy one to fix. Would you be able to help me with a
configure line I can use for a PowerPC bootstrap on one of the compile farm
machines so I can debug the issue I've introduced?

[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused

2016-12-05 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561

--- Comment #9 from James Greenhalgh  ---
(In reply to Segher Boessenkool from comment #8)
> I usually use --disable-libgomp, but otherwise everything default (well,
> --enable-languages=all,ada,go,obj-c++).

I need a bit more hand holding on this one - is there a compile farm machine
set up that if I log in and run your configure line I'll be able to get a
32-bit PowerPC ADA bootstrap going? (I tried gcc112, but that doesn't have GNAT
installed).

[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused

2016-12-05 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561

--- Comment #11 from James Greenhalgh  ---
My bootstrap at r243245 on gcc110 seemed to work fine.

[jgreenhalgh@gcc1-power7 gcc]$ ../build-gcc/gcc/xgcc -v
Using built-in specs.
COLLECT_GCC=../build-gcc/gcc/xgcc
Target: powerpc64-unknown-linux-gnu
Configured with: ../gcc/configure --enable-languages=all,ada,go,obj-c++
Thread model: posix
gcc version 7.0.0 20161205 (experimental) (GCC)

The file you mentioned (gcc/ada/rts_32/a-chahan.adb) seemed to have been
compiled with no issues.

Am I missing something to get the 32-bit multilib buiild, or maybe I need to
target it explicitly?

[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused

2016-12-05 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561

--- Comment #13 from James Greenhalgh  ---
(In reply to Segher Boessenkool from comment #12)
> It still happens here, also on gcc110.  Note you need --disable-werror,
> to avoid another bootstrap error.
> 
> Did you perchance use --disable-bootstrap?

I didn't need disable-werror either, which makes me think I'm building a
completely different toolchain to you.

Maybe I'm missing something very obvious?

All I'm doing is cloning from the git mirror, checking out the revisions we've
discussed here and on IRC, creating a new folder out of tree, running
configure, then running make -j41.

  ssh gcc110.fsffrance.org
  git clone git://gcc.gnu.org/git/gcc.git
  cd gcc
  git checkout 
  mkdir ../build-gcc
  cd ../build-gcc
  ../gcc/configure --enable-languages=all,ada,go,obj-c++
  make -j41 >& build.log

And that works for me.

If I'm missing a step or an environment variable, I'm happy to try again.

[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused

2016-12-05 Thread jgreenhalgh at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561

--- Comment #15 from James Greenhalgh  ---
(In reply to Segher Boessenkool from comment #14)
> I used trunk.  --disable-bootstrap fails the same, just much faster ;-)
> 
> Maybe the binutils etc. version matters?

Do you have a "modern" GCC on path? I'll just be bootstrapping with the system
compiler for stage 1, so might be missing newer warnings?

1 2 3 >

1 - 100 of 299 matches

Mail list logo