[Bug tree-optimization/68707] [6 Regression] testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

2015-12-08 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707 --- Comment #6 from alalaw01 at gcc dot gnu.org --- Well, I can confirm that the patch generates load-lanes/store-lanes instead of SLP, all over the (vect) testsuite. All execution tests are passing :) so it *may* just be a case of updating a lot

[Bug tree-optimization/68707] [6 Regression] testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

2015-12-08 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707 --- Comment #8 from alalaw01 at gcc dot gnu.org --- Adding a check against BB SLP avoids some regressions caused by bailing out of BB SLP when we can't then do a load/store-lanes.

[Bug tree-optimization/68707] [6 Regression] testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

2015-12-11 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707 --- Comment #10 from alalaw01 at gcc dot gnu.org --- This causes to FAIL the scan-tree-dump-times 'vectorizing stmts using SLP' in slp-perm-{1,2,3,5,6,7,8,11}.c. Looking at the assembler before and after... slp-perm-1.c: this looks

[Bug tree-optimization/68707] [6 Regression] testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

2015-12-14 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707 --- Comment #13 from alalaw01 at gcc dot gnu.org --- Hmmm, I realize a "definite" codegen improvement was maybe a bad choice of wording. A "substantial" (albeit uncertain!) improvement, may have been more accurate... Howeve

[Bug tree-optimization/68707] [6 Regression] testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

2015-12-16 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707 --- Comment #18 from alalaw01 at gcc dot gnu.org --- Well, we've seen this patch fix some of the vectorizer performance regressions we've had on some benchmarks. On SPEC...the "SLP cancelled" case triggers all over the p

[Bug tree-optimization/68707] [6 Regression] testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

2015-12-17 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707 --- Comment #20 from alalaw01 at gcc dot gnu.org --- > Would be nice to have a reduced testcase for this one. Working on it. Sadly it's fortran :( The SLP tree that gets cancelled, is quite big (and quite untreelike, if we could see

[Bug tree-optimization/68707] [6 Regression] testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

2015-12-17 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707 --- Comment #21 from alalaw01 at gcc dot gnu.org --- Here's the smallest testcase I could come up with (where SLP gets cancelled, but we end up with fewer st2's than before)...the key seems to be things being used in multiple places.

[Bug tree-optimization/68707] [6 Regression] testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

2015-12-22 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707 --- Comment #23 from alalaw01 at gcc dot gnu.org --- Yes, difficult. I'm conscious that this is stage 3, and worried about adding too much complexity, especially if we're writing code that we'd eventually drop in favour of

[Bug target/69053] [6 Regression] ICE in build_vector_from_val

2016-01-05 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69053 alalaw01 at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed

[Bug target/69053] [6 Regression] ICE in build_vector_from_val

2016-01-05 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69053 --- Comment #2 from alalaw01 at gcc dot gnu.org --- build_vector_from_val then gets called to build a vector (4) unsigned long, from an int* (which is the right signedness and size, but being a pointer it is not types_compatible_p).

[Bug target/69053] [6 Regression] ICE in build_vector_from_val

2016-01-06 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69053 --- Comment #3 from alalaw01 at gcc dot gnu.org --- Well, this fixes it, but I'm not sure it fixes it in the right place... diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index ee32166..bd66aa5 100644 --- a/gcc/tree-vect-loop.c

[Bug tree-optimization/69166] [6 Regression] ICE in get_initial_def_for_reduction, at tree-vect-loop.c:4188

2016-01-08 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69166 alalaw01 at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|REOPENED Last

[Bug tree-optimization/67682] Missed vectorization: (another) straight-line memcpy/memset not vectorized when equivalent loop is

2016-01-11 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67682 alalaw01 at gcc dot gnu.org changed: What|Removed |Added Status|WAITING |RESOLVED

[Bug target/69053] [6 Regression] ICE in build_vector_from_val

2016-01-12 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69053 --- Comment #9 from alalaw01 at gcc dot gnu.org --- I can confirm that both Richi's patch in comment 6 and my patchlet in comment 3, pass bootstrap + check-gcc on ARM and AArch64, and fix the ICE observed on ARM. (ICE never observed on AArch64.)

[Bug middle-end/68112] [6 Regression] FAIL: gcc.target/i386/avx512ifma-vpmaddhuq-2.c (test for excess errors)

2016-01-13 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68112 alalaw01 at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED

[Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.

2016-01-18 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679 --- Comment #39 from alalaw01 at gcc dot gnu.org --- Author: alalaw01 Date: Mon Jan 18 12:29:02 2016 New Revision: 232506 URL: https://gcc.gnu.org/viewcvs?rev=232506&root=gcc&view=rev Log: Make SRA scalarize constant-pool loads PR targ

[Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.

2016-01-18 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679 --- Comment #40 from alalaw01 at gcc dot gnu.org --- Author: alalaw01 Date: Mon Jan 18 12:40:43 2016 New Revision: 232508 URL: https://gcc.gnu.org/viewcvs?rev=232508&root=gcc&view=rev Log: Equate MEM_REFs and ARRAY_REFs in

[Bug tree-optimization/69336] Constant value not detected

2016-01-18 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69336 alalaw01 at gcc dot gnu.org changed: What|Removed |Added CC||alalaw01 at gcc dot gnu.org

[Bug tree-optimization/69352] [6 Regression] profiledbootstrap failure with --with-build-config=bootstrap-lto

2016-01-19 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69352 --- Comment #9 from alalaw01 at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #7) > There are various bugs in the r232508 change. > The > gcc_assert (sz0 == sz1); > gcc_assert (max0 == max1); > gcc_asser

[Bug testsuite/69380] [6 Regression] FAIL: g++.dg/tree-ssa/pr69336.C scan-tree-dump-not optimized "cmap"

2016-01-21 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69380 alalaw01 at gcc dot gnu.org changed: What|Removed |Added Target|arm-none-eabi powerpc*-*-* |arm-none-eabi powerpc

[Bug middle-end/66877] [6 Regression] FAIL: gcc.dg/vect/vect-over-widen-3-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_over_widening_pattern: detected" 2

2016-01-22 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66877 alalaw01 at gcc dot gnu.org changed: What|Removed |Added Status|WAITING |ASSIGNED

[Bug tree-optimization/56764] vect_prune_runtime_alias_test_list not smart enough

2015-06-11 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56764 alalaw01 at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2015-06-11 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 56764, which changed state. Bug 56764 Summary: vect_prune_runtime_alias_test_list not smart enough https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56764 What|Removed |Added

[Bug tree-optimization/56541] vectorizaton fails in conditional assignment of a constant

2015-06-11 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56541 alalaw01 at gcc dot gnu.org changed: What|Removed |Added CC||alalaw01 at gcc dot gnu.org

[Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code

2015-06-12 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 alalaw01 at gcc dot gnu.org changed: What|Removed |Added CC||alalaw01 at gcc dot gnu.org

[Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code

2015-06-12 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 --- Comment #8 from alalaw01 at gcc dot gnu.org --- Is there a case where the result is different with vs without all the extending/truncating? It seems we should need the extending/truncating on vectors exactly iff we need it on scalars?

[Bug tree-optimization/54803] Rotates are not vectorized

2015-06-12 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54803 alalaw01 at gcc dot gnu.org changed: What|Removed |Added Status|NEW |UNCONFIRMED

[Bug tree-optimization/56688] Fortran save statement prevents loop vectorization.

2015-06-12 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56688 alalaw01 at gcc dot gnu.org changed: What|Removed |Added CC||alalaw01 at gcc dot gnu.org

[Bug tree-optimization/54013] Loop with control flow not vectorized

2015-06-15 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013 alalaw01 at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed

[Bug tree-optimization/66558] New: Missed vectorization of loop with control flow

2015-06-16 Thread alalaw01 at gcc dot gnu.org
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alalaw01 at gcc dot gnu.org Target Milestone: --- Target: x86_64 ICC manages to vectorize the following loop, variants of which appear in several benchmarks: #define N 256 int a[N]; int find_last

[Bug tree-optimization/66558] Missed vectorization of loop with control flow

2015-06-16 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66558 --- Comment #1 from alalaw01 at gcc dot gnu.org --- Strategy could be similar to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013 except finding the last bit rather than the first (and no jump out of the loop). That is, in the loop body

[Bug tree-optimization/66558] Missed vectorization of loop with control flow

2015-06-16 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66558 --- Comment #2 from alalaw01 at gcc dot gnu.org --- This generalizes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65947, but vectorizing the predicate as a reduction is not sufficient here.

[Bug tree-optimization/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.

2015-06-16 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848 alalaw01 at gcc dot gnu.org changed: What|Removed |Added CC||alalaw01 at gcc dot gnu.org

[Bug tree-optimization/61171] vectorization fails for a reduction in presence of subtraction

2015-06-17 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61171 alalaw01 at gcc dot gnu.org changed: What|Removed |Added CC||alalaw01 at gcc dot gnu.org

[Bug target/65952] [AArch64] Will not vectorize storing induction of pointer addresses for LP64

2015-06-17 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65952 --- Comment #5 from alalaw01 at gcc dot gnu.org --- So the above example tends to get fully unrolled, but even on an example with 32 ptrs rather than 4, yes the vectorizer fails because of the multiplication - but the multiplication is gone by

[Bug target/65952] [AArch64] Will not vectorize storing induction of pointer addresses for LP64

2015-06-17 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65952 --- Comment #7 from alalaw01 at gcc dot gnu.org --- (In reply to Richard Biener from comment #6) > So aarch64 has no DImode vectors? Or just no DImode multiply (but it has a > DImode vector shift?). Yes, the latter.

[Bug target/65952] [AArch64] Will not vectorize storing induction of pointer addresses for LP64

2015-06-17 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65952 --- Comment #8 from alalaw01 at gcc dot gnu.org --- (In reply to alalaw01 from comment #7) > (In reply to Richard Biener from comment #6) > > So aarch64 has no DImode vectors? Or just no DImode multiply (but it has a > > DImo

[Bug tree-optimization/57600] Turn 2 comparisons into 1 with the min

2015-06-19 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57600 alalaw01 at gcc dot gnu.org changed: What|Removed |Added CC||alalaw01 at gcc dot gnu.org

[Bug target/64134] (vector float){0, 0, b, a} Uses stores when it does not need to

2015-06-26 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64134 alalaw01 at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2015-07-02 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 65946, which changed state. Bug 65946 Summary: Simple loop with if-statement not vectorized https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65946 What|Removed |Added ---

[Bug middle-end/65946] Simple loop with if-statement not vectorized

2015-07-02 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65946 alalaw01 at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED

[Bug target/65956] [5/6 Regression] Another ARM overaligned arg passing issue

2015-07-06 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65956 --- Comment #3 from alalaw01 at gcc dot gnu.org --- Author: alalaw01 Date: Mon Jul 6 16:58:16 2015 New Revision: 225465 URL: https://gcc.gnu.org/viewcvs?rev=225465&root=gcc&view=rev Log: [ARM] PR/65956 AAPCS update for alignment attrib

[Bug target/65956] [5/6 Regression] Another ARM overaligned arg passing issue

2015-07-06 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65956 --- Comment #4 from alalaw01 at gcc dot gnu.org --- Author: alalaw01 Date: Mon Jul 6 17:06:00 2015 New Revision: 225466 URL: https://gcc.gnu.org/viewcvs?rev=225466&root=gcc&view=rev Log: Fix eipa_src AAPCS issue (PR target/65956) 20

[Bug target/65956] [5/6 Regression] Another ARM overaligned arg passing issue

2015-07-06 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65956 --- Comment #5 from alalaw01 at gcc dot gnu.org --- Author: alalaw01 Date: Mon Jul 6 17:32:07 2015 New Revision: 225469 URL: https://gcc.gnu.org/viewcvs?rev=225469&root=gcc&view=rev Log: 2015-07-06 Alan Lawrence Backp

[Bug target/65956] [5/6 Regression] Another ARM overaligned arg passing issue

2015-07-06 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65956 --- Comment #6 from alalaw01 at gcc dot gnu.org --- Author: alalaw01 Date: Mon Jul 6 17:37:50 2015 New Revision: 225470 URL: https://gcc.gnu.org/viewcvs?rev=225470&root=gcc&view=rev Log: Backport r225466: tests from 'Fix eipa_sr

[Bug target/66791] New: Replace builtins with gcc vector extensions code

2015-07-07 Thread alalaw01 at gcc dot gnu.org
: target Assignee: unassigned at gcc dot gnu.org Reporter: alalaw01 at gcc dot gnu.org Blocks: 47562 Target Milestone: --- Target: arm Lots of ARM neon intrinsics are implemented using builtins backing onto patterns in neon.md. These are opaque to the

[Bug target/66964] Assembler error during ARM cross compile

2015-07-22 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66964 --- Comment #6 from alalaw01 at gcc dot gnu.org --- Bootstrap+test in progress FYI. However, that patch *does not* fix this failure; there must be some other route.

[Bug target/66964] Assembler error during ARM cross compile

2015-07-23 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66964 --- Comment #7 from alalaw01 at gcc dot gnu.org --- No new regressions bootstrapping that path on gcc-5-branch (--with-arch=armv7-a --with-fpu=neon-fp16 --with-float=hard). However, compiling the testcase with -dp reveals the bad strd'

[Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.

2015-07-28 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679 alalaw01 at gcc dot gnu.org changed: What|Removed |Added CC||alalaw01 at gcc dot gnu.org

[Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.

2015-07-29 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679 --- Comment #35 from alalaw01 at gcc dot gnu.org --- So it should be happening in dom2. On x86, input to dom2 is vect_cst_.9_31 = { 0, 1, 2, 3 }; [...]MEM[(int *)&a] = vect_cst_.9_31; [...]vect__13.3_20 = MEM[(int *)&a]; resu

[Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.

2015-08-03 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679 --- Comment #37 from alalaw01 at gcc dot gnu.org --- Hmmm, no it's not the hashing - that pretty much ignores all types. It's the comparison in hashable_expr_equal_p, which just uses operand_equal_p, specifically this part (in fo

[Bug tree-optimization/67283] GCC regression over inlining of returned structures

2015-08-27 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67283 --- Comment #7 from alalaw01 at gcc dot gnu.org --- Author: alalaw01 Date: Thu Aug 27 15:40:10 2015 New Revision: 227265 URL: https://gcc.gnu.org/viewcvs?rev=227265&root=gcc&view=rev Log: completely_scalarize arrays as well as reco

[Bug tree-optimization/67283] GCC regression over inlining of returned structures

2015-08-27 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67283 alalaw01 at gcc dot gnu.org changed: What|Removed |Added CC||alalaw01 at gcc dot gnu.org

[Bug tree-optimization/67283] GCC regression over inlining of returned structures

2015-08-28 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67283 --- Comment #12 from alalaw01 at gcc dot gnu.org --- Author: alalaw01 Date: Fri Aug 28 15:04:17 2015 New Revision: 227303 URL: https://gcc.gnu.org/viewcvs?rev=227303&root=gcc&view=rev Log: Revert: completely_scalarize arrays as well as

<    1   2