[Bug tree-optimization/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-03 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #10 from alalaw01 at gcc dot gnu.org ---
The stores are getting optimized out because equal_mem_array_ref_p considers
equal pairs of MEM_REFS like

fmcom.x[_168] and fmcom.x[_208]

That is, a ARRAY_REF whose first operand is a COMPONENT_REF fmcom.x (of a
VAR_DECL and a FIELD_DECL), and whose second operand is an SSA_NAME _168 or
_208; I don't see anything obvious to suggest that they should be equal).

get_ref_base_and_extent then returns base=fmcom, size=64, max_size=64 (so not a
variable-sized access), and offset 0 :-(.

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-04 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #20 from alalaw01 at gcc dot gnu.org ---
Hmmm, hang on. In unport.fppized.f, shouldn't we be using the 'F2C/GCC COMPILER
ON PC RUNNING UNIX (LINUX,BSD386,ETC)' version? In which case X has size (1)
everywhere?

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-04 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|DUPLICATE   |FIXED

--- Comment #23 from alalaw01 at gcc dot gnu.org ---
Well, this one is not fixed by -fno-aggressive-loop-optimizations.

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-05 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #27 from alalaw01 at gcc dot gnu.org ---
(In reply to Richard Biener from comment #25)
> (In reply to alalaw01 from comment #23)
> > Well, this one is not fixed by -fno-aggressive-loop-optimizations.
> 
> No, that just disabled one symptom of the issue at that point in time. 
> Fixing the issue also fixes this occurance (well, I hope so ;))

So by "fixing the issue" - we mean, making --std=legacy prevent this (as
although against the SPEC, colleagues with more FORTRAN knowledge than I
suggest this is common)? SPEC seem to be saying they will not change the
source: https://www.spec.org/cpu2006/Docs/faq.html#Run.05


As Jakub suggested in comment #13:

> So, perhaps we want some flag on the Fortran COMMON decls that would be set 
> on > COMMON that ends with an array and would tell get_ref_base_and_extent 
> (and
> other spots?) that accesses can be beyond end of the decl?

but only if --std=legacy ? ? ?

Should I raise a new bug for this, as both this and 53068 are CLOSED?

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-05 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #32 from alalaw01 at gcc dot gnu.org ---
(In reply to rguent...@suse.de from comment #31)
>
> Thus a "fix" for the case where treating a[i] as a[0] is the issue
> would be
> 
> Index: gcc/tree-dfa.c
> ===
> --- gcc/tree-dfa.c  (revision 233172)
> +++ gcc/tree-dfa.c  (working copy)
> @@ -617,7 +617,11 @@ get_ref_base_and_extent (tree exp, HOST_
>if (maxsize == -1
>   && DECL_SIZE (exp)
>   && TREE_CODE (DECL_SIZE (exp)) == INTEGER_CST)
> -   maxsize = wi::to_offset (DECL_SIZE (exp)) - bit_offset;
> +   {
> + maxsize = wi::to_offset (DECL_SIZE (exp)) - bit_offset;
> + if (maxsize == size)
> +   maxsize = -1;
> +   }
>  }
>else if (CONSTANT_CLASS_P (exp))
>  {

Maybe if we only did that for DECL_COMMONs if -std=legacy was in force?

Tho as you say:

> but that wouldn't fix the aggressive-loop optimization issue as that is
> _not_ looking at DECL_SIZE but at the array types domain.

I wonder if we can't get both places looking at the same thing (DECL_SIZE or
array type domain), but I haven't looked into that at all.

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-08 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #33 from alalaw01 at gcc dot gnu.org ---
(In reply to rguent...@suse.de from comment #31)
> 
> Thus a "fix" for the case where treating a[i] as a[0] is the issue
> would be
> 
> Index: gcc/tree-dfa.c
> ===
> --- gcc/tree-dfa.c  (revision 233172)
> +++ gcc/tree-dfa.c  (working copy)
> @@ -617,7 +617,11 @@ get_ref_base_and_extent (tree exp, HOST_
>if (maxsize == -1
>   && DECL_SIZE (exp)
>   && TREE_CODE (DECL_SIZE (exp)) == INTEGER_CST)
> -   maxsize = wi::to_offset (DECL_SIZE (exp)) - bit_offset;
> +   {
> + maxsize = wi::to_offset (DECL_SIZE (exp)) - bit_offset;
> + if (maxsize == size)
> +   maxsize = -1;
> +   }
>  }
>else if (CONSTANT_CLASS_P (exp))
>  {

So is there a case where we want this for C ?

If I declare a struct with a VLA, and access it through a pointer - GCC
recognizes the VLA idiom and keeps the accesses. If I access it from a decl,
yes we optimize away the out-of-bounds accesses (in FRE, long before we reach
the tree-ssa-scopedtables changes). So OK, if I access it from a extern or
__attribute__((weak) decl, which I then get the linker to replace with a bigger
decl, then I get "wrong" code (it ignores the extra elements in the bigger
decl) - but I'd say that was invalid code.

So if this is Fortran-only, we probably have to hook off --std=legacy, right?

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-09 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #37 from alalaw01 at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #36)
> As Richard said, you can do similar (invalid too) stuff in C too, say:
> struct S { int a[1]; } s;
> in one TU and
> struct S { int a[1]; } s;
> 
> int
> foo (int x)
> {
>   return s.a[x];
> }
> 
> int
> bar (int x)
> {
>   return s.a[1 + x] + s.a[0] + s.a[x];
> }
> 
> GCC 5 would compile it to what the author might have meant, while GCC 6 will
> optimize bar into s.a[0] * 3;

Yes, this was what I meant in comment #33. The question is, do we care? (Or, do
we only care in the FORTRAN case?)

If so, then we presumably want a -fbroken-common-blocks (or something!) that is
not FE-specific.

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-17 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #39 from alalaw01 at gcc dot gnu.org ---
Created attachment 37726
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37726&action=edit
Proposed patch (without flag).

Here's a prototype patch, that sets TYPE_SIZE to NULL_TREE but leaves DECL_SIZE
intact.  For the moment I'm applying this universally, rather than gating under
a flag, to ease testing check-fortran.  Only
gfortran.dg/gomp/appendix-a/a.24.1.f90 fails; in practice I think it's OK just
to not use the new code in conjunction with -fopenmp.

On AArch64, it fixes the 416.gamess issue, and allows compiling 416.gamess
without the -fno-aggressive-loop-optimizations previously required.

Also bootstraps and passes check-gcc check-fortran check-g++, on aarch64 and
x86_64, except as noted above. I expect to add a Fortran-only flag to gate the
trans-common.c changes before taking this to gcc-patches@ .

The worry is that while many cases in the mid-end were happy with a null
TYPE_SIZE, I still had to patch up a couple, so the worry is I might not have
got them all.  (Indeed, omp-low.c had too many!) I'm not sure this is any worse
than adding a new flag to the decl (indicating that the DECL_SIZE is not to be
trusted) and then trying to find all the cases where the DECL_SIZE is wrongly
relied upon - with the latter approach, the compiler would generate invalid
code, rather than "failing fast".

Thoughts welcome!

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-18 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #43 from alalaw01 at gcc dot gnu.org ---
Yeah, I plan to add a fortran-specific option for this, it's easy enough, but I
can't run the gfortran testsuite with that, because there are lots of C files
in there too, for which the compiler doesn't accept the option...

I'm having trouble writing a testcase though. My subroutine with

IMPLICIT DOUBLE PRECISION (X)
COMMON /MYCOMMON / X(1)

produces "mycommon.x" a COMPONENT_REF, but with "mycommon" being a MEM_REF,
which requires only the hunk to tree-dfa.c to handle correctly; whereas in
SPEC2006, what looks to me to be equivalent FORTRAN, ends up with "mycommon"
being a VAR_DECL, which requires the much-bigger patch to the fortran FE...

I've very little fortran experience here, any tips?

Thanks, Alan

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-18 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #53 from alalaw01 at gcc dot gnu.org ---
(In reply to Thomas Koenig from comment #44)
> I don't have access to SPEC, so I can only guess... Is there maybe an 
> equivalence involved, something like

Turns out the COMMON is accessed via a MEM_REF in a loop, or as a VAR_DECL
inside. Go figure! :)

(In reply to Dominique d'Humieres from comment #49)
> I don't see the point to add yet another option just because "SPEC does not
> want to change the invalid Fortran". I think SPEC should be run with the
> option(s) causing the problem disabled.

Anecdotally I hear from Fortran-using colleagues this may occur in other places
too. Moreover, the list of phases using get_ref_base_and_extent, is long; we
could end up compiling with an ever-growing -fno-this -fno-that as more and
more phases make use of the "bad" analysis results (that is correct by the
language spec after all). In this case, there are a few other equivalences
found due to the tree-ssa-scopedtables.c changes, that we'd lose with
-fno-tree-dominator-opts, too.

(In reply to H.J. Lu from comment #52)
>
>So, there is nothing to fix in GCC? Why isn't this bug closed as invalid?

Not everyone wants to patch SPEC sources.

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-23 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #77 from alalaw01 at gcc dot gnu.org ---
(In reply to rguent...@suse.de from comment #72)
> 
> Patch as posted passed bootstrap & regtest.  Adjusted according to 
> comments but not tested otherwise - please somebody throw at
> unpatched 416.gamess.

Still miscompares on aarch64, I'm afraid. (Both with and without
-fno-aggressive-loop-optimizations.)

Also where Jakub wrote:
> If you want to go this way, I'd at least key it off DECL_COMMON on the decl.
> And instead of multiplying max_size by 2 perhaps just add BITS_PER_UNIT?

I wonder why you prefer setting such an arbitrary guess at max_size rather than
going with -1 which is defined as "unknown" ?

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-23 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #79 from alalaw01 at gcc dot gnu.org ---
(In reply to rguent...@suse.de from comment #78)
>
> That would pessimize it too much IMHO.

I'm not sure how to evaluate the pessimization, given it's thought to be a
widespread pseudo-FORTRAN construct; so I probably have to defer to your
judgement here. However...

Given maxsize of an array as two elements, say, would the compiler not be
entitled to optimize an index selection down to, say, computing only the LSBit
of the actual index?  Whereas 'unknown' means, well, exactly what is the case.
So I fear this is storing problems up for the future.

Is the concern that we can't hide this behind an option, as that would "drive
people away from gfortran" ? If that's the case, can we hide it behind an
option that defaults to pessimization (?? at least for fortran)??

[Bug middle-end/66877] [6 Regression] FAIL: gcc.dg/vect/vect-over-widen-3-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_over_widening_pattern: detected" 2

2016-02-23 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66877

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from alalaw01 at gcc dot gnu.org ---
Fix committed r232720.

[Bug tree-optimization/65963] Missed vectorization of loads strided with << when equivalent * succeeds

2016-02-23 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65963

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from alalaw01 at gcc dot gnu.org ---
Can I class this as fixed?

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-03-02 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #82 from alalaw01 at gcc dot gnu.org ---
For those who haven't seen it, I've put forward this patch on the mailing list:
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01746.html based on a suggestion
from Jakub. (Unlike Richi's comment72 patch, this fixes 416.gamess on AArch64.)

[Bug bootstrap/60632] ICE in regcprop.c (copyprop_hardreg_forward_1)

2016-03-03 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60632

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 CC||alalaw01 at gcc dot gnu.org
 Resolution|--- |WORKSFORME

--- Comment #2 from alalaw01 at gcc dot gnu.org ---
Sorry, no idea...

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-03-03 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #84 from alalaw01 at gcc dot gnu.org ---
Bah. Do you normally use -fno-aggressive-loop-optimizations? With
-funknown-commons, did you try with/out aggressive loop opts?
Powerpc{,64}{be,le} ?

The unknown-commons testcase I included in that patch looks to pass on
powerpc64le-unknown-linux-gnu.

Does HJ Lu's spec source-patching work on powerpc following r232559?

I am not a lawyer...but I don't think the SPEC2006 license allows me to upload
onto the GCC Compile Farm and runspec. So if you could narrow down to an object
file that's broken with a recent compiler and -funknown-commons, with the rest
compiled with a gcc prior to r232508, that'd be very helpful - then I could see
what assembly I'm changing (and what expressions equal_mem_array_ref is falsely
declaring equivalent)...?

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-03-03 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

--- Comment #87 from alalaw01 at gcc dot gnu.org ---
Great, many thanks for the tests, I was worried if we had hit another distinct
issue. (Of course this would be better on gcc-patches!)

[Bug tree-optimization/70013] [6 Regression] packed structure tree-sra loses initialization

2016-03-07 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70013

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 CC||alalaw01 at gcc dot gnu.org

--- Comment #4 from alalaw01 at gcc dot gnu.org ---
Hmmm. First thing I notice is that the type of d (struct S0[2]) is not
scalarizable_type_p, but passes type_internals_preclude_sra_p. Changing the
latter to bail out on DECL_BIT_FIELD (as the former does) fixes the ICE, but
I'm not yet sure we want to do that.

[Bug tree-optimization/70013] [6 Regression] packed structure tree-sra loses initialization

2016-03-07 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70013

--- Comment #5 from alalaw01 at gcc dot gnu.org ---
Prior to SRA, we have
  d = *.LC0;
  d$0$f0_7 = MEM[(struct S0[2] *)&*.LC0].f0;
  e$f0_9 = MEM[(struct S0[2] *)&d + 3B].f0;
  _3 = (int) d$0$f0_7;
  c = _3;
  _5 = (int) e$f0_9;
  __builtin_printf ("%x\n", _5);

sra_modify_assign for d=*.LC0 ends up in load_assign_lhs_subreplacements, where
d has two children; the second is grp_to_be_replaced, but because we did not 
completely_scalarize LC0, there is an access to only the first half of *.LC0,
and no corresponding RHS for the second half of d ('racc =
find_access_in_subtree (sad->top_racc, offset, lacc->size' returns null). So we
generate the bad

d$3$f0_14 = MEM[(struct S0[2] *)&d + 3B].f0;

that is, initializing the scalar replacement for the second half of d, with a
value read from the first half of d.

[Bug tree-optimization/70013] [6 Regression] packed structure tree-sra loses initialization

2016-03-07 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70013

--- Comment #6 from alalaw01 at gcc dot gnu.org ---
Ugh, initializing the scalar replacement for the first half of d, with a value
read from the first half of d (should be from the first half of *.LC0).

[Bug tree-optimization/70013] [6 Regression] packed structure tree-sra loses initialization

2016-03-07 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70013

--- Comment #7 from alalaw01 at gcc dot gnu.org ---
*second* half, sorry. grp_to_be_replaced is here true, but
grp_unscalarized_data is false, so handle_unscalarized_data_in_subtree sets
sad->refreshed=UDH_LEFT and we build the access to the LHS. (Then,
load_assign_lhs_subreplacements exits, and the caller sees UDH_LEFT and removes
the original block move statement.)

In contrast, on a similar testcase using a parameter rather than *.LC0,
grp_unscalarized_data is true, handle_unscalarized_data_in_subtree sets
sad->refreshed=UDH_RIGHT and we build an access to the RHS, which is OK; and
leave the block move statement in place, hence correctness.

[Bug tree-optimization/70013] [6 Regression] packed structure tree-sra loses initialization

2016-03-07 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70013

--- Comment #9 from alalaw01 at gcc dot gnu.org ---
In analyze_access_subtree (since r147980, "New implementation of SRA", 2009):

  else if (root->grp_write || TREE_CODE (root->base) == PARM_DECL)
root->grp_unscalarized_data = 1; /* not covered and written to */

adding a case for constant_decl_p alongside the PARM_DECL case, fixes the ICE;
AArch64 bootstrap in progress.

[Bug tree-optimization/67681] Missed vectorization: induction variable used after loop

2016-03-09 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67681

--- Comment #3 from alalaw01 at gcc dot gnu.org ---
So in the not-vectorized case (-DFOO=1), we get for the inner loop:

:
  # i_27 = PHI 
  _8 = (long unsigned int) i_27;
  _9 = _8 * 4;
  _11 = data_10(D) + _9;
  _13 = *_11;
  _14 = _13 + j_23;
  *_11 = _14;
  i_16 = i_27 + 1;
  if (i_16 <= max_24)
goto ;
  else
goto ;

  :
  goto ;

  :
  # i_32 = PHI 

the loop exit phi, i_32=PHI, makes i_16=i_27+1 relevant
(vec_stmt_relevant_p: used out of loop.), so we go through that on the worklist
and then i_27=PHI, marking the phi as STMT_VINFO_LIVE_P, and
hence "not vectorized: value used after loop". Kind of as expected, FORNOW.

In the -DFOO=0 case, a bunch of loop peeling, header-copying, and other
transforms, end up with this input to vectorization:

  : //header of inner loop
  # i_2 = PHI 
  _8 = (long unsigned int) i_2;
  _9 = _8 * 4;
  _11 = data_10(D) + _9;
  _12 = *_11;
  _13 = _12 + j_26;
  *_11 = _13;
  i_15 = i_2 + 1;
  if (max_7 >= i_15)
goto ;
  else
goto ;

  :
  goto ;

  : //bb 5 is only predecessor
  _19 = (unsigned int) i_25;
  _18 = (unsigned int) max_7;
  _17 = (unsigned int) i_25;
  _5 = _18 - _17;
  _4 = _5 + _19;
  _3 = _4 + 1;
  i_21 = (int) _3;

  :
  # i_23 = PHI 
  //tests outer loop

note bb7 use i_25, not i_2; so neither i_15 nor i_2 escape the loop, and we
don't have the problem from above. (Yes bb7 is taking i_25 away from max_7 and
then adding it back on again, before adding 1, to give the value of i after the
inner loop.)

This arrangement of multiple i's live at the same time, is not present in
107t.ch2. 130t.loopinit introduces i_21, computed by an exit phi on leaving the
inner loop. 135t.sccp then changes this to the max_7-i_25+i_25 sequence which
removes the dependency on i_15 and allows vectorization.

[Bug tree-optimization/67681] Missed vectorization: induction variable used after loop

2016-03-09 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67681

--- Comment #4 from alalaw01 at gcc dot gnu.org ---
loopinit introduces the exit phi in much the same way for both -DFOO=0 and
-DFOO=1, so the difference is in sccp.

In the -DFOO=0 case, sccp does this (removing TODO_cleanup_cfg from
pass_data_scev_cprop to make the diff easier, still vectorizes):

 ;; Function addlog2 (addlog2, funcdef_no=0, decl_uid=2749, cgraph_uid=0,
symbol_order=0)

+
+final value replacement:
+  i_21 = PHI 
+  with
+  i_21 = (int) _3;
+
...[snip]...
   :
-  # i_21 = PHI 
+  _19 = (unsigned int) i_25;
+  _18 = (unsigned int) max_7;
+  _17 = (unsigned int) i_25;
+  _5 = _18 - _17;
+  _4 = _5 + _19;
+  _3 = _4 + 1;
+  i_21 = (int) _3;

In the -DFOO=1 case, sccp doesn't do anything; and adding -fno-tree-scev-cprop
prevents vectorization of the -DFOO=0 case.

[Bug tree-optimization/70013] [6 Regression] packed structure tree-sra loses initialization

2016-03-09 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70013

--- Comment #10 from alalaw01 at gcc dot gnu.org ---
Hmmm, so this fixes the ICE, generating:

  SR.5_12 = MEM[(struct S0[2] *)&*.LC0].f0;
  MEM[(struct S0[2] *)&*.LC0].f0 = SR.5_12;
  d = *.LC0;
  d$3$f0_14 = MEM[(struct S0[2] *)&*.LC0 + 3B].f0;
  d$0$f0_7 = SR.5_12;
  e$f0_9 = d$3$f0_14;
  _3 = (int) d$0$f0_7;
  c = _3;
  _5 = (int) e$f0_9;
  __builtin_printf ("%x\n", _5);
  d ={v} {CLOBBER};
  return 0;

which in -fdump-tree-optimized (at -O1) looks like:

  SR.5_12 = MEM[(struct S0[2] *)&*.LC0].f0;
  d$3$f0_14 = MEM[(struct S0[2] *)&*.LC0 + 3B].f0;
  _3 = (int) SR.5_12;
  c = _3;
  _5 = (int) d$3$f0_14;
  __builtin_printf ("%x\n", _5);
  return 0;

which is much saner. But I don't really understand why the PARM_DECL case that
I'm adding to here is that way (since r147980 "New implementation of SRA" in
2009, https://gcc.gnu.org/ml/gcc-patches/2009-04/msg02218.html)...

Bootstrapped+regtest on AArch64 (c,c++) and ARM (c,c++,ada), no regressions.
(Constants don't get pushed into the pool on x86.)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index
72157edd02e3235e57b786bbf460c94b0c52b2c5..24eac6ae7c4dcd41358b1a020047076afe1a8106
100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2427,7 +2427,8 @@ analyze_access_subtree (struct access *root, struct
access *parent,

   if (!hole || root->grp_total_scalarization)
 root->grp_covered = 1;
-  else if (root->grp_write || TREE_CODE (root->base) == PARM_DECL)
+  else if (root->grp_write || TREE_CODE (root->base) == PARM_DECL
+  || constant_decl_p (root->base))
 root->grp_unscalarized_data = 1; /* not covered and written to */
   return sth_created;
 }

[Bug tree-optimization/67681] Missed vectorization: induction variable used after loop

2016-03-09 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67681

--- Comment #5 from alalaw01 at gcc dot gnu.org ---
In the -DFOO=0 case, we have peeled an extra copy of the inner loop condition,
i <= max_7, above the loop. scalar evolution (final_value_replacement_loop)
works, because it sees the inner loop goes round niter = (unsigned int) max_7 -
(unsigned int) i_25 iterations, and compute_overall_effect_of_inner_loop gives
us

(int) (((unsigned int) i_25 + ((unsigned int) max_7 - (unsigned int) i_25)) +
1)

which is not expression_expensive_p, so we do it. Hence the add/subtract above.

When -DFOO=1, we have not done that peeling, so niter = i_22 <= max_24 ?
(unsigned int) max_24 - (unsigned int) i_22 : 0, and
compute_overall_effect_of_inner_loop gives us

(i_22 + 1) + (i_22 <= max_24 ? (int) ((unsigned int) max_24 - (unsigned int)
i_22) : 0)

which is expression_expensive_p, so we don't do the final value replacement.

[Bug tree-optimization/67681] Missed vectorization: induction variable used after loop

2016-03-10 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67681

--- Comment #7 from alalaw01 at gcc dot gnu.org ---
Looking at where the peeling happens. In both -DFOO=0 and -DFOO=1 cases,
107.ch2 peels the inner loop header, so there is an i<=max test in the outer
loop before the inner loop. However, in the -DFOO=1 case, this is dominated by
the extra i>max test (that breaks out of the outer loop), so 110.dom2 removes
the peeled i<=max.

Thus, just before sccp, in the -DFOO=0 case, we have:

  :
  # i_25 = PHI 
  # j_26 = PHI 
  max_7 = 1 << j_26;
  if (max_7 >= i_25)
goto ;
  else
goto ; //skip inner loop

  : //inner loop header
  # i_2 = PHI 
  _8 = (long unsigned int) i_2;
  _9 = _8 * 4;
  _11 = data_10(D) + _9;
  _12 = *_11;
  _13 = _12 + j_26;
  *_11 = _13;
  i_15 = i_2 + 1;
  if (max_7 >= i_15)
goto ; //cleaned, actually via latch
  else
goto ;

note the inner loop exits if !(max_7 >= i_15), and when we hit the inner loop,
we know that (max_7 >= i_25). Whereas in the -DFOO=1 case:
  :
  goto ;

  : //in outer loop
  max_7 = 1 << j_17;
  if (max_7 < i_32)
goto ;
  else
goto ;

  : //outer loop header
  # max_24 = PHI 
  # i_22 = PHI 
  # j_23 = PHI 

  : //inner loop header
  # i_27 = PHI 
  _8 = (long unsigned int) i_27;
  _9 = _8 * 4;
  _11 = data_10(D) + _9;
  _13 = *_11;
  _14 = _13 + j_23;
  *_11 = _14;
  i_16 = i_27 + 1;
  if (i_16 <= max_24)
goto ; //cleaned, actually via latch
  else
goto ;

the inner loop exits if !(max_24 >= i_16), but max_24 is defined as PHI, and we only have that max_7max) break" out of the loop, such that the outer loop now executes
"if (i>max) break" after the inner loop (rather than testing "if (i>max) break"
before the inner loop, as it still did following 107.ch2). So as an
alternative, possibly tweaking the jump-threading/loop-peeling heuristics might
help (?).

[Bug tree-optimization/67681] Missed vectorization: induction variable used after loop

2016-03-10 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67681

--- Comment #8 from alalaw01 at gcc dot gnu.org ---
Indeed, the -DFOO=1 case vectorizes with -fno-tree-dominator-opts.

[Bug tree-optimization/70013] [6 Regression] packed structure tree-sra loses initialization

2016-03-11 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70013

--- Comment #12 from alalaw01 at gcc dot gnu.org ---
Thanks, Martin - yes, I see.

Patch posted at https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00680.html after
full regtest.

[Bug tree-optimization/70013] [6 Regression] packed structure tree-sra loses initialization

2016-03-11 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70013

--- Comment #13 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Fri Mar 11 12:08:01 2016
New Revision: 234138

URL: https://gcc.gnu.org/viewcvs?rev=234138&root=gcc&view=rev
Log:
Fix PR/70013

gcc:

PR tree-optimization/70013
* tree-sra.c (analyze_access_subtree): Also set grp_unscalarized_data
for constant-pool entries.

gcc/testsuite:

* gcc.dg/tree-ssa/sra-20.c: New.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/sra-20.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-sra.c

[Bug middle-end/70189] New: Combine constant-pool logic from gimplify + SRA

2016-03-11 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70189

Bug ID: 70189
   Summary: Combine constant-pool logic from gimplify + SRA
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---

Following PR/63679 (r232506), gimplify.c (gimplify_init_constructor) uses lots
of heuristics to choose between pushing initializers out to the constant pool
(by calling tree_output_constant_def) or outputting many elementwise
statements. Then, in tree-sra.c (analyze_all_variable_accesses), we use more
heuristics to decide which constant-pool loads to completely_scalarize, 
turning those back into elementwise statements. (These get pulled back in from
the constant pool and the constant-pool entry deleted.) Both of these sets of
heuristics are platform dependent (gimplify.c uses can_move_by_pieces,
CLEAR_RATIO; tree-sra.c uses get_move_ratio).

Instead we should put all this logic in one place; this would make it clearer,
and we'd probably get better overall decisions. The suggestion is for
gimplify.c to always push out to the constant pool, as this makes initial tree
the same on all platforms, and for all the logic/heuristics to go into SRA (as,
being later, we then have more information available to maybe make better
decisions in the future).

[Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.

2016-03-11 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #43 from alalaw01 at gcc dot gnu.org ---
I think this can be closed now? I've raised PR/70189 for the followup
enhancement.

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-03-11 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|alalaw01 at gcc dot gnu.org|unassigned at gcc dot 
gnu.org

--- Comment #88 from alalaw01 at gcc dot gnu.org ---
Can this now be closed, or should I leave open for possible Fortran FE
warnings?

[Bug target/60825] [AArch64] int64x1_t, uint64x1_t and float64x1_t are not treated as vector types

2014-06-23 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60825

--- Comment #3 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Mon Jun 23 12:46:52 2014
New Revision: 211892

URL: https://gcc.gnu.org/viewcvs?rev=211892&root=gcc&view=rev
Log:
PR/60825 Make float64x1_t in arm_neon.h a proper vector type

gcc/ChangeLog:
PR target/60825
* config/aarch64/aarch64.c (aarch64_simd_mangle_map): Add entry for
V1DFmode.
* config/aarch64/aarch64-builtins.c (aarch64_simd_builtin_type_mode):
add V1DFmode
(BUILTIN_VD1): New.
(BUILTIN_VD_RE): Remove.
(aarch64_init_simd_builtins): Add V1DF to modes/modenames.
(aarch64_fold_builtin): Update reinterpret patterns, df becomes v1df.
* config/aarch64/aarch64-simd-builtins.def (create): Make a v1df
variant but not df.
(vreinterpretv1df*, vreinterpret*v1df): New.
(vreinterpretdf*, vreinterpret*df): Remove.
* config/aarch64/aarch64-simd.md (aarch64_create, aarch64_reinterpret*):
Generate V1DFmode pattern not DFmode.
* config/aarch64/iterators.md (VD_RE): Include V1DF, remove DF.
(VD1): New.
* config/aarch64/arm_neon.h (float64x1_t): typedef with gcc extensions.
(vcreate_f64): Remove cast, use v1df builtin.
(vcombine_f64): Remove cast, get elements with gcc vector extensions.
(vget_low_f64, vabs_f64, vceq_f64, vceqz_f64, vcge_f64, vgfez_f64,
vcgt_f64, vcgtz_f64, vcle_f64, vclez_f64, vclt_f64, vcltz_f64,
vdup_n_f64, vdupq_lane_f64, vld1_f64, vld2_f64, vld3_f64, vld4_f64,
vmov_n_f64, vst1_f64): Use gcc vector extensions.
(vget_lane_f64, vdupd_lane_f64, vmulq_lane_f64, ): Use gcc extensions,
add range check using __builtin_aarch64_im_lane_boundsi.
(vfma_lane_f64, vfmad_lane_f64, vfma_laneq_f64, vfmaq_lane_f64,
vfms_lane_f64, vfmsd_lane_f64, vfms_laneq_f64, vfmsq_lane_f64): Fix
type signature, use gcc vector extensions.
(vreinterpret_p8_f64, vreinterpret_p16_f64, vreinterpret_f32_f64,
vreinterpret_f64_f32, vreinterpret_f64_p8, vreinterpret_f64_p16,
vreinterpret_f64_s8, vreinterpret_f64_s16, vreinterpret_f64_s32,
vreinterpret_f64_s64, vreinterpret_f64_u8, vreinterpret_f64_u16,
vreinterpret_f64_u32, vreinterpret_f64_u64, vreinterpret_s8_f64,
vreinterpret_s16_f64, vreinterpret_s32_f64, vreinterpret_s64_f64,
vreinterpret_u8_f64, vreinterpret_u16_f64, vreinterpret_u32_f64,
vreinterpret_u64_f64): Use v1df builtin not df.

gcc/testsuite/ChangeLog:
* g++.dg/abi/mangle-neon-aarch64.C: Also test mangling of float64x1_t.
* gcc.target/aarch64/aapcs/test_64x1_1.c: New test.
* gcc.target/aarch64/aapcs/func-ret-64x1_1.c: New test.
* gcc.target/aarch64/simd/ext_f64_1.c (main): Compare vector elements.
* gcc.target/aarch64/vadd_f64.c: Rewrite with macro to use vector types.
* gcc.target/aarch64/vsub_f64.c: Likewise.
* gcc.target/aarch64/vdiv_f.c (INDEX*, RUN_TEST): Remove indexing scheme
as now the same for all variants.
* gcc.target/aarch64/vrnd_f64_1.c (compare_f64): Return float64_t not
float64x1_t.

Added:
trunk/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-64x1_1.c
trunk/gcc/testsuite/gcc.target/aarch64/aapcs64/test_64x1_1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64-builtins.c
trunk/gcc/config/aarch64/aarch64-simd-builtins.def
trunk/gcc/config/aarch64/aarch64-simd.md
trunk/gcc/config/aarch64/aarch64.c
trunk/gcc/config/aarch64/arm_neon.h
trunk/gcc/config/aarch64/iterators.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/g++.dg/abi/mangle-neon-aarch64.C
trunk/gcc/testsuite/gcc.target/aarch64/simd/ext_f64_1.c
trunk/gcc/testsuite/gcc.target/aarch64/vadd_f64.c
trunk/gcc/testsuite/gcc.target/aarch64/vdiv_f.c
trunk/gcc/testsuite/gcc.target/aarch64/vrnd_f64_1.c
trunk/gcc/testsuite/gcc.target/aarch64/vsub_f64.c


[Bug target/60825] [AArch64] int64x1_t, uint64x1_t and float64x1_t are not treated as vector types

2014-06-23 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60825

--- Comment #4 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Mon Jun 23 14:07:42 2014
New Revision: 211894

URL: https://gcc.gnu.org/viewcvs?rev=211894&root=gcc&view=rev
Log:
PR/60825 Make {int,uint}64x1_t in arm_neon.h a proper vector type

gcc/ChangeLog:
 PR target/60825
* config/aarch64/aarch64-builtins.c (aarch64_types_unop_qualifiers):
Ignore third operand if present by marking qualifier_internal.

* config/aarch64/aarch64-simd-builtins.def (abs): Comment.

* config/aarch64/arm_neon.h (int64x1_t, uint64x1_t): Typedef to GCC
vector extension.
(aarch64_vget_lane_s64, aarch64_vdup_lane_s64,
arch64_vdupq_lane_s64, aarch64_vdupq_lane_u64): Remove macro.
(vqadd_s64, vqadd_u64, vqsub_s64, vqsub_u64, vqneg_s64, vqabs_s64,
vcreate_s64, vcreate_u64, vreinterpret_s64_f64, vreinterpret_u64_f64,
vcombine_u64, vbsl_s64, vbsl_u64, vceq_s64, vceq_u64, vceqz_s64,
vceqz_u64, vcge_s64, vcge_u64, vcgez_s64, vcgt_s64, vcgt_u64,
vcgtz_s64, vcle_s64, vcle_u64, vclez_s64, vclt_s64, vclt_u64,
vcltz_s64, vdup_n_s64, vdup_n_u64, vld1_s64, vld1_u64, vmov_n_s64,
vmov_n_u64, vqdmlals_lane_s32, vqdmlsls_lane_s32,
vqdmulls_lane_s32, vqrshl_s64, vqrshl_u64, vqrshl_u64, vqshl_s64,
vqshl_u64, vqshl_n_s64, vqshl_n_u64, vqshl_n_s64, vqshl_n_u64,
vqshlu_n_s64, vrshl_s64, vrshl_u64, vrshr_n_s64, vrshr_n_u64,
vrsra_n_s64, vrsra_n_u64, vshl_n_s64, vshl_n_u64, vshl_s64,
vshl_u64, vshr_n_s64, vshr_n_u64, vsli_n_s64, vsli_n_u64,
vsqadd_u64, vsra_n_s64, vsra_n_u64, vsri_n_s64, vsri_n_u64,
vst1_s64, vst1_u64, vtst_s64, vtst_u64, vuqadd_s64): Wrap existing
logic in GCC vector extensions

(vpaddd_s64, vaddd_s64, vaddd_u64, vceqd_s64, vceqd_u64, vceqzd_s64
vceqzd_u64, vcged_s64, vcged_u64, vcgezd_s64, vcgtd_s64, vcgtd_u64,
vcgtzd_s64, vcled_s64, vcled_u64, vclezd_s64, vcltd_s64, vcltd_u64,
vcltzd_s64, vqdmlals_s32, vqdmlsls_s32, vqmovnd_s64, vqmovnd_u64
vqmovund_s64, vqrshld_s64, vqrshld_u64, vqrshrnd_n_s64,
vqrshrnd_n_u64, vqrshrund_n_s64, vqshld_s64, vqshld_u64,
vqshld_n_u64, vqshrnd_n_s64, vqshrnd_n_u64, vqshrund_n_s64,
vrshld_u64, vrshrd_n_u64, vrsrad_n_u64, vshld_n_u64, vshld_s64,
vshld_u64, vslid_n_u64, vsqaddd_u64, vsrad_n_u64, vsrid_n_u64,
vsubd_s64, vsubd_u64, vtstd_s64, vtstd_u64): Fix type signature.

(vabs_s64): Use GCC vector extensions; call __builtin_aarch64_absdi.

(vget_high_s64, vget_high_u64): Reimplement with GCC vector
extensions.

(__GET_LOW, vget_low_u64): Wrap result using vcreate_u64.
(vget_low_s64): Use __GET_LOW macro.
(vget_lane_s64, vget_lane_u64, vdupq_lane_s64, vdupq_lane_u64): Use
gcc vector extensions, add call to __builtin_aarch64_lane_boundsi.
(vdup_lane_s64, vdup_lane_u64,): Add __builtin_aarch64_lane_bound_si.
(vdupd_lane_s64, vdupd_lane_u64): Fix type signature, add
__builtin_aarch64_lane_boundsi, use GCC vector extensions.

(vcombine_s64): Use GCC vector extensions; remove cast.
(vqaddd_s64, vqaddd_u64, vqdmulls_s32, vqshld_n_s64, vqshlud_n_s64,
vqsubd_s64, vqsubd_u64, vrshld_s64, vrshrd_n_s64, vrsrad_n_s64,
vshld_n_s64, vshrd_n_s64, vslid_n_s64, vsrad_n_s64, vsrid_n_s64):
Fix type signature; remove cast.

gcc/testsuite/ChangeLog:
* g++.dg/abi/mangle-neon-aarch64.C (f22, f23): New tests of 
[u]int64x1_t.

* gcc.target/aarch64/aapcs64/func-ret-64x1_1.c: Add {u,}int64x1 cases.
* gcc.target/aarch64/aapcs64/test_64x1_1.c: Likewise.

* gcc.target/aarch64/scalar_intrinsics.c (test_vaddd_u64,
test_vaddd_s64, test_vceqd_s64, test_vceqzd_s64, test_vcged_s64,
test_vcled_s64, test_vcgezd_s64, test_vcged_u64, test_vcgtd_s64,
test_vcltd_s64, test_vcgtzd_s64, test_vcgtd_u64, test_vclezd_s64,
test_vcltzd_s64, test_vqaddd_u64, test_vqaddd_s64, test_vqdmlals_s32,
test_vqdmlsls_s32, test_vqdmulls_s32, test_vuqaddd_s64,
test_vsqaddd_u64, test_vqmovund_s64, test_vqmovnd_s64,
test_vqmovnd_u64, test_vsubd_u64, test_vsubd_s64, test_vqsubd_u64,
test_vqsubd_s64, test_vshld_s64, test_vshld_u64, test_vrshld_s64,
test_vrshld_u64, test_vshrd_n_s64, test_vshrd_n_u64, test_vsrad_n_s64,
test_vsrad_n_u64, test_vrshrd_n_s64, test_vrshrd_n_u64,
test_vrsrad_n_s64, test_vrsrad_n_u64, test_vqrshld_s64,
test_vqrshld_u64, test_vqshlud_n_s64, test_vqshld_s64, test_vqshld_u64,
test_vqshld_n_u64, test_vqshrund_n_s64, test_vqrshrund_n_s64,
test_vqshrnd_n_s64, test_vqshrnd_n_u64, test_vqrshrnd_n_s64,
test_vqrshrnd_n_u64, test_vshld_n_s64, test_vshdl_n_u64,
test_vslid_n_s64, test_vslid_n_u64, test_vsrid_n_s64,
test_vsrid_n_u64): Fix signature to match intrinsic.

(test_vabs_s64): Remove.
(test_vaddd_s64_2, test_vsubd_s64_2): Use force_simd.

(test_vdupd_lane_s64): Rename to...
(test_vdupd_laneq_s64): ...and remove a call to force_simd.

(test_vdupd_lane_u64): R

[Bug testsuite/65506] [5 Regression] FAIL: gcc.dg/pr29215.c scan-tree-dump-not gimple "memcpy"

2015-03-25 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65506

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 CC||alalaw01 at gcc dot gnu.org

--- Comment #8 from alalaw01 at gcc dot gnu.org ---
This test was also failing for target arm-none-eabi, also fixed by Jakub's
r221607.


[Bug libstdc++/33394] Add test case for Thread race segfault in std::string::append with -O and -s

2015-03-25 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33394

--- Comment #4 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Wed Mar 25 15:46:58 2015
New Revision: 221666

URL: https://gcc.gnu.org/viewcvs?rev=221666&root=gcc&view=rev
Log:
PR libstdc++/33394
* testsuite/21_strings/basic_string/pthread33394.cc: Use
dg-additional-options.

Modified:
trunk/libstdc++-v3/ChangeLog
trunk/libstdc++-v3/testsuite/21_strings/basic_string/pthread33394.cc


[Bug target/65689] New: [AArch64] S constraint fails for inline asm at -O0

2015-04-07 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65689

Bug ID: 65689
   Summary: [AArch64] S constraint fails for inline asm at -O0
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org

Starting with r221532
(https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01064.html),

void
test (void)
{
__asm__ ("@ %c0" : : "S" (&test + 4));
}

fails to compile at -O0 on all aarch64 targets with:

c-output-template-3.c: In function 'test':
c-output-template-3.c:7:5: error: impossible constraint in 'asm'
 __asm__ ("@ %c0" : : "S" (&test + 4));

(This is gcc.target/aarch64/c-output-template-3.c, without the -O added in
r221905, as that leads to successful compilation - however, the testcase should
compile without -O too.)


[Bug target/65689] [AArch64] S constraint fails for inline asm at -O0

2015-04-07 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65689

--- Comment #1 from alalaw01 at gcc dot gnu.org ---
Problem stems from parse_input_constraint (in stmt.c):

if (reg_class_for_constraint (cn) != NO_REGS
|| insn_extra_address_constraint (cn))
  *allows_reg = true;
else if (insn_extra_memory_constraint (cn))
  *allows_mem = true;
else
  {
/* Otherwise we can't assume anything about the nature of
   the constraint except that it isn't purely registers.
   Treat it like "g" and hope for the best.  */
*allows_reg = true;
*allows_mem = true;
  }

which causes expand_asm_operands to use (reg/f:DI ...), which fails the
definition of the S constraint. If instead parse_input_constraint set both
allows_reg and allows_mem to false (as it does for e.g. an "i" constraint, via
a special-case), expand_asm_operands would follow the register to its
definition:

(const:DI (plus:DI (symbol_ref:DI ("test") [flags 0x3] )
(const_int 4 [0x4])))

(as also happens with -O), which satisfies the S constraint.

One solution could be to generalize the special case in parse_input_constraint.


[Bug target/65689] [5 Regression][AArch64] S constraint fails for inline asm at -O0

2015-04-08 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65689

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P1  |P2


[Bug target/65689] [5 Regression][AArch64] S constraint fails for inline asm at -O0

2015-04-08 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65689

--- Comment #6 from alalaw01 at gcc dot gnu.org ---
Whilst I think this probably would fix the problem - surely this will change
the meaning of loads of constraints, on loads of platforms? I will of course
defer to the release manager(s) (!), but IMHO this feels rather risky to do at
this late stage, i.e. potentially "the cure is worse than the disease"...?

Secondly, do I understand correctly, that the constraint-parsing mechanism will
only come into play for plain ol' define_constraints, whereeas
define_register_constraint / define_memory_constraint would provide/override
with their own values? Does this still leave us with consistent meaning for all
three kinds of define...constraint?


[Bug target/65689] [5 Regression][AArch64] S constraint fails for inline asm at -O0

2015-04-09 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65689

--- Comment #8 from alalaw01 at gcc dot gnu.org ---
Well, meaning/behaviour. But thanks for the patch - I've bootstrapped and
check-gcc'd on AArch64 and arm hf (Cortex-A15 + Neon) with no regressions.


[Bug target/65770] New: [AArch64] vst2_lane broken on bigendian

2015-04-15 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65770

Bug ID: 65770
   Summary: [AArch64] vst2_lane broken on bigendian
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
Target: aarch64_be

Testcase:

void
test_vst2_lane_s32 (int32x2x2_t vals)
{
  int32_t buf[2];
  vst2_lane_s32 (buf, vals, 0);
  for (int i = 0; i < 2; i++)
if (buf[i] != vget_lane_s32 (vals.val[i], 0))
  abort();
}

int
main (int argc, char **argv)
{
  int32_t load[4] = { 11, 12, 21, 22 };
  test_vst2_lane_s32 (vld2_s32 (load));
}

Passes on aarch64-none-elf, but fails on aarch64_be-none-elf: the generated
assembly, contains

st2 {v0.s - v1.s}[3], [x1]

Which (1) has flipped endianness, and (2) has flipped endianness relative to a
Q register (int32x4_t) not a D register (int32x2_t).

A similar testcase for int32x4x2_t, also flips endianness, although at least
relative to the right vector length ;).


[Bug target/64134] (vector float){0, 0, b, a} Uses stores when it does not need to

2015-04-20 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64134

--- Comment #3 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Mon Apr 20 10:29:26 2015
New Revision: 29

URL: https://gcc.gnu.org/viewcvs?rev=29&root=gcc&view=rev
Log:
[AArch64] PR/64134: Make aarch64_expand_vector_init use 'ins' more often

gcc/:

PR target/64134
* config/aarch64/aarch64.c (aarch64_expand_vector_init): Load constant
and overwrite variable parts if <= 1/2 the elements are variable.

gcc/testsuite/:

PR target/64134
* gcc.target/aarch64/vec_init_1.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/aarch64/vec_init_1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64.c
trunk/gcc/testsuite/ChangeLog


[Bug tree-optimization/35226] Induction with multiplication are not vectorized

2015-04-30 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35226

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-04-30
 CC||alalaw01 at gcc dot gnu.org
Version|4.3.0   |6.0
Summary|Reduction and induction |Induction with
   |with multiplication are not |multiplication are not
   |vectorized  |vectorized
 Ever confirmed|0   |1

--- Comment #1 from alalaw01 at gcc dot gnu.org ---
Multiplication reductions are supported, certainly in gcc 4.9, I think longer.
However, the following induction does not vectorize on gcc 6 development branch
(x86_64, -O3, with or without -mavx or -msse2):

int a[24];

int
main (int argc, char **argv)
{
  int p = 1;
  for (int i = 0; i < 24; i++, p*=2)
a[i] *= p;
}

-fdump-tree-vect-details suggests the multiplication is recognized as a
reduction but not as an induction:

test_induc.c:7:3: note: Analyze phi: p_13 = PHI 

test_induc.c:7:3: note: reduction used in loop.
test_induc.c:7:3: note: Unknown def-use cycle pattern.
test_induc.c:7:3: note: === vect_pattern_recog ===
test_induc.c:7:3: note: vect_is_simple_use: operand _5
test_induc.c:7:3: note: def_stmt: _5 = a[i_14];
test_induc.c:7:3: note: type of def: 3.
test_induc.c:7:3: note: vect_is_simple_use: operand p_13
test_induc.c:7:3: note: def_stmt: p_13 = PHI 
test_induc.c:7:3: note: Unsupported pattern.
...
test_induc.c:7:3: note: def_stmt: p_13 = PHI 
test_induc.c:7:3: note: Unsupported pattern.
test_induc.c:7:3: note: not vectorized: unsupported use in stmt.
test_induc.c:7:3: note: unexpected pattern.
test_induc.c:4:1: note: vectorized 0 loops in function.
...

  :
  # p_13 = PHI 
...
  p_9 = p_13 * 2;


[Bug middle-end/65946] New: Simple loop with if-statement not vectorized

2015-04-30 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65946

Bug ID: 65946
   Summary: Simple loop with if-statement not vectorized
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64

This testcase:

#define N 32

int a[N], b[N];

int
foo ()
{
  for (int i = 0; i < N ; i++)
  {
int m = (a[i] & i) ? 5 : 4;
b[i] = a[i] * m;
  }
}

does not vectorize at -O3 on x86_64 or other platforms. Following dom1, jump
threading partially peels the loop to give:


  :
  goto ;

  :
  # i_11 = PHI 
  _5 = a[i_11];
  _6 = i_11 & _5;
  if (_6 != 0)
goto ;
  else
goto ;

  :

  :
  # m_14 = PHI <5(4), 4(3)>

  :
  # m_2 = PHI 
  # _15 = PHI <_5(5), _10(8)>
  # i_16 = PHI 
  _7 = m_2 * _15;
  b[i_16] = _7;
  i_9 = i_16 + 1;
  if (i_9 != 32)
goto ;
  else
goto ;

  :
  return;

  :
  # i_1 = PHI <0(2)>
  _10 = a[i_1];
  _3 = i_1 & _10;
  goto ;

which form cannot be if-converted (tree-if-conv.c):

  /* If one of the loop header's edge is an exit edge then do not
 apply if-conversion.  */
  FOR_EACH_EDGE (e, ei, loop->header->succs)
if (loop_exit_edge_p (loop, e))
  return false;

and even if it were, the PHI nodes at loop entry cannot be handled by the
vectorizer.


[Bug middle-end/65946] Simple loop with if-statement not vectorized

2015-04-30 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65946

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-04-30
   Assignee|unassigned at gcc dot gnu.org  |alalaw01 at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from alalaw01 at gcc dot gnu.org ---
Discussion here: https://gcc.gnu.org/ml/gcc/2015-04/msg00351.html

Suggestion is to use loop-header-copying to rotate the loop to a form that both
if-conversion and the vectorizer can handle.


[Bug middle-end/65947] New: Vectorizer misses conditional assignment of constant

2015-04-30 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65947

Bug ID: 65947
   Summary: Vectorizer misses conditional assignment of constant
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---

This testcase:

int a[32];

int main(int argc, char **argv)
{
  int res = 3;
  for (int i = 0; i < 32; i++)
if (a[i]) res = 7;
  return res;
}

does not vectorize at -O3 on x86_64 or aarch64. tree-if-conversion succeeds,
giving a loop of form:

:
  # res_10 = PHI 
  # i_11 = PHI 
  # ivtmp_9 = PHI 
  _5 = a[i_11];
  res_1 = _5 != 0 ? 7 : res_10;
  i_6 = i_11 + 1;
  ivtmp_2 = ivtmp_9 - 1;
  if (ivtmp_2 != 0)
goto ;
  else
goto ;

  :
  goto ;

but -fdump-tree-vect-details shows:

test.c:9:3: note: Analyze phi: res_10 = PHI 
test.c:9:3: note: reduction: not commutative/associative: res_1 = _5 != 0 ? 7 :
res_10;
test.c:9:3: note: Unknown def-use cycle pattern.
...
test.c:9:3: note: vect_is_simple_use: operand res_10
test.c:9:3: note: def_stmt: res_10 = PHI 
test.c:9:3: note: Unsupported pattern.
test.c:9:3: note: not vectorized: unsupported use in stmt.
test.c:9:3: note: unexpected pattern.


[Bug middle-end/65947] Vectorizer misses conditional assignment of constant

2015-04-30 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65947

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-04-30
   Assignee|unassigned at gcc dot gnu.org  |alalaw01 at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from alalaw01 at gcc dot gnu.org ---
Of course, the conditional assignment _is_ commutative and associative (wrt
reordering iterations).


[Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant

2015-04-30 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951

Bug ID: 65951
   Summary: [AArch64] Will not vectorize multiplication by long
constant
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

This loop:
void
foo (long *arr)
{
  for (int i = 0; i < 256; i++)
arr[i] *= 19594L;
}

will not vectorize on AArch64, but does on x86. On AArch64,
-fdump-tree-vect-details reveals:
test.c:4:3: note: ==> examining statement: _9 = _8 * 19594;
test.c:4:3: note: vect_is_simple_use: operand _8
test.c:4:3: note: def_stmt: _8 = *_7;
test.c:4:3: note: type of def: 3.
test.c:4:3: note: vect_is_simple_use: operand 19594
test.c:4:3: note: op not supported by target.
test.c:4:3: note: not vectorized: relevant stmt not supported: _9 = _8 * 19594;

on x86, vectorization fails with vectorization_factor = 4 (V4DI), but succeeds
at V2DI.

We could vectorize this on AArch64 even if we have to perform a
multiple-instruction load of that constant (invariant!) before the
loop...right?


[Bug target/65952] New: [AArch64] Will not vectorize copying pointers

2015-04-30 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65952

Bug ID: 65952
   Summary: [AArch64] Will not vectorize copying pointers
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

typedef struct {
  int a, b, c, d;
} my_struct;

my_struct *array;
my_struct *ptrs[4];

void
loop ()
{
  for (int i = 0; i < 4; i++)
ptrs[i] = &array[i];
}

Vectorizes on x86, but not on AArch64. From -fdump-tree-vect-details:

test.c:13:3: note: vectorization factor = 4
...
test.c:13:3: note: not vectorized: relevant stmt not supported: _6 = _5 * 16;
test.c:13:3: note: bad operation or unsupported loop bound.
test.c:13:3: note: * Re-trying analysis with vector size 8
...
test.c:13:3: note: not vectorized: no vectype for stmt: ptrs[i_12] = _7;
 scalar_type: struct my_struct *
test.c:13:3: note: bad data references.
test.c:11:1: note: vectorized 0 loops in function.


[Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication

2015-05-01 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951

--- Comment #3 from alalaw01 at gcc dot gnu.org ---
Yes you are right, we have no V2DI multiply. We do have V2DI shifts + add,
however, which would work well for some constants, e.g. the multiply by 16 in
PR/65952; perhaps the vectorizer does not consider such possibilities (whereas
we do for scalar code).


[Bug target/65952] [AArch64] Will not vectorize storing induction of pointer addresses for LP64

2015-05-01 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65952

--- Comment #4 from alalaw01 at gcc dot gnu.org ---
Hmmm. Yes. Well, x * 16 = x << 4, of course. Or, in theory something like VRP
could let us see that

  # i_12 = PHI 
  # ivtmp_18 = PHI 
  _5 = (long unsigned int) i_12;
  _6 = _5 * 16;
  _7 = pretmp_11 + _6;
  ptrs[i_12] = _7;
  i_9 = i_12 + 1;

could be rewritten to something like

  # i_12 = PHI 
  # ivtmp_18 = PHI 
  _5 = _12 * 16;
  _6 = (long unsigned int) _5;
  _7 = pretmp_11 + _6;
  ptrs[i_12] = _7;
  i_9 = i_12 + 1;

which would then be vectorizable.


[Bug middle-end/65962] New: Missed vectorization of strided stores

2015-05-01 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65962

Bug ID: 65962
   Summary: Missed vectorization of strided stores
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---

This does not vectorize at -O3 on x86_64/-mavx or aarch64:
int
loop (int *data)
{
  int tot = 0;
  for (int i = 0; i < 256; i++)
data[i * 2] += 7;
  return tot;
}

-fdump-tree-vect-details reveals:

loadstore.c:6:3: note: === vect_analyze_data_ref_accesses ===
loadstore.c:6:3: note: Detected single element interleaving *_8 step 8
loadstore.c:6:3: note: Data access with gaps requires scalar epilogue loop
loadstore.c:6:3: note: not consecutive access *_8 = _10;

loadstore.c:6:3: note: not vectorized: complicated access pattern.
loadstore.c:6:3: note: bad data access.

However, a similar testcase that only reads from those locations, vectorizes
ok:
int
loop_12 (int *data)
{
  int tot = 0;
  for (int i = 0; i < 256; i++)
tot += data[i * 2];
  return tot;
}
blocksort.c:6:3: note: === vect_analyze_data_ref_accesses ===
blocksort.c:6:3: note: Detected single element interleaving *_7 step 8
blocksort.c:6:3: note: Data access with gaps requires scalar epilogue loop


[Bug middle-end/65962] Missed vectorization of strided stores

2015-05-01 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65962

--- Comment #1 from alalaw01 at gcc dot gnu.org ---
I believe this is a known issue, but have not identified an existing PR.


[Bug middle-end/65963] New: Missed vectorization of loads strided with << when equivalent * succeeds

2015-05-01 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65963

Bug ID: 65963
   Summary: Missed vectorization of loads strided with << when
equivalent * succeeds
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---

This testcase does not vectorize at -O3 on x86_64/-mavx or AArch64:
void
loop (int *in, int *out)
{
  for (int i = 0; i < 256; i++) {
 out[i] = in[i << 1] + 7;
  }
}

-fdump-tree-vect-details reveals:
Creating dr for *_12
analyze_innermost: failed: evolution of base is not affine.
base_address: 
offset from base address: 
constant offset from base address: 
step: 
aligned to: 
base_object: *_12

However, this testcase succeeds:
void
loop (int *in, int *out)
{
  for (int i = 0; i < 256; i++) {
 out[i] = in[i * 2] + 7;
  }
}

The relevant extract of -fdump-tree-vect-details showing:
Creating dr for *_12
analyze_innermost: success.
base_address: in_11(D)
offset from base address: 0
constant offset from base address: 0
step: 8
aligned to: 256
base_object: *in_11(D)
Access function 0: {0B, +, 8}_1

The only difference is the multiplication:
$ diff splice{,2}.c.131t.ifcvt
27c27
<   _8 = i_19 * 2;
---
>   _8 = i_19 << 1;
$


[Bug middle-end/65965] New: Straight-line memcpy/memset not vectorized when equivalent loop is

2015-05-01 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65965

Bug ID: 65965
   Summary: Straight-line memcpy/memset not vectorized when
equivalent loop is
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---

Testcase:
void
test(int *__restrict__ a, int *__restrict__ b)
{
  a[0] = b[0];
  a[1] = b[1];
  a[2] = b[2];
  a[3] = b[3];
  a[5] = 0;
  a[6] = 0;
  a[7] = 0;
  a[8] = 0;
}
produces (at -O3) on AArch64:
test:
ldp w4, w3, [x1]
ldp w2, w1, [x1, 8]
stp w4, w3, [x0]
stp w2, w1, [x0, 8]
stp wzr, wzr, [x0, 20]
stp wzr, wzr, [x0, 28]
ret
or on x86_64/-mavx:
test:
.LFB0:
movl(%rsi), %eax
movl$0, 20(%rdi)
movl$0, 24(%rdi)
movl$0, 28(%rdi)
movl$0, 32(%rdi)
movl%eax, (%rdi)
movl4(%rsi), %eax
movl%eax, 4(%rdi)
movl8(%rsi), %eax
movl%eax, 8(%rdi)
movl12(%rsi), %eax
movl%eax, 12(%rdi)
ret
(there is no -fdump-tree-vect)

In contrast, testcase
void
test(int *__restrict__ a, int *__restrict__ b)
{
  for (int i = 0; i < 4; i++) a[i] = b[i];
  for (int i = 0; i < 4; i++) a[i+4] = 0;
}
the memcpy is recognized by ldist, and the 'memset' by slp1 (neither of which
triggers on the first case), producing (superior) AArch64:
test:
moviv0.4s, 0
ldp x2, x3, [x1]
stp x2, x3, [x0]
str q0, [x0, 16]
ret
or x86_64:
test:
.LFB0:
movq(%rsi), %rax
movq8(%rsi), %rdx
vpxor   %xmm0, %xmm0, %xmm0
movq%rax, (%rdi)
movq%rdx, 8(%rdi)
vmovups %xmm0, 16(%rdi)
ret


[Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication

2015-05-05 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951

--- Comment #5 from alalaw01 at gcc dot gnu.org ---
I believe the definitive algorithm for converting multiply-by-constant into
adds+shifts(+etc.) lives in expmed.c. I don't at present have a plan for how to
reuse that, but if we could do so _in_some_form_ then that would be the ideal??


[Bug middle-end/65947] Vectorizer misses conditional assignment of constant

2015-05-05 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65947

--- Comment #3 from alalaw01 at gcc dot gnu.org ---
Yeah, you're right, it's not commutative, but then, it doesn't need to be.

If f(x,y) is "(a[x] ? 7 : y)", then f(0, f(1, ...)) = f(1, f(0, ...))
(associative but not commutative), which is all we need to reorder the
iterations of the loop?

So if at the end of the loop we have a vector

v_tmp_result = { f(8, f(4, f(0, ))), f(9, f(5, f(1, ))), f(10, f(6,
f(2, ))), f(11, f(7, f(3, ))) }

obtained by standard technique for reductions, we then need to reduce the
vector to a scalar, which could be

(a) if any of the vector elements are equal to the constant 7, then return the
constant 7, else the initial value:

cond_expr (vec_reduc_or (vec_equals (v_tmp_result, 7)), 7, )

indeed you might just vectorize to get the predicates

v_tmp2 = { a[8] | a[4] | a[0], a[9] | a[5] | a[1], a[10] | a[6] | a[2], a[11] |
a[7] | a[3] }

and then reduce to scalar with cond_expr (vec_reduc_or (v_tmp2), 7, 3)

(b) alternatively one could exploit the initial value (3) also being a constant
and choose an appropriate operator from {max, min, or, and}, e.g. for 3 and 7
either reduc_max_expr(3,7) or reduc_or_expr(3,7) would work.


[Bug tree-optimization/46029] -ftree-loop-if-convert-stores causes FAIL: libstdc++-v3/testsuite/ext/pb_ds/example/tree_intervals.cc

2015-05-05 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46029

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 CC||alalaw01 at gcc dot gnu.org

--- Comment #4 from alalaw01 at gcc dot gnu.org ---
I'm still seeing this problem with -ftree-loop-if-convert-stores, introducing
faults by converting conditional to unconditional loads. It doesn't look as if
Sebastian Pop's patches went in (after being approved,
https://gcc.gnu.org/ml/gcc-patches/2010-11/msg01670.html). Can anyone shed any
light on this?


[Bug tree-optimization/57558] Issue with number of iterations calculation

2015-05-06 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57558

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-05-06
 CC||alalaw01 at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #3 from alalaw01 at gcc dot gnu.org ---
Seeing this too.

Is another approach to fall back to an alternative (scalar?) path (perhaps just
the epilogue?) if we can tell at the beginning of the loop that the iteration
count will be infinite?


[Bug target/67439] ICE: unrecognizable insn compiling arm-fp16 testcases with -march=armv7-a and -mrestrict-it

2015-09-03 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67439

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-09-03
 CC||alalaw01 at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from alalaw01 at gcc dot gnu.org ---
I can reproduce the ICE with -mthumb, both "-mfloat-abi=hard -mfpu=neon" and
"-mfloat-abi=soft", but only with -mrestrict-it in both cases.
"-mfloat-abi=hard -mfpu=neon-fp16" is OK with and without -mrestrict-it.

I note the movhf patterns in vfp.md are only usable with neon-fp16; in other
cases, we appear to be using arm32_movhf in arm.md.


[Bug target/63870] [Aarch64] [ARM] Errors in use of NEON intrinsics are reported incorrectly

2015-09-08 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63870

--- Comment #10 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Tue Sep  8 19:43:39 2015
New Revision: 227557

URL: https://gcc.gnu.org/viewcvs?rev=227557&root=gcc&view=rev
Log:
ARM/AArch64 Testsuite] Add float16 lane_f16_indices tests

PR target/63870
* gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f16_indices_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f16_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f16_indices_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_f16_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vld4_lane_f16_indices_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_f16_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vst2_lane_f16_indices_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_f16_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vst3_lane_f16_indices_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_f16_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vst4_lane_f16_indices_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_f16_indices_1.c:
New.

Added:
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f16_indices_1.c
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f16_indices_1.c
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f16_indices_1.c
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_f16_indices_1.c
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4_lane_f16_indices_1.c
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_f16_indices_1.c
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2_lane_f16_indices_1.c
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_f16_indices_1.c
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3_lane_f16_indices_1.c
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_f16_indices_1.c
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4_lane_f16_indices_1.c
   
trunk/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_f16_indices_1.c
Modified:
trunk/gcc/testsuite/ChangeLog


[Bug tree-optimization/67283] GCC regression over inlining of returned structures

2015-09-18 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67283

--- Comment #13 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Fri Sep 18 10:55:11 2015
New Revision: 227901

URL: https://gcc.gnu.org/viewcvs?rev=227901&root=gcc&view=rev
Log:
completely_scalarize arrays as well as records.

gcc/:

PR tree-optimization/67283
* tree-sra.c (type_consists_of_records_p): Rename to...
(scalarizable_type_p): ...this, add case for ARRAY_TYPE.
(completely_scalarize_record): Rename to...
(completely_scalarize): ...this, add ARRAY_TYPE case, move some code
to:
(scalarize_elem): New.
(analyze_all_variable_accesses): Follow renamings.

gcc/testsuite/:

* gcc.dg/tree-ssa/sra-15.c: New.
* gcc.dg/tree-ssa/sra-16.c: New.


Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/sra-16.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-sra.c


[Bug middle-end/65965] Straight-line memcpy/memset not vectorized when equivalent loop is

2015-09-22 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65965

--- Comment #4 from alalaw01 at gcc dot gnu.org ---
(In reply to Richard Biener from comment #3)
> Fixed for GCC 6.

Indeed. I note that the same testcase does _not_ SLP/vectorize if I use
consecutive indices:

void
test (int*__restrict a, int*__restrict b)
{
a[0] = b[0];
a[1] = b[1];
a[2] = b[2];
a[3] = b[3];
a[4] = 0;
a[5] = 0;
a[6] = 0;
a[7] = 0;
}

loop26a.c:6:13: note: Build SLP failed: different operation in stmt MEM[(int
*)a
_4(D) + 28B] = 0;
loop26a.c:6:13: note: original stmt *a_4(D) = _3;
loop26a.c:6:13: note: === vect_slp_analyze_data_ref_dependences ===
loop26a.c:6:13: note: === vect_slp_analyze_operations ===
loop26a.c:6:13: note: not vectorized: bad operation in basic block.

Worth another bug?


[Bug tree-optimization/67681] New: Missed vectorization: induction variable used after loop

2015-09-22 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67681

Bug ID: 67681
   Summary: Missed vectorization: induction variable used after
loop
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---

The inner loop here:
void addlog2 (int *data)
{
  int i = 1;
  for (int j=0; j<=30; j++) {
int max = 1 << j;
if (FOO && i>max) break;

for (; i <= max; i++)
  data[i] += j;
  }
}

does not vectorize if the if(FOO...) is present:
$ /work/alalaw01/build-aarch64-none-elf/install/bin/aarch64-none-elf-gcc -S -O2
-ftree-vectorize -fdump-tree-vect-details=stdout loop9b.c -DFOO=1 | grep
vectorized
loop9b.c:1:6: note: not vectorized: inner-loop count not invariant.
loop9b.c:8:5: note: === vect_mark_stmts_to_be_vectorized ===
loop9b.c:8:5: note: not vectorized: value used after loop.
loop9b.c:8:5: note: === vect_mark_stmts_to_be_vectorized ===
loop9b.c:8:5: note: not vectorized: value used after loop.
loop9b.c:1:6: note: vectorized 0 loops in function.


$ aarch64-none-elf-gcc -S -O2 -ftree-vectorize -fdump-tree-vect-details=stdout
loop9b.c -DFOO=0 | grep vectorized
loop9b.c:4:3: note: not vectorized: inner-loop count not invariant.
loop9b.c:8:5: note: === vect_mark_stmts_to_be_vectorized ===
loop9b.c:8:5: note: loop vectorized
loop9b.c:1:6: note: vectorized 1 loops in function.

Same with -O3. Of course clever analysis could figure out that i>max is never
true, but even without that, we should be able to get 'i' back afterwards.


[Bug tree-optimization/67682] New: Missed vectorization: (another) straight-line memcpy/memset not vectorized when equivalent loop is

2015-09-22 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67682

Bug ID: 67682
   Summary: Missed vectorization: (another) straight-line
memcpy/memset not vectorized when equivalent loop is
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

This code:

void
test (int*__restrict a, int*__restrict b)
{
a[0] = b[0];
a[1] = b[1];
a[2] = b[2];
a[3] = b[3];
a[4] = 0;
a[5] = 0;
a[6] = 0;
a[7] = 0;
}

is not vectorized; -fdump-tree-slp-details reveals

test.c:4:13: note: Build SLP failed: different operation in stmt MEM[(int
*)a_4(
D) + 28B] = 0;
test.c:4:13: note: original stmt *a_4(D) = _3;
test.c:4:13: note: === vect_slp_analyze_data_ref_dependences ===
test.c:4:13: note: === vect_slp_analyze_operations ===
test.c:4:13: note: not vectorized: bad operation in basic block.
test.c:4:13: note: * Re-trying analysis with vector size 8
...
test.c:4:13: note: Build SLP failed: different operation in stmt MEM[(int
*)a_4(D) + 28B] = 0;
test.c:4:13: note: original stmt *a_4(D) = _3;
test.c:4:13: note: === vect_slp_analyze_data_ref_dependences ===
test.c:4:13: note: === vect_slp_analyze_operations ===
test.c:4:13: note: not vectorized: bad operation in basic block.

(the failure with vector size 8 is expected, but vector size 4 should succeed)

Output is:
test:
ldp w4, w3, [x1]
ldp w2, w1, [x1, 8]
stp w4, w3, [x0]
stp w2, w1, [x0, 8]
stp wzr, wzr, [x0, 16]
stp wzr, wzr, [x0, 24]
ret

Curiously, a similar code but writing elements a[0..3] and a[5..8] (missing out
a[4]) is SLP'd, producing superior:

test:
ldr q0, [x1]
moviv1.4s, 0
str q1, [x0, 20]
str q0, [x0]
ret

And similarly for (equivalent to the first):

void
test (int*__restrict a, int*__restrict b)
{
  for (int i = 0; i < 4; i++)
a[i] = b[i];
  for (int i = 4; i < 8; i++)
a[i] = 0;
}

producing:

test:
moviv0.4s, 0
ldp x2, x3, [x1]
stp x2, x3, [x0]
str q0, [x0, 16]
ret


[Bug tree-optimization/67683] New: Missed vectorization: shifts of an induction variable

2015-09-22 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67683

Bug ID: 67683
   Summary: Missed vectorization: shifts of an induction variable
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
Blocks: 53947
  Target Milestone: ---

This testcase:

void test (unsigned char *data, int max)
{
  unsigned short val = 0xcdef;
  for(int i = 0; i < max; i++) { 
  data[i] = (unsigned char)(val & 0xff);
  val >>= 1; 
  }
}

does not vectorize on AArch64 or x86_64 at -O3. (I haven't yet looked at
whether it's a mid-end deficiency or both back-ends are missing patterns.)


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations


[Bug tree-optimization/67681] Missed vectorization: induction variable used after loop

2015-09-23 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67681

--- Comment #2 from alalaw01 at gcc dot gnu.org ---
Being stupid here, but why does the outer loop having multiple exits matter -
it's the inner loop that should be vectorized?

FOO was a macro used to selectively make the test i>max disappear (enabling
vectorization) - the two commandlines had -DFOO=0 (vectorizes) and -DFOO=1
(doesn't).


[Bug tree-optimization/57558] Loop not vectorized if iteration count could be infinite

2015-09-25 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57558

--- Comment #4 from alalaw01 at gcc dot gnu.org ---
Here's another example, extracted from another benchmark - it vectorizes if
INDEX is defined to 'long' but not if INDEX is 'short':

#include 

unsigned char *t_run_test(unsigned char *in, int N)
{
  unsigned char *out = malloc (N);

  for (unsigned INDEX i = 1; i < (N - 1); i++)
out[i] = ((3 * in[i]) - in[i - 1] - in[i + 1]);

  return out;
}

However, the -Wunsafe-loop-optimizations doesn't give us anything useful here:

(successful case, warning printed)
$ aarch64-none-elf-gcc -O3 bmark2.c -DINDEX=long -S -Wunsafe-loop-optimizations
-fdump-tree-vect-details=stdout | grep vectorized
bmark2.c:7:3: note: === vect_mark_stmts_to_be_vectorized ===
bmark2.c:7:3: note: loop vectorized
bmark2.c:3:16: note: vectorized 1 loops in function.
bmark2.c: In function 't_run_test':
bmark2.c:3:16: warning: cannot optimize loop, the loop counter may overflow
[-Wunsafe-loop-optimizations]
 unsigned char *t_run_test(unsigned char *in, int N)

(unsuccessful case, no warning)
$ aarch64-none-elf-gcc -O3 bmark2.c -DINDEX=short -S
-Wunsafe-loop-optimizations -fdump-tree-vect-details=stdout | grep vectorized
bmark2.c:7:3: note: not vectorized: number of iterations cannot be computed.
bmark2.c:3:16: note: vectorized 0 loops in function.


[Bug tree-optimization/67683] Missed vectorization: shifts of an induction variable

2015-10-06 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67683

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=35226

--- Comment #2 from alalaw01 at gcc dot gnu.org ---
Is there a way to do this kind of thing other than extending polynomial_chrec's
to understand operations other than addition ? Whilst beneficial, that looks to
be quite a large task.


[Bug middle-end/68112] [6 Regression] FAIL: gcc.target/i386/avx512ifma-vpmaddhuq-2.c (test for excess errors)

2015-10-28 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68112

--- Comment #2 from alalaw01 at gcc dot gnu.org ---
So (a << CONSTANT) is not equivalent to a * (1<

[Bug middle-end/68112] [6 Regression] FAIL: gcc.target/i386/avx512ifma-vpmaddhuq-2.c (test for excess errors)

2015-10-29 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68112

--- Comment #4 from alalaw01 at gcc dot gnu.org ---
Sure, but gcc exploits undefinedness of multiply, so rewriting shift to
multiply is not equivalent in the general case :(.

One way forward might be to make definedness of overflow a bit finer-grained
(either on types, i.e. TYPE_OVERFLOW_DEFINED, or maybe as a property of
chrecs?)


[Bug tree-optimization/68165] New: Not constant-folding setting vector element

2015-10-30 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68165

Bug ID: 68165
   Summary: Not constant-folding setting vector element
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---

I believe these two C functions are equivalent:
  typedef float __attribute__((__vector_size__ (2 * sizeof(float
float32x2_t;

  float32x2_t
  test_cprop ()
  {
float32x2_t vec = {0.0, 0.0};
vec[0] = 3.14f;
vec[1] = 2.71f;
return vec * ((float32x2_t) { 1.5f, 4.5f });
  }

  float32x2_t
  test_cprop2 ()
  {
  float32x2_t vec = {3.14f, 2.71f};
  return vec * ((float32x2_t) { 1.5f, 4.5f });
  }

at -O3 -fdump-tree-optimized, on AArch64:
=
;; Function test_cprop (test_cprop, funcdef_no=0, decl_uid=2603, cgraph_uid=0,
symbol_order=0)

test_cprop ()
{
  float32x2_t vec;
  vector(2) float vec.0_5;
  float32x2_t _6;

  :
  vec = { 0.0, 0.0 };
  BIT_FIELD_REF  = 3.141049041748046875e+0;
  BIT_FIELD_REF  = 2.7103814697265625e+0;
  vec.0_5 = vec;
  _6 = vec.0_5 * { 1.5e+0, 4.5e+0 };
  vec ={v} {CLOBBER};
  return _6;

}



;; Function test_cprop2 (test_cprop2, funcdef_no=1, decl_uid=2607,
cgraph_uid=1, symbol_order=1)

test_cprop2 ()
{
  :
  return { 4.7103814697265625e+0, 1.219499969482421875e+1 };

}
=
x86 is identical for test_cprop2, worse in test_cprop:
=
test_cprop ()
{
  float32x2_t vec;
  vector(2) float vec.0_5;
  float32x2_t _6;
  float _8;
  float _9;
  float _10;
  float _11;

  :
  vec = { 0.0, 0.0 };
  BIT_FIELD_REF  = 3.141049041748046875e+0;
  BIT_FIELD_REF  = 2.7103814697265625e+0;
  vec.0_5 = vec;
  _8 = BIT_FIELD_REF ;
  _9 = _8 * 1.5e+0;
  _10 = BIT_FIELD_REF ;
  _11 = _10 * 4.5e+0;
  _6 = {_9, _11};
  vec ={v} {CLOBBER};
  return _6;

}
=
i.e. we are not understanding the result of assigning to the BIT_FIELD_REF on
the whole vector, although we can resolve individual elements:
  float32x2_t
  test_cprop3 ()
  {
float32x2_t vec = {0.0, 0.0};
vec[0] = 3.14f;
vec[1] = 2.71f;
return (float32x2_t) {vec[0], vec[1]} * ((float32x2_t) { 1.5f, 4.5f });
  }

produces
=
test_cprop3 ()
{
  :
  return { 4.7103814697265625e+0, 1.219499969482421875e+1 };

}


[Bug tree-optimization/68165] Not constant-folding setting vector element

2015-11-02 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68165

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from alalaw01 at gcc dot gnu.org ---
Seems like a duplicate of 56118 to me.

*** This bug has been marked as a duplicate of bug 56118 ***

[Bug tree-optimization/56118] Piecewise vector / complex initialization from constants not combined

2015-11-02 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56118

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 CC||alalaw01 at gcc dot gnu.org

--- Comment #5 from alalaw01 at gcc dot gnu.org ---
*** Bug 68165 has been marked as a duplicate of this bug. ***

[Bug rtl-optimization/68182] New: ICE in reorder_basic_blocks_simple building libitm/beginend.cc

2015-11-02 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68182

Bug ID: 68182
   Summary: ICE in reorder_basic_blocks_simple building
libitm/beginend.cc
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64
Target: x86_64

Preprocessed source attached; command-line

$ /work/alalaw01/build/./gcc/xg++ -B/work/alalaw01/build/./gcc/ -mrtm -O1 -g
-m32 -c temp.ii
/work/alalaw01/src/gcc/libitm/beginend.cc: In static member function ‘static
uint32_t GTM::gtm_thread::begin_transaction(uint32_t, const gtm_jmpbuf*)’:
/work/alalaw01/src/gcc/libitm/beginend.cc:400:1: internal compiler error: in
operator[], at vec.h:714
 }
 ^
0x1310783 vec::operator[](unsigned int)
/work/alalaw01/src/gcc/gcc/vec.h:714
0x1310783 reorder_basic_blocks_simple
/work/alalaw01/src/gcc/gcc/bb-reorder.c:2322
0x1310783 reorder_basic_blocks
/work/alalaw01/src/gcc/gcc/bb-reorder.c:2450
0x1310783 execute
/work/alalaw01/src/gcc/gcc/bb-reorder.c:2551

[Bug rtl-optimization/68182] ICE in reorder_basic_blocks_simple building libitm/beginend.cc

2015-11-02 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68182

--- Comment #1 from alalaw01 at gcc dot gnu.org ---
Created attachment 36636
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36636&action=edit
Preprocessed source (compressed)

[Bug tree-optimization/65963] Missed vectorization of loads strided with << when equivalent * succeeds

2015-11-05 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65963

--- Comment #2 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Thu Nov  5 18:39:38 2015
New Revision: 229825

URL: https://gcc.gnu.org/viewcvs?rev=229825&root=gcc&view=rev
Log:
[PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant

gcc/:

PR tree-optimization/65963
* tree-scalar-evolution.c (interpret_rhs_expr): Try to handle
LSHIFT_EXPRs as equivalent unsigned MULT_EXPRs.

gcc/testsuite/:

* gcc.dg/pr68112.c: New.
* gcc.dg/vect/vect-strided-shift-1.c: New.

Added:
trunk/gcc/testsuite/gcc.dg/pr68112.c
trunk/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-scalar-evolution.c

[Bug tree-optimization/65963] Missed vectorization of loads strided with << when equivalent * succeeds

2015-11-06 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65963

--- Comment #4 from alalaw01 at gcc dot gnu.org ---
I confirm the testcase fails execution on armeb-none-eabi (also at -O0), but it
does so both with and without the patch to tree-scalar-evolution.c, which did
not change codegen (at -O2 -ftree-vectorize; the loop was not vectorized). So
this looks to be exposing a different, pre-existing, bug.

[Bug c/68385] New: ICE building libstdc++ on arm-none-eabi

2015-11-17 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68385

Bug ID: 68385
   Summary: ICE building libstdc++ on arm-none-eabi
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---
Target: arm-none-eabi

Created attachment 36738
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36738&action=edit
Reduced testcase

Starting with r230365, building gcc for arm-none-eabi falls over in libstdc++
with:

/work/alalaw01/build-arm-none-eabi/obj/gcc2/./gcc/xgcc -shared-libgcc
-B/work/alalaw01/build-arm-none-eabi/obj/gcc2/./gcc -nostdinc++
-L/work/alalaw01/build-arm-none-eabi/obj/gcc2/arm-none-eabi/libstdc++-v3/src
-L/work/alalaw01/build-arm-none-eabi/obj/gcc2/arm-none-eabi/libstdc++-v3/src/.libs
-L/work/alalaw01/build-arm-none-eabi/obj/gcc2/arm-none-eabi/libstdc++-v3/libsupc++/.libs
-B/work/alalaw01/build-arm-none-eabi/install/arm-none-eabi/bin/
-B/work/alalaw01/build-arm-none-eabi/install/arm-none-eabi/lib/ -isystem
/work/alalaw01/build-arm-none-eabi/install/arm-none-eabi/include -isystem
/work/alalaw01/build-arm-none-eabi/install/arm-none-eabi/sys-include
-I/work/alalaw01/src/gcc/libstdc++-v3/../libgcc
-I/work/alalaw01/build-arm-none-eabi/obj/gcc2/arm-none-eabi/libstdc++-v3/include/arm-none-eabi
-I/work/alalaw01/build-arm-none-eabi/obj/gcc2/arm-none-eabi/libstdc++-v3/include
-I/work/alalaw01/src/gcc/libstdc++-v3/libsupc++ -fno-implicit-templates -Wall
-Wextra -Wwrite-strings -Wcast-qual -Wabi -fdiagnostics-show-location=once
-ffunction-sections -fdata-sections -frandom-seed=eh_personality.lo -O2 -g -c
/work/alalaw01/src/gcc/libstdc++-v3/libsupc++/eh_personality.cc -o
eh_personality.o
/work/alalaw01/src/gcc/libstdc++-v3/libsupc++/eh_personality.cc: In function
'_Unwind_Reason_Code __cxxabiv1::__gxx_personality_v0(_Unwind_State,
_Unwind_Control_Block*, _Unwind_Context*)':
/work/alalaw01/src/gcc/libstdc++-v3/libsupc++/eh_personality.cc:394:26:
internal compiler error: tree check: expected integer_cst, have nop_expr in
decompose, at tree.h:5123
  UNWIND_STACK_REG))
  ^

0xf8d589 tree_check_failed(tree_node const*, char const*, int, char const*,
...)
/work/alalaw01/src/gcc/gcc/tree.c:9587
0x10df3fd tree_check
/work/alalaw01/src/gcc/gcc/tree.h:3212
0x10df3fd wi::int_traits::decompose(long*, unsigned int,
tree_node const*)
/work/alalaw01/src/gcc/gcc/tree.h:5123
0x10df3fd wide_int_ref_storage
/work/alalaw01/src/gcc/gcc/wide-int.h:936
0x10df3fd generic_wide_int
/work/alalaw01/src/gcc/gcc/wide-int.h:714
0x10df3fd generic_simplify_172
/work/alalaw01/build-arm-none-eabi/obj/gcc2/gcc/generic-match.c:6142
0x1113507 generic_simplify_EQ_EXPR
/work/alalaw01/build-arm-none-eabi/obj/gcc2/gcc/generic-match.c:22841
0x111d719 generic_simplify(unsigned int, tree_code, tree_node*, tree_node*,
tree_node*)
/work/alalaw01/build-arm-none-eabi/obj/gcc2/gcc/generic-match.c:25312
0xa182c8 fold_binary_loc(unsigned int, tree_code, tree_node*, tree_node*,
tree_node*)
/work/alalaw01/src/gcc/gcc/fold-const.c:9138
0xa227b2 fold_build2_stat_loc(unsigned int, tree_code, tree_node*, tree_node*,
tree_node*)
/work/alalaw01/src/gcc/gcc/fold-const.c:12333
0x10e00cd generic_simplify_46
/work/alalaw01/build-arm-none-eabi/obj/gcc2/gcc/generic-match.c:2014
0x1112b27 generic_simplify_EQ_EXPR
/work/alalaw01/build-arm-none-eabi/obj/gcc2/gcc/generic-match.c:22441
0x111d719 generic_simplify(unsigned int, tree_code, tree_node*, tree_node*,
tree_node*)
/work/alalaw01/build-arm-none-eabi/obj/gcc2/gcc/generic-match.c:25312
0xa182c8 fold_binary_loc(unsigned int, tree_code, tree_node*, tree_node*,
tree_node*)
/work/alalaw01/src/gcc/gcc/fold-const.c:9138
0xa3ec75 fold(tree_node*)
/work/alalaw01/src/gcc/gcc/fold-const.c:11973
0x5bdff3 build_new_op_1
/work/alalaw01/src/gcc/gcc/cp/call.c:5730
0x5be299 build_new_op(unsigned int, tree_code, int, tree_node*, tree_node*,
tree_node*, tree_node**, int)
/work/alalaw01/src/gcc/gcc/cp/call.c:5803
0x70f42f build_x_binary_op(unsigned int, tree_code, tree_node*, tree_code,
tree_node*, tree_code, tree_node**, int)
/work/alalaw01/src/gcc/gcc/cp/typeck.c:3828
0x6e3b39 cp_parser_binary_expression
/work/alalaw01/src/gcc/gcc/cp/parser.c:8621
0x6e3cdc cp_parser_assignment_expression
/work/alalaw01/src/gcc/gcc/cp/parser.c:8742
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

Reduced testcase attached:

$ arm-none-eabi-gcc -c reduced.cc
reduced.cc: In function 'bool __gxx_personality_v0(_Unwind_State,
_Unwind_Control_Block*, _Unwind_Context*)':
re

[Bug tree-optimization/68549] [6 Regression] ICE: in verify_loop_structure, at cfgloop.c:1669

2015-11-26 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68549

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 CC||alalaw01 at gcc dot gnu.org

--- Comment #8 from alalaw01 at gcc dot gnu.org ---
Here's another testcase, reduced from value.c in gdb - ICEs at -O2 on (at
least) x86_64 and AArch64:

typedef long unsigned int size_t;
extern void *xmalloc (size_t) __attribute__ ((__malloc__)) __attribute__
((__returns_nonnull__));
struct __jmp_buf_tag
  {
  };
extern int __sigsetjmp (struct __jmp_buf_tag __env[1], int __savemask)
__attribute__ ((__nothrow__));
typedef struct __jmp_buf_tag sigjmp_buf[1];
extern sigjmp_buf *exceptions_state_mc_init (void);
extern int exceptions_state_mc_action_iter (void);
extern void printf_unfiltered (const char *, ...)
;
extern struct gdbarch *get_current_arch (void);
struct internalvar
{
  struct internalvar *next;
};
static struct internalvar *internalvars;
struct internalvar *
create_internalvar (const char *name)
{
  struct internalvar *var = ((struct internalvar *) xmalloc (sizeof (struct
internalvar)));
  internalvars = var;
}
void
show_convenience ()
{
  struct gdbarch *gdbarch = get_current_arch ();
  int varseen = 0;
  for (struct internalvar *var = internalvars; var; var = var->next)
{
  if (!varseen)
varseen = 1;
  sigjmp_buf *buf = exceptions_state_mc_init ();
  __sigsetjmp ( (*buf), 1);
 while (exceptions_state_mc_action_iter ())
   while (exceptions_state_mc_action_iter ())
;
}
  if (!varseen)
  printf_unfiltered ( "" );
}

[Bug target/63870] [Aarch64] [ARM] Errors in use of NEON intrinsics are reported incorrectly

2015-01-15 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63870

--- Comment #7 from alalaw01 at gcc dot gnu.org ---
I'm doing some of the ARM work atm, but not sure how far I'll get before stage
4 starts.


[Bug target/64893] [5 Regression] ICE while doing a bootstrap with the latest compiler

2015-02-02 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64893

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 CC||alalaw01 at gcc dot gnu.org

--- Comment #7 from alalaw01 at gcc dot gnu.org ---
This feels like we are working around a deficiency in the C++ frontend, which
is a shame, but if we have to, then seems to me like an OK way to do so.


[Bug target/64997] New: [AArch64] Illegal EON on SIMD registers

2015-02-10 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64997

Bug ID: 64997
   Summary: [AArch64] Illegal EON on SIMD registers
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org

Testcase:

#include 
#define force_simd(V1) asm volatile ("mov %d0, %1.d[0]" \
: "=w"(V1) \
: "w"(V1)  \
: /* No clobbers */)

int foo(int64x1_t val4, int64x1_t val6, int64x1_t val7)
{
  int64x1_t val5 = vbic_s64 (val4,
 veor_s64 (val6,
   vsri_n_s64 (val6, val7, 13)));
  force_simd (val5);
  return vget_lane_s64 (val5, 0) == 0 ? 1 : 0;
}

generates an illegal assembly instruction (eon v1, v3, v1 -- EON works only on
General-Purpose Registers) at -O1 and higher.


[Bug target/64997] [AArch64] Illegal EON on SIMD registers

2015-02-10 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64997

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-02-10
   Assignee|unassigned at gcc dot gnu.org  |alalaw01 at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from alalaw01 at gcc dot gnu.org ---
Results from split condition of xor_one_cmpl pattern using 'which_alternative'
variable, which is not defined in split phase.


[Bug target/64997] [AArch64] Illegal EON on SIMD registers

2015-02-25 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64997

--- Comment #2 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Wed Feb 25 14:20:13 2015
New Revision: 220969

URL: https://gcc.gnu.org/viewcvs?rev=220969&root=gcc&view=rev
Log:
[AArch64] Fix illegal assembly 'eon v1, v2, v3'

PR target/64997
* config/aarch64/aarch64.md (*xor_one_cmpl3): Use FP_REGNUM_P
as split condition; force split via '#' in output pattern.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64.md


[Bug target/64997] [AArch64] Illegal EON on SIMD registers

2015-02-26 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64997

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from alalaw01 at gcc dot gnu.org ---
Fixed r220969


[Bug tree-optimization/61114] Scalar evolution hides a big-endian const-folding bug.

2014-10-27 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114

--- Comment #9 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Mon Oct 27 14:04:43 2014
New Revision: 216736

URL: https://gcc.gnu.org/viewcvs?rev=216736&root=gcc&view=rev
Log:
[Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

PR tree-optimization/61114
* expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
extract_bit_field around optab result.

* fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR, produce
scalar not vector.

* tree-cfg.c (verify_gimple_assign_unary): Check result vs operand type
for REDUC_{MIN,MAX,PLUS}_EXPR.

* tree-vect-loop.c (vect_analyze_loop): Update comment.
(vect_create_epilog_for_reduction): For direct vector reduction, use
result of tree code directly without extract_bit_field.

* tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
comment.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/expr.c
trunk/gcc/fold-const.c
trunk/gcc/tree-cfg.c
trunk/gcc/tree-vect-loop.c
trunk/gcc/tree.def


[Bug tree-optimization/61114] Scalar evolution hides a big-endian const-folding bug.

2014-10-27 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114

--- Comment #10 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Mon Oct 27 14:20:52 2014
New Revision: 216737

URL: https://gcc.gnu.org/viewcvs?rev=216737&root=gcc&view=rev
Log:
Add new optabs for reducing vectors to scalars

PR tree-optimization/61114
* doc/md.texi (Standard Names): Add reduc_(plus,[us](min|max))|scal
optabs, and note in reduc_[us](plus|min|max) to prefer the former.

* expr.c (expand_expr_real_2): Use reduc_..._scal if available, fall
back to old reduc_... + BIT_FIELD_REF only if not.

* optabs.c (optab_for_tree_code): for REDUC_(MAX,MIN,PLUS)_EXPR,
return the reduce-to-scalar (reduc_..._scal) optab.
(scalar_reduc_to_vector): New.

* optabs.def (reduc_smax_scal_optab, reduc_smin_scal_optab,
reduc_plus_scal_optab, reduc_umax_scal_optab, reduc_umin_scal_optab):
New.

* optabs.h (scalar_reduc_to_vector): Declare.

* tree-vect-loop.c (vectorizable_reduction): Look for optabs reducing
to either scalar or vector.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/doc/md.texi
trunk/gcc/expr.c
trunk/gcc/optabs.c
trunk/gcc/optabs.def
trunk/gcc/optabs.h
trunk/gcc/tree-vect-loop.c


[Bug target/59843] ICE with return of generic vector on aarch64

2014-11-19 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59843

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from alalaw01 at gcc dot gnu.org ---
Fixed on trunk in r211502 and backported to 4.9.


[Bug target/63950] New: [AArch64] ICE at -O0 on vld1_lane intrinsics

2014-11-19 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63950

Bug ID: 63950
   Summary: [AArch64] ICE at -O0 on vld1_lane intrinsics
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org

Testcase

#include 

int8x8_t
f_vld1_lane (int8_t * p, int8x8_t v)
{
  int8x8_t res;
  res = vld1_lane_s8 (p, v, 1);
  return res;
}

$ aarch64-none-elf-gcc -S test.c

In file included from neon_const_range_tests/vld1.c:2:0:
/work/alalaw01/sbuild/install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h:
In function 'f_vld1_lane':
/work/alalaw01/sbuild/install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h:658:10:
internal compiler error: in aarch64_simd_lane_bounds, at
config/aarch64/aarch64.c:8394
   return __aarch64_vset_lane_any (__vec, __index, __elem, 8);
  ^
0xf9419d aarch64_simd_lane_bounds(rtx_def*, long, long)
/work/alalaw01/svn/gcc/gcc/config/aarch64/aarch64.c:8394
0xfffcf4 gen_aarch64_im_lane_boundsi(rtx_def*, rtx_def*)
/work/alalaw01/svn/gcc/gcc/config/aarch64/aarch64-simd.md:4524
0x7cc10e insn_gen_fn::operator()(rtx_def*, rtx_def*) const
/work/alalaw01/svn/gcc/gcc/recog.h:303
0xf9a366 aarch64_simd_expand_args
/work/alalaw01/svn/gcc/gcc/config/aarch64/aarch64-builtins.c:970
0xf9a703 aarch64_simd_expand_builtin(int, tree_node*, rtx_def*)
/work/alalaw01/svn/gcc/gcc/config/aarch64/aarch64-builtins.c:1051
0xf9ac21 aarch64_expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode,
int)
/work/alalaw01/svn/gcc/gcc/config/aarch64/aarch64-builtins.c:1133

.

Seems to be caused by lack of constant propagation at -O0, compiles fine at -O1
and higher.


[Bug target/63870] [Aarch64] [ARM] Errors in use of NEON intrinsics are reported incorrectly

2014-12-09 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63870

--- Comment #3 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Tue Dec  9 20:23:36 2014
New Revision: 218536

URL: https://gcc.gnu.org/viewcvs?rev=218536&root=gcc&view=rev
Log:
[AArch64]Remove be_checked_get_lane, check bounds with
__builtin_aarch64_im_lane_boundsi.

gcc/:

PR target/63870
* config/aarch64/aarch64-simd-builtins.def (be_checked_get_lane):
Delete.
* config/aarch64/aarch64-simd.md (aarch64_be_checked_get_lane):
Delete.
* config/aarch64/arm_neon.h (aarch64_vget_lane_any): Use GCC
vector extensions, __aarch64_lane, __builtin_aarch64_im_lane_boundsi.
(__aarch64_vget_lane_f32, __aarch64_vget_lane_f64,
__aarch64_vget_lane_p8, __aarch64_vget_lane_p16,
__aarch64_vget_lane_s8, __aarch64_vget_lane_s16,
__aarch64_vget_lane_s32, __aarch64_vget_lane_s64,
__aarch64_vget_lane_u8, __aarch64_vget_lane_u16,
__aarch64_vget_lane_u32, __aarch64_vget_lane_u64,
__aarch64_vgetq_lane_f32, __aarch64_vgetq_lane_f64,
__aarch64_vgetq_lane_p8, __aarch64_vgetq_lane_p16,
__aarch64_vgetq_lane_s8, __aarch64_vgetq_lane_s16,
__aarch64_vgetq_lane_s32, __aarch64_vgetq_lane_s64,
__aarch64_vgetq_lane_u8, __aarch64_vgetq_lane_u16,
__aarch64_vgetq_lane_u32, __aarch64_vgetq_lane_u64): Delete.
(__aarch64_vdup_lane_any): Use __aarch64_vget_lane_any, remove
'q2' argument.
(__aarch64_vdup_lane_f32, __aarch64_vdup_lane_f64,
__aarch64_vdup_lane_p8, __aarch64_vdup_lane_p16,
__aarch64_vdup_lane_s8, __aarch64_vdup_lane_s16,
__aarch64_vdup_lane_s32, __aarch64_vdup_lane_s64,
__aarch64_vdup_lane_u8, __aarch64_vdup_lane_u16,
__aarch64_vdup_lane_u32, __aarch64_vdup_lane_u64,
__aarch64_vdup_laneq_f32, __aarch64_vdup_laneq_f64,
__aarch64_vdup_laneq_p8, __aarch64_vdup_laneq_p16,
__aarch64_vdup_laneq_s8, __aarch64_vdup_laneq_s16,
__aarch64_vdup_laneq_s32, __aarch64_vdup_laneq_s64,
__aarch64_vdup_laneq_u8, __aarch64_vdup_laneq_u16,
__aarch64_vdup_laneq_u32, __aarch64_vdup_laneq_u64): Remove argument
to __aarch64_vdup_lane_any.
(vget_lane_f32, vget_lane_f64, vget_lane_p8, vget_lane_p16,
vget_lane_s8, vget_lane_s16, vget_lane_s32, vget_lane_s64,
vget_lane_u8, vget_lane_u16, vget_lane_u32, vget_lane_u64,
vgetq_lane_f32, vgetq_lane_f64, vgetq_lane_p8, vgetq_lane_p16,
vgetq_lane_s8, vgetq_lane_s16, vgetq_lane_s32, vgetq_lane_s64,
vgetq_lane_u8, vgetq_lane_u16, vgetq_lane_u32, vgetq_lane_u64,
vdupb_lane_p8, vdupb_lane_s8, vdupb_lane_u8, vduph_lane_p16,
vduph_lane_s16, vduph_lane_u16, vdups_lane_f32, vdups_lane_s32,
vdups_lane_u32, vdupb_laneq_p8, vdupb_laneq_s8, vdupb_laneq_u8,
vduph_laneq_p16, vduph_laneq_s16, vduph_laneq_u16, vdups_laneq_f32,
vdups_laneq_s32, vdups_laneq_u32, vdupd_laneq_f64, vdupd_laneq_s64,
vdupd_laneq_u64, vfmas_lane_f32, vfma_laneq_f64, vfmad_laneq_f64,
vfmas_laneq_f32, vfmss_lane_f32, vfms_laneq_f64, vfmsd_laneq_f64,
vfmss_laneq_f32, vmla_lane_f32, vmla_lane_s16, vmla_lane_s32,
vmla_lane_u16, vmla_lane_u32, vmla_laneq_f32, vmla_laneq_s16,
vmla_laneq_s32, vmla_laneq_u16, vmla_laneq_u32, vmlaq_lane_f32,
vmlaq_lane_s16, vmlaq_lane_s32, vmlaq_lane_u16, vmlaq_lane_u32,
vmlaq_laneq_f32, vmlaq_laneq_s16, vmlaq_laneq_s32, vmlaq_laneq_u16,
vmlaq_laneq_u32, vmls_lane_f32, vmls_lane_s16, vmls_lane_s32,
vmls_lane_u16, vmls_lane_u32, vmls_laneq_f32, vmls_laneq_s16,
vmls_laneq_s32, vmls_laneq_u16, vmls_laneq_u32, vmlsq_lane_f32,
vmlsq_lane_s16, vmlsq_lane_s32, vmlsq_lane_u16, vmlsq_lane_u32,
vmlsq_laneq_f32, vmlsq_laneq_s16, vmlsq_laneq_s32, vmlsq_laneq_u16,
vmlsq_laneq_u32, vmul_lane_f32, vmul_lane_s16, vmul_lane_s32,
vmul_lane_u16, vmul_lane_u32, vmuld_lane_f64, vmuld_laneq_f64,
vmuls_lane_f32, vmuls_laneq_f32, vmul_laneq_f32, vmul_laneq_f64,
vmul_laneq_s16, vmul_laneq_s32, vmul_laneq_u16, vmul_laneq_u32,
vmulq_lane_f32, vmulq_lane_s16, vmulq_lane_s32, vmulq_lane_u16,
vmulq_lane_u32, vmulq_laneq_f32, vmulq_laneq_f64, vmulq_laneq_s16,
vmulq_laneq_s32, vmulq_laneq_u16, vmulq_laneq_u32) : Use
__aarch64_vget_lane_any.

gcc/testsuite/:

* gcc.target/aarch64/simd/vget_lane_f32_indices_1.c: New test.
* gcc.target/aarch64/simd/vget_lane_f64_indices_1.c: Likewise.
* gcc.target/aarch64/simd/vget_lane_p16_indices_1.c: Likewise.
* gcc.target/aarch64/simd/vget_lane_p8_indices_1.c: Likewise.
* gcc.target/aarch64/simd/vget_lane_s16_indices_1.c: Likewise.
* gcc.target/aarch64/simd/vget_lane_s32_indices_1.c: Likewise.
* gcc.target/aarch64/simd/vget_lane_s64_indices_1.c: Likewise.
* gcc.target/aarch64/simd/vget_lane_s8_indices_1.c: Likewise.
* gcc.target/aarch64/simd/vget_lane_u16_indices_1.c: Likewise.
* gcc.target/aarch64/simd/vget_lane_u32_indices_1.c: Likewise.
* gcc.target/aarch64/simd/vget_lane_u64_indices_1.c: Likewise.
* gcc.target/aarch64/simd/vget_lane_u8_ind

[Bug target/63950] [AArch64] ICE at -O0 on vld1_lane intrinsics

2014-12-10 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63950

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Tue Dec  9 19:37:18 2014
New Revision: 218531

URL: https://gcc.gnu.org/viewcvs?rev=218531&root=gcc&view=rev
Log:
[AArch64]Fix ICE at -O0 on vld1_lane intrinsics

gcc/:

* config/aarch64/arm_neon.h (__AARCH64_NUM_LANES, __aarch64_lane *2):
New.
(aarch64_vset_lane_any): Redefine using previous, same for BE + LE.
(vset_lane_f32, vset_lane_f64, vset_lane_p8, vset_lane_p16,
vset_lane_s8, vset_lane_s16, vset_lane_s32, vset_lane_s64,
vset_lane_u8, vset_lane_u16, vset_lane_u32, vset_lane_u64): Remove
number of lanes.
(vld1_lane_f32, vld1_lane_f64, vld1_lane_p8, vld1_lane_p16,
vld1_lane_s8, vld1_lane_s16, vld1_lane_s32, vld1_lane_s64,
vld1_lane_u8, vld1_lane_u16, vld1_lane_u32, vld1_lane_u64): Call
__aarch64_vset_lane_any rather than vset_lane_xxx.

gcc/testsuite/:

* gcc.target/aarch64/vld1_lane-o0.c: New test.


[Bug target/63870] [Aarch64] [ARM] Errors in use of NEON intrinsics are reported incorrectly

2014-12-10 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63870

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 CC||alalaw01 at gcc dot gnu.org

--- Comment #4 from alalaw01 at gcc dot gnu.org ---
(Apologies for out-of-orderness, I missed PRs from logs so adding by hand)

Author: alalaw01
Date: Tue Dec  9 19:52:22 2014
Revision: 218532

https://gcc.gnu.org/viewcvs?rev=218532&root=gcc&view=rev
Log:
[AArch64] Fix ICE on non-constant indices to __builtin_aarch64_im_lane_boundsi

gcc/:

* config/aarch64/aarch64-builtins.c (aarch64_types_binopv_qualifiers,
TYPES_BINOPV): Delete.
(enum aarch64_builtins): Add AARCH64_BUILTIN_SIMD_LANE_CHECK and
AARCH64_SIMD_PATTERN_START.
(aarch64_init_simd_builtins): Register
__builtin_aarch64_im_lane_boundsi; use  AARCH64_SIMD_PATTERN_START.
(aarch64_simd_expand_builtin): Handle AARCH64_BUILTIN_LANE_CHECK; use
AARCH64_SIMD_PATTERN_START.

* config/aarch64/aarch64-simd.md (aarch64_im_lane_boundsi): Delete.
* config/aarch64/aarch64-simd-builtins.def (im_lane_bound): Delete.

* config/aarch64/arm_neon.h (__AARCH64_LANE_CHECK): New.
(__aarch64_vget_lane_f64, __aarch64_vget_lane_s64,
__aarch64_vget_lane_u64, __aarch64_vset_lane_any, vdupd_lane_f64,
vdupd_lane_s64, vdupd_lane_u64, vext_f32, vext_f64, vext_p8, vext_p16,
vext_s8, vext_s16, vext_s32, vext_s64, vext_u8, vext_u16, vext_u32,
vext_u64, vextq_f32, vextq_f64, vextq_p8, vextq_p16, vextq_s8,
vextq_s16, vextq_s32, vextq_s64, vextq_u8, vextq_u16, vextq_u32,
vextq_u64, vmulq_lane_f64): Use __AARCH64_LANE_CHECK.

gcc/testsuite/:

* gcc.target/aarch64/simd/vset_lane_s16_const_1.c: New test.


[Bug target/59843] ICE with return of generic vector on aarch64

2014-07-08 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59843

--- Comment #6 from alalaw01 at gcc dot gnu.org ---
Author: alalaw01
Date: Tue Jul  8 10:32:57 2014
New Revision: 212355

URL: https://gcc.gnu.org/viewcvs?rev=212355&root=gcc&view=rev
Log:
Backport r211502: PR target/59843 Fix arm_neon.h ZIP/UZP/TRN for bigendian

2014-06-10  Alan Lawrence  

gcc/:
* config/aarch64/aarch64-modes.def: Add V1DFmode.
* config/aarch64/aarch64.c (aarch64_vector_mode_supported_p):
Support V1DFmode.

gcc/testsuite/:
   * gcc.dg/vect/vect-singleton_1.c: New file.



Added:
branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/vect/vect-singleton_1.c
Modified:
branches/gcc-4_9-branch/gcc/ChangeLog
branches/gcc-4_9-branch/gcc/config/aarch64/aarch64-modes.def
branches/gcc-4_9-branch/gcc/config/aarch64/aarch64.c
branches/gcc-4_9-branch/gcc/testsuite/ChangeLog


[Bug tree-optimization/68681] New: testcase gcc.dg/vect/pr45752.c fails on AArch64

2015-12-03 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68681

Bug ID: 68681
   Summary: testcase gcc.dg/vect/pr45752.c fails on AArch64
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

Created attachment 36900
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36900&action=edit
tree-vect-details dump

Since r231015 (https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03371.html), on
AArch64 we have

FAIL: gcc.dg/vect/pr45752.c scan-tree-dump-times vect "gaps requires scalar
epilogue loop" 0
FAIL: gcc.dg/vect/pr45752.c -flto -ffat-lto-objects  scan-tree-dump-times vect
"gaps requires scalar epilogue loop" 0

I attach -fdump-tree-vect-details from the non-lto case (line 5379:
gcc/testsuite/gcc.dg/vect/pr45752.c:45:3: note: Data access with gaps requires
scalar epilogue loop)

[Bug tree-optimization/68707] New: testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

2015-12-04 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707

Bug ID: 68707
   Summary: testcase gcc.dg/vect/O3-pr36098.c vectorized using
VEC_PERM_EXPR rather than VEC_LOAD_LANES
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64, arm

Created attachment 36928
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36928&action=edit
tree-vect-details dump (before patch, with LOAD_LANES)

Prior to r230993, O3-pr36098.c (at -O3) was vectorized using a LOAD_LANES /
STORE_LANES, resulting in:

.L5:
ld4 {v4.4s - v7.4s}, [x7], 64
add w4, w4, 1
cmp w3, w4
orr v1.16b, v4.16b, v4.16b
orr v2.16b, v5.16b, v5.16b
orr v3.16b, v6.16b, v6.16b
st3 {v1.4s - v3.4s}, [x6], 48
bhi .L5

each iteration of the outer loop processes a struct of 4 ints, of which the
first 3 are copied to a destination. The ld4 nicely gets us four structs with
all the elements we want in three registers row-wise (and the elements we don't
want in a fourth):
struct1 struct2 struct3 struct4
v4.s[0] v4.s[1] v4.s[2] v4.s[3]
v5.s[0] v5.s[1] v5.s[2] v5.s[3]
v6.s[0] v6.s[1] v6.s[2] v6.s[3]
v7.s[0] v7.s[1] v7.s[2] v7.s[3]
and st3 stores the desired rows (only) to the right locations.

Following r230993, instead the loop gets unrolled four times, four vectors are
loaded sequentially, and then permuted by SLP:

.L5:
ldr q0, [x5, 16]
add x4, x4, 48
ldr q1, [x5, 32]
add w6, w6, 1
ldr q4, [x5, 48]
cmp w3, w6
ldr q2, [x5], 64
orr v3.16b, v0.16b, v0.16b
orr v5.16b, v4.16b, v4.16b
orr v4.16b, v1.16b, v1.16b
tbl v0.16b, {v0.16b - v1.16b}, v6.16b
tbl v2.16b, {v2.16b - v3.16b}, v7.16b
tbl v4.16b, {v4.16b - v5.16b}, v16.16b
str q0, [x4, -32]
str q2, [x4, -48]
str q4, [x4, -16]
bhi .L5

that is, we load

struct1 struct2 struct3 struct4
v2.s[0] v0.s[0] v1.s[0] v4.s[0]
v2.s[1] v0.s[1] v1.s[1] v4.s[1]
v2.s[2] v0.s[2] v1.s[2] v4.s[2]
v2.s[3] v0.s[3] v1.s[3] v4.s[3]

and then permute

struct1 struct2 struct3 struct4
v2.s[0] v2.s[3] v0.s[2] v4.s[1]
v2.s[1] v0.s[0] v0.s[3] v4.s[2]
v2.s[2] v0.s[1] v4.s[0] v4.s[3]

so we then have the data 'columnwise' and store each sequentially.

[Bug tree-optimization/68707] testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

2015-12-04 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707

--- Comment #1 from alalaw01 at gcc dot gnu.org ---
Created attachment 36929
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36929&action=edit
tree-vect-details dump (after patch, with SLP)

  1   2   >