Re: patch to fix PR65647

2015-04-05 Thread Yvan Roux
Hi,

The issue is also present in 4.9 branch as explained in:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65647

As 4.9 doesn't contains lra rematerialization passe, we only need to
stop updating lra_constraint_new_regno_start when inheritance is
switched off.

Bootstrapped and tested on x64_64 and cross built and tested on
AArch64, arm, armeb and i686. Ok for 4.9 ?

Cheers,
Yvan

2015-04-05  Yvan Roux  

Backport from trunk r221867
2015-04-04  Vladimir Makarov  

PR target/65647
* lra.c (lra): Stop updating lra_constraint_new_regno_start after
  switching off inheritance.

2015-04-05  Yvan Roux  

Backport from trunk r221867
2015-04-04  Vladimir Makarov  

PR target/65647
* gcc.target/arm/pr65647.c: New.
* gcc.target/arm/pr65647-2.c: New.

On 4 April 2015 at 16:39, Vladimir Makarov  wrote:
>   The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65647
>
>   This very complicated problem occurred in rare cases when chain of reloads
> involving at least two pseudos and insns were generated and one pseudo was
> spilled on later sub-passes after spilling sub-pass and this pattern
> happened to be repeated.
>
>   The patch was bootstrapped and tested on x86/x86-64, ppc64, and aarch64.
>
>   Committed as rev.221867.
>
> 2015-04-04  Vladimir Makarov  
>
> PR target/65647
> * lra-int.h (LRA_MAX_REMATERIALIZATION_PASSES): New.  Add its
> value checking.
> (lra_rematerialization_iter): New.
> * lra.c (lra): Initialize lra_rematerialization_iter.
> Stop updating lra_constraint_new_regno_start after switching of
> inheritance and rematerialization.
> * lra-remat.c (lra_rematerialization_iter): New.
> (lra_remat): Add printing pass iteration.  Do rematerialization
> only first LRA_MAX_REMATERIALIZATION_PASSES iterations.
>
> 2015-04-04  Vladimir Makarov  
>
> PR target/65647
> * gcc.target/arm/pr65647.c: New.
>
>
diff --git a/gcc/lra.c b/gcc/lra.c
index 69b08dc..98f0444 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -2417,7 +2417,11 @@ lra (FILE *f)
   /* Assignment of stack slots changes elimination offsets for
 some eliminations.  So update the offsets here.  */
   lra_eliminate (false, false);
-  lra_constraint_new_regno_start = max_reg_num ();
+  /* After switching off inheritance passe, don't forget reload pseudos
+ after spilling sub-pass to avoid LRA cycling in some complicated
+cases.  */
+  if (lra_inheritance_iter <= LRA_MAX_INHERITANCE_PASSES)
+lra_constraint_new_regno_start = max_reg_num ();
   lra_constraint_new_insn_uid_start = get_max_uid ();
   lra_assignment_iter_after_spill = 0;
 }
diff --git a/gcc/testsuite/gcc.target/arm/pr65647-2.c 
b/gcc/testsuite/gcc.target/arm/pr65647-2.c
new file mode 100644
index 000..f2985f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr65647-2.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -marm -march=armv6 -std=c99" } */
+
+typedef struct {
+  int i;
+} x264_union32_t;
+typedef struct {
+  int level_idx;
+} trellis_node_t;
+int a, c, d, f, h, i = (int)&c;
+trellis_node_t b[1][1];
+short *e = 0;
+short g;
+void fn1() {
+  int k[64 * 8 * 2];
+  trellis_node_t *l = b[0];
+  for (; i >= d; i--) {
+if (e[i]) {
+  for (int j = 1; j < 8; j++) {
+((x264_union32_t *)&k[a])->i = l[j].level_idx;
+l[j].level_idx = a;
+a++;
+  }
+  continue;
+}
+for (int j;; j++)
+  ;
+  }
+  int m[6] __attribute__((aligned(16)));
+  for (; h; h++, f++)
+g = m[h];
+}
diff --git a/gcc/testsuite/gcc.target/arm/pr65647.c 
b/gcc/testsuite/gcc.target/arm/pr65647.c
new file mode 100644
index 000..686eb58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr65647.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv6-m -mthumb -O3 -w" } */
+
+a, b, c, e, g = &e, h, i = 7, l = 1, m, n, o, q = &m, r, s = &r, u, w = 9, x,
+  y = 6, z, t6 = 7, t8, t9 = 1, t11 = 5, t12 = &t8, t13 = 3, t15,
+  t16 = &t15;
+struct {
+  long long f3;
+char f4
+} p = {3}
+
+,
+  t = {4};
+
+struct S1 {
+  long long f0;
+  short f1;
+long long f2
+} d;
+long long f = 4073709551613, t7 = 8, t14 = 4073709551610;
+j[];
+k = j;
+v = &d;
+*t10 = j;
+struct S1 fn1();
+struct S1 fn2() {
+  signed char t1;
+  struct S1 t2;
+  long t3 = x;
+  short t4 = h;
+  short *t5 = &l;
+  fn1(t2, w, 1, o);
+  if (u) {
+l = q;
+t1 = a < b ?: b;
+z = c >= 2 || t1 << c;
+  }
+  *t5 = t4 &= t3;
+  fn3(y);
+}
+
+fn4() {
+  t6 = t.f3;
+  fn5(k, t7);
+}
+
+struct S1 fn1() {
+  f = 0;
+  for (; i;)
+;
+  t11 = 0;
+  t13 = *t10 = t14 || n;
+  t9 = t12;
+  for (; p.f4;)
+s = t16 <= fn6();
+  if (g)
+v = 0;
+}


Re: [Patch, Fortran, pr60322] was: [Patch 1/2, Fortran, pr60322] [OOP] Incorrect bounds on polymorphic dummy array

2015-04-05 Thread Paul Richard Thomas
Dear Andre,

Well, time passed and it didn't get done. Too much going on at the moment!

As you say, the patch bootstraps and regtests on x86_64, FC21 in my case.

I am now very reluctant to mess around with the gcc-5 release. Thus, I
think that this patch must be committed to 5.2 and 6.0, when the are
open for business.

A few trivial comments:

+  /* The dummy is returned for pointer, allocatable or assumed rank arrays.
+ The check for pointerness needs to be repeated here (it is done in
+ IS_CLASS_ARRAY (), too), because for class arrays that are pointers, as
+ is the one of the sym, which is incorrect here.  */

What does this mean, please?

+  /* Returning the descriptor for dummy class arrays is hazardous, because
+ some caller is expecting an expression to apply the component refs to.
+ Therefore the descriptor is only created and stored in
+ sym->backend_decl's GFC_DECL_SAVED_DESCRIPTOR.  The caller is then
+ responsible to extract it from there, when the descriptor is
+ desired.  */
+  if (IS_CLASS_ARRAY (sym)
+  && (!DECL_LANG_SPECIFIC (sym->backend_decl)
+  || !GFC_DECL_SAVED_DESCRIPTOR (sym->backend_decl)))
+{
+  decl = gfc_build_dummy_array_decl (sym, sym->backend_decl);
+  /* Prevent the dummy from being detected as unused if it is copied.  */
+  if (sym->backend_decl != NULL && decl != sym->backend_decl)
+DECL_ARTIFICIAL (sym->backend_decl) = 1;
+  sym->backend_decl = decl;
+}

The comments, such as the above are often going well beyond column 72,
into the 80's. I know that much of the existing code violates this
style requirement but there is no need to do so if clarity is not
reduced thereby.

In trans-stmt.c s/standart/standard/

Don't forget to put the PR numbers in the ChangeLogs.

For this submission, I would have appreciated some a description of
what each chunk in the patch is doing, just because there is so much
of it. I suppose that it was good for my imortal soul to sort it out
for myself but it took a little while :-)

Cheers and many thanks for the patch.

Paul

On 27 March 2015 at 13:48, Paul Richard Thomas
 wrote:
> Dear Andre,
>
> I am in the UK as of last night. Before leaving, I bootstrapped and
> regtested your patch and all was well. I must drive to Cambridge this
> afternoon to see my mother and will try to get to it either this
> evening or tomorrow morning. There is so much of it and it touches
> many places; so I must give it a very careful looking over before
> giving the green light. Bear with me please.
>
> Great work though!
>
> Paul
>
> On 24 March 2015 at 18:06, Andre Vehreschild  wrote:
>> Hi all,
>>
>> I have worked on the comments Mikael gave me. I am now checking for
>> class_pointer in the way he pointed out.
>>
>> Furthermore did I *join the two parts* of the patch into this one, because
>> keeping both in sync was no benefit but only tedious and did not prove to be
>> reviewed faster.
>>
>> Paul, Dominique: I have addressed the LOC issue that came up lately. Or 
>> rather
>> the patch addressed it already. I feel like this is not tested very well, not
>> the loc() call nor the sizeof() call as given in the 57305 second's download.
>> Unfortunately, is that download not runable. I would love to see a test 
>> similar
>> to that download, but couldn't come up with one, that satisfied me. Given 
>> that
>> the patch's review will last some days, I still have enough time to come up
>> with something beautiful which I will add then.
>>
>> Bootstraps and regtests ok on x86_64-linux-gnu/F20.
>>
>> Regards,
>> Andre
>>
>>
>> On Tue, 24 Mar 2015 11:13:27 +0100
>> Paul Richard Thomas  wrote:
>>
>>> Dear Andre,
>>>
>>> Dominique pointed out to me that the 'loc' patch causes a ICE in the
>>> testsuite. It seems that 'loc' should provide the address of the class
>>> container in some places and the address of the data in others. I will
>>> put my thinking cap on tonight :-)
>>>
>>> Cheers
>>>
>>> Paul
>>>
>>> On 23 March 2015 at 13:43, Andre Vehreschild  wrote:
>>> > Hi Mikael,
>>> >
>>> > thanks for looking at the patch. Please note, that Paul has sent an
>>> > addendum to the patches for 60322, which I deliberately have attached.
>>> >
>>> >>  26/02/2015 18:17, Andre Vehreschild a écrit :
>>> >> > This first patch is only preparatory and does not change any of the
>>> >> > semantics of gfortran at all.
>>> >> Sure?
>>> >
>>> > With the counterexample you found below, this of course is a wrong
>>> > statement.
>>> >
>>> >> > diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
>>> >> > index ab6f7a5..d28cf77 100644
>>> >> > --- a/gcc/fortran/expr.c
>>> >> > +++ b/gcc/fortran/expr.c
>>> >> > @@ -4059,10 +4060,10 @@ gfc_lval_expr_from_sym (gfc_symbol *sym)
>>> >> >lval->symtree = gfc_find_symtree (sym->ns->sym_root, sym->name);
>>> >> >
>>> >> >/* It will always be a full array.  */
>>> >> > -  lval->rank = sym->as ? sym->as->rank : 0;
>>> >> > +  as = sym->as;
>>> >> > 

[patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Thomas Koenig
Hello world,

this is a first draft of a patch to inline matmul (PR 37171). This is
preliminary, but functional as far as it goes.  Definitely for the
next stage one :-)

Basically, it takes

c = matmul(a,b)

and converts this into

   BLOCK
 integer i,j,k
 c = 0
 do j=0, size(b,2)-1
   do k=0, size(a, 2)-1
 do i=0, size(a, 1)-1
c(i * stride(c,1) + lbound(c,1), j * stride(c,2) +
lbound(c,2)) =
c(i * stride(c,1) + lbound(c,1), j * stride(c,2) + lbound(c,2)) +
a(i * stride(a,1) + lbound(a,1), k * stride(a,2) +
lbound(a,2)) *
b(k * stride(b,1) + lbound(b,1), j * stride(b,2) + lbound(b,2))
 end do
   end do
 end do
   END BLOCK

For this, I wrote something like a scalarizer that only operates on the
AST only.  I find it rather straigtforward to use (in contrast to the
one we have), but that may also be due to the fact that I wrote it
myself :-)

I chose zero-based loop variables because I didn't want to special-case
too many strange array bounds, and I trusted the induction variable
optimization and CSE in the middle end and in the RTL optimization.
If this approach suboptimizes something, let me know.

References are frozen to make sure that no additional evaluations are
done on.

The patch to simplify.c is not strictly necessary, but it should open
up some optimization possiblities and, frankly, makes the
-fdump-fortran-optimized output at least halfway readable.

This patch needs to be extended in different directions:

- Currently, it only handles the case of two-dimensional arguments

- Special cases like transpose(a) and spread(a) in the arguments
  arguments should be added, possibly with a reordering of loops
  and addition of some more variables.

- Temporary handling for arguments and results, which does not depend
  on allocatable RHS.  I am thinking of creating nested blocks,
  where the inner block has arrays with the bounds of the
  references.

- Cases like matmul(a+1,b) can also be handled in the loop,
  without a temporary

- This needs command-line arguments, -finline-matmul and
  -finline-matmul-limit=, similar to the handling of BLAS.

- Bounds checking could/should also be handled.  Currently, this
  causes the only regression because of a different error message.
  We need to call gfortran_runtime_error directly from the front end,
  and to be able to tell the bounds-checking code "do not worry about
  bounds checking this, it has already been taken care of".

- I already have some test cases, but for this, I would really like
  to cover all code paths.  This also needs some more work.

So, what do you think about this?

Thomas

2015-04-05  Thomas Koenig  

* frontend-passes.c (create_var):  Add optional argument
vname as part of the name.  Split off block creation into
(insert_block): New function.
(cfe_expr_0):  Use "fcn" as part of tempoary name.
(optimize_namesapce):  Call optimize_matmul_assign.
(combine_array_constructor):  Use "constr" as part of
temporary name.
(get_array_inq_function):  New function.
(is_function_or_op):  New function.
(has_function_or_op):  New function.
(freeze_expr):  New function.
(freeze_references):  New function.
(convert_to_index_kind):  New function.
(create_do_loop):  New function.
(get_operand):  New function.
(get_size_1):  New function.
(scalarize_expr):  New function.
(optimize_matmul_assign): New function.
* simplify.c (simplify_bound):  Simplify the case of the
lower bound of an assumed-shape argument.
Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 221757)
+++ frontend-passes.c	(Arbeitskopie)
@@ -43,7 +43,11 @@ static void doloop_warn (gfc_namespace *);
 static void optimize_reduction (gfc_namespace *);
 static int callback_reduction (gfc_expr **, int *, void *);
 static void realloc_strings (gfc_namespace *);
-static gfc_expr *create_var (gfc_expr *);
+static gfc_expr *create_var (gfc_expr *, const char *vname=NULL);
+static int optimize_matmul_assign (gfc_code **, int *, void *);
+static gfc_code * create_do_loop (gfc_expr *, gfc_expr *, gfc_expr *,
+  locus *, gfc_namespace *, 
+  char *vname=NULL);
 
 /* How deep we are inside an argument list.  */
 
@@ -95,6 +99,10 @@ static bool in_assoc_list;
 
 /* Entry point - run all passes for a namespace.  */
 
+/* Counter for temporary variables.  */
+
+static int var_num = 1;
+
 void
 gfc_run_passes (gfc_namespace *ns)
 {
@@ -157,7 +165,7 @@ realloc_string_callback (gfc_code **c, int *walk_s
 return 0;
   
   current_code = c;
-  n = create_var (expr2);
+  n = create_var (expr2, "trim");
   co->expr2 = n;
   return 0;
 }
@@ -524,29 +532,11 @@ constant_string_length (gfc_expr *e)
 
 }
 
-/* Returns a new expression (a variable) to be used in place of the old one

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Jerry DeLisle

On 04/05/2015 05:32 AM, Thomas Koenig wrote:
--- snip ---


So, what do you think about this?

Thomas


I am curious about what performance gain results from this? I can see saving a 
library call to our runtime libraries.  Do you have some timing results?


Jerry


Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Thomas Koenig
Hi Jerry,

> I am curious about what performance gain results from this? I can see
> saving a library call to our runtime libraries.  Do you have some timing
> results?

The speedup can be quite drastic for small matrices which can be
completely unrolled by -O3:

b1.f90:

program main
  use b2
  implicit none
  real, dimension(3,3) :: a, b, c
  integer :: i

  call random_number(a)
  call random_number(b)
  do i=1,10**8
 c = matmul(a,b)
 call bar(b,c)
  end do
end program main

b2.f90:

module b2
contains
  subroutine bar(b,c)
real, dimension(3,3) :: b,c
  end subroutine bar
end module b2

ig25@linux-fd1f:~/Krempel/Matmul> gfortran -O3 -fno-frontend-optimize
b2.f90 b1.f90 && time ./a.out

real0m15.411s
user0m15.404s
sys 0m0.001s

ig25@linux-fd1f:~/Krempel/Matmul> gfortran -O3 b2.f90 b1.f90 && time ./a.out

real0m1.736s
user0m1.735s
sys 0m0.001s



Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Dominique Dhumieres
> > So, what do you think about this?
>
> I am curious about what performance gain results from this?
> I can see saving a library call to our runtime libraries.
> Do you have some timing results?
>
> Jerry

IMO the inlining of MATMUL should be restricted to small matrices (less than 
4x4, 9x9
or 16x16 depending of your field!-). I have a variant of the polyhedron test 
induct.f90
in which I have replaced three dot products by the equivalent matmul and which 
executes
ten times slower than the original test.

Dominique


Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Thomas Koenig
Hi Dominique,

> IMO the inlining of MATMUL should be restricted to small matrices (less than 
> 4x4, 9x9
> or 16x16 depending of your field!-)

The problem with the library function we have is that it is quite
general; it can deal with all the complexity of assumed-shape array
arguments.  Inlining allows the compiler to take advantage of
contiguous memory, compile-time knowledge of array sizes, etc.
to allow for vectorization or even complete unrolling.

Of course, if you use c=matmul(a,b) where all three are assumed-shape
arrays, your advantage over the library function will be minimal at
best.

It will be interesting to see at what size calling BLAS directly
will be better.

Thomas


Re: Work around ICE in PR 65648

2015-04-05 Thread Gerald Pfeifer
On Fri, 3 Apr 2015, Jan Hubicka wrote:
> -#ifdef ENABLE_CHECKING
> +  /* This is used only for assert bellow.  */
> +#if 0

"below" (instead of "bellow")

Gerald


Re: patch to fix PR65647

2015-04-05 Thread Vladimir Makarov



On 05/04/15 04:45 AM, Yvan Roux wrote:

Hi,

The issue is also present in 4.9 branch as explained in:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65647

As 4.9 doesn't contains lra rematerialization passe, we only need to
stop updating lra_constraint_new_regno_start when inheritance is
switched off.

Bootstrapped and tested on x64_64 and cross built and tested on
AArch64, arm, armeb and i686. Ok for 4.9 ?



Yes.  The patch is ok for the branch.  Thanks, Yvan.



Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Dominique d'Humières
I have done some timings

(1) with the test given below, before the patch I get (last column in Gflops)

[Book15] f90/bug% gfc -Ofast timing/matmul_tst_sys.f90 -framework Accelerate
[Book15] f90/bug% time a.out
 Time, MATMUL:373.708008   373.69497100014.2815668504139435 

 Time, MATMUL:172.086609   23.117919169.210381782201068 

545.374u 0.537s 6:36.93 137.5%  0+0k 0+0io 0pf+0w
[Book15] f90/bug% gfc -Ofast -fexternal-blas timing/matmul_tst_sys.f90  
-framework Accelerate
[Book15] f90/bug% time a.out
 Time, MATMUL:176.327881   23.855111867.071577781735002 

 Time, MATMUL:182.086746   24.453551165.430170039516952 

358.059u 0.471s 0:48.43 740.2%  0+0k 0+0io 0pf+0w

after the patch

[Book15] f90/bug% time a.out
 Time, MATMUL:392.415436   392.4172874.0772923337669056 

 Time, MATMUL:171.690399   22.905118269.853383859450091 

563.671u 0.551s 6:55.44 135.8%  0+0k 0+0io 0pf+0w
[Book15] f90/bug% gfc -Ofast -fexternal-blas timing/matmul_tst_sys.f90 
-framework Accelerate
[Book15] f90/bug% time a.out
 Time, MATMUL:392.850342   392.8419074.0728852177349673 

 Time, MATMUL:174.534821   23.784797167.269861500184334 

566.824u 0.674s 6:56.74 136.1%  0+0k 0+0io 0pf+0w

which means that -fexternal-blas should disable the inlining.

program t2 
implicit none 
REAL time_begin, time_end 
integer, parameter :: n=2000; 
integer(8) :: ts, te, rate8, cmax8
real(8) :: elapsed
REAL(8) :: a(n,n), b(n,n), c(n,n) 
integer, parameter :: m = 100 
integer :: i 
call RANDOM_NUMBER(a) 
call RANDOM_NUMBER(b) 
call cpu_time(time_begin) 
call SYSTEM_CLOCK (ts, rate8, cmax8)
do i = 1,m 
a(1,1) = a(1,1) + 0.1 
c = MATMUL(a,b) 
enddo 
call SYSTEM_CLOCK (te, rate8, cmax8)
call cpu_time(time_end) 
elapsed = real(te-ts, kind=8)/real(rate8, kind=8)
PRINT *, 'Time, MATMUL: ',time_end-time_begin, elapsed , 2*m*real(n, 
kind=8)**3/(10**9*elapsed)
call cpu_time(time_begin) 
call SYSTEM_CLOCK (ts, rate8, cmax8)
do i = 1,m 
a(1,1) = a(1,1) + 0.1 
call dgemm('n','n',n, n, n, dble(1.0), a, n, b, n, dble(0.0), c, n) 
enddo 
call SYSTEM_CLOCK (te, rate8, cmax8)
call cpu_time(time_end) 
elapsed = real(te-ts, kind=8)/real(rate8, kind=8)
PRINT *, 'Time, MATMUL: ',time_end-time_begin, elapsed , 2*m*real(n, 
kind=8)**3/(10**9*elapsed)
end program 
! 
http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/1cba8e6ce5080197

(2) The polyhedron test compiled with -fprotect-parens -Ofast -funroll-loops 
-ftree-loop-linear -fomit-frame-pointer -fwhole-program -flto runs in ~6.5s. 
After applying the patch below, it runs in ~66s with/without the patch.

Dominique

--- induct.f90  2005-10-11 22:53:32.0 +0200
+++ induct_vmc.f90  2015-04-05 19:06:30.0 +0200
@@ -1644,18 +1644,17 @@ contains
   coil_tmp_vector(1) = -sin(theta)
   coil_tmp_vector(2) = cos(theta)
   coil_tmp_vector(3) = 0.0_longreal
-  coil_current_vec(1) = 
dot_product(rotate_coil(1,:),coil_tmp_vector(:))
-  coil_current_vec(2) = 
dot_product(rotate_coil(2,:),coil_tmp_vector(:))
-  coil_current_vec(3) = 
dot_product(rotate_coil(3,:),coil_tmp_vector(:))
+  coil_current_vec = matmul(rotate_coil,coil_tmp_vector)
 !
   do j = 1, 9
   c_vector(3) = 0.5 * h_coil * z1gauss(j)
 !
 !   rotate coil vector into the global coordinate system and translate it
 !
-  rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx
-  rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy
-  rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz
+  rot_c_vector = matmul(rotate_coil,c_vector)
+  rot_c_vector(1) = rot_c_vector(1) + dx
+  rot_c_vector(2) = rot_c_vector(2) + dy
+  rot_c_vector(3) = rot_c_vector(3) + dz
 !
   do k = 1, 9
   q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal)
@@ -1664,9 +1663,7 @@ contains
 !
 !   rotate quad vector into the global coordinate system
 !
-  rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:))
-  rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:))
-  rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:))
+  rot_q_vector = matmul(rotate_quad,q_vector)
 !
 !   compute and add in quadrature term
 !
@@ -1756,18 +1753,17 @@ contains
   coil_tmp_vector(1) = -sin(theta)
   coil_tmp_vector(2) = cos(theta)
   coil_tmp_vector(3) = 0.0_longreal
-  coil_current_vec(1) = 
dot_product(rotate_coil(1,:),coil_tmp_vector(:))
-  coil_current_vec(2) = 
dot_product(rotate_coil(2,:),coil_tmp_vector(:))
-  coil_current_vec(3) = 
dot_product(rotate_coil(3,:),coil_tmp_vector(:))
+

pr59016

2015-04-05 Thread Evangelos Drikos

Hi,

The attached patch, type 0, has been discussed a little at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59016

Yet, the final version submitted is slightly different from the ones discussed 
and tested in the above link.

Having read the GNU Coding Conventions, I created a patch using svn diff 
(“-up”) and I added a small test case. Yet, the function names aren’t printed 
into the context (surrounding changes).

Further, I've run the “check_GNU_style.sh” script to ensure that the source 
code meets the GNU style requirements. 

Also, I don’t have DejaGNU installed and thus I think that I cannot run all the 
tests. In other words, evaluation is up to the GNU team.

Finally, it might be obvious that the patch is for trunk (not gcc-4.9.2).

Regards,
Ev. Drikos 





gcc-5.0-pr59016.patch
Description: Binary data



Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Thomas Koenig
Hi Dominique,

> which means that -fexternal-blas should disable the inlining.

It is not surprising that a higly tuned BLAS library is better than
a simple inlining for large matrices.

I did some tests by adjusting n; it seems the inline version is
faster for n<=22, which is not too bad.

Regarding your other test case:  This tests matrix*vector
multiplication, which is not implemented yet :-)

Regards,

Thomas



pr59016

2015-04-05 Thread Evangelos Drikos

Hi,

The attached patch, type 0, has been discussed a little at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59016

Yet, the final version submitted is slightly different from the ones discussed 
and tested in the above link.

Having read the GNU Coding Conventions, I created a patch using svn diff 
(“-up”) and I added a small test case.

Further, I've run the “check_GNU_style.sh” script to ensure that the source 
code meets the GNU style requirements. 

Also, I don’t have DejaGNU installed and thus I think that I cannot run all the 
tests. In other words, evaluation is up to the GNU team.

Finally, it might be obvious that the patch is for trunk (not gcc-4.9.2).

Regards,
Ev. Drikos 





gcc-5.0-pr59016.c_log
Description: Binary data


gcc-5.0-pr59016.patch
Description: Binary data


Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Dominique d'Humières
The patch causes the following regressions:

FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single  -O2  -latomic (internal 
compiler error)
FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single  -O2  -latomic (test for 
excess errors)
FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=lib  -O2  -lcaf_single -latomic 
(internal compiler error)
FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=lib  -O2  -lcaf_single -latomic 
(test for excess errors)
FAIL: gfortran.dg/array_function_3.f90   -O  (internal compiler error)
FAIL: gfortran.dg/array_function_3.f90   -O  (test for excess errors)
FAIL: gfortran.dg/bind_c_vars.f90   -g -flto  (test for excess errors)
FAIL: gfortran.dg/bound_2.f90   -O0  (internal compiler error)
FAIL: gfortran.dg/bound_2.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/bound_2.f90   -O1  (internal compiler error)
FAIL: gfortran.dg/bound_2.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/bound_2.f90   -O2  (internal compiler error)
FAIL: gfortran.dg/bound_2.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/bound_2.f90   -O3 -fomit-frame-pointer  (internal compiler 
error)
FAIL: gfortran.dg/bound_2.f90   -O3 -fomit-frame-pointer  (test for excess 
errors)
FAIL: gfortran.dg/bound_2.f90   -O3 -fomit-frame-pointer -funroll-loops  
(internal compiler error)
FAIL: gfortran.dg/bound_2.f90   -O3 -fomit-frame-pointer -funroll-loops  (test 
for excess errors)
FAIL: gfortran.dg/bound_2.f90   -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (internal compiler error)
FAIL: gfortran.dg/bound_2.f90   -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (test for excess errors)
FAIL: gfortran.dg/bound_2.f90   -O3 -g  (internal compiler error)
FAIL: gfortran.dg/bound_2.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/bound_2.f90   -Os  (internal compiler error)
FAIL: gfortran.dg/bound_2.f90   -Os  (test for excess errors)
FAIL: gfortran.dg/bound_2.f90   -g -flto  (internal compiler error)
FAIL: gfortran.dg/bound_7.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/bound_7.f90   -O1  (internal compiler error)
FAIL: gfortran.dg/bound_7.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/bound_7.f90   -O2  (internal compiler error)
FAIL: gfortran.dg/bound_7.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/bound_7.f90   -O3 -fomit-frame-pointer  (internal compiler 
error)
FAIL: gfortran.dg/bound_7.f90   -O3 -fomit-frame-pointer  (test for excess 
errors)
FAIL: gfortran.dg/bound_7.f90   -O3 -fomit-frame-pointer -funroll-loops  
(internal compiler error)
FAIL: gfortran.dg/bound_7.f90   -O3 -fomit-frame-pointer -funroll-loops  (test 
for excess errors)
FAIL: gfortran.dg/bound_7.f90   -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (internal compiler error)
FAIL: gfortran.dg/bound_7.f90   -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (test for excess errors)
FAIL: gfortran.dg/bound_7.f90   -O3 -g  (internal compiler error)
FAIL: gfortran.dg/bound_7.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/bound_7.f90   -Os  (internal compiler error)
FAIL: gfortran.dg/bound_7.f90   -Os  (test for excess errors)
FAIL: gfortran.dg/bound_7.f90   -g -flto  (internal compiler error)
FAIL: gfortran.dg/bound_7.f90   -g -flto  (test for excess errors)
FAIL: gfortran.dg/bound_8.f90   -O0  (internal compiler error)
FAIL: gfortran.dg/bound_8.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/bound_8.f90   -O1  (internal compiler error)
FAIL: gfortran.dg/bound_8.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/bound_8.f90   -O2  (internal compiler error)
FAIL: gfortran.dg/bound_8.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/bound_8.f90   -O3 -fomit-frame-pointer  (internal compiler 
error)
FAIL: gfortran.dg/bound_8.f90   -O3 -fomit-frame-pointer  (test for excess 
errors)
FAIL: gfortran.dg/bound_8.f90   -O3 -fomit-frame-pointer -funroll-loops  
(internal compiler error)
FAIL: gfortran.dg/bound_8.f90   -O3 -fomit-frame-pointer -funroll-loops  (test 
for excess errors)
FAIL: gfortran.dg/bound_8.f90   -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (internal compiler error)
FAIL: gfortran.dg/bound_8.f90   -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (test for excess errors)
FAIL: gfortran.dg/bound_8.f90   -O3 -g  (internal compiler error)
FAIL: gfortran.dg/bound_8.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/bound_8.f90   -Os  (internal compiler error)
FAIL: gfortran.dg/bound_8.f90   -Os  (test for excess errors)
FAIL: gfortran.dg/bound_8.f90   -g -flto  (internal compiler error)
FAIL: gfortran.dg/bound_8.f90   -g -flto  (test for excess errors)

I think the line

  if (!upper && as->type == AS_ASSUMED_SHAPE && dim)

should be something such as (untested)

  if (!upper && dim && as && as->type == AS_ASSUMED_SHAPE)

Dominique

> Le 5 avr. 2015 à 22:37, Thomas Koenig  a écrit :
> 
> Hi Dominique,
> 
>> which means that -fexternal-blas should disable the inlining.
> 
> It

Re: [libstdc++/65033] Give alignment info to libatomic

2015-04-05 Thread Hans-Peter Nilsson
On Fri, 3 Apr 2015, Jonathan Wakely wrote:

> On 03/04/15 05:24 -0400, Hans-Peter Nilsson wrote:
> > On Thu, 2 Apr 2015, Hans-Peter Nilsson wrote:
> > > Why then use __alignof(_M_i) (the object-alignment)
> > > instead of _S_alignment (the deduced alas insufficiently
> > > increased type-alignment)?
>
> Isn't the object aligned to _S_alignment?

We did specify that with the alignas.  Is the alignof always
exactly the same as an alignas, if one is specified?  (And will
that not change in a future amendment, standard and/or
implementation?)  Either way, is there a test-case to guard all
this?

Those questions wouldn't even be asked if we used _S_alignment
for the fake-pointer too, just as a matter of defensive
programming.

> Instead of changing every case in the condition to include sizeof why
> not just do it afterwards using sizeof(_Tp), in the _S_alignment
> calculation?

Doh.

> We know sizeof(_Tp) == sizeof(corresponding integer type) because
> that's the whole point of the conditionals! See attachment.
>
> > @@ -216,7 +216,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >   is_lock_free() const noexcept
> >   {
> > // Produce a fake, minimally aligned pointer.
> > -   void *__a = reinterpret_cast(-__alignof(_M_i));
> > +   void *__a = reinterpret_cast(-_S_alignment);
> > return __atomic_is_lock_free(sizeof(_M_i), __a);
> >   }
>
> If _M_i is aligned to _S_alignment then what difference does the
> change above make?
>
> It doesn't matter if the value is per-object if we've forced all such
> objects to have the same alignment, does it?
>
> Or is it different if a std::atomic is included in some other
> struct and the user forces a different alignment on it? I don't think
> we really need to support that, users shouldn't be doing that.

Why do we even need to ask those questions, when the patch takes
care of the per-type business without doubt?

> The attached patch against trunk should have the same result with much
> less effort.
>
> It doesn't include the changes to the reinterpret_cast
> expressions to produce a minimally aligned pointer, but I think this
> is progress, thanks :-)

Progress is good. :)

brgds, H-P


Re: [PATCH, libmpx, i386, PR driver/65444] Pass '-z bndplt' when building dynamic objects with MPX

2015-04-05 Thread Sandra Loosemore

On 04/03/2015 01:34 PM, Joseph Myers wrote:

On Tue, 31 Mar 2015, Ilya Enkovich wrote:


+library.  It also passes '-z bndplt' to a linker in case it supports this
+option (which is checked on libmpx configuration).  Note that old versions
+of linker may ignore option.  Gold linker doesn't support '-z bndplt'
+option.  With no '-z bndplt' support in linker all calls to dynamic libraries
+lose passed bounds reducing overall protection level.  It's highly
+recommended to use linker with '-z bndplt' support.  In case such linker
+is not available it is adviced to always use @option{-static-libmpxwrappers}
+for better protection level or use @option{-static} to completely avoid
+external calls to dynamic libraries.  MPX-based instrumentation


Use @samp{-z bndplt} rather than '' quoting (but Sandra may have further
advice on the substance of this documentation).


To tell the truth, I can't figure out what this means from a user 
perspective.  How does a user know whether the linker option is being 
ignored, or if they have a new enough linker?  If the linker available 
at configuration time doesn't support the option, does that mean the 
option will never be passed and users will never know that there are 
gaping holes in the pointer bounds checking?


My suggestion would be to pass the option unconditionally and make the 
documentation say something like


It also passes @option{-z bndplt} to the linker.  LD version xxx or 
later is required to use this feature.  If no linker support for 
@option{-z bndplt} is available, you should link with 
@option{-static-libmpxwrappers} or @option{-static} instead; otherwise 
calls to dynamic libraries lose bounds checking protection.


... where you need to fill in "version xxx" appropriately.

-Sandra



Re: [PATCH, libmpx, i386, PR driver/65444] Pass '-z bndplt' when building dynamic objects with MPX

2015-04-05 Thread H.J. Lu
On Sun, Apr 5, 2015 at 6:44 PM, Sandra Loosemore
 wrote:
> On 04/03/2015 01:34 PM, Joseph Myers wrote:
>>
>> On Tue, 31 Mar 2015, Ilya Enkovich wrote:
>>
>>> +library.  It also passes '-z bndplt' to a linker in case it supports
>>> this
>>> +option (which is checked on libmpx configuration).  Note that old
>>> versions
>>> +of linker may ignore option.  Gold linker doesn't support '-z bndplt'
>>> +option.  With no '-z bndplt' support in linker all calls to dynamic
>>> libraries
>>> +lose passed bounds reducing overall protection level.  It's highly
>>> +recommended to use linker with '-z bndplt' support.  In case such linker
>>> +is not available it is adviced to always use
>>> @option{-static-libmpxwrappers}
>>> +for better protection level or use @option{-static} to completely avoid
>>> +external calls to dynamic libraries.  MPX-based instrumentation
>>
>>
>> Use @samp{-z bndplt} rather than '' quoting (but Sandra may have further
>> advice on the substance of this documentation).
>
>
> To tell the truth, I can't figure out what this means from a user
> perspective.  How does a user know whether the linker option is being
> ignored, or if they have a new enough linker?  If the linker available at
> configuration time doesn't support the option, does that mean the option
> will never be passed and users will never know that there are gaping holes
> in the pointer bounds checking?
>
> My suggestion would be to pass the option unconditionally and make the
> documentation say something like

I totally agree with it.

> It also passes @option{-z bndplt} to the linker.  LD version xxx or later is
> required to use this feature.  If no linker support for @option{-z bndplt}
> is available, you should link with @option{-static-libmpxwrappers} or
> @option{-static} instead; otherwise calls to dynamic libraries lose bounds
> checking protection.
>

This implies that -static-libmpxwrappers will cover all dynamic libraries,
which isn't true.  -static-libmpxwrappers only covers calls to functions
defined in libmpxwrappers.so and leaves calls to functions defined in
other dynamic libraries open to buffer overflow.

-- 
H.J.


Re: [PATCH, libmpx, i386, PR driver/65444] Pass '-z bndplt' when building dynamic objects with MPX

2015-04-05 Thread Sandra Loosemore

On 04/05/2015 08:35 PM, H.J. Lu wrote:

On Sun, Apr 5, 2015 at 6:44 PM, Sandra Loosemore
 wrote:

 [snip snip]

To tell the truth, I can't figure out what this means from a user
perspective.  How does a user know whether the linker option is being
ignored, or if they have a new enough linker?  If the linker available at
configuration time doesn't support the option, does that mean the option
will never be passed and users will never know that there are gaping holes
in the pointer bounds checking?

My suggestion would be to pass the option unconditionally and make the
documentation say something like


I totally agree with it.


It also passes @option{-z bndplt} to the linker.  LD version xxx or later is
required to use this feature.  If no linker support for @option{-z bndplt}
is available, you should link with @option{-static-libmpxwrappers} or
@option{-static} instead; otherwise calls to dynamic libraries lose bounds
checking protection.



This implies that -static-libmpxwrappers will cover all dynamic libraries,
which isn't true.  -static-libmpxwrappers only covers calls to functions
defined in libmpxwrappers.so and leaves calls to functions defined in
other dynamic libraries open to buffer overflow.


See, I said I didn't really understand the details.  ;-)  Just strike 
the reference to -static-libmpxwrappers entirely, then.


-Sandra