gcc parallel make check

2014-09-03 Thread VandeVondele Joost
I've noticed that

make -j -k check-fortran

results in a serialized checking, while

make -j32 -k check-fortran

goes parallel. Somehow the explicit 'N' in -jN seems to be needed for the check 
target, while the other targets seem to do just fine. Is that a feature, or 
should I file a PR for that... ?

Somewhat related is there a rule of thumb on how is the granularity of 
parallel check decided ? E.g. check-fortran seems to be limited to about ~5 
parallel targets, which is few for a typical server (but of course a welcome 
speedup already).

Thanks,

Joost


RE: gcc parallel make check

2014-09-03 Thread VandeVondele Joost
> It is intentional.  With -j it is essentially a fork bomb, just don't use it.

well, silently ignoring it for just this target did cost me a lot of time, 
while an eventual fork bomb would have been dealt with much more quickly.

>> Somewhat related is there a rule of thumb on how is the granularity of
>> parallel check decided ?  E.g.  check-fortran seems to be limited to about
>> ~5 parallel targets, which is few for a typical server (but of course a
>> welcome speedup already).
>
>The splitting has some cost (e.g. lots of various checks are cached, with
>split jobs they need to be done in each separate goal), and the goal of the
>split is toplevel make check parallelization, not individual directory or
>language testing.  For the latter perhaps more fine grained split could be
>useful, but how would one find out if it is a toplevel make check, or say
>make -C gcc check where you test many languages, or check-gfortran?

the cost must be small compared to the possible gain... on a 32 core server, 
testing of fortran FE changes would be 4x larger. I notice that even on a full 
check, the Fortran tests are still running when the number of processes is 
already way below 32. However, the longest running (by a few minutes) are those:

expect -- /usr/share/dejagnu/runtest.exp --tool gcc lto.exp weak.exp tls.exp 
ipa.exp tree-ssa.exp debug.exp dwarf2.exp fixed-point.exp vxworks.exp 
cilk-plus.exp vmx.exp pch.exp simulate-thread.exp x86_64-costmodel-vect.exp 
i386-costmodel-vect.exp spu-costmodel-vect.exp ppc-costmodel-vect.exp 
charset.exp noncompile.exp tsan.exp graphite.exp compat.exp
expect -- /usr/share/dejagnu/runtest.exp --tool g++ lto.exp tls.exp gcov.exp 
debug.exp dwarf2.exp cilk-plus.exp pch.exp bprob.exp simulate-thread.exp 
vect.exp charset.exp tsan.exp graphite.exp compat.exp struct-layout-1.exp 
ubsan.exp tm.exp gomp.exp dfp.exp tree-prof.exp stackalign.exp plugin.exp 
guality.exp asan.exp ecos.exp

so can those be run more independently ?

RE: gcc parallel make check

2014-09-03 Thread VandeVondele Joost

> What did you expect for -j alone? an error?

No, as is standard in gnu make, a new process for any target that can be 
processed (i.e. unlimited).

>> ... check-fortran seems to be limited to about ~5 parallel targets ...
>
>Running the make with -j8 gives 7 directories gfortran[1-6]? in gcc/testsuite/.
>Note that the load balancing could be improved: few minutes with a single 
>thread
>over ~20 minutes.

I'd like to have roughly 32 directories (or as many of the -jN allows for).



RE: gcc parallel make check

2014-09-03 Thread VandeVondele Joost
> I have to admit that I don't know why that's the case. 

Actually Marc answered that one (I had the wrong mail address for gcc@ so 
repeat here):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53155

> See: gcc/fortran/Make-lang.in, which has:

I'll have a look and do some testing what the gains/costs of a further split 
are.

Joost

RE: gcc parallel make check

2014-09-03 Thread VandeVondele Joost
>> expect -- /usr/share/dejagnu/runtest.exp --tool gcc lto.exp weak.exp tls.exp 
>> ipa.exp tree-ssa.exp debug.exp >dwarf2.exp fixed-point.exp vxworks.exp 
>> cilk-plus.exp vmx.exp pch.exp simulate-thread.exp x86_64-costmodel-vect.exp 
>> i386-costmodel-vect.exp spu-costmodel-vect.exp ppc-costmodel-vect.exp 
>> charset.exp noncompile.exp tsan.exp graphite.exp compat.exp
>> expect -- /usr/share/dejagnu/runtest.exp --tool g++ lto.exp tls.exp gcov.exp 
>> debug.exp dwarf2.exp cilk-plus.exp pch.exp bprob.exp simulate-thread.exp 
>> vect.exp charset.exp tsan.exp graphite.exp compat.exp struct-layout-1.exp 
>> ubsan.exp tm.exp gomp.exp dfp.exp tree-prof.exp stackalign.exp plugin.exp 
>> guality.exp asan.exp ecos.exp
>>
>> so can those be run more independently ?

>It is a moving target, new tests are added every day.  I'm trying to adjust
>it during stage3/stage4 occassionally, but it also very much depends on
>which target it is (e.g. i?86/x86_64 has many more tests in i386.exp then
>other targets in their gcc.target), how fast the compiler is on the target
>(e.g. on some targets -g is much slower than on others, etc.).

could you point me to the right file (or example commit) for trying to adjust 
this ? I can try to do some testing and come back with some numbers.

[PATCH] RE: gcc parallel make check

2014-09-05 Thread VandeVondele Joost
> The splits are in the Makefiles, see check_gcc_parallelize

attached is a patch to improve the parallel performance of 'make -jXX -k 
check-fortran'. For XX=16, this yields ~50% speedup, and even with XX=4 we 
still have 15%, the measured slowdown at XX=1 (<2%) is in the noise of testing. 
The patch is a simple update of the 'check_gfortran_parallelize' variable, 
updating it from its 2008 values to a set that I found +- optimal based on 
several tests. Detailed timings are :

# timings/trunk-check-fortran
#cores  averagestd. dev. #tests
 1  2955.3275.06  3
 2  1735.30   122.26  3
 4   929.5154.19  3
 8   470.29 7.85  3
16   468.09 4.29  3
32   466.06 1.24  3

# timings/patched-check-fortran
#cores  averagestd. dev. #tests
 1  3008.8916.38  3
 2  1534.17   118.33  3
 4   800.1831.71  3
 8   418.71 0.20  2
16   298.29 5.86  3
32   299.84 1.34  3

There is no effect on a full 'make -j32 -k check' as other goals run for much 
longer (to be looked at in a followup).

A second part of the patch is a new file 'contrib/generate_tcl_patterns.sh' 
which generates the needed regexp to do the split based on an input of the 
files in the target directory. It basically groups the initial characters such 
that each regexp tries not to exceed a maximum number of files. So, the number 
of files is used as a proxy for the runtime. While I don't feel to strong about 
adding this (shell/gawk) script, it certainly is convenient, and makes sure 
that no characters are missing from the regexp. The maximum number of files per 
regexp is an input, testing (-j16) with 200, 300, 400 I found that 300 was 
optimal for testsuite/gfortran.dg, but this will depend on many things. 

A sample run would look like

gcc/gcc/testsuite/gfortran.dg> ls -1 | 
../../../contrib/generate_tcl_patterns.sh 300 "dg.exp=gfortran.dg/"
Adding label:  p matching files:499
Adding label:  c matching files:497
Adding label:  a matching files:448
Adding label:  i matching files:350
Adding label:  d matching files:245
Adding label:  s matching files:211
Adding label:  b matching files:206
Adding label:  t matching files:180
Adding label:  f matching files:173
Adding label:  e matching files:166
Adding label:  r matching files:165
Adding label:  n matching files:162
Adding label:  mu matching files:278
Adding label:  wlgo matching files:284
Adding label:  vhzPkqWx_-9876543210ZYXVUTSRQONMLKJIHGFEDCBAyj matching files:94
patterns:
dg.exp=gfortran.dg/p* \
dg.exp=gfortran.dg/c* \
dg.exp=gfortran.dg/a* \
dg.exp=gfortran.dg/i* \
dg.exp=gfortran.dg/\[wlgo\]* \
dg.exp=gfortran.dg/\[mu\]* \
dg.exp=gfortran.dg/d* \
dg.exp=gfortran.dg/s* \
dg.exp=gfortran.dg/b* \
dg.exp=gfortran.dg/t* \
dg.exp=gfortran.dg/f* \
dg.exp=gfortran.dg/e* \
dg.exp=gfortran.dg/r* \
dg.exp=gfortran.dg/n* \
dg.exp=gfortran.dg/\[vhzPkqWx_-9876543210ZYXVUTSRQONMLKJIHGFEDCBAyj\]* \

Is the current attached patch OK for trunk ?

contrib/ChangeLog

2014-09-05  Joost VandeVondele  

   * generate_tcl_patterns.sh: New file.

gcc/fortran/ChangeLog

 2014-09-05  Joost VandeVondele  

   * Make-lang.in (check_gfortran_parallelize): improved parallelism.


Index: contrib/generate_tcl_patterns.sh
===
--- contrib/generate_tcl_patterns.sh	(revision 0)
+++ contrib/generate_tcl_patterns.sh	(revision 0)
@@ -0,0 +1,86 @@
+#! /bin/sh
+
+#
+# based on a list of filenames as input,
+# generate regexps that match subsets trying to not exceed a
+# 'maxcount' parameter. Most useful to generate the
+# check_LANG_parallelize assignments needed to split
+# testsuite directories, defining prefix appropriately.
+#
+# Example usage:
+#   cd gcc/gcc/testsuite/gfortran.dg
+#   ls -1 | ../../../contrib/generate_tcl_patterns.sh 300 "dg.exp=gfortran.dg/"
+#
+# the first parameter is the maximum number of files.
+# the second parameter the prefix used for printing.
+#
+
+# Copyright (C) 2014 Free Software Foundation
+# Contributed by Joost VandeVondele 
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING.  If not, write to
+# the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+# Boston, MA 02110-1301, USA.
+
+gawk -v maxcount=$1 -v prefix=$2 '
+BEGIN{
+  

RE: [PATCH] RE: gcc parallel make check

2014-09-05 Thread VandeVondele Joost
>> > Please sort the letters (LC_ALL=C sort) and where consecutive, use ranges.
>> > Thus \[0-9A-Zhjqvx-z\]*

OK, works fine with the attached patch, and looks cleaner in Make-lang.in.

Now, with the proper email address for gcc-patches... I wonder how many time 
I'll be punished for typos.

unmodified CL.

Joost



Index: contrib/generate_tcl_patterns.sh
===
--- contrib/generate_tcl_patterns.sh	(revision 0)
+++ contrib/generate_tcl_patterns.sh	(revision 0)
@@ -0,0 +1,108 @@
+#! /bin/sh
+
+#
+# based on a list of filenames as input,
+# generate regexps that match subsets trying to not exceed a
+# 'maxcount' parameter. Most useful to generate the
+# check_LANG_parallelize assignments needed to split
+# testsuite directories, defining prefix appropriately.
+#
+# Example usage:
+#   cd gcc/gcc/testsuite/gfortran.dg
+#   ls -1 | ../../../contrib/generate_tcl_patterns.sh 300 "dg.exp=gfortran.dg/"
+#
+# the first parameter is the maximum number of files.
+# the second parameter the prefix used for printing.
+#
+
+# Copyright (C) 2014 Free Software Foundation
+# Contributed by Joost VandeVondele 
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING.  If not, write to
+# the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+# Boston, MA 02110-1301, USA.
+
+gawk -v maxcount=$1 -v prefix=$2 '
+BEGIN{
+  # list of allowed starting chars for a file name in a dir to split
+  achars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
+  ranget="112233"
+}
+{
+  nfiles++ ; files[nfiles]=$1
+}
+END{
+  for(i=1; i<=length(achars); i++) count[substr(achars,i,1)]=0
+  for(i=1; i<=nfiles; i++) {
+ if (length(files[i]>0)) { count[substr(files[i],1,1)]++ }
+  };
+  asort(count,ordered)
+  countsingle=0
+  groups=0
+  label=""
+  for(i=length(achars);i>=1;i--) {
+countsingle=countsingle+ordered[i] 
+for(j=1;j<=length(achars);j++) {
+   if(count[substr(achars,j,1)]==ordered[i]) found=substr(achars,j,1)
+}
+count[found]=-1
+label=label found
+if(i==1) { val=maxcount+1 } else { val=ordered[i-1] }
+if(countsingle+val>maxcount) {
+  subset[label]=countsingle
+  print "Adding label: ", label, "matching files:" countsingle
+  groups++
+  countsingle=0
+  label=""
+}
+  }
+  print "patterns:"
+  asort(subset,ordered)
+  for(i=groups;i>=1;i--) {
+for(j in subset){
+  if(subset[j]==ordered[i]) found=j
+}
+subset[found]=-1
+if (length(found)==1) {
+   printf("%s%s* \\\n",prefix,found)
+} else {
+   sortandcompress()
+   printf("%s\\[%s\\]* \\\n",prefix,found)
+}
+  }
+}
+function sortandcompress(i,n,tmp,bestj)
+{
+  n=length(found)
+  for(i=1; i<=n; i++) tmp[i]=substr(found,i,1) 
+  asort(tmp)
+  for(i=1;i<=n;i++){
+ipos=index(achars,tmp[i])
+for(j=i;j<=n;j++){
+  jpos=index(achars,tmp[j])
+  if (jpos-ipos==j-i && substr(ranget,ipos,1)==substr(ranget,jpos,1)) bestj=j
+}
+if (bestj-i>3) {
+  tmp[i+1]="-" 
+  for(j=i+2;j

RE: [PATCH] RE: gcc parallel make check

2014-09-08 Thread VandeVondele Joost
Attached is an extended version of the patch, it brings a 100% improvement in 
make -j32 -k check-gcc (down from 20min to <10min) by modification of 
check_gcc_parallelize.

It includes one non-trivial part, namely a split of the target exps. They are 
now all split using a common choice (based on i386), which I believe is 
reasonable as it is the target with most tests, and the patterns will be 
somewhat similar for other targets (e.g. split of p(rxxx)). The implementation 
of this in the makefile uses an odd looking technique to substitute spaces with 
commas in a variable, if this can be done more elegantly, I'm happy to make the 
change.

Bootstrap and testing revealed one issue, i386.exp hard-codes a loop for the 
testcase 'vect-args.c' in order to test 10 different combinations of options. 
With the current split (i.e. target x4) this test will thus be executed 4 
times. There are two easy options

1) keep the current setup, overhead is small
2) keep the .exp file simple and just replicate this test 10x 

I've selected 1), but I can update a patch with 2). Ideally dg-options in the 
testcase file itself could be repeated, but I haven't found an example of this. 

The script now includes sorting and compression of the ranges, and an 
additional sanity check on the input, i.e. that file names start with 
[0-9A-Za-z]. Some (few) files seem to start with _ or # (in ./gcc.dg/cpp/).

I'll follow up with a separate patch to improve check_g++_parallelize.

Full 'make -j k32 check' is now dominated by libstdc++ testing, which contains 
single goals that run ~1100s (e.g. regex related tests). These uses a slightly 
different syntax (see gcc/libstdc++-v3/testsuite/Makefile.am) and I'm not yet 
sure how to deal with the .am files.

current patch OK for trunk ?

Joost



patch-speedup-checkfortran-v05.CL
Description: patch-speedup-checkfortran-v05.CL
Index: contrib/generate_tcl_patterns.sh
===
--- contrib/generate_tcl_patterns.sh	(revision 0)
+++ contrib/generate_tcl_patterns.sh	(revision 0)
@@ -0,0 +1,114 @@
+#! /bin/sh
+
+#
+# based on a list of filenames as input, starting with [0-9A-Za-z],
+# generate regexps that match subsets trying to not exceed a
+# 'maxcount' parameter. Most useful to generate the
+# check_LANG_parallelize assignments needed to split
+# testsuite directories, defining prefix appropriately.
+#
+# Example usage:
+#   cd gcc/gcc/testsuite/gfortran.dg
+#   ls -1 | ../../../contrib/generate_tcl_patterns.sh 300 "dg.exp=gfortran.dg/"
+#
+# the first parameter is the maximum number of files.
+# the second parameter the prefix used for printing.
+#
+
+# Copyright (C) 2014 Free Software Foundation
+# Contributed by Joost VandeVondele 
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING.  If not, write to
+# the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+# Boston, MA 02110-1301, USA.
+
+gawk -v maxcount=$1 -v prefix=$2 '
+BEGIN{
+  # list of allowed starting chars for a file name in a dir to split
+  achars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
+  ranget="112233"
+}
+{
+  if (index(achars,substr($1,1,1))==0){
+ print "file : " $1 " does not start with an allowed character."
+ _assert_exit = 1
+ exit 1
+  }
+  nfiles++ ; files[nfiles]=$1
+}
+END{
+  if (_assert_exit) exit 1
+  for(i=1; i<=length(achars); i++) count[substr(achars,i,1)]=0
+  for(i=1; i<=nfiles; i++) {
+ if (length(files[i]>0)) { count[substr(files[i],1,1)]++ }
+  };
+  asort(count,ordered)
+  countsingle=0
+  groups=0
+  label=""
+  for(i=length(achars);i>=1;i--) {
+countsingle=countsingle+ordered[i] 
+for(j=1;j<=length(achars);j++) {
+   if(count[substr(achars,j,1)]==ordered[i]) found=substr(achars,j,1)
+}
+count[found]=-1
+label=label found
+if(i==1) { val=maxcount+1 } else { val=ordered[i-1] }
+if(countsingle+val>maxcount) {
+  subset[label]=countsingle
+  print "Adding label: ", label, "matching files:" countsingle
+  groups++
+  countsingle=0
+  label=""
+}
+  }
+  print "patterns:"
+  asort(subset,ordered)
+  for(i=groups;i>=1;i--) {
+for(j in subset){
+  if(subset[j]==ordered[i]) found=j
+}
+subset[found]=-1
+if (length(found)==1) {
+   printf("%s%s* \\\n",prefix,found)
+} else {
+   sortandcompress()
+   pri

RE: [PATCH] RE: gcc parallel make check

2014-09-09 Thread VandeVondele Joost
> +#   ls -1 | ../../../contrib/generate_tcl_patterns.sh 300
> "dg.exp=gfortran.dg/"
> 
> How does this work with subdirectories? Can we replace ls with find?

The input to the script is general, you can use this to your advantage. For 
example, I've been using:

 ls -1 g++.*/* | cut -c5- | ../../../contrib/generate_tcl_patterns.sh 700 
old-deja.exp=g++.old-deja/g++.

to split at a deeper level or

find . -name "[0-9A-Za-z]*" -type f -printf "%f\n" | 
../../../../contrib/generate_tcl_patterns.sh 300 dg-torture.exp=torture/

to collect statistics also from subdirs.

> +  if (_assert_exit) exit 1
>
> Haven't you already exited above?

yes, but the END{} block in awk is nevertheless executed, unless protected as 
above.

RE: [PATCH] RE: gcc parallel make check

2014-09-09 Thread VandeVondele Joost
> No.  As I wrote earlier, splitting on filenames and test counts only is only
> very rough split, all the splits really need to be backed out by real timing
> data from popular targets.  

I'm actually doing quite some testing trying to get a reasonable balance, 
checking 'completed in' in all *.log.sep files. However, it is important that 
the procedure is semi-automatic, otherwise few people will be interested in 
doing so. Furthermore, for parallel performance, it is not so important that 
times are distributed evenly (it is anyway unlikely the number of goals is 
exactly divided by N of -jN), but rather that the goals are ordered (executed) 
from slow to fast (similar to omp schedule guided). Most of the real 
bottlenecks are single letter patterns (e.g. p* since pr is such a common 
filename), and this is ultimately limiting.

In the project (CP2K) I'm working on, we also parallelize testing over 
directories, but we keep a list of approximate runtimes per directory, and keep 
that (global) list sorted. Testing follows that list. As a result, we have near 
perfect parallel speedup, despite (or because) timings per directory ranging 
from a few 100s to 1s. 

> Also, I'm afraid of some tests being left out
> unintentionally (e.g. the wildcards created at some point, then a new test
> is added with a weird starting character that hasn't been used before and
> suddenly it will not be tested with make -j?).

I agree this is an issue, partially addressed by not having to write patterns 
by hand anymore (i.e. a script does this), and by having the script check its 
input. There are something like 10 testnames that do not fall in [0-9A-Za-z], 
as mentioned in a previous email.


RE: [PATCH] RE: gcc parallel make check

2014-09-09 Thread VandeVondele Joost
> If you get whitespace right, one can provide multiple different wildcards to
> a single *.exp file, e.g.
> make check-gcc RUNTESTFLAGS="dg.exp='p[0-9A-Za-qs-z]* pr[9A-Za-z]*'" should
> cover all tests starting with p other than pr[0-8]*.c (where you could split
> say pr[0-2]* into another job, pr[3-5]* into another and pr[6-8]* into
> another.

I think this confirms that it becomes very delicate to try and write these more 
complex patterns. The above would miss p_test.c, p-1.c, etc ? 

For other classes of files the difference is even further down the filename 
(e.g. using dates as in 20020508-3.c going from 2000 to 2014, or avx*), making 
the automatic generation of the patterns more complicated.

I certainly don't want to claim that the patch I have now is perfect, it is 
rather an incremental improvement on the current setup.






RE: [PATCH] RE: gcc parallel make check

2014-09-09 Thread VandeVondele Joost
Now with gzipped figure.. why do these bounce ?

> But if there are jobs that just take 1s to complete, then clearly it doesn't
> make sense to split them off as separate job.  I think we don't need 100%
> even split, but at least roughly is highly desirable.

Let me add some data, attached is a graph (logscale y) showing the runtime of 
tests before and after my changes (including a new patch for c++). There is 
virtually no change for tests running shorter than 50s, only slowly running 
tests have been split.

Now, there are only very few slow tests remaining:

gcc_trunk/obj.new> find . -name "*.log" | xargs grep " completed in " | sort -n 
-k 5 | tail -n 10
./gcc/testsuite/gcc/gcc.log:testcase 
/data/vjoost/gnu/gcc_trunk/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp 
completed in 521 seconds
./x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.log:testcase 
/data/vjoost/gnu/gcc_trunk/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
 completed in 530 seconds
./x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.log:testcase 
/data/vjoost/gnu/gcc_trunk/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
 completed in 553 seconds
./x86_64-unknown-linux-gnu/libgomp/testsuite/libgomp.log:testcase 
/data/vjoost/gnu/gcc_trunk/gcc/libgomp/testsuite/libgomp.fortran/fortran.exp 
completed in 561 seconds
./gcc/testsuite/gcc/gcc.log:testcase 
/data/vjoost/gnu/gcc_trunk/gcc/gcc/testsuite/gcc.c-torture/compile/compile.exp 
completed in 625 seconds
./x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.log:testcase 
/data/vjoost/gnu/gcc_trunk/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
 completed in 683 seconds
./gcc/testsuite/g++/g++.log:testcase 
/data/vjoost/gnu/gcc_trunk/gcc/gcc/testsuite/g++.dg/dg.exp completed in 702 
seconds
./x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.log:testcase 
/data/vjoost/gnu/gcc_trunk/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
 completed in 726 seconds
./gcc/testsuite/gcc/gcc.log:testcase 
/data/vjoost/gnu/gcc_trunk/gcc/gcc/testsuite/gcc.c-torture/execute/execute.exp 
completed in 752 seconds
./x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.log:testcase 
/data/vjoost/gnu/gcc_trunk/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
 completed in 904 seconds

They, of course, limit the ultimate speedup.

timings.png.gz
Description: timings.png.gz


RE: [PATCH] RE: gcc parallel make check

2014-09-09 Thread VandeVondele Joost
Attached is a further revision of the patch, now dealing with check-c++. 
Roughly 50% speedup here at '-j32' (18m vs 12m). For my setup 
(--enable-languages=c,c++,fortran) I have now improved all targets called in 
'make -j32 -k check'. The latter is now 30% faster (15m vs 20m). Note that 
there are +- 1m fluctuations in these numbers, easily.

I currently have no plans to work on other check targets before this patch is 
committed.

OK for trunk ?

Joost







contrib/ChangeLog

2014-09-09  Joost VandeVondele  

* generate_tcl_patterns.sh: New file.

gcc/fortran/ChangeLog

2014-09-09  Joost VandeVondele  

* Make-lang.in (check_gfortran_parallelize): Improved parallelism.

gcc/Changelog

2014-09-09  Joost VandeVondele  

* Makefile.in (check_gcc_parallelize): Improved parallelism.
(check_p_numbers): Increase maximum value.
(dg_target_exps): Mention targets as separate words only.
(null,space,comma,dg_target_exps_p1,dg_target_exps_p2,
dg_target_exps_p3,dg_target_exps_p4): New variables.

gcc/cp/ChangeLog

2014-09-09  Joost VandeVondele  

* Make-lang.in (check_g++_parallelize): Improved parallelism.

libstdc++-v3/ChangeLog

2014-09-09  Joost VandeVondele  

* testsuite/Makefile.am (check_DEJAGNU_normal_targets): Add
check-DEJAGNUnormal[11-15].
(check-DEJAGNU): Split into 15 jobs for parallel testing.
* testsuite/Makefile.in: Regenerated.
Index: libstdc++-v3/testsuite/Makefile.am
===
--- libstdc++-v3/testsuite/Makefile.am	(revision 215017)
+++ libstdc++-v3/testsuite/Makefile.am	(working copy)
@@ -101,7 +101,7 @@ new-abi-baseline:
 	@test ! -f $*/site.exp || mv $*/site.exp $*/site.bak
 	@mv $*/site.exp.tmp $*/site.exp
 
-check_DEJAGNU_normal_targets = $(patsubst %,check-DEJAGNUnormal%,0 1 2 3 4 5 6 7 8 9 10)
+check_DEJAGNU_normal_targets = $(patsubst %,check-DEJAGNUnormal%,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
 $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: normal%/site.exp
 
 # Run the testsuite in normal mode.
@@ -111,7 +111,7 @@ check-DEJAGNU $(check_DEJAGNU_normal_tar
 	if [ -z "$*$(filter-out --target_board=%, $(RUNTESTFLAGS))" ] \
 	&& [ "$(filter -j, $(MFLAGS))" = "-j" ]; then \
 	  $(MAKE) $(AM_MAKEFLAGS) $(check_DEJAGNU_normal_targets); \
-	  for idx in 0 1 2 3 4 5 6 7 8 9 10; do \
+	  for idx in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do \
 	mv -f normal$$idx/libstdc++.sum normal$$idx/libstdc++.sum.sep; \
 	mv -f normal$$idx/libstdc++.log normal$$idx/libstdc++.log.sep; \
 	  done; \
@@ -138,25 +138,35 @@ check-DEJAGNU $(check_DEJAGNU_normal_tar
 	fi; \
 	dirs="`cd $$srcdir; echo [013-9][0-9]_*/*`";; \
 	  normal1) \
-	dirs="`cd $$srcdir; echo [ab]* de* [ep]*/*`";; \
+	dirs="`cd $$srcdir; echo e*/*`";; \
 	  normal2) \
-	dirs="`cd $$srcdir; echo 2[01]_*/*`";; \
+	dirs="`cd $$srcdir; echo 28_*/a*`";; \
 	  normal3) \
-	dirs="`cd $$srcdir; echo 22_*/*`";; \
+	dirs="`cd $$srcdir; echo 23_*/[lu]*`";; \
 	  normal4) \
-	dirs="`cd $$srcdir; echo 23_*/[a-km-tw-z]*`";; \
+	dirs="`cd $$srcdir; echo 2[459]_*/*`";; \
 	  normal5) \
-	dirs="`cd $$srcdir; echo 23_*/[luv]*`";; \
+	dirs="`cd $$srcdir; echo 2[01]_*/*`";; \
 	  normal6) \
-	dirs="`cd $$srcdir; echo 2[459]_*/*`";; \
+	dirs="`cd $$srcdir; echo 23_*/[m-tw-z]*`";; \
 	  normal7) \
-	dirs="`cd $$srcdir; echo 26_*/* 28_*/[c-z]*`";; \
+	dirs="`cd $$srcdir; echo 26_*/*`";; \
 	  normal8) \
 	dirs="`cd $$srcdir; echo 27_*/*`";; \
 	  normal9) \
-	dirs="`cd $$srcdir; echo 28_*/[ab]*`";; \
+	dirs="`cd $$srcdir; echo 22_*/*`";; \
 	  normal10) \
 	dirs="`cd $$srcdir; echo t*/*`";; \
+	  normal11) \
+	dirs="`cd $$srcdir; echo 28_*/b*`";; \
+	  normal12) \
+	dirs="`cd $$srcdir; echo 28_*/[c-z]*`";; \
+	  normal13) \
+	dirs="`cd $$srcdir; echo de* p*/*`";; \
+	  normal14) \
+	dirs="`cd $$srcdir; echo [ab]* 23_*/v*`";; \
+	  normal15) \
+	dirs="`cd $$srcdir; echo 23_*/[a-k]*`";; \
 	esac; \
 	if [ -n "$*" ]; then cd "$*"; fi; \
 	if $(SHELL) -c "$$runtest --version" > /dev/null 2>&1; then \
Index: libstdc++-v3/testsuite/Makefile.in
===
--- libstdc++-v3/testsuite/Makefile.in	(revision 215017)
+++ libstdc++-v3/testsuite/Makefile.in	(working copy)
@@ -301,7 +301,7 @@ lists_of_files = \
 
 extract_symvers = $(glibcxx_builddir)/scripts/extract_symvers
 baseline_subdir := $(shell $(CXX) $(baseline_subdir_switch))
-check_DEJAGNU_normal_targets = $(patsubst %,check-DEJAGNUnormal%,0 1 2 3 4 5 6 7 8 9 10)
+check_DEJAGNU_normal_targets = $(patsubst %,check-DEJAGNUnormal%,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
 
 # Runs the testsuite, but in compile only mode.
 # Can be used to test sources with non-GNU FE's at various warning
@@ -562,7 +562,7 @@ check-DEJAGNU $(check_DEJAGNU_normal_tar
 	if [ -z "$*$(filter-out --target_board=%, $(RUNTES

RE: [PATCH] RE: gcc parallel make check

2014-09-10 Thread VandeVondele Joost

Thanks for testing.

The vect-args.c I explained earlier, and is indeed due to i386.exp hardcoding 
those.

The libstdc++ double counts didn't appear in my testing, but I'll have  a look. 
Note that these patterns are handwritten, so error prone.

The long tests in libstdc++ come from (in timing order, from my machine):
  normal1) \
dirs="`cd $$srcdir; echo e*/*`";; \
  normal2) \
dirs="`cd $$srcdir; echo 28_*/a*`";; \
  normal3) \
dirs="`cd $$srcdir; echo 23_*/[lu]*`";; \
  normal4) \
dirs="`cd $$srcdir; echo 2[459]_*/*`";; \




RE: [PATCH] RE: gcc parallel make check

2014-09-10 Thread VandeVondele Joost
> You mean enhancing the script to split across arbitrarily long prefixes?
> That would be great.

I've now a script that does something like that:

~/test$ find /data/vjoost/gnu/gcc_trunk/gcc/gcc/testsuite/gfortran.dg/ 
-maxdepth 1 -type f -printf "%f\n" | ./generate_patterns.py 500 foo
All  3947  files matched the pattern ^[0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+ 
without exception
Final  12  patterns and match count:
(^[j-z_#+-][p-z_#+-][0-9A-Za-i][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[j-z_#+-][0-9A-Za-o][0-9A-Za-m]([.][0-9A-Za-z_#+-]+)+)
  matching  469  files
(^[0-9A-Za-i][0-9A-Za-n][0-9A-Za-n][0-9A-Za-o][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^([.][0-9A-Za-z_#+-]+)+)
  matching  433  files
(^[j-z_#+-][0-9A-Za-o][n-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i][0-9A-Za-n][o-z_#+-]([.][0-9A-Za-z_#+-]+)+)
  matching  400  files
(^[j-z_#+-][p-z_#+-][j-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i]([.][0-9A-Za-z_#+-]+)+)
  matching  371  files
(^[0-9A-Za-i][o-z_#+-][s-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i][0-9A-Za-n][0-9A-Za-n]([.][0-9A-Za-z_#+-]+)+)
  matching  323  files
(^[0-9A-Za-i][o-z_#+-][0-9A-Za-r][o-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[j-z_#+-][p-z_#+-]([.][0-9A-Za-z_#+-]+)+)
  matching  314  files
(^[0-9A-Za-i][o-z_#+-][0-9A-Za-r][0-9A-Za-n][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[j-z_#+-][0-9A-Za-o]([.][0-9A-Za-z_#+-]+)+)
  matching  314  files
(^[j-z_#+-][0-9A-Za-o][0-9A-Za-m][0-9A-Za-i][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[j-z_#+-]([.][0-9A-Za-z_#+-]+)+)
  matching  272  files
(^[0-9A-Za-i][0-9A-Za-n][0-9A-Za-n][p-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i][o-z_#+-]([.][0-9A-Za-z_#+-]+)+)
  matching  270  files
(^[0-9A-Za-i][0-9A-Za-n][o-z_#+-][0-9A-Za-l][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i][0-9A-Za-n]([.][0-9A-Za-z_#+-]+)+)
  matching  265  files
(^[0-9A-Za-i][0-9A-Za-n][o-z_#+-][m-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i][o-z_#+-][0-9A-Za-r]([.][0-9A-Za-z_#+-]+)+)
  matching  260  files
^[j-z_#+-][0-9A-Za-o][0-9A-Za-m][j-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+ 
 matching  256  files

It is a set of patterns that will match any file of the form 
'^[0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+', but such that it splits a list of 
input files roughly in equal chunks (e.g. between 500 and 500/2 in this 
example), even if files have long overlapping prefixes. However, I'm unsure 
if/how this can be integrated, i.e. what precisely is allowed for testsuite 
filenames, and if this regexp format can be employed in gcc makefiles / tcl / 
expect harness, suggestions/help appreciated.





RE: [PATCH] RE: gcc parallel make check

2014-09-10 Thread VandeVondele Joost
Jakub,

> First of all, the -j2 testing shows more tests tested in gcc and libstdc++:
>
>-# of expected passes   10133
>+# of expected passes   10152
>
>+PASS: 23_containers/set/modifiers/erase/abi_tag.cc (test for excess errors)
>[...]
>
>Not sure where the bug is, could be e.g. in i386.exp for gcc, but for
>libstdc++ less likely to be there rather than in the split.

I looked into this, and believe this problem is already in current trunk, and 
not due to my patch. I.e. unmodified trunk also has these tests executed 
several times:

libstdc++-v3/testsuite/normal4/libstdc++.log.sep:PASS: 
23_containers/map/modifiers/erase/abi_tag.cc
libstdc++-v3/testsuite/normal1/libstdc++.log.sep:PASS: 
23_containers/map/modifiers/erase/abi_tag.cc

 I believe the current trunk pattern could indeed match those twice 
(Makefile.in in trunk):
  normal1) \
dirs="`cd $$srcdir; echo [ab]* de* [ep]*/*`";; \
  normal4) \
dirs="`cd $$srcdir; echo 23_*/[a-km-tw-z]*`";; \

could it be that the pattern in normal1 should have been '[ab]*/ de*/ [ep]*/*' ?


Joost




RE: [PATCH] RE: gcc parallel make check

2014-09-11 Thread VandeVondele Joost
> could it be that the pattern in normal1 should have been '[ab]*/ de*/ 
> [ep]*/*' ?

I've checked that this fixes the bug in the current trunk split. I.e. files are 
stil tested, but now only once. Consider this change added to the previously 
submitted patch.



RE: [PATCH] RE: gcc parallel make check

2014-09-11 Thread VandeVondele Joost

>> could it be that the pattern in normal1 should have been '[ab]*/ de*/ 
>> [ep]*/*' ?
>
>Yes, we are running these tests multiple times:
>
>PASS: 23_containers/map/modifiers/erase/abi_tag.cc (test for excess errors)
>PASS: 23_containers/multimap/modifiers/erase/abi_tag.cc (test for excess 
>errors)
>PASS: 23_containers/multiset/modifiers/erase/abi_tag.cc (test for excess 
>errors)
>PASS: 23_containers/set/modifiers/erase/abi_tag.cc (test for excess errors)
>PASS: 26_numerics/complex/abi_tag.cc (test for excess errors)
>
>I'll fix that.

Actually, the proper pattern should presumably be '[ab]*/* de*/* [ep]*/*' even 
though it seems to make no difference in testing. I'll have this included in 
yet another version of the parallel make check patch (plus some further 
reschuffling as requested by Jakub), so I think there is no need for you to fix 
this now.


RE: [PATCH] gcc parallel make check

2014-09-11 Thread VandeVondele Joost
> Here is a patch I'm testing now:

Hi Jakub,

I also tested your patch to compare timings vs a newer patch (v8) I'll send soon

== patch v8 == make -j32 -k ==
check-fortran   4m58.178s
check-c++ ~10m
check-c   ~10m
check  15m29.873s

== patch Jakub
check-c++ ~20m
check-fortran   3m31.237s 
check-c 8m8

on the positive side, your patch provides a further speedup e.g. fortran and c 
testing (where it splits things nicely). The libstdc++ bottleneck is not 
solved, but I guess that is expected.

As you have presumably found as well, your patch introduces a number failures, 
because some tests seem to have additional dependencies, either explicit or 
implicit:

e.g. in gfortran.dg/binding_label_tests_10_main.f03
! { dg-do compile }
! This file must be compiled AFTER binding_label_tests_10.f03, which it 
! should be because dejagnu will sort the files.
module binding_label_tests_10_main

in gfortran.dg/class_45b.f03 
! { dg-do link }
! { dg-additional-sources class_45a.f03 }

This could clearly trigger as well in the current scheme of splitting, only we 
have been lucky that dependencies seem to be 'well behaved' in having the same 
initial letter in the filename.

Joost

RE: [PATCH] gcc parallel make check

2014-09-11 Thread VandeVondele Joost
> And these Fortran inter-test dependencies, which Tobias told me is
> PR56408.
> For PR56408 we need some fix.

BTW, is there anything special about Fortran ? There are at least 180 test 
files that contain 'dg-additional-sources' some in a very non-local way:

./objc.dg/foreach-2.m: /* { dg-additional-sources 
"../objc-obj-c++-shared/nsconstantstring-class-impl.m" } */

Joost

RE: [PATCH] gcc parallel make check

2014-09-11 Thread VandeVondele Joost

>>> >For PR56408 we need some fix.
>> BTW, is there anything special about Fortran ? There are at least 180 test 
>> files that contain 'dg-additional-sources' >some in a very non-local way:
>The current scheme comes at its limits in that case. . See the files listed in 
>the PR for issues.

So, what about a pragmatic solution, and move the tests that rely on being 
serialized to a subdirectory serialized/ where, like now, we rely on the 
implicit ordering we have now ? At least it makes this assumption somewhat 
explicit.

Joost



RE: [PATCH] gcc parallel make check

2014-09-12 Thread VandeVondele Joost
> a newer patch (v8) I'll send soon

attached with updated changelog. Compared to the previously posted v6, only the 
libstdc++-v3/testsuite/Makefile.am has been refined to split a little more the 
e*/* pattern, and two quickly running goal have been merged, in addition to 
fixing the pre-exisiting error in some of the patterns in that file.

Checked comparing testsuite results before after. 

Obviously, if Jakub's patch can be made to work around the testsuite special 
cases, I believe it should be superior. If not, the attached patch is working 
as far as I can tell, and provides a significant improvement over current trunk.

Joostcontrib/ChangeLog

2014-09-12  Joost VandeVondele  

* generate_tcl_patterns.sh: New file.

gcc/fortran/ChangeLog

2014-09-12  Joost VandeVondele  

* Make-lang.in (check_gfortran_parallelize): Improved parallelism.

gcc/Changelog

2014-09-12  Joost VandeVondele  

* Makefile.in (check_gcc_parallelize): Improved parallelism.
(check_p_numbers): Increase maximum value.
(dg_target_exps): Mention targets as separate words only.
(null,space,comma,dg_target_exps_p1,dg_target_exps_p2,
dg_target_exps_p3,dg_target_exps_p4): New variables.

gcc/cp/ChangeLog

2014-09-12  Joost VandeVondele  

* Make-lang.in (check_g++_parallelize): Improved parallelism.

libstdc++-v3/ChangeLog

2014-09-12  Joost VandeVondele  

* testsuite/Makefile.am (check_DEJAGNU_normal_targets): Add
check-DEJAGNUnormal[11-15].
(check-DEJAGNU): Split into 15 jobs for parallel testing, correct 
pattern.
* testsuite/Makefile.in: Regenerated.
Index: libstdc++-v3/testsuite/Makefile.in
===
--- libstdc++-v3/testsuite/Makefile.in	(revision 215147)
+++ libstdc++-v3/testsuite/Makefile.in	(working copy)
@@ -301,7 +301,7 @@ lists_of_files = \
 
 extract_symvers = $(glibcxx_builddir)/scripts/extract_symvers
 baseline_subdir := $(shell $(CXX) $(baseline_subdir_switch))
-check_DEJAGNU_normal_targets = $(patsubst %,check-DEJAGNUnormal%,0 1 2 3 4 5 6 7 8 9 10)
+check_DEJAGNU_normal_targets = $(patsubst %,check-DEJAGNUnormal%,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
 
 # Runs the testsuite, but in compile only mode.
 # Can be used to test sources with non-GNU FE's at various warning
@@ -562,7 +562,7 @@ check-DEJAGNU $(check_DEJAGNU_normal_tar
 	if [ -z "$*$(filter-out --target_board=%, $(RUNTESTFLAGS))" ] \
 	&& [ "$(filter -j, $(MFLAGS))" = "-j" ]; then \
 	  $(MAKE) $(AM_MAKEFLAGS) $(check_DEJAGNU_normal_targets); \
-	  for idx in 0 1 2 3 4 5 6 7 8 9 10; do \
+	  for idx in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do \
 	mv -f normal$$idx/libstdc++.sum normal$$idx/libstdc++.sum.sep; \
 	mv -f normal$$idx/libstdc++.log normal$$idx/libstdc++.log.sep; \
 	  done; \
@@ -589,25 +589,35 @@ check-DEJAGNU $(check_DEJAGNU_normal_tar
 	fi; \
 	dirs="`cd $$srcdir; echo [013-9][0-9]_*/*`";; \
 	  normal1) \
-	dirs="`cd $$srcdir; echo [ab]* de* [ep]*/*`";; \
+	dirs="`cd $$srcdir; echo experimental/* ext/[a-m]*`";; \
 	  normal2) \
-	dirs="`cd $$srcdir; echo 2[01]_*/*`";; \
+	dirs="`cd $$srcdir; echo 28_*/a*`";; \
 	  normal3) \
-	dirs="`cd $$srcdir; echo 22_*/*`";; \
+	dirs="`cd $$srcdir; echo 23_*/[lu]*`";; \
 	  normal4) \
-	dirs="`cd $$srcdir; echo 23_*/[a-km-tw-z]*`";; \
+	dirs="`cd $$srcdir; echo 2[459]_*/*`";; \
 	  normal5) \
-	dirs="`cd $$srcdir; echo 23_*/[luv]*`";; \
+	dirs="`cd $$srcdir; echo 2[01]_*/*`";; \
 	  normal6) \
-	dirs="`cd $$srcdir; echo 2[459]_*/*`";; \
+	dirs="`cd $$srcdir; echo 23_*/[m-tw-z]*`";; \
 	  normal7) \
-	dirs="`cd $$srcdir; echo 26_*/* 28_*/[c-z]*`";; \
+	dirs="`cd $$srcdir; echo 26_*/*`";; \
 	  normal8) \
 	dirs="`cd $$srcdir; echo 27_*/*`";; \
 	  normal9) \
-	dirs="`cd $$srcdir; echo 28_*/[ab]*`";; \
+	dirs="`cd $$srcdir; echo 22_*/*`";; \
 	  normal10) \
 	dirs="`cd $$srcdir; echo t*/*`";; \
+	  normal11) \
+	dirs="`cd $$srcdir; echo 28_*/b*`";; \
+	  normal12) \
+	dirs="`cd $$srcdir; echo 28_*/[c-z]*`";; \
+	  normal13) \
+	dirs="`cd $$srcdir; echo ext/[n-z]*`";; \
+	  normal14) \
+	dirs="`cd $$srcdir; echo de*/* p*/* [ab]*/* 23_*/v*`";; \
+	  normal15) \
+	dirs="`cd $$srcdir; echo 23_*/[a-k]*`";; \
 	esac; \
 	if [ -n "$*" ]; then cd "$*"; fi; \
 	if $(SHELL) -c "$$runtest --version" > /dev/null 2>&1; then \
Index: libstdc++-v3/testsuite/Makefile.am
===
--- libstdc++-v3/testsuite/Makefile.am	(revision 215147)
+++ libstdc++-v3/testsuite/Makefile.am	(working copy)
@@ -101,7 +101,7 @@ new-abi-baseline:
 	@test ! -f $*/site.exp || mv $*/site.exp $*/site.bak
 	@mv $*/site.exp.tmp $*/site.exp
 
-check_DEJAGNU_normal_targets = $(patsubst %,check-DEJAGNUnormal%,0 1 2 3 4 5 6 7 8 9 10)
+check_DEJAGNU_normal_targets = $(patsubst %,check-DEJAGNUnormal%,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1

RE: [PATCH] gcc parallel make check

2014-09-12 Thread VandeVondele Joost


>> Regtested on x86_64-linux, ok for trunk?
>
>Oh, forgot to say, PR56408 isn't fixed by this patch, but given the
>higher granularity (10 tests instead of 1) we don't happen to trigger it
>right now.

which means that any commit to that dir could trigger it, right ?

RE: [PATCH] gcc parallel make check

2014-09-12 Thread VandeVondele Joost
> So, I’d love to see the numbers for 5 and 20 to double check that 10 is the 
> right number to pick.  This sort of refinement is trivial post checkin.

So, some timings with the patch, I think this is great. 

Doing the testing you suggest, changing the variable doesn't influence things 
much (at least for Fortran, and  on this system).

make -j32 -k
check-fortran
real3m27.875s -> gcc_runtest_parallelize_counter_minor == 02 (several 
testsuite errors: binding_label_tests_10_main.f03, 
binding_label_tests_11_main.f03, class_45b.f03, class_4b.f03, class_4c.f03, 
coarray_29_2.f90, test_common_binding_labels_3_main.f03)
real3m26.234s -> gcc_runtest_parallelize_counter_minor == 05 (one 
additional testsuite error: whole_file_31.f90)
real3m36.405s -> gcc_runtest_parallelize_counter_minor == 10
real3m38.736s -> gcc_runtest_parallelize_counter_minor == 20
check-c
real8m26.935s
check-c++
real7m4.165s
check
real   17m45.185s




RE: [PATCH] gcc parallel make check

2014-09-16 Thread VandeVondele Joost
>> > These numbers are useful to try and ensure the overhead (scaling factor) 
>> > is reasonable, thanks.
>>
>> A nice improvement indeed.  The patched result is 15 times faster
>> than the serial unpatched run.  So there is room for improvement
>
> Note, the box used was oldish AMD 16-core, no ht, box, haven't tried it on 
> anything

on a 32 core box, no ht, I see these timings:

time make -j32 -k check >& log.check32 ; time make -j8 -k check >& log.check8

real18m14.562s
user260m21.578s
sys 264m26.042s

real41m33.210s
user233m4.563s
sys 72m11.429s

so it is not quite reaching the ideal 4x speedup. Counting the number of 
'expect' processes they are nicely at around 32 and 8 for the full test, with 
only a very short tail near the end. So, there might be some overhead 
somewhere. Total user time is similar, but time in sys goes up.

msan and gcc ?

2014-10-01 Thread VandeVondele Joost
Hi,

I've noticed that gcc includes a msan_interface.h file, and I'm wondering if 
this implies that memory sanitizer is already part of gcc. If not, are there 
plans to port this useful looking tool to gcc during the current stage 1 ?

Cheers,

Joost


RE: msan and gcc ?

2014-10-01 Thread VandeVondele Joost
> it was certainly worth it. 

since I see msan as a kind of valgrind replacement (similar functionality, but 
~10x the speed, partially at the cost of more difficult deployment), I did a 
quick search in gcc bugzilla. 982 PRs mention valgrind, so such functionality 
is clearly heavily used.

lto and gold

2009-08-15 Thread VandeVondele Joost



I'd like to test lto on a project where objects first go through an 
archive, and so wanted to follow 
http://gcc.gnu.org/wiki/LinkTimeOptimization

using 'gcc -use-linker-plugin'
However, I can't get this to work.

 gfortran  -use-linker-plugin -flto main.f90 test.f90
/data03/vondele/binutils-2.19.1/build/bin/ld: -plugin: unknown option
/data03/vondele/binutils-2.19.1/build/bin/ld: use the --help option for 
usage information

collect2: ld returned 1 exit status

/data03/vondele/binutils-2.19.1/build/bin/ld -v
GNU gold (GNU Binutils 2.19.1) 1.7

I guess this is some configure flag missing, does anybody have a clue?

gcc configured as:

/data03/vondele/gcc_lto/gcc/configure 
--prefix=/data03/vondele/gcc_lto/build 
--with-libelf=/data03/vondele/libelf-0.8.10/build/ --enable-gold 
--enable-languages=c,c++,fortran --disable-multilib -disable-bootstrap


binutils as:
./configure --prefix=/data03/vondele/binutils-2.19.1/build --enable-gold

This is what collect2 sees:
/data03/vondele/gcc_lto/build/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/collect2 
-plugin 
/data03/vondele/gcc_lto/build/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/liblto_plugin.so 
-plugin-opt=/data03/vondele/gcc_lto/build/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper 
-plugin-opt=gfortran -plugin-opt=-flto -flto --eh-frame-hdr -m elf_x86_64 
-dynamic-linker /lib64/ld-linux-x86-64.so.2 -use-linker-plugin 
/usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o 
/data03/vondele/gcc_lto/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/crtbegin.o 
-L/data03/vondele/gcc_lto/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0 
-L/data03/vondele/gcc_lto/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../../../lib64 
-L/lib/../lib64 -L/usr/lib/../lib64 
-L/data03/vondele/gcc_lto/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../.. 
/tmp/ccUQ7wr3.o /tmp/cczQrSMz.o -lgfortran -lm -lgcc_s -lgcc -lc -lgcc_s 
-lgcc 
/data03/vondele/gcc_lto/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/crtend.o 
/usr/lib/../lib64/crtn.o


Thanks,

Joost



Re: lto and gold

2009-08-16 Thread VandeVondele Joost




I guess this is some configure flag missing, does anybody have a clue?


Yes, you must build with --enable-gold --enable-plugins :-)



Is that for gcc or for binutils (neither documents this in ./configure 
--help) ?


I used it for both, but only get this to work with binutils CVS, is that 
correct ?


Now, however, I get the following error:

gfortran -flto -use-linker-plugin main.f90 test1.f90 test2.f90
collect2: ld terminated with signal 6 [Aborted]
ld: /data03/vondele/gcc_lto/gcc/lto-plugin/lto-plugin.c:142: 
parse_table_entry: Assertion `t <= 4' failed.


with

==> main.f90 <==
CALL S1
CALL S2
END

==> test1.f90 <==
SUBROUTINE S1
END SUBROUTINE

==> test2.f90 <==
SUBROUTINE S2
END SUBROUTINE S2

and similar for C based sources.

Thanks,

Joost


Re: Reducing fortran testcase with delta.

2009-10-30 Thread VandeVondele Joost

Hi Li,

I've attached 'Fortran-aware' delta. I tries to guess cut a Fortran file 
in more reasonable places (e.g. between subroutine boundaries, after 
enddos). It works reasonably well, but is a hack.


Especially with Fortran90 and modules, iterated delta runs can help a lot 
(i.e. first runs removes 'public/use' module statements, next round cleans 
more efficiently). It also features 'randomized' bisection. That helps to 
reduce towards a minimized testcase when iterating delta runs.


I usually call it with the following script:


cat do_many

for i in `seq 1 30`
do
  ~/delta-2006.08.03/delta -suffix=.f90 -test=delta.script 
-cp_minimal=small.f90  bug.f90
  cp small.f90 small.f90.$i
  cp small.f90 bug.f90
done

Cheers,

Joost
#!/usr/bin/perl -w
# delta; see License.txt for copyright and terms of use

use strict;

# 
# Implementation of the delta debugging algorithm:
# http://www.st.cs.uni-sb.de/dd/
# Daniel S. Wilkerson d...@cs.berkeley.edu

# Notes:

# The test script should not depend on the current directory to work.

# Note that 1-minimality does not imply idempotency, so we could
# re-run once it is stuck, perhaps with some randomization.


# Global State 

my @chunks = ();# Once input, is read only.
my @markers = ();   # Delimits a dynamic subsequence of @chunks 
being considered.
my %test_cache = ();# Cached test results.

# Mark boundaries that uniquely determine the marked contents.  This
# is used as a shorter key to hash on than the contents themselves.
# Since Perl hashes retain their keys if you don't do this you get a
# horrible memory leak in the test_cache.
my $mark_signature;

# End of the last marker rendered to the tmp file.  Used to figure out
# if the next one abuts it or not.
my $last_mark_stop;
my @current_markers;# Markers to be rendered to $tmpinput if answer 
not in cache.

my $tmpinput;   # Temporary file to render marked subsequence 
to.
my $last_successful_tmpinput;   # Last one to past the test.

my $tmp_index = 0;  # Cache the last index used to make a tmp file.
my $tmpdir_index = 0;   # Cache the last index used to make a tmp 
directory.
my $tmpdir; # Temporary directory for external programs.
my $logfile = "log";# File in $tmpdir where log of successful runs 
is written.
chomp (my $this_dir = `pwd`);   # The current directory.
my $starttime = time;   # The time we started.

my $granularity = "line";   # What is the size of an input chunk?
my $dump_input = 0; # Dump out the input after reading it in.
my $cp_minimal; # Copy the minimal successful test to the 
current dir.
my $verbose = 0;# Be more verbose.
my $quiet = 0;  # Prints go to /dev/null.
my $suffix = ".c";  # For now, our input files are .c files.
my $test;   # The script to run as the test.

# when true, all operations on input file are in-place:
#   - don't make a new directory
#   - overwrite the original input file with our constructed inputs
my $in_place = 0;
my $start_file; # name of input/output file for in_place

my $help_message = <<"END"

Delta version 2003.7.14
delta implements the delta-debugging algorithm:
  http://www.st.cs.uni-sb.de/dd/
Implemented by Daniel Wilkerson.

usage: $0 [options] start-file

-test=   Specify the test script.
-suffix= Candidate filename suffix [$suffix]
-dump_input  Dump input after reading
-cp_minimal=   Copy the minimal successful test to the
 current directory
-granularity=lineUse lines as the granularity (default)
-granularity=top_formUse C top-level forms as the granularity
 (currently only works with CIL output)
-log=  Log file for main events
-quiet   Say nothing
-verbose Get more verbose output
-in_placeOverwrite start-file with inputs

-helpGet help

The test program accepts a single argument, the name of the candidate
file to test.  It is run within a directory containing only that file,
and it can make temporary files/directories in that directory.  It
should return zero for a candidate that exhibits the desired property,
and nonzero for one that does not.

Example test program (delta will retain a line containing "foo"):
  #!/bin/sh
  grep 'foo' <"\$1" >/dev/null

END
;

# Functions 

sub output(@) {
print @_ unless $quiet;
}

# Return true if the current_markers pass the interesting test.
sub test {
if (-f "DELTA-STOP") {
output "Stopping because DELTA-STOP file exists\n";
exit 1;
}

my $cached_result = $test_cache{$mark_signature};
if (defined $cached_result) {
output

trunk bootstrap failure?

2008-12-17 Thread VandeVondele Joost

Current trunk fails for me with

/data04/vondele/gcc_trunk/obj/./gcc/xgcc 
-B/data04/vondele/gcc_trunk/obj/./gcc/ 
-B/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/bin/ 
-B/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/lib/ -isystem 
/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/include -isystem 
/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/sys-include -g 
-O2 -O2  -g -O2 -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes 
-Wmissing-prototypes -Wcast-qual -Wold-style-definition  -isystem 
./include  -fPIC -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 
-D__GCC_FLOAT_NOT_NEEDED   -I. -I. -I../.././gcc 
-I/data04/vondele/gcc_trunk/gcc/libgcc 
-I/data04/vondele/gcc_trunk/gcc/libgcc/. 
-I/data04/vondele/gcc_trunk/gcc/libgcc/../gcc 
-I/data04/vondele/gcc_trunk/gcc/libgcc/../include 
-I/data04/vondele/gcc_trunk/gcc/libgcc/config/libbid 
-DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS -DUSE_TLS -o _trampoline.o -MT 
_trampoline.o -MD -MP -MF _trampoline.dep -DL_trampoline -c 
/data04/vondele/gcc_trunk/gcc/libgcc/../gcc/libgcc2.c \

  -fvisibility=hidden -DHIDE_EXPORTS
In file included from /usr/include/features.h:354,
 from /usr/include/stdio.h:28,
 from 
/data04/vondele/gcc_trunk/gcc/libgcc/../gcc/tsystem.h:90,
 from 
/data04/vondele/gcc_trunk/gcc/libgcc/../gcc/libgcc2.c:33:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or 
directory

In file included from /usr/include/features.h:354,
 from /usr/include/stdio.h:28,
 from 
/data04/vondele/gcc_trunk/gcc/libgcc/../gcc/tsystem.h:90,
 from 
/data04/vondele/gcc_trunk/gcc/libgcc/../gcc/libgcc2.c:33:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or 
directory

In file included from /usr/include/features.h:354,
 from /usr/include/stdio.h:28,
 from 
/data04/vondele/gcc_trunk/gcc/libgcc/../gcc/tsystem.h:90,
 from 
/data04/vondele/gcc_trunk/gcc/libgcc/../gcc/libgcc2.c:33:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or 
directory

In file included from /usr/include/features.h:354,
 from /usr/include/stdio.h:28,
 from 
/data04/vondele/gcc_trunk/gcc/libgcc/../gcc/tsystem.h:90,
 from 
/data04/vondele/gcc_trunk/gcc/libgcc/../gcc/libgcc2.c:33:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or 
directory

make[5]: *** [_muldi3.o] Error 1
make[5]: *** Waiting for unfinished jobs
/data04/vondele/gcc_trunk/obj/./gcc/xgcc 
-B/data04/vondele/gcc_trunk/obj/./gcc/ 
-B/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/bin/ 
-B/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/lib/ -isystem 
/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/include -isystem 
/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/sys-include -g 
-O2 -O2  -g -O2 -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes 
-Wmissing-prototypes -Wcast-qual -Wold-style-definition  -isystem 
./include  -fPIC -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 
-D__GCC_FLOAT_NOT_NEEDED   -I. -I. -I../.././gcc 
-I/data04/vondele/gcc_trunk/gcc/libgcc 
-I/data04/vondele/gcc_trunk/gcc/libgcc/. 
-I/data04/vondele/gcc_trunk/gcc/libgcc/../gcc 
-I/data04/vondele/gcc_trunk/gcc/libgcc/../include 
-I/data04/vondele/gcc_trunk/gcc/libgcc/config/libbid 
-DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS -DUSE_TLS -o __main.o -MT 
__main.o -MD -MP -MF __main.dep -DL__main -c 
/data04/vondele/gcc_trunk/gcc/libgcc/../gcc/libgcc2.c \

  -fvisibility=hidden -DHIDE_EXPORTS
make[5]: *** [_negdi2.o] Error 1
/data04/vondele/gcc_trunk/obj/./gcc/xgcc 
-B/data04/vondele/gcc_trunk/obj/./gcc/ 
-B/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/bin/ 
-B/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/lib/ -isystem 
/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/include -isystem 
/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/sys-include -g 
-O2 -O2  -g -O2 -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes 
-Wmissing-prototypes -Wcast-qual -Wold-style-definition  -isystem 
./include  -fPIC -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 
-D__GCC_FLOAT_NOT_NEEDED   -I. -I. -I../.././gcc 
-I/data04/vondele/gcc_trunk/gcc/libgcc 
-I/data04/vondele/gcc_trunk/gcc/libgcc/. 
-I/data04/vondele/gcc_trunk/gcc/libgcc/../gcc 
-I/data04/vondele/gcc_trunk/gcc/libgcc/../include 
-I/data04/vondele/gcc_trunk/gcc/libgcc/config/libbid 
-DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS -DUSE_TLS -o _absvsi2.o -MT 
_absvsi2.o -MD -MP -MF _absvsi2.dep -DL_absvsi2 -c 
/data04/vondele/gcc_trunk/gcc/libgcc/../gcc/libgcc2.c \

  -fvisibility=hidden -DHIDE_EXPORTS
make[5]: *** [_lshrdi3.o] Error 1
/data04/vondele/gcc_trunk/obj/./gcc/xgcc 
-B/data04/vondele/gcc_trunk/obj/./gcc/ 
-B/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/bin/ 
-B/data04/vondele/gcc_trunk/build/x86_64-unknown-linux-gnu/lib/ -isystem 
/data04/von

Re: trunk bootstrap failure?

2008-12-17 Thread VandeVondele Joost

thats is on a standard linux (x86_64) box running opensuse 11.0, and a
clean checkout. Is this a known problem?


You haven't installed the 32-bit glibc devel package.



Many thanks, that fixed it.

Would be great if such a thing could be detected at configure time (i.e. 
like missing mpfr.h headers are already detected), with some kind of a 
gentle error message.


Re: trunk bootstrap failure?

2008-12-17 Thread VandeVondele Joost

Would be great if such a thing could be detected at configure time (i.e.
like missing mpfr.h headers are already detected), with some kind of a
gentle error message.


It wouldn't be detected until the target libs are built, since that's the
first time any 32-bit headers are needed.

Patches welcome.



Is this useful ?

Index: install.texi
===
--- install.texi(revision 142790)
+++ install.texi(working copy)
@@ -4070,6 +4070,7 @@
 (amd64-*-* is an alias for x86_64-*-*) on GNU/Linux, FreeBSD and net...@.
 On GNU/Linux the default is a bi-arch compiler which is able to generate
 both 64-bit x86-64 and 32-bit x86 code (via the @option{-m32} switch).
+This requires that both 32 and 64 bit header files are installed on the 
system.


 @html
 


also, this likely fixes a typo

Index: cvs.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/cvs.html,v
retrieving revision 1.213
diff -c -p -r1.213 cvs.html
*** cvs.html30 Dec 2007 09:01:19 -  1.213
--- cvs.html17 Dec 2008 12:04:09 -
*** and SSH installed, you can check out the
*** 36,42 
   Set CVS_RSH in your environment to ssh.
   Set CVSROOT in your environment to
   :pserver:c...@gcc.gnu.org:/cvs/gcc. 
!  Alternately add
   -d :pserver:c...@gcc.gnu.org:/cvs/gcc
   immediately after cvs in the commands below.
   The command cvs -qz9 checkout -P wwwdocs,
--- 36,42 
   Set CVS_RSH in your environment to ssh.
   Set CVSROOT in your environment to
   :pserver:c...@gcc.gnu.org:/cvs/gcc. 
!  Alternatively add
   -d :pserver:c...@gcc.gnu.org:/cvs/gcc
   immediately after cvs in the commands below.
   The command cvs -qz9 checkout -P wwwdocs,



Re: gfortran / gdb question

2009-02-02 Thread VandeVondele Joost
actually, I just find out that this seems a 4.4 issue, compiled with 4.3 the 
gdb session just goes fine... I also seem to be able to debug small examples 
with either 4.3 or 4.4, just CP2K seems to cause troubles (as usual ;-)


I've filed PR39073 for this, somehow hope this can be solved before 
release (ugh.. show it is not Fortran (I've made it debug) and declare it 
P1?)


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39073





optimization question

2009-05-16 Thread VandeVondele Joost
the attached code (see contract__sparse) is a kernel which I hope gets 
optimized well. Unfortunately, compiling (on opteron or core2) it as


gfortran -O3 -march=native -ffast-math -funroll-loops 
-ffree-line-length-200  test.f90



./a.out

 Sparse: time[s]   0.66804099
 New: time[s]   0.20801300
 speedup3.2115347
  Glfops3.1151900
 Error:   1.11022302462515654E-016

shows that the hand-optimized version (see contract__test) is about 3x 
faster. I played around with options, but couldn't get gcc to generate 
fast code for the original source. I think that this would involve 
unrolling a loop and scalarizing the scratch arrays buffer1 and buffer2 
(as done in the hand-optimized version). So, is there any combination of 
options to get that effect?


Second question, even the code generated for the hand-optimized version is 
not quite ideal. The asm of the inner loop appears (like the source) to 
contain about 4*81 multiplies. However, a 'smarter' way to do the 
calculation would be to compute the constants used for multiplying work(i) 
by retaining common subexpressions (i.e. all values of sa_i * sb_j * sc_k 
* sd_l * work[n] can be computed in 9+9+81+81 multiplies instead of the 
current scheme, which has 4*81). That could bring another factor of 2 
speedup. Is there a chance to have gcc see this, or does this need to be 
done on the source level ?


If considered useful, I can add a PR to bugzilla with the testcase.

Joost

MODULE TEST
  IMPLICIT NONE
  INTEGER :: l
  INTEGER, PARAMETER :: dp=8
  INTEGER, PARAMETER :: nco(0:3)=(/((l+1)*(l+2)/2,l=0,3)/)
  INTEGER, PARAMETER :: nso(0:3)=(/(2*l+1,l=0,3)/)
CONTAINS
  SUBROUTINE contract__sparse(work, &
nl_a, nl_b, nl_c, nl_d,&
sphi_a, sphi_b, sphi_c, sphi_d,&
primitives,&
s_offset_a, s_offset_b, s_offset_c, s_offset_d)
REAL(dp), DIMENSION(3*3*3*3), INTENT(IN) :: work
INTEGER :: nl_a, nl_b, nl_c, nl_d
REAL(dp), DIMENSION(3,3*nl_a), INTENT(IN) :: sphi_a
REAL(dp), DIMENSION(3,3*nl_b), INTENT(IN) :: sphi_b
REAL(dp), DIMENSION(3,3*nl_c), INTENT(IN) :: sphi_c
REAL(dp), DIMENSION(3,3*nl_d), INTENT(IN) :: sphi_d
REAL(dp), DIMENSION(3*nl_a, 3*nl_b,3*nl_c,3*nl_d) :: primitives
INTEGER, INTENT(IN) :: s_offset_a, s_offset_b, s_offset_c, s_offset_d
REAL(dp), DIMENSION(3* 3*3*3) :: buffer1, buffer2
INTEGER :: imax,jmax,kmax, ia, ib, ic, id, s_offset_a1, s_offset_b1, 
s_offset_c1, s_offset_d1,&
  i1 ,i2, i3, i, j, k

s_offset_a1 = 0
DO ia = 1,nl_a
  s_offset_b1 = 0
  DO ib = 1,nl_b
s_offset_c1 = 0
DO ic = 1,nl_c
  s_offset_d1 = 0
  DO id = 1,nl_d
buffer1 = 0.0_dp
imax=3*3*3
jmax=3
kmax=3
DO i=1,imax
buffer1(i+imax*(3-1)) = buffer1(i+imax*(3-1)) + work(1+(i-1)*kmax) * 
sphi_a(1,3+s_offset_a1)
buffer1(i+imax*(1-1)) = buffer1(i+imax*(1-1)) + work(2+(i-1)*kmax) * 
sphi_a(2,1+s_offset_a1)
buffer1(i+imax*(2-1)) = buffer1(i+imax*(2-1)) + work(3+(i-1)*kmax) * 
sphi_a(3,2+s_offset_a1)
ENDDO
buffer2 = 0.0_dp
imax=3*3*3
jmax=3
kmax=3
DO i=1,imax
buffer2(i+imax*(3-1)) = buffer2(i+imax*(3-1)) + buffer1(1+(i-1)*kmax) * 
sphi_b(1,3+s_offset_b1)
buffer2(i+imax*(1-1)) = buffer2(i+imax*(1-1)) + buffer1(2+(i-1)*kmax) * 
sphi_b(2,1+s_offset_b1)
buffer2(i+imax*(2-1)) = buffer2(i+imax*(2-1)) + buffer1(3+(i-1)*kmax) * 
sphi_b(3,2+s_offset_b1)
ENDDO
buffer1 = 0.0_dp
imax=3*3*3
jmax=3
kmax=3
DO i=1,imax
buffer1(i+imax*(3-1)) = buffer1(i+imax*(3-1)) + buffer2(1+(i-1)*kmax) * 
sphi_c(1,3+s_offset_c1)
buffer1(i+imax*(1-1)) = buffer1(i+imax*(1-1)) + buffer2(2+(i-1)*kmax) * 
sphi_c(2,1+s_offset_c1)
buffer1(i+imax*(2-1)) = buffer1(i+imax*(2-1)) + buffer2(3+(i-1)*kmax) * 
sphi_c(3,2+s_offset_c1)
ENDDO
imax=3*3*3
jmax=3
kmax=3
i = 0
DO i1=1,3
DO i2=1,3
DO i3=1,3
  i = i + 1
primitives(s_offset_a1+i3, s_offset_b1+i2, s_offset_c1+i1, s_offset_d1+3) =&
primitives(s_offset_a1+i3, s_offset_b1+i2, s_offset_c1+i1, s_offset_d1+3) &
+ buffer1(1+(i-1)*kmax) * sphi_d(1,3+s_offset_d1)
primitives(s_offset_a1+i3, s_offset_b1+i2, s_offset_c1+i1, s_offset_d1+1) =&
primitives(s_offset_a1+i3, s_offset_b1+i2, s_offset_c1+i1, s_offset_d1+1) &
+ buffer1(2+(i-1)*kmax) * sphi_d(2,1+s_offset_d1)
primitives(s_offset_a1+i3, s_offset_b1+i2, s_offset_c1+i1, s_offset_d1+2) =&
primitives(s_offset_a1+i3, s_offset_b1+i2, s_offset_c1+i1, s_offset_d1+2) &
+ buffer1(3+(i-1)*kmax) * sphi_d(3,2+s_offset_d1)
ENDDO
ENDDO
ENDDO
s_offset_d1 = s_offset_d1 + 3
  END DO
  s_offset_c1 = s_offset_c1 + 3
END DO
s_offset_b1 = s_off

Re: optimization question

2009-05-16 Thread VandeVondele Joost



thanks for the info


I think it is useful to have a bugzilla here.


will do.



I tested 4.4, what did you test?



4.3 4.4 4.5

Joost


Re: optimization question

2009-05-16 Thread VandeVondele Joost



I think it is useful to have a bugzilla here.


will do.


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40168



Btw, complete unrolling is also hindred by the artificial limit of maximally
unrolling 16 iterations.  Your inner loops iterate 27 times.  Also by the
artificial limit of the maximal unrolled size.

With --param max-completely-peel-times=27 --param
max-completely-peeled-insns=666

(values for trunk) the loops are unrolled at -O3.


hmmm. but leading to slower code.



GCC performance with CP2K

2008-04-28 Thread VandeVondele Joost


I've just tested gcc/gfortran with CP2K, which some of you might know from 
PR29975 and other messages to the list, and observed some very pleasing 
evolution in the runtime of the code. In each case the set of compilation 
options is '-O2 -ffast-math -funroll-loops -ftree-vectorize -march=native' 
(-march=k8-sse3), the intel reference '-O2 -xW -heap-arrays 64'


version  subroutine time[s]

out.intel:  CP2K 504.52

out.gfortran.4.2.3: CP2K 601.35
out.gfortran.4.3.0: CP2K 569.42
out.gfortran.4.4.0: CP2K 508.12

I hope that this rate of improvement sets a standard up to gcc 4.95.3 ;-)

Thanks for your efforts...

Cheers,

Joost





bootstrap broken?

2008-08-07 Thread VandeVondele Joost


dwarf2out.c:13496: internal compiler error: in extract_insn, at 
recog.c:1988


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37045



CP2K gcc nightly benchmark / wwwdocs patch

2008-08-12 Thread VandeVondele Joost


A nightly tester has been set up to track the performance of the 
gcc/gfortran compiler (trunk) for typical CP2K runs. Results and code can 
be found at:


http://cp2k.berlios.de/gfortran/

I'll consider your suggestions for improvements.

The following patch could be applied to the wwwdocs

Index: index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/benchmarks/index.html,v
retrieving revision 1.26
diff -r1.26 index.html
83a84,90

Joost VandeVondele runs a CP2K benchmark with mainline GCC.
Results can be found at
http://cp2k.berlios.de/gfortran/";
>http://cp2k.berlios.de/gfortran/.





Cheers,

Joost






vectorizer question

2008-08-18 Thread VandeVondele Joost


The attached testcase yields (on a core2 duo, gcc trunk):


gfortran -O3 -ftree-vectorize -ffast-math -march=native test.f90
time ./a.out

real0m3.414s


ifort -xT -O3  test.f90
time ./a.out

real0m1.556s

The assembly contains:

ifort   gfortran
mulpd 140  0
mulsd   0280

so the reason seems that ifort vectorizes the following code (full 
testcase attached):


SUBROUTINE collocate_core_6(res,coef_xyz,pol_x,pol_y,pol_z,cmax,kg,jg)

 IMPLICIT NONE
 INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
 integer, PARAMETER :: lp=6
real(wp), INTENT(OUT):: res
integer, INTENT(IN) :: cmax,kg,jg
real(wp), INTENT(IN):: pol_x(0:lp,-cmax:cmax)
real(wp), INTENT(IN):: pol_y(1:2,0:lp,-cmax:0)
real(wp), INTENT(IN):: pol_z(1:2,0:lp,-cmax:0)
real(wp), INTENT(IN):: coef_xyz(((lp+1)*(lp+2)*(lp+3))/6)
real(wp) ::  coef_xy(2,(lp+1)*(lp+2)/2)
real(wp) ::  coef_x(4,0:lp)

[...]
coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,12)*pol_y(1,1,jg)
coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,12)*pol_y(2,1,jg)
coef_x(1:2,5)=coef_x(1:2,5)+coef_xy(1:2,13)*pol_y(1,1,jg)
coef_x(3:4,5)=coef_x(3:4,5)+coef_xy(1:2,13)*pol_y(2,1,jg)
coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,14)*pol_y(1,2,jg)
coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,14)*pol_y(2,2,jg)
coef_x(1:2,1)=coef_x(1:2,1)+coef_xy(1:2,15)*pol_y(1,2,jg)
coef_x(3:4,1)=coef_x(3:4,1)+coef_xy(1:2,15)*pol_y(2,2,jg)
coef_x(1:2,2)=coef_x(1:2,2)+coef_xy(1:2,16)*pol_y(1,2,jg)
coef_x(3:4,2)=coef_x(3:4,2)+coef_xy(1:2,16)*pol_y(2,2,jg)
coef_x(1:2,3)=coef_x(1:2,3)+coef_xy(1:2,17)*pol_y(1,2,jg)
coef_x(3:4,3)=coef_x(3:4,3)+coef_xy(1:2,17)*pol_y(2,2,jg)
coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,18)*pol_y(1,2,jg)
coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,18)*pol_y(2,2,jg)
coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,19)*pol_y(1,3,jg)
coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,19)*pol_y(2,3,jg)
[...]

either it is able to interpret the short vectors as such, or it realizes 
that these very short implicit loops are nevertheless favourable for 
vectorization.


Is there a trick to get gcc vectorize these loops, or is there some 
technology missing for this ?


Should I file a PR for this (this is somewhat similar to PR31079 and 
PR31021)?


Thanks in advance,

Joost
SUBROUTINE collocate_core_6(res,coef_xyz,pol_x,pol_y,pol_z,cmax,kg,jg)

 IMPLICIT NONE
 INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
 integer, PARAMETER :: lp=6
real(wp), INTENT(OUT):: res
integer, INTENT(IN) :: cmax,kg,jg
real(wp), INTENT(IN):: pol_x(0:lp,-cmax:cmax)
real(wp), INTENT(IN):: pol_y(1:2,0:lp,-cmax:0)
real(wp), INTENT(IN):: pol_z(1:2,0:lp,-cmax:0)
real(wp), INTENT(IN):: coef_xyz(((lp+1)*(lp+2)*(lp+3))/6)
real(wp) ::  coef_xy(2,(lp+1)*(lp+2)/2)
real(wp) ::  coef_x(4,0:lp)

coef_xy=0.0_wp
coef_xy(:,1)=coef_xy(:,1)+coef_xyz(1)*pol_z(:,0,kg)
coef_xy(:,2)=coef_xy(:,2)+coef_xyz(2)*pol_z(:,0,kg)
coef_xy(:,3)=coef_xy(:,3)+coef_xyz(3)*pol_z(:,0,kg)
coef_xy(:,4)=coef_xy(:,4)+coef_xyz(4)*pol_z(:,0,kg)
coef_xy(:,5)=coef_xy(:,5)+coef_xyz(5)*pol_z(:,0,kg)
coef_xy(:,6)=coef_xy(:,6)+coef_xyz(6)*pol_z(:,0,kg)
coef_xy(:,7)=coef_xy(:,7)+coef_xyz(7)*pol_z(:,0,kg)
coef_xy(:,8)=coef_xy(:,8)+coef_xyz(8)*pol_z(:,0,kg)
coef_xy(:,9)=coef_xy(:,9)+coef_xyz(9)*pol_z(:,0,kg)
coef_xy(:,10)=coef_xy(:,10)+coef_xyz(10)*pol_z(:,0,kg)
coef_xy(:,11)=coef_xy(:,11)+coef_xyz(11)*pol_z(:,0,kg)
coef_xy(:,12)=coef_xy(:,12)+coef_xyz(12)*pol_z(:,0,kg)
coef_xy(:,13)=coef_xy(:,13)+coef_xyz(13)*pol_z(:,0,kg)
coef_xy(:,14)=coef_xy(:,14)+coef_xyz(14)*pol_z(:,0,kg)
coef_xy(:,15)=coef_xy(:,15)+coef_xyz(15)*pol_z(:,0,kg)
coef_xy(:,16)=coef_xy(:,16)+coef_xyz(16)*pol_z(:,0,kg)
coef_xy(:,17)=coef_xy(:,17)+coef_xyz(17)*pol_z(:,0,kg)
coef_xy(:,18)=coef_xy(:,18)+coef_xyz(18)*pol_z(:,0,kg)
coef_xy(:,19)=coef_xy(:,19)+coef_xyz(19)*pol_z(:,0,kg)
coef_xy(:,20)=coef_xy(:,20)+coef_xyz(20)*pol_z(:,0,kg)
coef_xy(:,21)=coef_xy(:,21)+coef_xyz(21)*pol_z(:,0,kg)
coef_xy(:,22)=coef_xy(:,22)+coef_xyz(22)*pol_z(:,0,kg)
coef_xy(:,23)=coef_xy(:,23)+coef_xyz(23)*pol_z(:,0,kg)
coef_xy(:,24)=coef_xy(:,24)+coef_xyz(24)*pol_z(:,0,kg)
coef_xy(:,25)=coef_xy(:,25)+coef_xyz(25)*pol_z(:,0,kg)
coef_xy(:,26)=coef_xy(:,26)+coef_xyz(26)*pol_z(:,0,kg)
coef_xy(:,27)=coef_xy(:,27)+coef_xyz(27)*pol_z(:,0,kg)
coef_xy(:,28)=coef_xy(:,28)+coef_xyz(28)*pol_z(:,0,kg)
coef_xy(:,1)=coef_xy(:,1)+coef_xyz(29)*pol_z(:,1,kg)
coef_xy(:,2)=coef_xy(:,2)+coef_xyz(30)*pol_z(:,1,kg)
coef_xy(:,3)=coef_xy(:,3)+coef_xyz(31)*pol_z(:,1,kg)
coef_xy(:,4)=coef_xy(:,4)+coef_xyz(32)*pol_z(:,1,kg)
coef_xy(:,5)=coef_xy(:,5)+coef_xyz(33)*pol_z(:,1,kg)
coef_xy(:,6)=coef_xy(:,6)+coef_xyz(34)*pol_z(:,1,kg)
coef_xy(:,8)=coef_xy(:,8)+coef_xyz(35)*pol_z(:,1,kg)
coef_xy(:,9)=coef_xy(:,9)+coef_xyz(36)*pol_z(:,1,kg)

Re: vectorizer question

2008-08-18 Thread VandeVondele Joost


It would be nice to have a stand-alone testcase for this, so please
file a bugreport.



I've opened PR37150 for this.

Thanks,

Joost