Re: patch to fix PR65647
Hi, The issue is also present in 4.9 branch as explained in: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65647 As 4.9 doesn't contains lra rematerialization passe, we only need to stop updating lra_constraint_new_regno_start when inheritance is switched off. Bootstrapped and tested on x64_64 and cross built and tested on AArch64, arm, armeb and i686. Ok for 4.9 ? Cheers, Yvan 2015-04-05 Yvan Roux Backport from trunk r221867 2015-04-04 Vladimir Makarov PR target/65647 * lra.c (lra): Stop updating lra_constraint_new_regno_start after switching off inheritance. 2015-04-05 Yvan Roux Backport from trunk r221867 2015-04-04 Vladimir Makarov PR target/65647 * gcc.target/arm/pr65647.c: New. * gcc.target/arm/pr65647-2.c: New. On 4 April 2015 at 16:39, Vladimir Makarov wrote: > The following patch fixes > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65647 > > This very complicated problem occurred in rare cases when chain of reloads > involving at least two pseudos and insns were generated and one pseudo was > spilled on later sub-passes after spilling sub-pass and this pattern > happened to be repeated. > > The patch was bootstrapped and tested on x86/x86-64, ppc64, and aarch64. > > Committed as rev.221867. > > 2015-04-04 Vladimir Makarov > > PR target/65647 > * lra-int.h (LRA_MAX_REMATERIALIZATION_PASSES): New. Add its > value checking. > (lra_rematerialization_iter): New. > * lra.c (lra): Initialize lra_rematerialization_iter. > Stop updating lra_constraint_new_regno_start after switching of > inheritance and rematerialization. > * lra-remat.c (lra_rematerialization_iter): New. > (lra_remat): Add printing pass iteration. Do rematerialization > only first LRA_MAX_REMATERIALIZATION_PASSES iterations. > > 2015-04-04 Vladimir Makarov > > PR target/65647 > * gcc.target/arm/pr65647.c: New. > > diff --git a/gcc/lra.c b/gcc/lra.c index 69b08dc..98f0444 100644 --- a/gcc/lra.c +++ b/gcc/lra.c @@ -2417,7 +2417,11 @@ lra (FILE *f) /* Assignment of stack slots changes elimination offsets for some eliminations. So update the offsets here. */ lra_eliminate (false, false); - lra_constraint_new_regno_start = max_reg_num (); + /* After switching off inheritance passe, don't forget reload pseudos + after spilling sub-pass to avoid LRA cycling in some complicated +cases. */ + if (lra_inheritance_iter <= LRA_MAX_INHERITANCE_PASSES) +lra_constraint_new_regno_start = max_reg_num (); lra_constraint_new_insn_uid_start = get_max_uid (); lra_assignment_iter_after_spill = 0; } diff --git a/gcc/testsuite/gcc.target/arm/pr65647-2.c b/gcc/testsuite/gcc.target/arm/pr65647-2.c new file mode 100644 index 000..f2985f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr65647-2.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -marm -march=armv6 -std=c99" } */ + +typedef struct { + int i; +} x264_union32_t; +typedef struct { + int level_idx; +} trellis_node_t; +int a, c, d, f, h, i = (int)&c; +trellis_node_t b[1][1]; +short *e = 0; +short g; +void fn1() { + int k[64 * 8 * 2]; + trellis_node_t *l = b[0]; + for (; i >= d; i--) { +if (e[i]) { + for (int j = 1; j < 8; j++) { +((x264_union32_t *)&k[a])->i = l[j].level_idx; +l[j].level_idx = a; +a++; + } + continue; +} +for (int j;; j++) + ; + } + int m[6] __attribute__((aligned(16))); + for (; h; h++, f++) +g = m[h]; +} diff --git a/gcc/testsuite/gcc.target/arm/pr65647.c b/gcc/testsuite/gcc.target/arm/pr65647.c new file mode 100644 index 000..686eb58 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr65647.c @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-march=armv6-m -mthumb -O3 -w" } */ + +a, b, c, e, g = &e, h, i = 7, l = 1, m, n, o, q = &m, r, s = &r, u, w = 9, x, + y = 6, z, t6 = 7, t8, t9 = 1, t11 = 5, t12 = &t8, t13 = 3, t15, + t16 = &t15; +struct { + long long f3; +char f4 +} p = {3} + +, + t = {4}; + +struct S1 { + long long f0; + short f1; +long long f2 +} d; +long long f = 4073709551613, t7 = 8, t14 = 4073709551610; +j[]; +k = j; +v = &d; +*t10 = j; +struct S1 fn1(); +struct S1 fn2() { + signed char t1; + struct S1 t2; + long t3 = x; + short t4 = h; + short *t5 = &l; + fn1(t2, w, 1, o); + if (u) { +l = q; +t1 = a < b ?: b; +z = c >= 2 || t1 << c; + } + *t5 = t4 &= t3; + fn3(y); +} + +fn4() { + t6 = t.f3; + fn5(k, t7); +} + +struct S1 fn1() { + f = 0; + for (; i;) +; + t11 = 0; + t13 = *t10 = t14 || n; + t9 = t12; + for (; p.f4;) +s = t16 <= fn6(); + if (g) +v = 0; +}
Re: [Patch, Fortran, pr60322] was: [Patch 1/2, Fortran, pr60322] [OOP] Incorrect bounds on polymorphic dummy array
Dear Andre, Well, time passed and it didn't get done. Too much going on at the moment! As you say, the patch bootstraps and regtests on x86_64, FC21 in my case. I am now very reluctant to mess around with the gcc-5 release. Thus, I think that this patch must be committed to 5.2 and 6.0, when the are open for business. A few trivial comments: + /* The dummy is returned for pointer, allocatable or assumed rank arrays. + The check for pointerness needs to be repeated here (it is done in + IS_CLASS_ARRAY (), too), because for class arrays that are pointers, as + is the one of the sym, which is incorrect here. */ What does this mean, please? + /* Returning the descriptor for dummy class arrays is hazardous, because + some caller is expecting an expression to apply the component refs to. + Therefore the descriptor is only created and stored in + sym->backend_decl's GFC_DECL_SAVED_DESCRIPTOR. The caller is then + responsible to extract it from there, when the descriptor is + desired. */ + if (IS_CLASS_ARRAY (sym) + && (!DECL_LANG_SPECIFIC (sym->backend_decl) + || !GFC_DECL_SAVED_DESCRIPTOR (sym->backend_decl))) +{ + decl = gfc_build_dummy_array_decl (sym, sym->backend_decl); + /* Prevent the dummy from being detected as unused if it is copied. */ + if (sym->backend_decl != NULL && decl != sym->backend_decl) +DECL_ARTIFICIAL (sym->backend_decl) = 1; + sym->backend_decl = decl; +} The comments, such as the above are often going well beyond column 72, into the 80's. I know that much of the existing code violates this style requirement but there is no need to do so if clarity is not reduced thereby. In trans-stmt.c s/standart/standard/ Don't forget to put the PR numbers in the ChangeLogs. For this submission, I would have appreciated some a description of what each chunk in the patch is doing, just because there is so much of it. I suppose that it was good for my imortal soul to sort it out for myself but it took a little while :-) Cheers and many thanks for the patch. Paul On 27 March 2015 at 13:48, Paul Richard Thomas wrote: > Dear Andre, > > I am in the UK as of last night. Before leaving, I bootstrapped and > regtested your patch and all was well. I must drive to Cambridge this > afternoon to see my mother and will try to get to it either this > evening or tomorrow morning. There is so much of it and it touches > many places; so I must give it a very careful looking over before > giving the green light. Bear with me please. > > Great work though! > > Paul > > On 24 March 2015 at 18:06, Andre Vehreschild wrote: >> Hi all, >> >> I have worked on the comments Mikael gave me. I am now checking for >> class_pointer in the way he pointed out. >> >> Furthermore did I *join the two parts* of the patch into this one, because >> keeping both in sync was no benefit but only tedious and did not prove to be >> reviewed faster. >> >> Paul, Dominique: I have addressed the LOC issue that came up lately. Or >> rather >> the patch addressed it already. I feel like this is not tested very well, not >> the loc() call nor the sizeof() call as given in the 57305 second's download. >> Unfortunately, is that download not runable. I would love to see a test >> similar >> to that download, but couldn't come up with one, that satisfied me. Given >> that >> the patch's review will last some days, I still have enough time to come up >> with something beautiful which I will add then. >> >> Bootstraps and regtests ok on x86_64-linux-gnu/F20. >> >> Regards, >> Andre >> >> >> On Tue, 24 Mar 2015 11:13:27 +0100 >> Paul Richard Thomas wrote: >> >>> Dear Andre, >>> >>> Dominique pointed out to me that the 'loc' patch causes a ICE in the >>> testsuite. It seems that 'loc' should provide the address of the class >>> container in some places and the address of the data in others. I will >>> put my thinking cap on tonight :-) >>> >>> Cheers >>> >>> Paul >>> >>> On 23 March 2015 at 13:43, Andre Vehreschild wrote: >>> > Hi Mikael, >>> > >>> > thanks for looking at the patch. Please note, that Paul has sent an >>> > addendum to the patches for 60322, which I deliberately have attached. >>> > >>> >> 26/02/2015 18:17, Andre Vehreschild a écrit : >>> >> > This first patch is only preparatory and does not change any of the >>> >> > semantics of gfortran at all. >>> >> Sure? >>> > >>> > With the counterexample you found below, this of course is a wrong >>> > statement. >>> > >>> >> > diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c >>> >> > index ab6f7a5..d28cf77 100644 >>> >> > --- a/gcc/fortran/expr.c >>> >> > +++ b/gcc/fortran/expr.c >>> >> > @@ -4059,10 +4060,10 @@ gfc_lval_expr_from_sym (gfc_symbol *sym) >>> >> >lval->symtree = gfc_find_symtree (sym->ns->sym_root, sym->name); >>> >> > >>> >> >/* It will always be a full array. */ >>> >> > - lval->rank = sym->as ? sym->as->rank : 0; >>> >> > + as = sym->as; >>> >> >
[patch, fortran, RFC] First steps towards inlining matmul
Hello world, this is a first draft of a patch to inline matmul (PR 37171). This is preliminary, but functional as far as it goes. Definitely for the next stage one :-) Basically, it takes c = matmul(a,b) and converts this into BLOCK integer i,j,k c = 0 do j=0, size(b,2)-1 do k=0, size(a, 2)-1 do i=0, size(a, 1)-1 c(i * stride(c,1) + lbound(c,1), j * stride(c,2) + lbound(c,2)) = c(i * stride(c,1) + lbound(c,1), j * stride(c,2) + lbound(c,2)) + a(i * stride(a,1) + lbound(a,1), k * stride(a,2) + lbound(a,2)) * b(k * stride(b,1) + lbound(b,1), j * stride(b,2) + lbound(b,2)) end do end do end do END BLOCK For this, I wrote something like a scalarizer that only operates on the AST only. I find it rather straigtforward to use (in contrast to the one we have), but that may also be due to the fact that I wrote it myself :-) I chose zero-based loop variables because I didn't want to special-case too many strange array bounds, and I trusted the induction variable optimization and CSE in the middle end and in the RTL optimization. If this approach suboptimizes something, let me know. References are frozen to make sure that no additional evaluations are done on. The patch to simplify.c is not strictly necessary, but it should open up some optimization possiblities and, frankly, makes the -fdump-fortran-optimized output at least halfway readable. This patch needs to be extended in different directions: - Currently, it only handles the case of two-dimensional arguments - Special cases like transpose(a) and spread(a) in the arguments arguments should be added, possibly with a reordering of loops and addition of some more variables. - Temporary handling for arguments and results, which does not depend on allocatable RHS. I am thinking of creating nested blocks, where the inner block has arrays with the bounds of the references. - Cases like matmul(a+1,b) can also be handled in the loop, without a temporary - This needs command-line arguments, -finline-matmul and -finline-matmul-limit=, similar to the handling of BLAS. - Bounds checking could/should also be handled. Currently, this causes the only regression because of a different error message. We need to call gfortran_runtime_error directly from the front end, and to be able to tell the bounds-checking code "do not worry about bounds checking this, it has already been taken care of". - I already have some test cases, but for this, I would really like to cover all code paths. This also needs some more work. So, what do you think about this? Thomas 2015-04-05 Thomas Koenig * frontend-passes.c (create_var): Add optional argument vname as part of the name. Split off block creation into (insert_block): New function. (cfe_expr_0): Use "fcn" as part of tempoary name. (optimize_namesapce): Call optimize_matmul_assign. (combine_array_constructor): Use "constr" as part of temporary name. (get_array_inq_function): New function. (is_function_or_op): New function. (has_function_or_op): New function. (freeze_expr): New function. (freeze_references): New function. (convert_to_index_kind): New function. (create_do_loop): New function. (get_operand): New function. (get_size_1): New function. (scalarize_expr): New function. (optimize_matmul_assign): New function. * simplify.c (simplify_bound): Simplify the case of the lower bound of an assumed-shape argument. Index: frontend-passes.c === --- frontend-passes.c (Revision 221757) +++ frontend-passes.c (Arbeitskopie) @@ -43,7 +43,11 @@ static void doloop_warn (gfc_namespace *); static void optimize_reduction (gfc_namespace *); static int callback_reduction (gfc_expr **, int *, void *); static void realloc_strings (gfc_namespace *); -static gfc_expr *create_var (gfc_expr *); +static gfc_expr *create_var (gfc_expr *, const char *vname=NULL); +static int optimize_matmul_assign (gfc_code **, int *, void *); +static gfc_code * create_do_loop (gfc_expr *, gfc_expr *, gfc_expr *, + locus *, gfc_namespace *, + char *vname=NULL); /* How deep we are inside an argument list. */ @@ -95,6 +99,10 @@ static bool in_assoc_list; /* Entry point - run all passes for a namespace. */ +/* Counter for temporary variables. */ + +static int var_num = 1; + void gfc_run_passes (gfc_namespace *ns) { @@ -157,7 +165,7 @@ realloc_string_callback (gfc_code **c, int *walk_s return 0; current_code = c; - n = create_var (expr2); + n = create_var (expr2, "trim"); co->expr2 = n; return 0; } @@ -524,29 +532,11 @@ constant_string_length (gfc_expr *e) } -/* Returns a new expression (a variable) to be used in place of the old one
Re: [patch, fortran, RFC] First steps towards inlining matmul
On 04/05/2015 05:32 AM, Thomas Koenig wrote: --- snip --- So, what do you think about this? Thomas I am curious about what performance gain results from this? I can see saving a library call to our runtime libraries. Do you have some timing results? Jerry
Re: [patch, fortran, RFC] First steps towards inlining matmul
Hi Jerry, > I am curious about what performance gain results from this? I can see > saving a library call to our runtime libraries. Do you have some timing > results? The speedup can be quite drastic for small matrices which can be completely unrolled by -O3: b1.f90: program main use b2 implicit none real, dimension(3,3) :: a, b, c integer :: i call random_number(a) call random_number(b) do i=1,10**8 c = matmul(a,b) call bar(b,c) end do end program main b2.f90: module b2 contains subroutine bar(b,c) real, dimension(3,3) :: b,c end subroutine bar end module b2 ig25@linux-fd1f:~/Krempel/Matmul> gfortran -O3 -fno-frontend-optimize b2.f90 b1.f90 && time ./a.out real0m15.411s user0m15.404s sys 0m0.001s ig25@linux-fd1f:~/Krempel/Matmul> gfortran -O3 b2.f90 b1.f90 && time ./a.out real0m1.736s user0m1.735s sys 0m0.001s
Re: [patch, fortran, RFC] First steps towards inlining matmul
> > So, what do you think about this? > > I am curious about what performance gain results from this? > I can see saving a library call to our runtime libraries. > Do you have some timing results? > > Jerry IMO the inlining of MATMUL should be restricted to small matrices (less than 4x4, 9x9 or 16x16 depending of your field!-). I have a variant of the polyhedron test induct.f90 in which I have replaced three dot products by the equivalent matmul and which executes ten times slower than the original test. Dominique
Re: [patch, fortran, RFC] First steps towards inlining matmul
Hi Dominique, > IMO the inlining of MATMUL should be restricted to small matrices (less than > 4x4, 9x9 > or 16x16 depending of your field!-) The problem with the library function we have is that it is quite general; it can deal with all the complexity of assumed-shape array arguments. Inlining allows the compiler to take advantage of contiguous memory, compile-time knowledge of array sizes, etc. to allow for vectorization or even complete unrolling. Of course, if you use c=matmul(a,b) where all three are assumed-shape arrays, your advantage over the library function will be minimal at best. It will be interesting to see at what size calling BLAS directly will be better. Thomas
Re: Work around ICE in PR 65648
On Fri, 3 Apr 2015, Jan Hubicka wrote: > -#ifdef ENABLE_CHECKING > + /* This is used only for assert bellow. */ > +#if 0 "below" (instead of "bellow") Gerald
Re: patch to fix PR65647
On 05/04/15 04:45 AM, Yvan Roux wrote: Hi, The issue is also present in 4.9 branch as explained in: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65647 As 4.9 doesn't contains lra rematerialization passe, we only need to stop updating lra_constraint_new_regno_start when inheritance is switched off. Bootstrapped and tested on x64_64 and cross built and tested on AArch64, arm, armeb and i686. Ok for 4.9 ? Yes. The patch is ok for the branch. Thanks, Yvan.
Re: [patch, fortran, RFC] First steps towards inlining matmul
I have done some timings (1) with the test given below, before the patch I get (last column in Gflops) [Book15] f90/bug% gfc -Ofast timing/matmul_tst_sys.f90 -framework Accelerate [Book15] f90/bug% time a.out Time, MATMUL:373.708008 373.69497100014.2815668504139435 Time, MATMUL:172.086609 23.117919169.210381782201068 545.374u 0.537s 6:36.93 137.5% 0+0k 0+0io 0pf+0w [Book15] f90/bug% gfc -Ofast -fexternal-blas timing/matmul_tst_sys.f90 -framework Accelerate [Book15] f90/bug% time a.out Time, MATMUL:176.327881 23.855111867.071577781735002 Time, MATMUL:182.086746 24.453551165.430170039516952 358.059u 0.471s 0:48.43 740.2% 0+0k 0+0io 0pf+0w after the patch [Book15] f90/bug% time a.out Time, MATMUL:392.415436 392.4172874.0772923337669056 Time, MATMUL:171.690399 22.905118269.853383859450091 563.671u 0.551s 6:55.44 135.8% 0+0k 0+0io 0pf+0w [Book15] f90/bug% gfc -Ofast -fexternal-blas timing/matmul_tst_sys.f90 -framework Accelerate [Book15] f90/bug% time a.out Time, MATMUL:392.850342 392.8419074.0728852177349673 Time, MATMUL:174.534821 23.784797167.269861500184334 566.824u 0.674s 6:56.74 136.1% 0+0k 0+0io 0pf+0w which means that -fexternal-blas should disable the inlining. program t2 implicit none REAL time_begin, time_end integer, parameter :: n=2000; integer(8) :: ts, te, rate8, cmax8 real(8) :: elapsed REAL(8) :: a(n,n), b(n,n), c(n,n) integer, parameter :: m = 100 integer :: i call RANDOM_NUMBER(a) call RANDOM_NUMBER(b) call cpu_time(time_begin) call SYSTEM_CLOCK (ts, rate8, cmax8) do i = 1,m a(1,1) = a(1,1) + 0.1 c = MATMUL(a,b) enddo call SYSTEM_CLOCK (te, rate8, cmax8) call cpu_time(time_end) elapsed = real(te-ts, kind=8)/real(rate8, kind=8) PRINT *, 'Time, MATMUL: ',time_end-time_begin, elapsed , 2*m*real(n, kind=8)**3/(10**9*elapsed) call cpu_time(time_begin) call SYSTEM_CLOCK (ts, rate8, cmax8) do i = 1,m a(1,1) = a(1,1) + 0.1 call dgemm('n','n',n, n, n, dble(1.0), a, n, b, n, dble(0.0), c, n) enddo call SYSTEM_CLOCK (te, rate8, cmax8) call cpu_time(time_end) elapsed = real(te-ts, kind=8)/real(rate8, kind=8) PRINT *, 'Time, MATMUL: ',time_end-time_begin, elapsed , 2*m*real(n, kind=8)**3/(10**9*elapsed) end program ! http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/1cba8e6ce5080197 (2) The polyhedron test compiled with -fprotect-parens -Ofast -funroll-loops -ftree-loop-linear -fomit-frame-pointer -fwhole-program -flto runs in ~6.5s. After applying the patch below, it runs in ~66s with/without the patch. Dominique --- induct.f90 2005-10-11 22:53:32.0 +0200 +++ induct_vmc.f90 2015-04-05 19:06:30.0 +0200 @@ -1644,18 +1644,17 @@ contains coil_tmp_vector(1) = -sin(theta) coil_tmp_vector(2) = cos(theta) coil_tmp_vector(3) = 0.0_longreal - coil_current_vec(1) = dot_product(rotate_coil(1,:),coil_tmp_vector(:)) - coil_current_vec(2) = dot_product(rotate_coil(2,:),coil_tmp_vector(:)) - coil_current_vec(3) = dot_product(rotate_coil(3,:),coil_tmp_vector(:)) + coil_current_vec = matmul(rotate_coil,coil_tmp_vector) ! do j = 1, 9 c_vector(3) = 0.5 * h_coil * z1gauss(j) ! ! rotate coil vector into the global coordinate system and translate it ! - rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx - rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy - rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz + rot_c_vector = matmul(rotate_coil,c_vector) + rot_c_vector(1) = rot_c_vector(1) + dx + rot_c_vector(2) = rot_c_vector(2) + dy + rot_c_vector(3) = rot_c_vector(3) + dz ! do k = 1, 9 q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal) @@ -1664,9 +1663,7 @@ contains ! ! rotate quad vector into the global coordinate system ! - rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:)) - rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:)) - rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:)) + rot_q_vector = matmul(rotate_quad,q_vector) ! ! compute and add in quadrature term ! @@ -1756,18 +1753,17 @@ contains coil_tmp_vector(1) = -sin(theta) coil_tmp_vector(2) = cos(theta) coil_tmp_vector(3) = 0.0_longreal - coil_current_vec(1) = dot_product(rotate_coil(1,:),coil_tmp_vector(:)) - coil_current_vec(2) = dot_product(rotate_coil(2,:),coil_tmp_vector(:)) - coil_current_vec(3) = dot_product(rotate_coil(3,:),coil_tmp_vector(:)) +
pr59016
Hi, The attached patch, type 0, has been discussed a little at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59016 Yet, the final version submitted is slightly different from the ones discussed and tested in the above link. Having read the GNU Coding Conventions, I created a patch using svn diff (“-up”) and I added a small test case. Yet, the function names aren’t printed into the context (surrounding changes). Further, I've run the “check_GNU_style.sh” script to ensure that the source code meets the GNU style requirements. Also, I don’t have DejaGNU installed and thus I think that I cannot run all the tests. In other words, evaluation is up to the GNU team. Finally, it might be obvious that the patch is for trunk (not gcc-4.9.2). Regards, Ev. Drikos gcc-5.0-pr59016.patch Description: Binary data
Re: [patch, fortran, RFC] First steps towards inlining matmul
Hi Dominique, > which means that -fexternal-blas should disable the inlining. It is not surprising that a higly tuned BLAS library is better than a simple inlining for large matrices. I did some tests by adjusting n; it seems the inline version is faster for n<=22, which is not too bad. Regarding your other test case: This tests matrix*vector multiplication, which is not implemented yet :-) Regards, Thomas
pr59016
Hi, The attached patch, type 0, has been discussed a little at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59016 Yet, the final version submitted is slightly different from the ones discussed and tested in the above link. Having read the GNU Coding Conventions, I created a patch using svn diff (“-up”) and I added a small test case. Further, I've run the “check_GNU_style.sh” script to ensure that the source code meets the GNU style requirements. Also, I don’t have DejaGNU installed and thus I think that I cannot run all the tests. In other words, evaluation is up to the GNU team. Finally, it might be obvious that the patch is for trunk (not gcc-4.9.2). Regards, Ev. Drikos gcc-5.0-pr59016.c_log Description: Binary data gcc-5.0-pr59016.patch Description: Binary data
Re: [patch, fortran, RFC] First steps towards inlining matmul
The patch causes the following regressions: FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single -O2 -latomic (internal compiler error) FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single -O2 -latomic (test for excess errors) FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=lib -O2 -lcaf_single -latomic (internal compiler error) FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=lib -O2 -lcaf_single -latomic (test for excess errors) FAIL: gfortran.dg/array_function_3.f90 -O (internal compiler error) FAIL: gfortran.dg/array_function_3.f90 -O (test for excess errors) FAIL: gfortran.dg/bind_c_vars.f90 -g -flto (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O0 (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O0 (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O1 (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O1 (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O2 (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O2 (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer -funroll-loops (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer -funroll-loops (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O3 -g (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O3 -g (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -Os (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -Os (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -g -flto (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O0 (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O1 (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O1 (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O2 (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O2 (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer -funroll-loops (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer -funroll-loops (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O3 -g (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O3 -g (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -Os (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -Os (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -g -flto (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -g -flto (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O0 (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O0 (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O1 (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O1 (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O2 (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O2 (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer -funroll-loops (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer -funroll-loops (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O3 -g (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O3 -g (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -Os (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -Os (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -g -flto (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -g -flto (test for excess errors) I think the line if (!upper && as->type == AS_ASSUMED_SHAPE && dim) should be something such as (untested) if (!upper && dim && as && as->type == AS_ASSUMED_SHAPE) Dominique > Le 5 avr. 2015 à 22:37, Thomas Koenig a écrit : > > Hi Dominique, > >> which means that -fexternal-blas should disable the inlining. > > It
Re: [libstdc++/65033] Give alignment info to libatomic
On Fri, 3 Apr 2015, Jonathan Wakely wrote: > On 03/04/15 05:24 -0400, Hans-Peter Nilsson wrote: > > On Thu, 2 Apr 2015, Hans-Peter Nilsson wrote: > > > Why then use __alignof(_M_i) (the object-alignment) > > > instead of _S_alignment (the deduced alas insufficiently > > > increased type-alignment)? > > Isn't the object aligned to _S_alignment? We did specify that with the alignas. Is the alignof always exactly the same as an alignas, if one is specified? (And will that not change in a future amendment, standard and/or implementation?) Either way, is there a test-case to guard all this? Those questions wouldn't even be asked if we used _S_alignment for the fake-pointer too, just as a matter of defensive programming. > Instead of changing every case in the condition to include sizeof why > not just do it afterwards using sizeof(_Tp), in the _S_alignment > calculation? Doh. > We know sizeof(_Tp) == sizeof(corresponding integer type) because > that's the whole point of the conditionals! See attachment. > > > @@ -216,7 +216,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > is_lock_free() const noexcept > > { > > // Produce a fake, minimally aligned pointer. > > - void *__a = reinterpret_cast(-__alignof(_M_i)); > > + void *__a = reinterpret_cast(-_S_alignment); > > return __atomic_is_lock_free(sizeof(_M_i), __a); > > } > > If _M_i is aligned to _S_alignment then what difference does the > change above make? > > It doesn't matter if the value is per-object if we've forced all such > objects to have the same alignment, does it? > > Or is it different if a std::atomic is included in some other > struct and the user forces a different alignment on it? I don't think > we really need to support that, users shouldn't be doing that. Why do we even need to ask those questions, when the patch takes care of the per-type business without doubt? > The attached patch against trunk should have the same result with much > less effort. > > It doesn't include the changes to the reinterpret_cast > expressions to produce a minimally aligned pointer, but I think this > is progress, thanks :-) Progress is good. :) brgds, H-P
Re: [PATCH, libmpx, i386, PR driver/65444] Pass '-z bndplt' when building dynamic objects with MPX
On 04/03/2015 01:34 PM, Joseph Myers wrote: On Tue, 31 Mar 2015, Ilya Enkovich wrote: +library. It also passes '-z bndplt' to a linker in case it supports this +option (which is checked on libmpx configuration). Note that old versions +of linker may ignore option. Gold linker doesn't support '-z bndplt' +option. With no '-z bndplt' support in linker all calls to dynamic libraries +lose passed bounds reducing overall protection level. It's highly +recommended to use linker with '-z bndplt' support. In case such linker +is not available it is adviced to always use @option{-static-libmpxwrappers} +for better protection level or use @option{-static} to completely avoid +external calls to dynamic libraries. MPX-based instrumentation Use @samp{-z bndplt} rather than '' quoting (but Sandra may have further advice on the substance of this documentation). To tell the truth, I can't figure out what this means from a user perspective. How does a user know whether the linker option is being ignored, or if they have a new enough linker? If the linker available at configuration time doesn't support the option, does that mean the option will never be passed and users will never know that there are gaping holes in the pointer bounds checking? My suggestion would be to pass the option unconditionally and make the documentation say something like It also passes @option{-z bndplt} to the linker. LD version xxx or later is required to use this feature. If no linker support for @option{-z bndplt} is available, you should link with @option{-static-libmpxwrappers} or @option{-static} instead; otherwise calls to dynamic libraries lose bounds checking protection. ... where you need to fill in "version xxx" appropriately. -Sandra
Re: [PATCH, libmpx, i386, PR driver/65444] Pass '-z bndplt' when building dynamic objects with MPX
On Sun, Apr 5, 2015 at 6:44 PM, Sandra Loosemore wrote: > On 04/03/2015 01:34 PM, Joseph Myers wrote: >> >> On Tue, 31 Mar 2015, Ilya Enkovich wrote: >> >>> +library. It also passes '-z bndplt' to a linker in case it supports >>> this >>> +option (which is checked on libmpx configuration). Note that old >>> versions >>> +of linker may ignore option. Gold linker doesn't support '-z bndplt' >>> +option. With no '-z bndplt' support in linker all calls to dynamic >>> libraries >>> +lose passed bounds reducing overall protection level. It's highly >>> +recommended to use linker with '-z bndplt' support. In case such linker >>> +is not available it is adviced to always use >>> @option{-static-libmpxwrappers} >>> +for better protection level or use @option{-static} to completely avoid >>> +external calls to dynamic libraries. MPX-based instrumentation >> >> >> Use @samp{-z bndplt} rather than '' quoting (but Sandra may have further >> advice on the substance of this documentation). > > > To tell the truth, I can't figure out what this means from a user > perspective. How does a user know whether the linker option is being > ignored, or if they have a new enough linker? If the linker available at > configuration time doesn't support the option, does that mean the option > will never be passed and users will never know that there are gaping holes > in the pointer bounds checking? > > My suggestion would be to pass the option unconditionally and make the > documentation say something like I totally agree with it. > It also passes @option{-z bndplt} to the linker. LD version xxx or later is > required to use this feature. If no linker support for @option{-z bndplt} > is available, you should link with @option{-static-libmpxwrappers} or > @option{-static} instead; otherwise calls to dynamic libraries lose bounds > checking protection. > This implies that -static-libmpxwrappers will cover all dynamic libraries, which isn't true. -static-libmpxwrappers only covers calls to functions defined in libmpxwrappers.so and leaves calls to functions defined in other dynamic libraries open to buffer overflow. -- H.J.
Re: [PATCH, libmpx, i386, PR driver/65444] Pass '-z bndplt' when building dynamic objects with MPX
On 04/05/2015 08:35 PM, H.J. Lu wrote: On Sun, Apr 5, 2015 at 6:44 PM, Sandra Loosemore wrote: [snip snip] To tell the truth, I can't figure out what this means from a user perspective. How does a user know whether the linker option is being ignored, or if they have a new enough linker? If the linker available at configuration time doesn't support the option, does that mean the option will never be passed and users will never know that there are gaping holes in the pointer bounds checking? My suggestion would be to pass the option unconditionally and make the documentation say something like I totally agree with it. It also passes @option{-z bndplt} to the linker. LD version xxx or later is required to use this feature. If no linker support for @option{-z bndplt} is available, you should link with @option{-static-libmpxwrappers} or @option{-static} instead; otherwise calls to dynamic libraries lose bounds checking protection. This implies that -static-libmpxwrappers will cover all dynamic libraries, which isn't true. -static-libmpxwrappers only covers calls to functions defined in libmpxwrappers.so and leaves calls to functions defined in other dynamic libraries open to buffer overflow. See, I said I didn't really understand the details. ;-) Just strike the reference to -static-libmpxwrappers entirely, then. -Sandra