Re: [patch, fortran] Make ABI ready for BACK argument of MINLOC and MAXLOC
On Mon, Jan 8, 2018 at 1:23 AM, Thomas Koenig wrote: > Hello world, > > the attached patch is a step towards the implementaion of the BACK > argument for the MINLOC and MAXLOC intrinsics, a part of F2008. > This readies the ABI for a later date. Makes sense. > In order to avoid combinatrorial explosion in the library, I have > opted to always add the BACK argument to the library version. > The additional overhead should be small, this is only a scalar > LOGICAL. We currently have 216 versions of minloc in the library, > I don't want this to be 432 :-) Yes, I agree. > Of course, the actual implementation of BACK is still missing, as > are the standard-dependent checks. This will be done at a later > date. In this version, the library functions always get a .false. > value, which is equivalent to the current behavior. > > Regression-tested. OK for trunk? If I understand it correctly, in the library the back argument is always a LOGICAL(kind=4). But in the frontend, the back argument is coerced to gfc_default_logical_kind. So this doesn't work if one uses the -fdefault-integer-8 option, because then gfc_default_logical_kind will be 8. I suggest you create a constant "gfc_logical4_kind" and use that in the frontend. -- Janne Blomqvist
[PATCH] Fix PR83517
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2018-01-08 Richard Biener PR middle-end/83517 * match.pd ((t * 2) / 2) -> t): Add missing :c. * gcc.dg/pr83517.c: New testcase. Index: gcc/match.pd === --- gcc/match.pd(revision 256275) +++ gcc/match.pd(working copy) @@ -510,7 +510,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Simplify (t * 2) / 2) -> t. */ (for div (trunc_div ceil_div floor_div round_div exact_div) (simplify - (div (mult @0 @1) @1) + (div (mult:c @0 @1) @1) (if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) @0))) Index: gcc/testsuite/gcc.dg/pr83517.c === --- gcc/testsuite/gcc.dg/pr83517.c (nonexistent) +++ gcc/testsuite/gcc.dg/pr83517.c (working copy) @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-original" } */ + +int test(int x) +{ + return (x+x)/x; +} + +/* { dg-final { scan-tree-dump "return 2;" "original" } } */
Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
* H. J. Lu: > Add -mindirect-branch-loop= option to control loop filler in call and > return thunks generated by -mindirect-branch=. 'lfence' uses "lfence" > as loop filler. 'pause' uses "pause" as loop filler. 'nop' uses "nop" > as loop filler. The default is 'lfence'. Why is the loop needed? Doesn't ud2 or cpuid stop speculative execution?
[PATCH] Fix PR83580
The following fixes PR83580, split_constant_offset is a somewhat odd beast, replicating SCEV code and tree-affine a bit. It also got similar tricks as those with regarding to looking through conversions but while being pedantic about overflow it simply strips sign-conversions. That's of course wrong, thus the following patch removes that. I do expect eventual fallout (in missed optimizations), so it needs some baking on trunk before backporting. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2018-01-08 Richard Biener PR middle-end/83580 * tree-data-ref.c (split_constant_offset): Remove STRIP_NOPS. * gcc.dg/torture/pr83580.c: New testcase. Index: gcc/tree-data-ref.c === --- gcc/tree-data-ref.c (revision 256275) +++ gcc/tree-data-ref.c (working copy) @@ -723,23 +723,21 @@ split_constant_offset_1 (tree type, tree void split_constant_offset (tree exp, tree *var, tree *off) { - tree type = TREE_TYPE (exp), otype, op0, op1, e, o; + tree type = TREE_TYPE (exp), op0, op1, e, o; enum tree_code code; *var = exp; *off = ssize_int (0); - STRIP_NOPS (exp); if (tree_is_chrec (exp) || get_gimple_rhs_class (TREE_CODE (exp)) == GIMPLE_TERNARY_RHS) return; - otype = TREE_TYPE (exp); code = TREE_CODE (exp); extract_ops_from_tree (exp, &code, &op0, &op1); - if (split_constant_offset_1 (otype, op0, code, op1, &e, &o)) + if (split_constant_offset_1 (type, op0, code, op1, &e, &o)) { - *var = fold_convert (type, e); + *var = e; *off = o; } } Index: gcc/testsuite/gcc.dg/torture/pr83580.c === --- gcc/testsuite/gcc.dg/torture/pr83580.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr83580.c (working copy) @@ -0,0 +1,16 @@ +/* { dg-do run } */ + +int a[2] = { 0, 1 }; +int x = 129; + +int +main () +{ + volatile int v = 0; + int t = x, i; + for (i = 0; i < 1 + v + v + v + v + v + v + v + v + a[a[0]]; i++) +t = a[(signed char) (130 - x)]; + if (t != 1) +__builtin_abort (); + return 0; +}
Re: [wwwdocs] readings.html - libre.adacore.com is gone
On 01/08/2018 01:39 AM, Gerald Pfeifer wrote: ...so adjust to where it redirects. Applied. Ah, good catch. Thank you Gerald! -- Pierre-Marie de Rodat
Re: [PING 2][PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)
On 12/17/2017 01:01 AM, Martin Sebor wrote: * c-c++-common/Wrestrict.c: New test. 681/* The following doesn't overlap but it should trigger -Wstrinop-ovewrflow 682 for writing past the end. */ 683T ("012", a + sizeof a, a); For nvptx, the warning actually shows up and is classified as excess error: ... gcc/testsuite/c-c++-common/Wrestrict.c:683:3: warning: '__builtin_memcpy' writing 4 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=] ... 760r = SR (DIFF_MAX - 2, DIFF_MAX - 1); 761T (8, "012", a + r, a);/* { dg-warning "accessing 4 bytes at offsets \\\[\[0-9\]+, \[0-9\]+] and 0 overlaps" "strcpy" } */ 762 Likewise, the warning triggers here: ... gcc/testsuite/c-c++-common/Wrestrict.c:761:3: warning: '__builtin_memcpy' writing 4 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=] ... >>> * c-c++-common/Warray-bounds-4.c: New test. 66TM ("0123", "", ma.a5 + i, ma.a5); /* { dg-warning "offset 6 from the object at .ma. is out of the bounds of referenced subobject .a5. with type .char\\\[5]. at offset 0" "strcpy" { xfail *-*-* } } */ 67TM ("", "012345", ma.a7 + i, ma.a7);/* { dg-warning "offset 13 from the object at .ma. is out of the bounds of referenced subobject .\(MA::\)?a7. with type .char ?\\\[7]. at offset 5" } */ And this warning fails to trigger: ... FAIL: c-c++-common/Warray-bounds-4.c -Wc++-compat (test for warnings, line 67) ... Thanks, - Tom
Re: [PATCH 3/5] x86: Add -mfunction-return=
On 01/07/2018 11:59 PM, H.J. Lu wrote: > Function return thunk is the same as memory thunk for -mindirect-branch= > where the return address is at the top of the stack: > > __x86_return_thunk: > call L2 > L1: > lfence > jmp L1 > L2: > lea 8(%rsp), %rsp|lea 4(%esp), %esp > ret > > and function return becomes > > jmp __x86_return_thunk Hello. Can you please explain more usage of the option? Is to prevent a speculative execution of 'ret' instruction (which is an indirect call), as described in [1]? The paper mentions that return stack predictors are commonly implemented in some form. Looks that current version of Linux patches does not use the option. Thanks, Martin [1] https://support.google.com/faqs/answer/7625886
[testsuite] Require alloca for some test-cases
Hi, This patch requires alloca for some test-cases. Tested on x86_64 and committed. Thanks, - Tom Require alloca for some test-cases 2018-01-08 Tom de Vries * c-c++-common/builtins.c: Require effective target alloca. * gcc.dg/Wrestrict.c: Same. * gcc.dg/tree-ssa/loop-interchange-15.c: Same. --- gcc/testsuite/c-c++-common/builtins.c | 3 ++- gcc/testsuite/gcc.dg/Wrestrict.c| 3 ++- gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-15.c | 1 + 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/c-c++-common/builtins.c b/gcc/testsuite/c-c++-common/builtins.c index 673fcad..3f1ef11 100644 --- a/gcc/testsuite/c-c++-common/builtins.c +++ b/gcc/testsuite/c-c++-common/builtins.c @@ -2,7 +2,8 @@ with no prototype do not cause an ICE. { dg-do compile } { dg-options "-O2 -Wall -Wextra" } - { dg-prune-output "warning" } */ + { dg-prune-output "warning" } + { dg-require-effective-target alloca } */ typedef __SIZE_TYPE__ size_t; diff --git a/gcc/testsuite/gcc.dg/Wrestrict.c b/gcc/testsuite/gcc.dg/Wrestrict.c index 076f878..266443f 100644 --- a/gcc/testsuite/gcc.dg/Wrestrict.c +++ b/gcc/testsuite/gcc.dg/Wrestrict.c @@ -1,6 +1,7 @@ /* Test to verify that VLAs are handled gracefully by -Wrestrict { dg-do compile } - { dg-options "-O2 -Wrestrict" } */ + { dg-options "-O2 -Wrestrict" } + { dg-require-effective-target alloca } */ typedef __SIZE_TYPE__ size_t; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-15.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-15.c index 420880e..63e5bb7 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-15.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-15.c @@ -1,6 +1,7 @@ /* PR tree-optimization/83337 */ /* { dg-do run { target int32plus } } */ /* { dg-options "-O2 -floop-interchange" } */ +/* { dg-require-effective-target alloca } */ /* Copied from graphite/interchange-5.c */
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On 01/08/2018 01:29 AM, H.J. Lu wrote: > 1. They need to be backportable to GCC 7/6/5/4.x. I must admit this is very important constrain. To be honest, we're planning to backport the patchset to GCC 4.3. Martin
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On 01/07/2018 11:59 PM, H.J. Lu wrote: > +static void > +output_indirect_thunk_function (bool need_bnd_p, int regno) > +{ > + char name[32]; > + tree decl; > + > + /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd. */ > + indirect_thunk_name (name, regno, need_bnd_p); > + decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL, > + get_identifier (name), > + build_function_type_list (void_type_node, NULL_TREE)); > + DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL, > +NULL_TREE, void_type_node); > + TREE_PUBLIC (decl) = 1; > + TREE_STATIC (decl) = 1; > + DECL_IGNORED_P (decl) = 1; > + > +#if TARGET_MACHO > + if (TARGET_MACHO) > +{ > + switch_to_section (darwin_sections[picbase_thunk_section]); > + fputs ("\t.weak_definition\t", asm_out_file); > + assemble_name (asm_out_file, name); > + fputs ("\n\t.private_extern\t", asm_out_file); > + assemble_name (asm_out_file, name); > + putc ('\n', asm_out_file); > + ASM_OUTPUT_LABEL (asm_out_file, name); > + DECL_WEAK (decl) = 1; > +} > + else > +#endif > +if (USE_HIDDEN_LINKONCE) > + { > + cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME > (decl)); > + > + targetm.asm_out.unique_section (decl, 0); > + switch_to_section (get_named_section (decl, NULL, 0)); > + > + targetm.asm_out.globalize_label (asm_out_file, name); > + fputs ("\t.hidden\t", asm_out_file); > + assemble_name (asm_out_file, name); > + putc ('\n', asm_out_file); > + ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl); > + } > +else > + { > + switch_to_section (text_section); > + ASM_OUTPUT_LABEL (asm_out_file, name); > + } > + > + DECL_INITIAL (decl) = make_node (BLOCK); > + current_function_decl = decl; > + allocate_struct_function (decl, false); > + init_function_start (decl); > + /* We're about to hide the function body from callees of final_* by > + emitting it directly; tell them we're a thunk, if they care. */ > + cfun->is_thunk = true; > + first_function_block_is_cold = false; > + /* Make sure unwind info is emitted for the thunk if needed. */ > + final_start_function (emit_barrier (), asm_out_file, 1); > + > + output_indirect_thunk (need_bnd_p, regno); > + > + final_end_function (); > + init_insn_lengths (); > + free_after_compilation (cfun); > + set_cfun (NULL); > + current_function_decl = NULL; > +} > + I'm wondering whether thunk creation can be a good target-independent generalization? I guess we can emit the function declaration without direct writes to asm_out_file? And the emission of function body can be potentially a target hook? What about emitting body of the function with RTL instructions instead of direct assembly write? My knowledge of RTL is quite small, but maybe it can bring some generalization and reusability for other targets? Thank you, Martin
[PATCH] -mjsr option bug fix
Hi, The -mjsr option in RX should ensure the that BSR instruction is not generated, only JSR instruction should be generated. However this does not work as expected: BSR instruction still gets generated even if -mjsr is passed in the command line. This is reproducible even if test cases from the gcc testsuite, for example: gcc.c-torture\compile\920625-1.c gcc.c-torture\compile\20051216-1.c gcc.dg\torture\builtin-explog-1.c The following patch fixes this issue by adding a new constraint to call_internal and call_value_internal. The patch also contains a test case which I created as follows: 1. I copied gcc.c-torture\compile\20051216-1.c to gcc.target\rx and renamed to mjsr.c 2. added the following lines to scan the assembly files for BSR. If BSR is present the test fails. /* { dg-do compile } */ /* { dg-options "-O2 -mjsr" } */ /* { dg-final { scan-assembler-not "bsr" } } */ Regression test is OK, tested with the following command: make -k check-gcc RUNTESTFLAGS=--target_board=rx-sim Please let me know if this is OK. Thank you! Best Regards, Sebastian Index: ChangeLog === --- ChangeLog (revision 256278) +++ ChangeLog (working copy) @@ -1,3 +1,10 @@ +2018-01-05 Sebastian Perta + + * config/rx/constraints.md: added new constraint CALL_OP_SYMBOL_REF + to allow or block "symbol_ref" depending on value of TARGET_JSR + * config/rx/rx.md: use CALL_OP_SYMBOL_REF in call_internal and + call_value_internal insns + 2018-01-05 Richard Sandiford * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Don't Index: config/rx/constraints.md === --- config/rx/constraints.md(revision 256278) +++ config/rx/constraints.md(working copy) @@ -106,3 +106,9 @@ ) ) ) + +(define_constraint "CALL_OP_SYMBOL_REF" +"constraint for call instructions using symbol ref" +(and (match_test "!TARGET_JSR") + (match_code "symbol_ref")) +) Index: config/rx/rx.md === --- config/rx/rx.md (revision 256278) +++ config/rx/rx.md (working copy) @@ -438,7 +438,7 @@ ) (define_insn "call_internal" - [(call (mem:QI (match_operand:SI 0 "rx_call_operand" "r,Symbol")) + [(call (mem:QI (match_operand:SI 0 "rx_call_operand" "r,CALL_OP_SYMBOL_REF")) (const_int 0)) (clobber (reg:CC CC_REG))] "" @@ -466,7 +466,7 @@ (define_insn "call_value_internal" [(set (match_operand 0 "register_operand" "=r,r") - (call (mem:QI (match_operand:SI 1 "rx_call_operand" "r,Symbol")) + (call (mem:QI (match_operand:SI 1 "rx_call_operand" "r,CALL_OP_SYMBOL_REF")) (const_int 0))) (clobber (reg:CC CC_REG))] "" Index: testsuite/gcc.target/rx/mjsr.c === --- testsuite/gcc.target/rx/mjsr.c (nonexistent) +++ testsuite/gcc.target/rx/mjsr.c (working copy) @@ -0,0 +1,134 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mjsr" } */ + +void *malloc (__SIZE_TYPE__); +void *realloc (void *, __SIZE_TYPE__); + +struct A { double x, y; }; +struct B { double x0, y0, x1, y1; }; +struct C { int n_points; int dir; struct B bbox; struct A *points; }; +struct D { int n_segs; struct C segs[1]; }; + +void foo (int, int, int *, int, int *, struct A **, int *, int *, + struct D *, int *, struct D **, int *, int **); +int baz (struct A, struct A, struct A, struct A); + +static void +bar (struct D *svp, int *n_points_max, + struct A p, int *seg_map, int *active_segs, int i) +{ + int asi, n_points; + struct C *seg; + + asi = seg_map[active_segs[i]]; + seg = &svp->segs[asi]; + n_points = seg->n_points; + seg->points = ((struct A *) + realloc (seg->points, (n_points_max[asi] <<= 1) * sizeof (struct A))); + seg->points[n_points] = p; + seg->bbox.y1 = p.y; + seg->n_points++; +} + +struct D * +test (struct D *vp) +{ + int *active_segs, n_active_segs, *cursor, seg_idx; + double y, share_x; + int tmp1, tmp2, asi, i, j, *n_ips, *n_ips_max, n_segs_max; + struct A **ips, p_curs, *pts; + struct D *new_vp; + int *n_points_max, *seg_map, first_share; + + n_segs_max = 16; + new_vp = (struct D *) malloc (sizeof (struct D) + + (n_segs_max - 1) * sizeof (struct C)); + new_vp->n_segs = 0; + + if (vp->n_segs == 0) +return new_vp; + + active_segs = ((int *) malloc ((vp->n_segs) * sizeof (int))); + cursor = ((int *) malloc ((vp->n_segs) * sizeof (int))); + + seg_map = ((int *) malloc ((vp->n_segs) * sizeof (int))); + n_ips = ((int *) malloc ((vp->n_segs) * sizeof (int))); + n_ips_max = ((int *) malloc ((vp->n_segs) * sizeof (int))); + ips = ((struct A * *) malloc ((vp->n_segs) * sizeof (struct A *))); + + n_points_max = ((int *) malloc ((n_segs_max) * sizeof (int))); + + n_active_segs = 0; + seg_idx = 0; + y = vp->segs[0].
Re: [PATCH, libgcc] Fix PowerPC libgcc issues with -mabi=ieeelongdouble
On Thu, Dec 14, 2017 at 11:10:13PM -0500, Michael Meissner wrote: > I am working on some patches to optionally enable multilibs for the PowerPC > long double support to be switchable between IBM extended double and IEEE > 128-bit floating point. While the patches to actually enable the multlibs > need > some more tweaking, it did point up an issue in the libgcc _Float128 and IBM > extended double functions. > > These patches use the correct types for IBM extended double and __float128 if > the IEEE default is used. I have built the compiler with bootstrap builds and > there were no regressions in running the tests on a little endian power8 > system. > > In addition, I had fixed the previous changes to _divkc3.c and _mulkc3.c so > that these functions now include soft-fp.h and quad-float128.h, which provides > the appropriate prototypes. > > I have also done a bootstrap build with my preliminary multilib patches, and > it > built fine with both -mabi=ieeelongdouble and -mabi=ibmlongdouble > configurations. > > Can I apply these patches to libgcc? As far as I can follow, it's okay for trunk. Please apply. Thanks, Segher > 2017-12-14 Michael Meissner > > * config/rs6000/_divkc3.c: Switch to using soft-fp.h and > quad-float128.h include files and use the types that they define, > instead of hand-rolling the types. > * config/rs6000/_mulkc3.c: Likewise. > * config/rs6000/ibm-ldouble.c: If we have __float128/_Float128, > use __ieee128 for the IBM extended double type instead of long double. > Change all functions. > * config/rs6000/quad-float128.h (IBM128_TYPE): Always use > __ieee128. > (CVT_FLOAT128_TO_IBM128): Use long double instead of __float128 on > systems where the default long double is IEEE 128-bit floating > point. > * config/rs6000/extendkftf2-sw.c (extendkftf2_sw): Likewise. > * config/rs6000/trunctfkf2-sw.c (__trunctfkf2_sw): Likewise.
Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer wrote: > * H. J. Lu: > >> Add -mindirect-branch-loop= option to control loop filler in call and >> return thunks generated by -mindirect-branch=. 'lfence' uses "lfence" >> as loop filler. 'pause' uses "pause" as loop filler. 'nop' uses "nop" >> as loop filler. The default is 'lfence'. > > Why is the loop needed? Doesn't ud2 or cpuid stop speculative > execution? My understanding is that a loop works better. -- H.J.
[Patch, fortran] PDT bugs PR2 83611 and 83731
This patch adds: (i) Default initializers for parameterized arrays; (ii) Fixes ordinary assignment of PDTs by implementing the same deep copy mechanism as for derived types with allocatable components; and (iii) Fixes the len parameter checking, which failed where the dummy type had an assumed parameter. I have in fact committed this patch as revision 256335 since it is safe for anything other than PDTs. 2018-01-08 Paul Thomas PR fortran/83611 * decl.c (gfc_get_pdt_instance): If parameterized arrays have an initializer, convert the kind parameters and add to the component if the instance. * trans-array.c (structure_alloc_comps): Add 'is_pdt_type' and use it with case COPY_ALLOC_COMP. Call 'duplicate_allocatable' for parameterized arrays. Clean up typos in comments. Convert parameterized array initializers and copy into the array. * trans-expr.c (gfc_trans_scalar_assign): Do a deep copy for parameterized types. *trans-stmt.c (trans_associate_var): Deallocate associate vars as necessary, when they are PDT function results for example. PR fortran/83731 * trans-array.c (structure_alloc_comps): Only compare len parms when they are declared explicitly. 2018-01-08 Paul Thomas PR fortran/83611 * gfortran.dg/pdt_15.f03 : Bump count of 'n.data = 0B' to 8. * gfortran.dg/pdt_26.f03 : Bump count of '_malloc' to 9. * gfortran.dg/pdt_27.f03 : New test. PR fortran/83731 * gfortran.dg/pdt_28.f03 : New test. Cheers Paul
Re: [PATCH improve early strlen range folding (PR 83671)
On Sat, Jan 6, 2018 at 11:04 PM, Martin Sebor wrote: > Bug 83671 - Fix for false positive reported by -Wstringop-overflow > does not work at -O1, points out that the string length range > optimization implemented as a solution for bug 83373 doesn't help > at -O1. The root cause is that the fix was added to the strlen > pass that doesn't run at -O1. > > The string length range computation doesn't depend on the strlen > pass, and so the range can be set earlier, in gimple-fold, and > its results made available even at -O1. The attached patch > changes the gimple_fold_builtin_strlen() function to do that. > > While testing the change I came across a number of other simple > strlen cases that currently aren't handled, some at -O1, others > at all. I added code to handle some of the simplest of them > and opened bugs to remind us/myself to get back to the rest in > the future (pr83693 and pr83702). The significant enhancement > is handling arrays of arrays with non-constant indices and > pointers to such things, such as in: > > char a[2][7]; > > void f (int i) > { > if (strlen (a[i]) > 6) // eliminated with the patch > abort (); > } > > Attached is a near-minimal patch to handle PR 83671. Please don't use set_range_info form insinde fold_stmt (), this is IMHO a layering violation. Why not restrict -Wstrinop-overflow to -O2+? Richard. > Martin
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 7, 2018 at 8:07 PM, Sandra Loosemore wrote: > On 01/07/2018 03:58 PM, H.J. Lu wrote: >> >> This set of patches for GCC 8 mitigates variant #2 of the speculative >> execution >> vulnerabilities on x86 processors identified by CVE-2017-5715, aka >> Spectre. They >> convert indirect branches to call and return thunks to avoid speculative >> execution >> via indirect call and jmp. > > > I have a general documentation issue with all the new command-line options > and attributes added by this patch set: the documentation is very > implementor-speaky and doesn't explain what user-level problem they're > trying to solve. Do you have any suggestions? > E.g. to take just one example > >> +@item function_return("@var{choice}") >> +@cindex @code{function_return} function attribute, x86 >> +On x86 targets, the @code{function_return} attribute causes the compiler >> +to convert function return with @var{choice}. @samp{keep} keeps function >> +return unmodified. @samp{thunk} converts function return to call and >> +return thunk. @samp{thunk-inline} converts function return to inlined >> +call and return thunk. @samp{thunk-extern} converts function return to >> +external call and return thunk provided in a separate object file. > > > Why would you want to mess with call and return code generation in this way? > The documentation doesn't say. > > For thunk-extern, is the programmer supposed to provide the thunk? How > would you go about implementing the missing bit of code? What should it do? > I'm compiler implementor and I wouldn't even know how to use this feature by > reading the manual; how would an ordinary application programmer who isn't > familiar with GCC internals know how to use it? This option was requested by Linux kernel people. Linux kernel may choose different thunks at kernel load-time. If a program doesn't know how to write external thunk, he/she shouldn't it. > If the goal here is to tell GCC to produce code that is protected against > the Spectre vulnerability, perhaps simplify this to adding just one option > that controls all the things you've given separate options and attributes > for? -mindirect-branch=thunk does the job. Other options/choices are for fine tuning. Thanks. -- H.J.
Re: [PATCH] fold strlen of constant aggregates (PR 83693)
On Mon, Jan 8, 2018 at 3:11 AM, Martin Sebor wrote: > GCC is able to fold references to members of global aggregate > constants in many expressions but it doesn't known how do it > for strlen() arguments. As a result, strlen calls with such > arguments such the one below are not optimized: > > const struct { char a[4]; } s = { "123" }; > > void f (void) > { > if (s.a[3]) abort (); // folded/eliminated > } > > void g (void) > { > if (strlen (s.a) != 3) abort (); // not folded > } > > The attached patch enables gimple_fold_builtin_strlen() to extract > constant strings from aggregate initializers, analogously to how > it extracts data of other types. Hmm. You do diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index fe3e0b4..f277324 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -3517,8 +3517,14 @@ gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi) wide_int minlen; wide_int maxlen; + /* Try to extract a constant from an object's CONSTRUCTOR first. */ + tree arg = gimple_call_arg (stmt, 0); + if (TREE_CODE (arg) == ADDR_EXPR) +if (tree str = fold_const_aggregate_ref (TREE_OPERAND (arg, 0))) + arg = str; + (patch is not against trunk?) but then fold_const_aggregate_ref of, say, &s.a[2] will simply return '3' which then yields to a bougs result? So I'm not sure this flys as-is or at least needs a comment how such simplification will end up _not_ disturbing the following code (maybe by noticing '3' aka an INTEGER_CST isn't a valid string and thus not folding). Richard. > Martin
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 7, 2018 at 10:55 PM, Markus Trippelsdorf wrote: > On 2018.01.07 at 21:07 -0700, Sandra Loosemore wrote: >> On 01/07/2018 03:58 PM, H.J. Lu wrote: >> > This set of patches for GCC 8 mitigates variant #2 of the speculative >> > execution >> > vulnerabilities on x86 processors identified by CVE-2017-5715, aka >> > Spectre. They >> > convert indirect branches to call and return thunks to avoid speculative >> > execution >> > via indirect call and jmp. >> >> I have a general documentation issue with all the new command-line >> options and attributes added by this patch set: the documentation is >> very implementor-speaky and doesn't explain what user-level problem >> they're trying to solve. >> >> E.g. to take just one example >> >> > +@item function_return("@var{choice}") >> > +@cindex @code{function_return} function attribute, x86 >> > +On x86 targets, the @code{function_return} attribute causes the compiler >> > +to convert function return with @var{choice}. @samp{keep} keeps function >> > +return unmodified. @samp{thunk} converts function return to call and >> > +return thunk. @samp{thunk-inline} converts function return to inlined >> > +call and return thunk. @samp{thunk-extern} converts function return to >> > +external call and return thunk provided in a separate object file. >> >> Why would you want to mess with call and return code generation in this >> way? The documentation doesn't say. >> >> For thunk-extern, is the programmer supposed to provide the thunk? How >> would you go about implementing the missing bit of code? What should it >> do? I'm compiler implementor and I wouldn't even know how to use this >> feature by reading the manual; how would an ordinary application >> programmer who isn't familiar with GCC internals know how to use it? >> >> If the goal here is to tell GCC to produce code that is protected >> against the Spectre vulnerability, perhaps simplify this to adding just >> one option that controls all the things you've given separate options >> and attributes for? > > Also it would be good to coordinate with the LLVM guys: They currently > use -mretpoline and -mretpoline_external_thunk. > I agree with Sandra that a single master option like -mretpoline would > be better. Our current goal is to compile Linux kernel. We won't change the generated codes. We will change the command options only if we add a late generic RTL pass. -- H.J.
Re: [PATCH][PR rtl-optimization/81308] Conditionally cleanup the CFG after insn splitting
On Mon, Jan 8, 2018 at 5:22 AM, Jeff Law wrote: > > This patch fixes the original problem reported in 81308. Namely that > g++.dg/pr62079.C will trigger a checking failure on 32bit x86. > > As Jakub noted in the BZ the problem is we had an insn with an EH region > note. That insn gets split and the split insns do not have an EH region > note (nor do they need one AFAICT). > > With the EH region note gone, the actual EH region becomes unreachable > and we get a verification error in the dominance code. > > My solution is relatively simple. During splitting we track if the > current insn has an EH region note. If it does and we end splitting the > insn or deleting it as a nop-move, then we note that we're going to need > a cfg cleanup. > > After splitting all insns, we conditionally cleanup the CFG. > > This should keep the overhead relatively low -- primarily it's the cost > to look for the EH region note on each insn as I don't expect the cfg > cleanup is often needed. > > If we could prove to ourselves that the situation only occurs with > -fnon-call-exceptions, then we could further reduce the overhead by > exploiting that invariant as well. I haven't really thought too much > about this. > > No new testcase as the existing pr62079.C will trigger on x86. > > Bootstrapped and regression tested on x86_64. Verified pr62079.C now > passes by hand in 32 bit mode. > > OK for the trunk? Ok. Richard. > Jeff > > PR rtl-optimization/81308 > * recog.c (split_all_insns): Conditionally cleanup the CFG after > splitting insns. > > diff --git a/gcc/recog.c b/gcc/recog.c > index d6aa903..cc28b71 100644 > --- a/gcc/recog.c > +++ b/gcc/recog.c > @@ -2931,6 +2931,7 @@ void > split_all_insns (void) > { >bool changed; > + bool need_cfg_cleanup = false; >basic_block bb; > >auto_sbitmap blocks (last_basic_block_for_fn (cfun)); > @@ -2949,6 +2950,18 @@ split_all_insns (void) > CODE_LABELS and short-out basic blocks. */ > next = NEXT_INSN (insn); > finish = (insn == BB_END (bb)); > + > + /* If INSN has a REG_EH_REGION note and we split INSN, the > +resulting split may not have/need REG_EH_REGION notes. > + > +If that happens and INSN was the last reference to the > +given EH region, then the EH region will become unreachable. > +We can not leave the unreachable blocks in the CFG as that > +will trigger a checking failure. > + > +So track if INSN has a REG_EH_REGION note. If so and we > +split INSN, then trigger a CFG cleanup. */ > + rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX); > if (INSN_P (insn)) > { > rtx set = single_set (insn); > @@ -2965,6 +2978,8 @@ split_all_insns (void) > nops then anyways. */ > if (reload_completed) > delete_insn_and_edges (insn); > + if (note) > + need_cfg_cleanup = true; > } > else > { > @@ -2972,6 +2987,8 @@ split_all_insns (void) > { > bitmap_set_bit (blocks, bb->index); > changed = true; > + if (note) > + need_cfg_cleanup = true; > } > } > } > @@ -2980,7 +2997,16 @@ split_all_insns (void) > >default_rtl_profile (); >if (changed) > -find_many_sub_basic_blocks (blocks); > +{ > + find_many_sub_basic_blocks (blocks); > + > + /* Splitting could drop an REG_EH_REGION if it potentially > +trapped in its original form, but does not in its split > +form. Consider a FLOAT_TRUNCATE which splits into a memory > +store/load pair and -fnon-call-exceptions. */ > + if (need_cfg_cleanup) > + cleanup_cfg (0); > +} > >checking_verify_flow_info (); > } >
Re: [PATCH][PR rtl-optimization/81308] Conditionally cleanup the CFG after switch conversion
On Mon, Jan 8, 2018 at 5:45 AM, Jeff Law wrote: > This patch fixes the second testcase in 81308 and the duplicate in 83724. > > For those cases we have a switch statement where one or more case labels > are marked as __builtin_unreachable. > > Switch conversion calls group_case_labels which can drop the edges from > the switch to the case labels that are marked as __builtin_unreachable. > > That leaves those blocks unreachable and thus we again trigger the > assertion failure in the dominance code. > > My solution here is again to note if a change was made that ought to > trigger a CFG cleanup (group_case_labels returns a suitable boolean). > If such a change was made then the pass returns TODO_cleanup_cfg. > > Bootstrapped and regression tested on x86_64. OK for the trunk? Ok. RIchard. > Jeff > > > PR rtl-optimizatin/81308 > * tree-switch-conversion.c (cfg_altered): New file scoped static. > (process_switch): If group_case_labels makes a change, then set > cfg_altered. > (pass_convert_switch::execute): If a switch is converted, then > set cfg_altered. Return TODO_cfg_cleanup if cfg_altered is true. > > > PR rtl-optimizatin/81308 > * g++.dg/pr81308-1.C: New test. > * g++.dg/pr81308-2.C: New test. > > diff --git a/gcc/testsuite/g++.dg/pr81308-1.C > b/gcc/testsuite/g++.dg/pr81308-1.C > new file mode 100644 > index 000..508372b > --- /dev/null > +++ b/gcc/testsuite/g++.dg/pr81308-1.C > @@ -0,0 +1,67 @@ > +/* { dg-do compile } */ > +/* { dg-options "-w -O2 -fno-exceptions -std=c++11 -fpermissive" } */ > + > +namespace a { > +template struct d { static constexpr b e = c; }; > +template struct f : d {}; > +} > +typedef long g; > +template struct h { static const bool e = a::f::e; }; > +namespace a { > +template struct ah; > +template class ai; > +} > +class i { > +public: > + operator[](long) const {} > +}; > +template class am : public i {}; > +class an; > +class k : public am, h>>::e> {}; > +class l { > +public: > + aq(); > +}; > +class ar extern as; > +typedef k at; > +class m { > + virtual bool av(int, unsigned &, at &, int &, g &, bool); > +}; > +class ar { > +public: > + typedef m *aw(const &, int &, const &, const &); > +}; > +struct ax { > + static ay(ar::aw); > +}; > +template struct n { > + n(ar) { ax::ay(ba); } > + static m *ba(const &bb, int &bc, const &bd, const &be) { az(bb, bc, bd, > be); } > +}; > +namespace { > +class G : m { > + unsigned bi(const at &, l &); > + bool av(int, unsigned &, at &, int &, g &, bool); > + > +public: > + G(const, int, const, const) {} > +}; > +} > +bool G::av(int, unsigned &, at &bl, int &, g &, bool) { > + l bo; > + bi(bl, bo); > +} > +o() { n bp(as); } > +namespace { > +enum { bq, br }; > +} > +unsigned G::bi(const at &bl, l &bo) { > + unsigned bs; > + for (char *j;; j += 2) > +switch (*j) { > +case bq: > + bl[bs]; > +case br: > + bo.aq(); > +} > +} > diff --git a/gcc/testsuite/g++.dg/pr81308-2.C > b/gcc/testsuite/g++.dg/pr81308-2.C > new file mode 100644 > index 000..97e3409 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/pr81308-2.C > @@ -0,0 +1,38 @@ > +/* { dg-do compile } */ > +/* { dg-options "-w -O2" } */ > + > +struct A { > + int operator[](int) const {} > +}; > +struct B { > + void m_fn1(); > +}; > +struct C { > + virtual bool m_fn2(int, unsigned &, A &, int &, unsigned long &, bool); > +}; > +template struct D { > + D(int) { MCAsmParserImpl(0, 0, 0, 0); } > +}; > +int a; > +namespace { > +struct F : C { > + bool m_fn2(int, unsigned &, A &, int &, unsigned long &, bool); > + unsigned m_fn3(const A &, B &); > + F(int, int, int, int) {} > +}; > +} > +bool F::m_fn2(int, unsigned &, A &p3, int &, unsigned long &, bool) { > + B b; > + m_fn3(p3, b); > +} > +void fn1() { D(0); } > +unsigned F::m_fn3(const A &p1, B &p2) { > + for (int *p;; p++) > +switch (*p) { > +case 0: > + p1[a]; > +case 1: > + p2.m_fn1(); > +} > +} > + > diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c > index fdec59e..b384e4d 100644 > --- a/gcc/tree-switch-conversion.c > +++ b/gcc/tree-switch-conversion.c > @@ -60,6 +60,10 @@ Software Foundation, 51 Franklin Street, Fifth Floor, > Boston, MA > targetm.case_values_threshold(), or be its own param. */ > #define MAX_CASE_BIT_TESTS 3 > > +/* Track whether or not we have altered the CFG and thus may need to > + cleanup the CFG when complete. */ > +bool cfg_altered; > + > /* Split the basic block at the statement pointed to by GSIP, and insert > a branch to the target basic block of E_TRUE conditional on tree > expression COND. > @@ -1492,7 +1496,7 @@ process_switch (gswitch *swtch) > >/* Group case labels so that we get the right results from the heuristics > that decide on the code generation approach for this switch. */ > - group_case_labels_stmt (swtch); > + cfg_altered |= group_case_labels_stmt (swtch); > >/* If
Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
On Mon, Jan 8, 2018 at 5:47 AM, Jeff Law wrote: > On 01/07/2018 07:20 PM, Bill Schmidt wrote: >> Hi Richard, >> >> Unfortunately, I don't see any way that this will be useful for the ppc >> targets. We don't >> have a way to force resolution of a condition prior to continuing >> speculation, so this >> will just introduce another comparison that we would speculate past. For >> our mitigation >> we will have to introduce an instruction that halts all speculation at that >> point, and place >> it in front of all dangerous loads. I wish it were otherwise. > So could you have an expander for __builtin_load_no_speculate that just > emits the magic insn that halts all speculation and essentially ignores > the additional stuff that __builtin_load_no_speculate might be able to > do on other platforms? I think you at least need to expand the builtin semantically given as designed it might consume the condition entirely in the source code. I also think the user documentation in extend.texi should contain examples on how to actually use the builtin to mitigate the Spectre attack, that is, code before and after using it. And somebody might want to set up a spectre.html page and some NEWS item at some point. Richard. > > jeff
[testsuite] Require stack size for some test-cases
Hi, this patch requires stack size for some test-cases that are currently failing for nvptx with error message: ... nvptx-run: error launching kernel: invalid argument (CUDA_ERROR_INVALID_VALUE, 1) ... Tested on nvptx. Committed. Thanks, - Tom Require stack size for some test-cases 2018-01-08 Tom de Vries * gcc.dg/graphite/interchange-7.c: Add dg-require-stack-size. * gcc.dg/graphite/run-id-1.c: Same. * gcc.dg/tree-ssa/loop-interchange-4.c: Same. --- gcc/testsuite/gcc.dg/graphite/interchange-7.c | 1 + gcc/testsuite/gcc.dg/graphite/run-id-1.c | 1 + gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c | 1 + 3 files changed, 3 insertions(+) diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-7.c b/gcc/testsuite/gcc.dg/graphite/interchange-7.c index 81a6d83..50f7dd7 100644 --- a/gcc/testsuite/gcc.dg/graphite/interchange-7.c +++ b/gcc/testsuite/gcc.dg/graphite/interchange-7.c @@ -1,4 +1,5 @@ /* { dg-require-effective-target size32plus } */ +/* { dg-require-stack-size "8*111*" } */ /* Formerly known as ltrans-8.c */ diff --git a/gcc/testsuite/gcc.dg/graphite/run-id-1.c b/gcc/testsuite/gcc.dg/graphite/run-id-1.c index a58c090..d2fc3c5 100644 --- a/gcc/testsuite/gcc.dg/graphite/run-id-1.c +++ b/gcc/testsuite/gcc.dg/graphite/run-id-1.c @@ -1,5 +1,6 @@ /* { dg-options "-Wl,--stack,12582912" { target *-*-mingw* *-*-cygwin* } } */ /* { dg-require-effective-target size32plus } */ +/* { dg-require-stack-size "4*1000*1000" } */ void abort (void); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c index a919a6c..4e64275 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */ +/* { dg-require-stack-size "8*111*" } */ /* Copied from graphite/interchange-7.c */
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On Mon, Jan 8, 2018 at 2:10 AM, Martin Liška wrote: > On 01/07/2018 11:59 PM, H.J. Lu wrote: >> +static void >> +output_indirect_thunk_function (bool need_bnd_p, int regno) >> +{ >> + char name[32]; >> + tree decl; >> + >> + /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd. */ >> + indirect_thunk_name (name, regno, need_bnd_p); >> + decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL, >> + get_identifier (name), >> + build_function_type_list (void_type_node, NULL_TREE)); >> + DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL, >> +NULL_TREE, void_type_node); >> + TREE_PUBLIC (decl) = 1; >> + TREE_STATIC (decl) = 1; >> + DECL_IGNORED_P (decl) = 1; >> + >> +#if TARGET_MACHO >> + if (TARGET_MACHO) >> +{ >> + switch_to_section (darwin_sections[picbase_thunk_section]); >> + fputs ("\t.weak_definition\t", asm_out_file); >> + assemble_name (asm_out_file, name); >> + fputs ("\n\t.private_extern\t", asm_out_file); >> + assemble_name (asm_out_file, name); >> + putc ('\n', asm_out_file); >> + ASM_OUTPUT_LABEL (asm_out_file, name); >> + DECL_WEAK (decl) = 1; >> +} >> + else >> +#endif >> +if (USE_HIDDEN_LINKONCE) >> + { >> + cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME >> (decl)); >> + >> + targetm.asm_out.unique_section (decl, 0); >> + switch_to_section (get_named_section (decl, NULL, 0)); >> + >> + targetm.asm_out.globalize_label (asm_out_file, name); >> + fputs ("\t.hidden\t", asm_out_file); >> + assemble_name (asm_out_file, name); >> + putc ('\n', asm_out_file); >> + ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl); >> + } >> +else >> + { >> + switch_to_section (text_section); >> + ASM_OUTPUT_LABEL (asm_out_file, name); >> + } >> + >> + DECL_INITIAL (decl) = make_node (BLOCK); >> + current_function_decl = decl; >> + allocate_struct_function (decl, false); >> + init_function_start (decl); >> + /* We're about to hide the function body from callees of final_* by >> + emitting it directly; tell them we're a thunk, if they care. */ >> + cfun->is_thunk = true; >> + first_function_block_is_cold = false; >> + /* Make sure unwind info is emitted for the thunk if needed. */ >> + final_start_function (emit_barrier (), asm_out_file, 1); >> + >> + output_indirect_thunk (need_bnd_p, regno); >> + >> + final_end_function (); >> + init_insn_lengths (); >> + free_after_compilation (cfun); >> + set_cfun (NULL); >> + current_function_decl = NULL; >> +} >> + > > I'm wondering whether thunk creation can be a good target-independent > generalization? I guess > we can emit the function declaration without direct writes to asm_out_file? > And the emission > of function body can be potentially a target hook? > > What about emitting body of the function with RTL instructions instead of > direct assembly write? > My knowledge of RTL is quite small, but maybe it can bring some > generalization and reusability > for other targets? Thunks are x86 specific and they are created the same way as 32-bit PIC thunks. I don't see how a target hook is used. -- H.J.
Re: [PATCH 3/5] x86: Add -mfunction-return=
On Mon, Jan 8, 2018 at 1:56 AM, Martin Liška wrote: > On 01/07/2018 11:59 PM, H.J. Lu wrote: >> Function return thunk is the same as memory thunk for -mindirect-branch= >> where the return address is at the top of the stack: >> >> __x86_return_thunk: >> call L2 >> L1: >> lfence >> jmp L1 >> L2: >> lea 8(%rsp), %rsp|lea 4(%esp), %esp >> ret >> >> and function return becomes >> >> jmp __x86_return_thunk > > Hello. > > Can you please explain more usage of the option? Is to prevent a speculative > execution of 'ret' instruction (which is an indirect call), as described in > [1]? > The paper mentions that return stack predictors are commonly implemented in > some form. > Looks that current version of Linux patches does not use the option. > This option is requested by Linux kernel people. It may be used in the future. -- H.J.
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote: > > I'm wondering whether thunk creation can be a good target-independent > > generalization? I guess > > we can emit the function declaration without direct writes to asm_out_file? > > And the emission > > of function body can be potentially a target hook? > > > > What about emitting body of the function with RTL instructions instead of > > direct assembly write? > > My knowledge of RTL is quite small, but maybe it can bring some > > generalization and reusability > > for other targets? > > Thunks are x86 specific and they are created the same way as 32-bit PIC > thunks. > I don't see how a target hook is used. Talking about PIC thunks, those have I believe . character in their symbols, so that they can't be confused with user functions. Any reason these retpoline thunks aren't? Jakub
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek wrote: > On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote: >> > I'm wondering whether thunk creation can be a good target-independent >> > generalization? I guess >> > we can emit the function declaration without direct writes to >> > asm_out_file? And the emission >> > of function body can be potentially a target hook? >> > >> > What about emitting body of the function with RTL instructions instead of >> > direct assembly write? >> > My knowledge of RTL is quite small, but maybe it can bring some >> > generalization and reusability >> > for other targets? >> >> Thunks are x86 specific and they are created the same way as 32-bit PIC >> thunks. >> I don't see how a target hook is used. > > Talking about PIC thunks, those have I believe . character in their symbols, > so that they can't be confused with user functions. Any reason these > retpoline thunks aren't? > They used to have '.'. It was changed at the last minute since kernel needs to export them as regular symbols. -- H.J.
[PATCH] PR 78534 Regression on 32-bit targets
By switching from int to size_t in order to handle larger values, r256322 introduced a bug that manifested itself on 32-bit targets. Fixed by using the correct type to store the result of a next_array_record call. Regtested on x86_64-pc-linux-gnu and i686-pc-linux-gnu, committed to trunk as obvious. libgfortran/ChangeLog: 2018-01-08 Janne Blomqvist PR 78534, bugfix for r256322 * io/transfer.c (next_record_w): Use correct type for return value of next_array_record. --- libgfortran/io/transfer.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c index f9c8696..7e076de 100644 --- a/libgfortran/io/transfer.c +++ b/libgfortran/io/transfer.c @@ -3691,7 +3691,7 @@ next_record_w (st_parameter_dt *dtp, int done) { char *p; /* Internal unit, so must fit in memory. */ - size_t length, m, record; + size_t length, m; size_t max_pos = max_pos_off; if (is_array_io (dtp)) { @@ -3730,14 +3730,16 @@ next_record_w (st_parameter_dt *dtp, int done) memset (p, ' ', length); /* Now that the current record has been padded out, -determine where the next record in the array is. */ - record = next_array_record (dtp, dtp->u.p.current_unit->ls, - &finished); +determine where the next record in the array is. +Note that this can return a negative value, so it +needs to be assigned to a signed value. */ + gfc_offset record = next_array_record + (dtp, dtp->u.p.current_unit->ls, &finished); if (finished) dtp->u.p.current_unit->endfile = AT_ENDFILE; /* Now seek to this record */ - record = record * ((size_t) dtp->u.p.current_unit->recl); + record = record * dtp->u.p.current_unit->recl; if (sseek (dtp->u.p.current_unit->s, record, SEEK_SET) < 0) { -- 2.7.4
Re: [PATCH, rs6000] Add vec_mergee, vec_mergeo, vec_float2 builtin support
Hi Carl, On Mon, Dec 18, 2017 at 04:10:06PM -0800, Carl Love wrote: > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1,3 +1,4 @@ > + > ;; AltiVec patterns. > ;; Copyright (C) 2002-2017 Free Software Foundation, Inc. > ;; Contributed by Aldy Hernandez (a...@quesejoda.com) Please lose this change. > +;; Power8 vector merge two V2DF/V2DI even words to V2DF > +(define_expand "p8_vmrgew_" > + [(use (match_operand:VSX_D 0 "vsx_register_operand" "")) > + (use (match_operand:VSX_D 1 "vsx_register_operand" "")) > + (use (match_operand:VSX_D 2 "vsx_register_operand" ""))] Drop the empty default field please (here and elsewhere, certainly in all expanders). It looks good to me otherwise. Okay for trunk. Thanks! Segher
[PATCH] Fix PR83719
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2018-01-08 Richard Biener PR lto/83719 * dwarf2out.c (output_indirect_strings): Handle empty skeleton_debug_str_hash. (dwarf2out_early_finish): Index strings for -gsplit-dwarf. * gcc.dg/lto/pr83719_0.c: New testcase. Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c (revision 256329) +++ gcc/dwarf2out.c (working copy) @@ -27795,8 +27795,9 @@ output_indirect_strings (void) unsigned int offset = 0; unsigned int cur_idx = 0; - skeleton_debug_str_hash->traverse (DW_FORM_strp); + if (skeleton_debug_str_hash) +skeleton_debug_str_hash->traverse (DW_FORM_strp); switch_to_section (debug_str_offsets_section); debug_str_hash->traverse_noresize @@ -30819,6 +30820,12 @@ dwarf2out_early_finish (const char *file save_macinfo_strings (); + if (dwarf_split_debug_info) +{ + unsigned int index = 0; + debug_str_hash->traverse_noresize (&index); +} + /* Output all of the compilation units. We put the main one last so that the offsets are available to output_pubnames. */ for (limbo_die_node *node = limbo_die_list; node; node = node->next) Index: gcc/testsuite/gcc.dg/lto/pr83719_0.c === --- gcc/testsuite/gcc.dg/lto/pr83719_0.c(nonexistent) +++ gcc/testsuite/gcc.dg/lto/pr83719_0.c(working copy) @@ -0,0 +1,4 @@ +/* { dg-lto-do assemble } */ +/* { dg-lto-options { { -flto -g -gsplit-dwarf } } } */ + +/* Empty. */
[PATCH] Fix PR83685
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2018-01-08 Richard Biener PR tree-optimization/83685 * tree-ssa-pre.c (create_expression_by_pieces): Do not insert references to abnormals. * gcc.dg/torture/pr83685.c: New testcase. Index: gcc/tree-ssa-pre.c === --- gcc/tree-ssa-pre.c (revision 256329) +++ gcc/tree-ssa-pre.c (working copy) @@ -2697,6 +2697,8 @@ create_expression_by_pieces (basic_block that value numbering saw through. */ case NAME: folded = PRE_EXPR_NAME (expr); + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (folded)) + return NULL_TREE; if (useless_type_conversion_p (exprtype, TREE_TYPE (folded))) return folded; break; Index: gcc/testsuite/gcc.dg/torture/pr83685.c === --- gcc/testsuite/gcc.dg/torture/pr83685.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr83685.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ + +int _setjmp (void *); +void foo (int); + +void +bar (int e, int b, char c, void *d) +{ + while (b) +{ + if (_setjmp (d)) + foo (e); + if (c) + { + e--; + foo (0); + } + e++; +} +}
[PATCH] Fix PR83713
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk sofar. Richard. 2018-01-08 Richard Biener PR middle-end/83713 * convert.c (do_narrow): Properly guard TYPE_OVERFLOW_WRAPS checks. * g++.dg/torture/pr83713.C: New testcase. Index: gcc/convert.c === --- gcc/convert.c (revision 256329) +++ gcc/convert.c (working copy) @@ -471,8 +471,10 @@ do_narrow (location_t loc, type in case the operation in outprec precision could overflow. Otherwise, we would introduce signed-overflow undefinedness. */ - || ((!TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0)) - || !TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg1))) + || ((!(INTEGRAL_TYPE_P (TREE_TYPE (arg0)) +&& TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0))) + || !(INTEGRAL_TYPE_P (TREE_TYPE (arg1)) + && TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg1 && ((TYPE_PRECISION (TREE_TYPE (arg0)) * 2u > outprec) || (TYPE_PRECISION (TREE_TYPE (arg1)) * 2u Index: gcc/testsuite/g++.dg/torture/pr83713.C === --- gcc/testsuite/g++.dg/torture/pr83713.C (nonexistent) +++ gcc/testsuite/g++.dg/torture/pr83713.C (working copy) @@ -0,0 +1,12 @@ +// { dg-do compile } + +class a +{ + char b; + void c (); +}; +void +a::c () +{ + &b + ((long long) &b & 0); +}
Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping
Segher Boessenkool wrote: > On Fri, Jan 05, 2018 at 12:22:44PM +, Wilco Dijkstra wrote: >> An example epilog in a shrinkwrapped function before: >> >> ldp x21, x22, [sp,#16] >> ldr x23, [sp,#32] >> ldr x24, [sp,#40] >> ldp x25, x26, [sp,#48] >> ldr x27, [sp,#64] >> ldr x28, [sp,#72] >> ldr x30, [sp,#80] >> ldr d8, [sp,#88] >> ldp x19, x20, [sp],#96 >> ret > > In this example, the compiler already can make a ldp for both x23/x24 and > x27/x28 just fine (if not in emit_epilogue_components, then simply in a > peephole); why did that not work? Or is this not the actual generated > machine code (and there are labels between the insns, for example)? This block originally had a label in it, 2 blocks emitted identical restores and then branched to the final epilog. The final epilogue was then duplicated so we end up with 2 almost identical epilogs of 10 instructions (almost since there were 1-2 unrelated instructions in both blocks). Peepholing is very conservative about instructions using SP and won't touch anything frame related. If this was working better then the backend could just emit single loads/stores and let peepholing generate LDP/STP. However this is not the real issue. In the worst case the current code may only emit LDR and STR. If there are multiple callee-saves in a block, we want to use LDP/STP, and if there is an odd number of registers, we want to add a callee-save from an inner block. Another issue is that after pro_and_epilogue pass I see multiple restores of the same registers and then a branch to the same block. We should try to avoid the unnecessary duplication. Wilco
[testsuite] Xfail ssa-dom-cse-2.c for nvptx
Hi, For nvptx we have: ... FAIL: gcc.dg/tree-ssa/ssa-dom-cse-2.c scan-tree-dump optimized "return 28;" ... The test-case is compiled with -O3, which implies -ftree-loop-vectorize and -ftree-slp-vectorize. I've investigated the test-case on x86_64, and there the test-case fails when specifying -fno-tree-loop-vectorize, but passes again when adding -fno-tree-slp-vectorize. For nvptx, loop vectorization does nothing but slp vectorization manages to do a transformation, which matches the failing scenario on x86_64, and with similar gimple code. So, I think we expect this scan test to fail for nvptx. Tested on nvptx and committed. Thanks, - Tom Xfail ssa-dom-cse-2.c for nvptx 2018-01-08 Tom de Vries * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Xfail scan for nvptx. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c index a660e82..7e88516 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c @@ -25,4 +25,4 @@ foo () but the loop reads only one element at a time, and DOM cannot resolve these. The same happens on powerpc depending on the SIMD support available. */ -/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* hppa*64*-*-* } || { lp64 && { powerpc*-*-* sparc*-*-* riscv*-*-* } } } } } } */ +/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* hppa*64*-*-* nvptx*-*-* } || { lp64 && { powerpc*-*-* sparc*-*-* riscv*-*-* } } } } } } */
Re: [v3 PATCH] Make optional conditionally trivially_{copy,move}_{constructible,assignable}
On 25/12/17 23:59 +0200, Ville Voutilainen wrote: In the midst of the holiday season, the king and ruler of all elves, otherwise known as The Elf, was told by little elves that users are complaining how stlstl and libc++ make optional's copy and move operations conditionally trivial, but libstdc++ doesn't. This made The Elf fairly angry, and he spoke "this will not stand". Tested on Linux-PPC64. The change is an ABI break due to changing optional to a trivially copyable type. It's perhaps better to get that ABI break in now rather than later. Agreed, but a few comments and questions below. @@ -203,6 +200,39 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION this->_M_construct(std::move(__other._M_payload)); } + _Optional_payload& + operator=(const _Optional_payload& __other) + { +if (this->_M_engaged && __other._M_engaged) + this->_M_get() = __other._M_get(); +else + { + if (__other._M_engaged) + this->_M_construct(__other._M_get()); + else + this->_M_reset(); + } + +return *this; + } + + _Optional_payload& + operator=(_Optional_payload&& __other) + noexcept(__and_, + is_nothrow_move_assignable<_Tp>>()) + { + if (this->_M_engaged && __other._M_engaged) + this->_M_get() = std::move(__other._M_get()); + else + { + if (__other._M_engaged) + this->_M_construct(std::move(__other._M_get())); + else + this->_M_reset(); + } + return *this; + } Please make the whitespace before the return statement consistent in these two assignment operators (one has a blank line and uses spaces, one uses a tab). @@ -226,95 +256,86 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _Stored_type(std::forward<_Args>(__args)...); this->_M_engaged = true; } -}; - - // Payload for non-constexpr optionals with trivial destructor. - template -struct _Optional_payload<_Tp, false, true> -{ - constexpr _Optional_payload() - : _M_empty() {} - - template - constexpr _Optional_payload(in_place_t, _Args&&... __args) - : _M_payload(std::forward<_Args>(__args)...), - _M_engaged(true) {} - - template - constexpr _Optional_payload(std::initializer_list<_Up> __il, - _Args&&... __args) - : _M_payload(__il, std::forward<_Args>(__args)...), - _M_engaged(true) {} - constexpr - _Optional_payload(bool __engaged, const _Optional_payload& __other) - : _Optional_payload(__other) - {} - constexpr - _Optional_payload(bool __engaged, _Optional_payload&& __other) - : _Optional_payload(std::move(__other)) - {} + // The _M_get operations have _M_engaged as a precondition. + constexpr _Tp& + _M_get() noexcept + { + return this->_M_payload; + } - constexpr _Optional_payload(const _Optional_payload& __other) + constexpr const _Tp& + _M_get() const noexcept { - if (__other._M_engaged) - this->_M_construct(__other._M_payload); + return this->_M_payload; } - constexpr _Optional_payload(_Optional_payload&& __other) + // _M_reset is a 'safe' operation with no precondition. + void + _M_reset() Should this be noexcept? { - if (__other._M_engaged) - this->_M_construct(std::move(__other._M_payload)); + if (this->_M_engaged) + { + this->_M_engaged = false; + this->_M_payload.~_Stored_type(); + } } + }; This closing brace seems to be indented incorrectly. - using _Stored_type = remove_const_t<_Tp>; - struct _Empty_byte { }; - union { - _Empty_byte _M_empty; - _Stored_type _M_payload; - }; - bool _M_engaged = false; + template +class _Optional_base_impl + { + protected: And thos whole class body should be indented to line up with the "class" keyword. +using _Stored_type = remove_const_t<_Tp>; + +// The _M_construct operation has !_M_engaged as a precondition +// while _M_destruct has _M_engaged as a precondition. +template +void +_M_construct(_Args&&... __args) + noexcept(is_nothrow_constructible<_Stored_type, _Args...>()) +{ + ::new + (std::__addressof(static_cast<_Dp*>(this)->_M_payload._M_payload)) + _Stored_type(std::forward<_Args>(__args)...); + static_cast<_Dp*>(this)->_M_payload._M_engaged = true; +} - template -void -_M_construct(_Args&&... __args) -noexcept(is_nothrow_constructible<_Stored_type, _Args...>()) -{ - ::new ((void *) std::__addressof(this->_M_payload)) -_Stored_type(std::forward<_Args>(__args)...); - this->_M_engaged = true; -} -}; +void +_M_destruct() noexcept? +
[wwwdocs] Add GCC 7.3 section
Index: htdocs/gcc-7/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v retrieving revision 1.96 diff -u -r1.96 changes.html --- htdocs/gcc-7/changes.html 4 Aug 2017 12:44:54 - 1.96 +++ htdocs/gcc-7/changes.html 8 Jan 2018 13:42:20 - @@ -1283,5 +1283,22 @@ double has been added. + +GCC 7.3 + +This is the href="https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=RESOLVED&resolution=FIXED&target_milestone=7.3";>list +of problem reports (PRs) from GCC's bug tracking system that are +known to be fixed in the 7.3 release. This list might not be +complete (that is, it is possible that some PRs that have been fixed +are not listed here). + + +Operating Systems + +RTEMS + + Support for EPIPHANY has been added. + + -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: std::forward_list optim for always equal allocator
On 23/11/17 22:22 +0100, François Dumont wrote: Gentle reminder for this patch. I looked when the constructor got unused and I think it is back in June 2015 in git commit: commit debb6aabb771ed02cb7256a7719555e5fbd7d3f7 Author: redi Date: Wed Jun 17 17:45:45 2015 + * include/bits/forward_list.h (_Fwd_list_base(const _Node_alloc_type&)): Change parameter to rvalue-reference. Hmm, I should have put that same change on the gcc-5-branch too. If you fear abi breaking change I can restore it in a !_GLIBCXX_INLINE_VERSION section. I think if there was a problem here my June 2015 change would already have caused it (when I changed the _Fwd_list_base constructor signatures). So let's assume it's OK to remove the constructor. @@ -533,15 +560,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER /** * @brief The %forward_list move constructor. - * @param __list A %forward_list of identical element and allocator - * types. + * @param A %forward_list of identical element and allocator types. This change is wrong, you can't just remove the parameter name, because now Doxygen will document a parameter called "A" (and complain that there is no such parameter). It would be better to leave the name __list there and just get the warning. Otherwise the patch is OK for trunk (please ensure to update the Copyright dates in the test files to 2018). Thanks.
Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
On 08/01/18 02:20, Bill Schmidt wrote: > Hi Richard, > > Unfortunately, I don't see any way that this will be useful for the ppc > targets. We don't > have a way to force resolution of a condition prior to continuing > speculation, so this > will just introduce another comparison that we would speculate past. For our > mitigation > we will have to introduce an instruction that halts all speculation at that > point, and place > it in front of all dangerous loads. I wish it were otherwise. So can't you make the builtin expand to (in pseudo code): if (bounds_check) { __asm ("barrier"); result = *ptr; } else result = failval; R. > > Thanks, > Bill > >> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw >> wrote: >> >> >> This patch adds generic support for the new builtin >> __builtin_load_no_speculate. It provides the overloading of the >> different access sizes and a default fall-back expansion for targets >> that do not support a mechanism for inhibiting speculation. >> >> * builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): >> New builtin type signature. >> (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. >> (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. >> (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. >> (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. >> * builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin. >> (BUILT_IN_LOAD_NO_SPECULATE_1): Likewise. >> (BUILT_IN_LOAD_NO_SPECULATE_2): Likewise. >> (BUILT_IN_LOAD_NO_SPECULATE_4): Likewise. >> (BUILT_IN_LOAD_NO_SPECULATE_8): Likewise. >> (BUILT_IN_LOAD_NO_SPECULATE_16): Likewise. >> * target.def (inhibit_load_speculation): New hook. >> * doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to >> documentation. >> * doc/tm.texi: Regenerated. >> * doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE. >> * doc/extend.texi: Document __builtin_load_no_speculate. >> * c-family/c-common.c (load_no_speculate_resolve_size): New function. >> (load_no_speculate_resolve_params): New function. >> (load_no_speculate_resolve_return): New function. >> (resolve_overloaded_builtin): Handle overloading >> __builtin_load_no_speculate. >> * builtins.c (expand_load_no_speculate): New function. >> (expand_builtin): Handle new no-speculation builtins. >> * targhooks.h (default_inhibit_load_speculation): Declare. >> * targhooks.c (default_inhibit_load_speculation): New function. >> --- >> gcc/builtin-types.def | 16 + >> gcc/builtins.c | 99 ++ >> gcc/builtins.def| 22 ++ >> gcc/c-family/c-common.c | 164 >> >> gcc/c-family/c-cppbuiltin.c | 5 +- >> gcc/doc/cpp.texi| 4 ++ >> gcc/doc/extend.texi | 53 ++ >> gcc/doc/tm.texi | 6 ++ >> gcc/doc/tm.texi.in | 2 + >> gcc/target.def | 20 ++ >> gcc/targhooks.c | 69 +++ >> gcc/targhooks.h | 3 + >> 12 files changed, 462 insertions(+), 1 deletion(-) >> >> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch> >
Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
> On Jan 7, 2018, at 10:47 PM, Jeff Law wrote: > > On 01/07/2018 07:20 PM, Bill Schmidt wrote: >> Hi Richard, >> >> Unfortunately, I don't see any way that this will be useful for the ppc >> targets. We don't >> have a way to force resolution of a condition prior to continuing >> speculation, so this >> will just introduce another comparison that we would speculate past. For >> our mitigation >> we will have to introduce an instruction that halts all speculation at that >> point, and place >> it in front of all dangerous loads. I wish it were otherwise. > So could you have an expander for __builtin_load_no_speculate that just > emits the magic insn that halts all speculation and essentially ignores > the additional stuff that __builtin_load_no_speculate might be able to > do on other platforms? This is possible, but the builtin documentation is completely misleading for powerpc. We will not provide the semantics that this builtin claims to provide. So at a minimum we would need the documentation to indicate that the additional range-checking is target-specific behavior as well, not just the speculation code. At that point it isn't really a very target-neutral solution. What about other targets? This builtin seems predicated on specific behavior of ARM architecture; I don't know whether other targets have a guaranteed speculation-rectifying conditional test. For POWER, all we would need, or be able to exploit, is void __builtin_speculation_barrier () or some such. If there are two dangerous loads in one block, a single call to this suffices, but a generic solution involving range checks for specific loads would require one per load. Thanks, Bill > > jeff >
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 07, 2018 at 04:36:20PM -0700, Jeff Law wrote: > On 01/07/2018 03:58 PM, H.J. Lu wrote: > > This set of patches for GCC 8 mitigates variant #2 of the speculative > > execution > > vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre. [snip] > My fundamental problem with this patchkit is that it is 100% x86/x86_64 > specific. It's possible that x86 needs spectre variant 2 mitigation that isn't necessary on other modern processors like ARM and PowerPC, so let's not rush into general solutions designed around x86.. Here's a quick overview of Spectre. For more, see https://spectreattack.com/spectre.pdf https://googleprojectzero.blogspot.com.au/2018/01/reading-privileged-memory-with-side.html https://developer.arm.com/-/media/Files/pdf/Cache_Speculation_Side-channels.pdf The simplest example of ideal "gadget" code that can be exploited by an attacker who can control the value of x, perhaps as a parameter to some service provided by the victim is: char *array1, *array2; y = array2[array1[x] * cache_line_size]; The idea being that with the appropriate x, array1[x] can load any byte of interest in the victim, with the array2 load evicting a cache line detectable by the attacker. The value of the byte of interest can then be inferred by which cache line was affected. Typical code of course checks user input. if (x < array1_size) y = array2[array1[x] * cache_line_size]; Spectre variant 1 preloads the branch predictor to make the condition predict as true. Then when the out-of-range value of x is given, speculative execution runs the gadget code affecting the cache. Even though this speculative execution is rolled back, the cache remains affected.. Spectre variant 2 preloads the branch target predictor for indirect branches so that some indirect branch in the victim, eg. a PLT call, speculatively executes gadget code found somewhere in the victim. So, to mitigate Spectre variant 1, ensure that speculative execution doesn't get as far as the array2 load. You could do that by modifying the above code to if (x < array1_size) { /* speculation barrier goes here */ y = array2[array1[x] * cache_line_size]; } But you could also do if (x < array1_size) { tmp = array1[x] * cache_line_size; /* speculation barrier goes here */ y = array2[tmp]; } This has the advantage of killing variant 2 attacks for this gadget too. If you ensure there are no gadgets anywhere, then variant 2 attacks are not possible. Besides compiler changes to prevent gadgets being emitted you also need compiler and linker changes to not emit read-only data in executable segments, because data might just happen to be a gadget when executed. However, x86 has the additional problem of variable length instructions. Gadgets might be hiding in code when executed at an offset from the start of the "real" instructions. Which is why x86 is more at risk from this attack than other processors, and why x86 needs something like the posted variant 2 mitigation, slowing down all indirect branches. -- Alan Modra Australia Development Lab, IBM
[Patch, fortran] PR52162 - Bogus -fcheck=bounds with realloc on assignment to unallocated LHS
I post this patch early last year and did not submit because I was up to my eyeballs with PR34640. I just forgot about it until it came up on clf a few days ago. Bootstraps and regtests on FC23/x86_64 - OK for trunk? Paul 2018-01-08 Paul Thomas PR fortran/52162 * trans-expr.c (gfc_trans_scalar_assign): Flag is_alloc_lhs if the rhs expression is neither an elemental nor a conversion function. 2018-01-08 Paul Thomas PR fortran/52162 * gfortran.dg/bounds_check_19.f90 : New test. Index: gcc/fortran/trans-expr.c === *** gcc/fortran/trans-expr.c(revision 256335) --- gcc/fortran/trans-expr.c(working copy) *** gfc_trans_assignment_1 (gfc_expr * expr1 *** 9924,9932 /* Walk the lhs. */ lss = gfc_walk_expr (expr1); if (gfc_is_reallocatable_lhs (expr1) ! && !(expr2->expr_type == EXPR_FUNCTION !&& expr2->value.function.isym != NULL)) lss->is_alloc_lhs = 1; rss = NULL; if ((expr1->ts.type == BT_DERIVED) --- 9924,9935 /* Walk the lhs. */ lss = gfc_walk_expr (expr1); if (gfc_is_reallocatable_lhs (expr1) ! && !(expr2->expr_type == EXPR_FUNCTION ! && expr2->value.function.isym != NULL ! && !(expr2->value.function.isym->elemental ! || expr2->value.function.isym->conversion))) lss->is_alloc_lhs = 1; + rss = NULL; if ((expr1->ts.type == BT_DERIVED) Index: gcc/testsuite/gfortran.dg/bounds_check_19.f90 === *** gcc/testsuite/gfortran.dg/bounds_check_19.f90 (nonexistent) --- gcc/testsuite/gfortran.dg/bounds_check_19.f90 (working copy) *** *** 0 --- 1,24 + ! { dg-do run } + ! { dg-options "-fbounds-check" } + ! + ! Test the fix for PR52162 in which the elemental and conversion + ! intrinsics in lines 14 and 19 would cause the bounds check to fail. + ! + ! Contributed by Dominique d'Humieres + ! + integer(4), allocatable :: a(:) + integer(8), allocatable :: b(:) + real, allocatable :: c(:) + allocate (b(7:11), source = [7_8,8_8,9_8,10_8,11_8]) + + a = b ! Implicit conversion + + if (lbound (a, 1) .ne. lbound(b, 1)) call abort + if (ubound (a, 1) .ne. ubound(b, 1)) call abort + + c = sin(real(b(9:11))/100_8) ! Elemental intrinsic + + if ((ubound(c, 1) - lbound(c, 1)) .ne. 2) call abort + if (any (nint(asin(c)*100.0) .ne. b(9:11))) call abort + deallocate (a, b, c) + end
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Mon, Jan 8, 2018 at 6:23 AM, Alan Modra wrote: > On Sun, Jan 07, 2018 at 04:36:20PM -0700, Jeff Law wrote: >> On 01/07/2018 03:58 PM, H.J. Lu wrote: >> > This set of patches for GCC 8 mitigates variant #2 of the speculative >> > execution >> > vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre. > [snip] >> My fundamental problem with this patchkit is that it is 100% x86/x86_64 >> specific. > > It's possible that x86 needs spectre variant 2 mitigation that isn't > necessary on other modern processors like ARM and PowerPC, so let's > not rush into general solutions designed around x86.. > > Here's a quick overview of Spectre. For more, see > https://spectreattack.com/spectre.pdf > https://googleprojectzero.blogspot.com.au/2018/01/reading-privileged-memory-with-side.html > https://developer.arm.com/-/media/Files/pdf/Cache_Speculation_Side-channels.pdf > > The simplest example of ideal "gadget" code that can be exploited by > an attacker who can control the value of x, perhaps as a parameter to > some service provided by the victim is: > > char *array1, *array2; > y = array2[array1[x] * cache_line_size]; > > The idea being that with the appropriate x, array1[x] can load any > byte of interest in the victim, with the array2 load evicting a cache > line detectable by the attacker. The value of the byte of interest > can then be inferred by which cache line was affected. > > Typical code of course checks user input. > > if (x < array1_size) > y = array2[array1[x] * cache_line_size]; > > Spectre variant 1 preloads the branch predictor to make the condition > predict as true. Then when the out-of-range value of x is given, > speculative execution runs the gadget code affecting the cache. Even > though this speculative execution is rolled back, the cache remains > affected.. > > Spectre variant 2 preloads the branch target predictor for indirect > branches so that some indirect branch in the victim, eg. a PLT call, > speculatively executes gadget code found somewhere in the victim. > > > So, to mitigate Spectre variant 1, ensure that speculative execution > doesn't get as far as the array2 load. You could do that by modifying > the above code to > > if (x < array1_size) > { > /* speculation barrier goes here */ > y = array2[array1[x] * cache_line_size]; > } > > But you could also do > > if (x < array1_size) > { > tmp = array1[x] * cache_line_size; > /* speculation barrier goes here */ > y = array2[tmp]; > } > > This has the advantage of killing variant 2 attacks for this gadget > too. If you ensure there are no gadgets anywhere, then variant 2 > attacks are not possible. Besides compiler changes to prevent gadgets > being emitted you also need compiler and linker changes to not emit > read-only data in executable segments, because data might just happen > to be a gadget when executed. See: https://sourceware.org/ml/binutils/2017-11/msg00369.html -- H.J.
[PATCH] Fix PR83563
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2018-01-08 Richard Biener PR tree-optimization/83563 * graphite.c (canonicalize_loop_closed_ssa_form): Reset the SCEV cache. * gcc.dg/graphite/pr83563.c: New testcase. Index: gcc/graphite.c === --- gcc/graphite.c (revision 256329) +++ gcc/graphite.c (working copy) @@ -322,6 +323,10 @@ canonicalize_loop_closed_ssa_form (void) FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) canonicalize_loop_closed_ssa (loop); + /* We can end up releasing duplicate exit PHIs and also introduce + additional copies so the cached information isn't correct anymore. */ + scev_reset (); + checking_verify_loop_closed_ssa (true); } Index: gcc/testsuite/gcc.dg/graphite/pr83563.c === --- gcc/testsuite/gcc.dg/graphite/pr83563.c (nonexistent) +++ gcc/testsuite/gcc.dg/graphite/pr83563.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fgraphite -ftree-loop-distribution -fno-tree-dominator-opts -fno-tree-sink -fno-tree-dce" } */ + +void +sy (void) +{ + int hb; + + for (hb = 1; hb != 0; hb += hb) +{ +} + + while (hb < 1) +++hb; +}
[PR83663] Revert r255946
Hello, This patch reverts the changes introduced by r255946 and further changes to that done by r256195, as the former causes large number of regressions on aarch64_be* targets. This should be respun with the mismatch in lane numbering in AArch64 and GCC's numbering fixed as explained in PR83663. OK for trunk? VP. ChangeLog: gcc/ PR target/83663 - Revert r255946 * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code generation for cases where splatting a value is not useful. * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge across a vec_duplicate and a paradoxical subreg forming a vector mode to a vec_concat. gcc/testsuite/ PR target/83663 - Revert r255946 * gcc.target/aarch64/vect-slp-dup.c: New. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index a189605..03a92b6 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -12129,51 +12129,9 @@ aarch64_expand_vector_init (rtx target, rtx vals) maxv = matches[i][1]; } - /* Create a duplicate of the most common element, unless all elements - are equally useless to us, in which case just immediately set the - vector register using the first element. */ - - if (maxv == 1) - { - /* For vectors of two 64-bit elements, we can do even better. */ - if (n_elts == 2 - && (inner_mode == E_DImode - || inner_mode == E_DFmode)) - - { - rtx x0 = XVECEXP (vals, 0, 0); - rtx x1 = XVECEXP (vals, 0, 1); - /* Combine can pick up this case, but handling it directly - here leaves clearer RTL. - - This is load_pair_lanes, and also gives us a clean-up - for store_pair_lanes. */ - if (memory_operand (x0, inner_mode) - && memory_operand (x1, inner_mode) - && !STRICT_ALIGNMENT - && rtx_equal_p (XEXP (x1, 0), - plus_constant (Pmode, - XEXP (x0, 0), - GET_MODE_SIZE (inner_mode - { - rtx t; - if (inner_mode == DFmode) - t = gen_load_pair_lanesdf (target, x0, x1); - else - t = gen_load_pair_lanesdi (target, x0, x1); - emit_insn (t); - return; - } - } - rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0)); - aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode)); - maxelement = 0; - } - else - { - rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement)); - aarch64_emit_move (target, gen_vec_duplicate (mode, x)); - } + /* Create a duplicate of the most common element. */ + rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement)); + aarch64_emit_move (target, gen_vec_duplicate (mode, x)); /* Insert the rest. */ for (int i = 0; i < n_elts; i++) diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 6cb5a6e..b052fbb 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -5888,57 +5888,6 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode, return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1); } - /* Replace: - - (vec_merge:outer (vec_duplicate:outer x:inner) - (subreg:outer y:inner 0) - (const_int N)) - - with (vec_concat:outer x:inner y:inner) if N == 1, - or (vec_concat:outer y:inner x:inner) if N == 2. - We assume that degenrate cases (N == 0 or N == 3), which - represent taking all elements from either input, are handled - elsewhere. - - Implicitly, this means we have a paradoxical subreg, but such - a check is cheap, so make it anyway. - - Only applies for vectors of two elements. */ - - if ((GET_CODE (op0) == VEC_DUPLICATE - || GET_CODE (op1) == VEC_DUPLICATE) - && GET_MODE (op0) == GET_MODE (op1) - && known_eq (GET_MODE_NUNITS (GET_MODE (op0)), 2) - && known_eq (GET_MODE_NUNITS (GET_MODE (op1)), 2) - && IN_RANGE (sel, 1, 2)) - { - rtx newop0 = op0, newop1 = op1; - - /* Canonicalize locally such that the VEC_DUPLICATE is always - the first operand. */ - if (GET_CODE (newop1) == VEC_DUPLICATE) - { - std::swap (newop0, newop1); - /* If we swap the operand order, we also need to swap - the selector mask. */ - sel = sel == 1 ? 2 : 1; - } - - if (GET_CODE (newop1) == SUBREG - && paradoxical_subreg_p (newop1) - && subreg_lowpart_p (newop1) - && GET_MODE (SUBREG_REG (newop1)) - == GET_MODE (XEXP (newop0, 0))) - { - newop0 = XEXP (newop0, 0); - newop1 = SUBREG_REG (newop1); - if (sel == 2) - std::swap (newop0, newop1); - return simplify_gen_binary (VEC_CONCAT, mode, - newop0, newop1); - } - } - /* Replace (vec_merge (vec_duplicate x) (vec_duplicate y) (const_int n)) with (vec_concat x y) or (vec_concat y x) depending on value diff --git a/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c b/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c deleted file mode 100644 index 0541
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote: > See: > > https://sourceware.org/ml/binutils/2017-11/msg00369.html Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x00 0x 0x 0x00200 0x00200 R 0x20 LOAD 0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x20 LOAD 0x001000 0x00201000 0x00201000 0x00058 0x00058 R 0x20 LOAD 0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW 0x20 DYNAMIC0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW 0x4 GNU_STACK 0x00 0x 0x 0x0 0x0 RW 0x10 GNU_RELRO 0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R 0x1 Uh, 3 read-only LOADs instead of 2? Shouldn't then all the read-only non-executable sections be emitted together, so that you have a R, then R E, then RW PT_LOADs? Jakub
Re: [PATCH] rs6000: Cleanup bdz/bdnz insn/splitter, add new insn/splitter for bdzt/bdzf/bdnzt/bdnzf
On Fri, 2017-12-01 at 16:45 -0600, Segher Boessenkool wrote: > Looks good otherwise. I'll ok it when there is a user (or a > testcase). > It shouldn't go in before the canonicalize_condition patch, of > course. The canonicalize_condition patch is in, so I have checked in this cleanup and addition to the patterns and splitters for the branch decrement instructions as 256344. 2018-01-08 Aaron Sawdey * config/rs6000/rs6000.md (cceq_ior_compare): Remove * so I can use it to generate rtl. (cceq_ior_compare_complement): Give it a name so I can use it, and change boolean_or_operator predicate to boolean_operator so it can be used to generate a crand. (eqne): New code iterator. (bd/bd_neg): New code_attrs. (_): New name for ctr_internal[12] now combined into a single define_insn. (tf_): A new insn pattern for the conditional form branch decrement (bdnzt/bdnzf/bdzt/bdzf). * config/rs6000/rs6000.c (rs6000_legitimate_combined_insn): Updated with the new names of the branch decrement patterns, and added the names of the branch decrement conditional patterns. -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 256216) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -12797,7 +12797,7 @@ ; which are generated by the branch logic. ; Prefer destructive operations where BT = BB (for crXX BT,BA,BB) -(define_insn "*cceq_ior_compare" +(define_insn "cceq_ior_compare" [(set (match_operand:CCEQ 0 "cc_reg_operand" "=y,?y") (compare:CCEQ (match_operator:SI 1 "boolean_operator" [(match_operator:SI 2 @@ -12817,9 +12817,9 @@ ; Why is the constant -1 here, but 1 in the previous pattern? ; Because ~1 has all but the low bit set. -(define_insn "" +(define_insn "cceq_ior_compare_complement" [(set (match_operand:CCEQ 0 "cc_reg_operand" "=y,?y") -(compare:CCEQ (match_operator:SI 1 "boolean_or_operator" +(compare:CCEQ (match_operator:SI 1 "boolean_operator" [(not:SI (match_operator:SI 2 "branch_positive_comparison_operator" [(match_operand 3 @@ -13036,34 +13036,13 @@ ;; rs6000_legitimate_combined_insn prevents combine creating any of ;; the ctr insns. -(define_insn "ctr_internal1" - [(set (pc) - (if_then_else (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b") - (const_int 1)) - (label_ref (match_operand 0)) - (pc))) - (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l") - (plus:P (match_dup 1) - (const_int -1))) - (clobber (match_scratch:CC 3 "=X,&x,&x,&x")) - (clobber (match_scratch:P 4 "=X,X,&r,r"))] - "" -{ - if (which_alternative != 0) -return "#"; - else if (get_attr_length (insn) == 4) -return "bdnz %l0"; - else -return "bdz $+8\;b %l0"; -} - [(set_attr "type" "branch") - (set_attr "length" "*,16,20,20")]) +(define_code_iterator eqne [eq ne]) +(define_code_attr bd [(eq "bdz") (ne "bdnz")]) +(define_code_attr bd_neg [(eq "bdnz") (ne "bdz")]) -;; Similar but use EQ - -(define_insn "ctr_internal2" +(define_insn "_" [(set (pc) - (if_then_else (eq (match_operand:P 1 "register_operand" "c,*b,*b,*b") + (if_then_else (eqne (match_operand:P 1 "register_operand" "c,*b,*b,*b") (const_int 1)) (label_ref (match_operand 0)) (pc))) @@ -13077,15 +13056,14 @@ if (which_alternative != 0) return "#"; else if (get_attr_length (insn) == 4) -return "bdz %l0"; +return " %l0"; else -return "bdnz $+8\;b %l0"; +return " $+8\;b %l0"; } [(set_attr "type" "branch") (set_attr "length" "*,16,20,20")]) -;; Now the splitters if we could not allocate the CTR register - +;; Now the splitter if we could not allocate the CTR register (define_split [(set (pc) (if_then_else (match_operator 2 "comparison_operator" @@ -13093,19 +13071,13 @@ (const_int 1)]) (match_operand 5) (match_operand 6))) - (set (match_operand:P 0 "int_reg_operand") + (set (match_operand:P 0 "nonimmediate_operand") (plus:P (match_dup 1) (const_int -1))) (clobber (match_scratch:CC 3)) (clobber (match_scratch:P 4))] "reload_completed" - [(set (match_dup 3) - (compare:CC (match_dup 1) - (const_int 1))) - (set (match_dup 0) - (plus:P (match_dup 1) - (const_int -1))) - (set (pc) + [(set (pc) (if_then_else (match_dup 7) (match_dup 5) (match_dup 6)))] @@ -13112,37 +13084,124 @@ { operands[7] = gen_rtx_fmt_ee (GET_CODE (operands[2]), VOIDmode, operands[3], const0_rtx); + emit_insn (gen_rtx_SET (operands[3], + gen_rtx_COMPARE (CCmode, operands[1], const1_rtx))); + if (gpc_reg_operand (operands[0], mode)) +emit_insn (gen_add3 (
[PATCH][arm] Add -march=armv8.3-a and dotprod multilib selection rules
Hi all, We don't have the t-aprofile, t-multilib and t-arm-elf mapping rules for multilibs when using the variants of -march=armv8.3-a and the dotproduct extension. This patch adds them. -march=armv8.3-a behaves in the same way as -march=armv8.2-a in this regard. Bootstrapped and tested with the aprofile multilib list. Checked that --print-multi-directory gives sensible results with armv8.3-a options and extensions. I've also added some armv8.3-a, fp16 and dotprod combination tests to multilib.exp Committing to trunk. Thanks, Kyrill 2018-01-08 Kyrylo Tkachov * config/arm/t-aprofile (MULTILIB_MATCHES): Add mapping rules for -march=armv8.3-a variants. * config/arm/t-multilib: Likewise. * config/arm/t-arm-elf: Likewise. Handle dotprod extension. 2018-01-08 Kyrylo Tkachov * gcc.target/arm/multilib.exp: Add fp16, dotprod and armv8.3-a combination tests. commit 1ddc344d07ed643926e0f91c8467b3f0973483c0 Author: Kyrylo Tkachov Date: Mon Dec 18 19:57:22 2017 + [arm] Add -march=armv8.3-a and dotprod multilib selection rules diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile index 3e92c62..6c34c09 100644 --- a/gcc/config/arm/t-aprofile +++ b/gcc/config/arm/t-aprofile @@ -88,9 +88,13 @@ MULTILIB_MATCHES += $(foreach ARCH, $(v8_1_a_simd_variants), \ # Baseline v8.2-a: map down to baseline v8-a MULTILIB_MATCHES += march?armv8-a=march?armv8.2-a -# Map all v8.2-a SIMD variants to v8-a+simd +# Baseline v8.3-a: map down to baseline v8-a +MULTILIB_MATCHES += march?armv8-a=march?armv8.3-a + +# Map all v8.2-a and v8.3-a SIMD variants to v8-a+simd MULTILIB_MATCHES += $(foreach ARCH, $(v8_2_a_simd_variants), \ - march?armv8-a+simd=march?armv8.2-a$(ARCH)) + march?armv8-a+simd=march?armv8.2-a$(ARCH) \ + march?armv8-a+simd=march?armv8.3-a$(ARCH)) # Use Thumb libraries for everything. diff --git a/gcc/config/arm/t-arm-elf b/gcc/config/arm/t-arm-elf index ace4736..189ab9b 100644 --- a/gcc/config/arm/t-arm-elf +++ b/gcc/config/arm/t-arm-elf @@ -36,7 +36,7 @@ v7ve_fps := vfpv3-d16 vfpv3 vfpv3-d16-fp16 vfpv3-fp16 vfpv4 neon \ # Not all these permutations exist for all architecture variants, but # it seems to work ok. -v8_fps := simd fp16 crypto fp16+crypto +v8_fps := simd fp16 crypto fp16+crypto dotprod # We don't do anything special with these. Pre-v4t probably doesn't work. all_early_nofp := armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5t @@ -46,7 +46,7 @@ all_early_arch := armv5e armv5tej armv6 armv6j armv6k armv6z armv6kz \ all_v7_a_r := armv7-a armv7ve armv7-r -all_v8_archs := armv8-a armv8-a+crc armv8.1-a armv8.2-a +all_v8_archs := armv8-a armv8-a+crc armv8.1-a armv8.2-a armv8.3-a # No floating point variants, require thumb1 softfp all_nofp_t := armv6-m armv6s-m armv8-m.base diff --git a/gcc/config/arm/t-multilib b/gcc/config/arm/t-multilib index 58448a9..c0ca255 100644 --- a/gcc/config/arm/t-multilib +++ b/gcc/config/arm/t-multilib @@ -139,9 +139,13 @@ MULTILIB_MATCHES += $(foreach ARCH, $(v8_1_a_simd_variants), \ # Baseline v8.2-a: map down to baseline v8-a MULTILIB_MATCHES += march?armv7=march?armv8.2-a -# Map all v8.2-a SIMD variants +# Baseline v8.3-a: map down to baseline v8-a +MULTILIB_MATCHES += march?armv7=march?armv8.3-a + +# Map all v8.2-a SIMD variants. v8.3-a SIMD variants have the same mappings MULTILIB_MATCHES += $(foreach ARCH, $(v8_2_a_simd_variants), \ - march?armv7+fp=march?armv8.2-a$(ARCH)) + march?armv7+fp=march?armv8.2-a$(ARCH) \ + march?armv7+fp=march?armv8.3-a$(ARCH)) # Use Thumb libraries for everything. diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp b/gcc/testsuite/gcc.target/arm/multilib.exp index 5de87c4..c3c8e1f 100644 --- a/gcc/testsuite/gcc.target/arm/multilib.exp +++ b/gcc/testsuite/gcc.target/arm/multilib.exp @@ -72,6 +72,26 @@ if {[multilib_config "aprofile"] } { {-march=armv8.2-a+simd+crypto -mfloat-abi=softfp} "thumb/v8-a+simd/softfp" {-march=armv8.2-a+simd+crypto+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp" {-march=armv8.2-a+simd+nofp+crypto -mfloat-abi=softfp} "thumb/v8-a+simd/softfp" + {-march=armv8.2-a+fp16 -mfloat-abi=soft} "thumb/v8-a/nofp" + {-march=armv8.2-a+simd+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp" + {-march=armv8.2-a+simd+fp16+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp" + {-march=armv8.2-a+simd+nofp+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp" + {-march=armv8.2-a+dotprod -mfloat-abi=soft} "thumb/v8-a/nofp" + {-march=armv8.2-a+simd+dotprod -mfloat-abi=softfp} "thumb/v8-a+simd/softfp" + {-march=armv8.2-a+simd+dotprod+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp" + {-march=armv8.2-a+simd+nofp+dotprod -mfloat-abi=softfp} "thumb/v8-a+simd/softfp" + {-march=armv8.3-a+crypto -mfloat-abi=soft} "thumb/v8-a/nofp" + {-march=armv8.3-a+simd+crypto -mfloat-abi=softfp} "thumb/v8-a+simd/softfp" + {-march=armv8.3-a+simd+crypto+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp" + {-march=armv8.3-a+simd+nofp+crypto -mfloat-abi=softfp}
RE: [PATCH 00/10][ARC] Critical fixes
> [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV. > [ARC] Don't allow the last ZOL insn to be in a delay slot. > [ARC] Add trap instruction. > [ARC] Update legitimate constant hook. > [ARC] Enable unaligned access. > [ARC] Revamp trampoline implementation. > [ARC][ZOL] Update uses for hw-loop labels. > [ARC] Add ARCv2 core3 tune option. > [ARC][FIX] Consider command line ffixed- option. > [ARC] Update (u)maddsidi patterns. Hi Andrew, Thank you for reviewing this batch of fixes. Any chance to check also these ones, they are hanging there for a long time now: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00078.html https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00081.html https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00080.html https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00079.html https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00084.html https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00083.html https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00082.html Thank you, Claudiu
Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
On 08/01/18 14:19, Bill Schmidt wrote: > >> On Jan 7, 2018, at 10:47 PM, Jeff Law wrote: >> >> On 01/07/2018 07:20 PM, Bill Schmidt wrote: >>> Hi Richard, >>> >>> Unfortunately, I don't see any way that this will be useful for the ppc >>> targets. We don't >>> have a way to force resolution of a condition prior to continuing >>> speculation, so this >>> will just introduce another comparison that we would speculate past. For >>> our mitigation >>> we will have to introduce an instruction that halts all speculation at that >>> point, and place >>> it in front of all dangerous loads. I wish it were otherwise. >> So could you have an expander for __builtin_load_no_speculate that just >> emits the magic insn that halts all speculation and essentially ignores >> the additional stuff that __builtin_load_no_speculate might be able to >> do on other platforms? > > This is possible, but the builtin documentation is completely misleading for > powerpc. We will not provide the semantics that this builtin claims to > provide. > So at a minimum we would need the documentation to indicate that the > additional > range-checking is target-specific behavior as well, not just the speculation > code. > At that point it isn't really a very target-neutral solution. > > What about other targets? This builtin seems predicated on specific behavior > of ARM architecture; I don't know whether other targets have a guaranteed > speculation-rectifying conditional test. > > For POWER, all we would need, or be able to exploit, is > > void __builtin_speculation_barrier () > > or some such. If there are two dangerous loads in one block, a single call > to this suffices, but a generic solution involving range checks for specific > loads would require one per load. > Do you have any data to suggest that multiple /independent/ vulnerable accesses occur under a single guarding condition more often than 'once in a blue moon'? It seems to me that would be highly unlikely. R.
Re: [PATCH 0/3] Add __builtin_load_no_speculate
Hi Guys, It seems to me that it might be worth taking a step back here, and consider adding a security framework to gcc. Mitigations for CVEs in the past have resulted in individual patches being added to gcc, oftern in a target specific manner, and with no real framework to support them, document them, or indicate to an external tool that they have been applied. In addition security fixes often result in the generation of less optimal code, and so it might be useful to have a way to tell other parts of gcc that a given particular sequence should not be altered. Not that I am an expert in this area, but I do think that it is something that should be discussed... Cheers Nick
Re: [PATCH, rs6000] Fix PR83677 (incorrect generation of xxpermr)
Hi! On Thu, Jan 04, 2018 at 08:16:06AM -0600, Bill Schmidt wrote: > https://gcc.gnu.org/PR83677 reports that generation of xxpermr is always > wrong. It effectively inverts the order of the two input registers from > what they should be. This patch addresses that and provides a test case > modified from the original report. How confusing. Maybe it would be easier to read if the operands number 1 and 2 in the pattern had swapped numbers. > Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. > Is this okay for trunk and shortly for backport to GCC 7? I will check > on 6, but I'm pretty certain this was introduced in 7, as 6 has only > minimal POWER9 support. Okay for trunk and all branches where it is needed. Thanks! One minor testcase thingie: > --- gcc/testsuite/gcc.target/powerpc/pr83677.c(nonexistent) > +++ gcc/testsuite/gcc.target/powerpc/pr83677.c(working copy) > @@ -0,0 +1,166 @@ > +/* { dg-do run { target { powerpc64*-*-* && { lp64 && p9vector_hw } } } } */ powerpc*-*-* please; or why would that not work? Segher
Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) wrote: > > On 08/01/18 02:20, Bill Schmidt wrote: >> Hi Richard, >> >> Unfortunately, I don't see any way that this will be useful for the ppc >> targets. We don't >> have a way to force resolution of a condition prior to continuing >> speculation, so this >> will just introduce another comparison that we would speculate past. For >> our mitigation >> we will have to introduce an instruction that halts all speculation at that >> point, and place >> it in front of all dangerous loads. I wish it were otherwise. > > So can't you make the builtin expand to (in pseudo code): > > if (bounds_check) > { > __asm ("barrier"); > result = *ptr; > } >else > result = failval; Could, but this just generates unnecessary code for Power. We would instead generate __asm ("barrier"); result = *ptr; without any checks. We would ignore everything but the first argument. Thanks, Bill > > R. > >> >> Thanks, >> Bill >> >>> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw >>> wrote: >>> >>> >>> This patch adds generic support for the new builtin >>> __builtin_load_no_speculate. It provides the overloading of the >>> different access sizes and a default fall-back expansion for targets >>> that do not support a mechanism for inhibiting speculation. >>> >>> * builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): >>> New builtin type signature. >>> (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. >>> (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. >>> (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. >>> (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. >>> * builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin. >>> (BUILT_IN_LOAD_NO_SPECULATE_1): Likewise. >>> (BUILT_IN_LOAD_NO_SPECULATE_2): Likewise. >>> (BUILT_IN_LOAD_NO_SPECULATE_4): Likewise. >>> (BUILT_IN_LOAD_NO_SPECULATE_8): Likewise. >>> (BUILT_IN_LOAD_NO_SPECULATE_16): Likewise. >>> * target.def (inhibit_load_speculation): New hook. >>> * doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to >>> documentation. >>> * doc/tm.texi: Regenerated. >>> * doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE. >>> * doc/extend.texi: Document __builtin_load_no_speculate. >>> * c-family/c-common.c (load_no_speculate_resolve_size): New function. >>> (load_no_speculate_resolve_params): New function. >>> (load_no_speculate_resolve_return): New function. >>> (resolve_overloaded_builtin): Handle overloading >>> __builtin_load_no_speculate. >>> * builtins.c (expand_load_no_speculate): New function. >>> (expand_builtin): Handle new no-speculation builtins. >>> * targhooks.h (default_inhibit_load_speculation): Declare. >>> * targhooks.c (default_inhibit_load_speculation): New function. >>> --- >>> gcc/builtin-types.def | 16 + >>> gcc/builtins.c | 99 ++ >>> gcc/builtins.def| 22 ++ >>> gcc/c-family/c-common.c | 164 >>> >>> gcc/c-family/c-cppbuiltin.c | 5 +- >>> gcc/doc/cpp.texi| 4 ++ >>> gcc/doc/extend.texi | 53 ++ >>> gcc/doc/tm.texi | 6 ++ >>> gcc/doc/tm.texi.in | 2 + >>> gcc/target.def | 20 ++ >>> gcc/targhooks.c | 69 +++ >>> gcc/targhooks.h | 3 + >>> 12 files changed, 462 insertions(+), 1 deletion(-) >>> >>> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch> >> >
Re: C++ PATCH to add a test for c++/81860
Hi Nathan, > On 01/02/2018 09:36 AM, Marek Polacek wrote: >> This test exercising inheriting a template constructor in this PR got >> fixed with >> r251426. As I don't see any lambdas here, I thought it worth to add it. >> >> Tested on x86_64-linux, ok for trunk? >> >> 2018-01-02 Marek Polacek >> >> PR c++/81860 >> * g++.dg/cpp0x/inh-ctor30.C: New test. >> > > yes thanks this test FAILs on a couple of targets: i386-pc-solaris2.1[01], sparc-sun-solaris2.11, powerpc-ibm-aix7.2.0.0, x86_64-apple-darwin15.6.0. The former two have _ZN1AIjEC1Ev instead of _ZN1AIjEC2Ev which demangle the same. Should it accept both? Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH 0/3] Add __builtin_load_no_speculate
I thought about your new builtin again, and I wonder if something like that might work as well? cat despec.s .arch armv7-a .eabi_attribute 28, 1 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 4 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .section.text.startup,"ax",%progbits .align 2 .global despec_i .syntax unified .arm .fpu vfpv3-d16 despec_i: cmp r0,#0 beq L0 ldr r0,[r1] moveq r0,r2 nop {0x14} @ CSDB str r0,[r1] mov r0,#1 bx lr L0: mov r0,#0 bx lr cat test.c extern int despec_i(int predicate, int *untrusted, int fallback); #define N 8 int a[N] = {1,2,3,4,5,7,8,9}; int a2[0x200]; int test(int untrust) { int x = 0; if (despec_i(untrust >= 0 && untrust < N, &untrust, 0)) { int v = a[untrust] & 0x1 ? 0x100 : 0x0; x = a2[v]; } return x; } So this should feed the predicate through the builtin, and clear the untrusted value when the condition has been been mis-predicted. Wouldn't that be more flexible to use? Or am I missing something? As a side note: I noticed that "nop {0x14}" seems to produce the correct assembler opcode without the need for using a .insn code. Bernd.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Mon, Jan 8, 2018 at 7:06 AM, Jakub Jelinek wrote: > On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote: >> See: >> >> https://sourceware.org/ml/binutils/2017-11/msg00369.html > > Program Headers: > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > LOAD 0x00 0x 0x 0x00200 0x00200 R 0x20 > LOAD 0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x20 > LOAD 0x001000 0x00201000 0x00201000 0x00058 0x00058 R 0x20 > LOAD 0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW 0x20 > DYNAMIC0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW 0x4 > GNU_STACK 0x00 0x 0x 0x0 0x0 RW 0x10 > GNU_RELRO 0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R 0x1 > > Uh, 3 read-only LOADs instead of 2? Shouldn't then all the read-only > non-executable sections be emitted together, so that you have a R, then R E, > then RW PT_LOADs? It is done on purpose since the second RO segment will be merged with the RELRO segment at load time: Elf file type is EXEC (Executable file) Entry point 0x401ea0 There are 11 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x34 0x00400034 0x00400034 0x00160 0x00160 R 0x4 INTERP 0x000194 0x00400194 0x00400194 0x0001a 0x0001a R 0x1 [Requesting program interpreter: /libx32/ld-linux-x32.so.2] LOAD 0x00 0x0040 0x0040 0x0037c 0x0037c R 0x1000 LOAD 0x000e68 0x00401e68 0x00401e68 0x00195 0x00195 R E 0x1000 LOAD 0x001000 0x00402000 0x00402000 0x00124 0x00124 R 0x1000 LOAD 0x001ef0 0x00402ef0 0x00402ef0 0x00134 0x00138 RW 0x1000 DYNAMIC0x001ef8 0x00402ef8 0x00402ef8 0x000f8 0x000f8 RW 0x4 NOTE 0x0001b0 0x004001b0 0x004001b0 0x00044 0x00044 R 0x4 GNU_EH_FRAME 0x001008 0x00402008 0x00402008 0x00034 0x00034 R 0x4 GNU_STACK 0x00 0x 0x 0x0 0x0 RW 0x10 GNU_RELRO 0x001ef0 0x00402ef0 0x00402ef0 0x00110 0x00110 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 03 .init .plt .text .fini 04 .rodata .eh_frame_hdr .eh_frame 05 .init_array .fini_array .dynamic .got .got.plt .data .bss 06 .dynamic 07 .note.ABI-tag .note.gnu.build-id 08 .eh_frame_hdr 09 10 .init_array .fini_array .dynamic .got -- H.J.
Re: [PATCH], Add optional IEEE/IBM long double multilib support
On Thu, Jan 04, 2018 at 06:05:55PM -0500, Michael Meissner wrote: > This patch is the beginning step to switching the PowerPC long double support > from IBM extended double to IEEE 128-bit floating point on PowerPC servers. > It > will be necessary to have this patch or a similar patch to allow the GLIBC > team > to begin their modifications in GLIBC 2.28, so that by the time GCC 9 comes > out, we can decide to switch the default. It is likely, the default will only > be switched on the 64-bit little endian PowerPC systems, when a distribution > goes through a major level, such that they can contemplate major changes. I would hope the default changes for BE systems at the same time (at least those with VSX, but ideally *all*). > If you do not use the configuration option --with-long-double-format=ieee or > --with-long-double-format=ibm, the system will not build multilibs, and just > build normal libraries with the default set to IBM extended double. If you do > use either of the switches, and allow multilibs, it will build two sets of > multilibs, one for -mabi=ieeelongdouble and one for -mabi=ibmlongdouble. Huh. Why not always, then? There already is an option to turn off multilibs, for people who really really want that. Segher
[patch,avr] Implement PR83737
This PR skips saving of any registers in main. Attribute OS_main can do this as well, however we can just drop any saves / restores in all optimized compilation -- not even the test suite needs these saves. The feature can still be switched off by new -mno-OS_main Ok for trunk? gcc/ Don't save registers in main(). PR target/83737 * doc/invoke.texi (AVR Options) [-mOS_main]: Document it. * config/avr/avr.opt (-mOS_main): New target option. * config/avr/avr.c (avr_in_main_p): New static function. (avr_regs_to_save) [avr_in_main_p]: Return 0. (avr_prologue_setup_frame): Don't save any regs if avr_in_main_p. (avr_expand_epilogue): Same. * common/config/avr/avr-common.c (avr_option_optimization_table): Switch on -mOS_main for optimizing compilations. Index: common/config/avr/avr-common.c === --- common/config/avr/avr-common.c (revision 256338) +++ common/config/avr/avr-common.c (working copy) @@ -31,6 +31,7 @@ static const struct default_options avr_ // a frame without need when it tries to be smart around calls. { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 }, { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mgas_isr_prologues, NULL, 1 }, +{ OPT_LEVELS_1_PLUS, OPT_mOS_main, NULL, 1 }, { OPT_LEVELS_NONE, 0, NULL, 0 } }; Index: config/avr/avr.c === --- config/avr/avr.c (revision 256338) +++ config/avr/avr.c (working copy) @@ -1167,6 +1167,23 @@ avr_starting_frame_offset (void) } +/* Return true if we are supposed to be in main(). This is only used + to determine if the callee-saved registers don't need to be saved + because the caller of main (crt*.o) doesn't use any of them. */ + +static bool +avr_in_main_p () +{ + return (TARGET_OS_MAIN + && MAIN_NAME_P (DECL_NAME (current_function_decl)) + // FIXME: We'd like to also test `flag_hosted' which is only + // available in the C-ish fronts, so no such test for now. + // Instead, we test the return type of "main" which is not exactly + // the same but good enough. + && INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (current_function_decl; +} + + /* Return the number of hard registers to push/pop in the prologue/epilogue of the current function, and optionally store these registers in SET. */ @@ -1180,10 +1197,11 @@ avr_regs_to_save (HARD_REG_SET *set) CLEAR_HARD_REG_SET (*set); count = 0; - /* No need to save any registers if the function never returns or - has the "OS_task" or "OS_main" attribute. */ + /* No need to save any registers if the function never returns or has the + "OS_task" or "OS_main" attribute. Dito if we are in "main". */ if (TREE_THIS_VOLATILE (current_function_decl) + || avr_in_main_p() || cfun->machine->is_OS_task || cfun->machine->is_OS_main) return 0; @@ -1651,6 +1669,7 @@ avr_prologue_setup_frame (HOST_WIDE_INT && size < size_max && live_seq && !isr_p + && !avr_in_main_p() && !cfun->machine->is_OS_task && !cfun->machine->is_OS_main && !AVR_TINY); @@ -1713,7 +1732,9 @@ avr_prologue_setup_frame (HOST_WIDE_INT emit_push_byte (reg, true); if (frame_pointer_needed - && (!(cfun->machine->is_OS_task || cfun->machine->is_OS_main))) + && (!(avr_in_main_p() +|| cfun->machine->is_OS_task +|| cfun->machine->is_OS_main))) { /* Push frame pointer. Always be consistent about the ordering of pushes -- epilogue_restores expects the @@ -1834,6 +1855,9 @@ avr_prologue_setup_frame (HOST_WIDE_INT if (cfun->machine->is_interrupt) irq_state = 1; + /* IRQs might be on when entering "main", hence avr_in_main_p + is *not* included in the following test. */ + if (TARGET_NO_INTERRUPTS || cfun->machine->is_signal || cfun->machine->is_OS_main) @@ -2122,6 +2146,7 @@ avr_expand_epilogue (bool sibcall_p) && !isr_p && !cfun->machine->is_OS_task && !cfun->machine->is_OS_main + && !avr_in_main_p() && !AVR_TINY); if (minimize @@ -2227,7 +2252,9 @@ avr_expand_epilogue (bool sibcall_p) } /* size != 0 */ if (frame_pointer_needed - && !(cfun->machine->is_OS_task || cfun->machine->is_OS_main)) + && !(avr_in_main_p() + || cfun->machine->is_OS_task + || cfun->machine->is_OS_main)) { /* Restore previous frame_pointer. See avr_expand_prologue for rationale for not using pophi. */ Index: config/avr/avr.opt === --- co
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Mon, Jan 08, 2018 at 08:17:27AM -0800, H.J. Lu wrote: > On Mon, Jan 8, 2018 at 7:06 AM, Jakub Jelinek wrote: > > On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote: > >> See: > >> > >> https://sourceware.org/ml/binutils/2017-11/msg00369.html > > > > Program Headers: > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > > LOAD 0x00 0x 0x 0x00200 0x00200 R 0x20 > > LOAD 0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x20 > > LOAD 0x001000 0x00201000 0x00201000 0x00058 0x00058 R 0x20 > > LOAD 0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW 0x20 > > DYNAMIC0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW 0x4 > > GNU_STACK 0x00 0x 0x 0x0 0x0 RW 0x10 > > GNU_RELRO 0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R 0x1 > > > > Uh, 3 read-only LOADs instead of 2? Shouldn't then all the read-only > > non-executable sections be emitted together, so that you have a R, then R E, > > then RW PT_LOADs? > > It is done on purpose since the second RO segment will be merged with the > RELRO > segment at load time: That doesn't look like an advantage over not introducing it. > Elf file type is EXEC (Executable file) > Entry point 0x401ea0 > There are 11 program headers, starting at offset 52 > > Program Headers: > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > PHDR 0x34 0x00400034 0x00400034 0x00160 0x00160 R 0x4 > INTERP 0x000194 0x00400194 0x00400194 0x0001a 0x0001a R 0x1 > [Requesting program interpreter: /libx32/ld-linux-x32.so.2] > LOAD 0x00 0x0040 0x0040 0x0037c 0x0037c R 0x1000 > LOAD 0x000e68 0x00401e68 0x00401e68 0x00195 0x00195 R E 0x1000 > LOAD 0x001000 0x00402000 0x00402000 0x00124 0x00124 R 0x1000 > LOAD 0x001ef0 0x00402ef0 0x00402ef0 0x00134 0x00138 RW 0x1000 > DYNAMIC0x001ef8 0x00402ef8 0x00402ef8 0x000f8 0x000f8 RW 0x4 > NOTE 0x0001b0 0x004001b0 0x004001b0 0x00044 0x00044 R 0x4 > GNU_EH_FRAME 0x001008 0x00402008 0x00402008 0x00034 0x00034 R 0x4 > GNU_STACK 0x00 0x 0x 0x0 0x0 RW 0x10 > GNU_RELRO 0x001ef0 0x00402ef0 0x00402ef0 0x00110 0x00110 R 0x1 PHDR 0x34 0x00400034 0x00400034 0x00160 0x00160 R 0x4 INTERP 0x000194 0x00400194 0x00400194 0x0001a 0x0001a R 0x1 [Requesting program interpreter: /libx32/ld-linux-x32.so.2] LOAD 0x00 0x0040 0x0040 0x004a0 0x004a0 R 0x1000 LOAD 0x000e68 0x00401e68 0x00401e68 0x00195 0x00195 R E 0x1000 LOAD 0x001ef0 0x00402ef0 0x00402ef0 0x00134 0x00138 RW 0x1000 DYNAMIC0x001ef8 0x00402ef8 0x00402ef8 0x000f8 0x000f8 RW 0x4 NOTE 0x0001b0 0x004001b0 0x004001b0 0x00044 0x00044 R 0x4 GNU_EH_FRAME 0x001008 0x00402008 0x00402008 0x00034 0x00034 R 0x4 GNU_STACK 0x00 0x 0x 0x0 0x0 RW 0x10 GNU_RELRO 0x001ef0 0x00402ef0 0x00402ef0 0x00110 0x00110 R 0x1 you could even more the second PT_LOAD earlier to make the gaps on disk smaller. Jakub
Re: [PATCH, PR83327] Fix liveness analysis in lra for spilled-into hard regs
On 12/18/2017 05:57 PM, Vladimir Makarov wrote: On 12/15/2017 06:25 AM, Tom de Vries wrote: Proposed Solution: The patch addresses the problem, by: - marking the hard regs that have been used in lra_spill in hard_regs_spilled_into - using hard_regs_spilled_into in lra_create_live_ranges to make sure those registers are marked in the conflict_hard_regs of pseudos that overlap with the spill register usage [ I've also tried an approach where I didn't use hard_regs_spilled_into, but tried to propagate all hard regs. I figured out that I needed to mask out eliminable_regset. Also I needed to masked out lra_no_alloc_regs, but that could be due to gcn-specific problems (pointers take 2 hard regs), I'm not yet sure. Anyway, in the submitted patch I tried to avoid these problems and went for the more minimal approach. ] Tom, thank you for the detail explanation of the problem and solutions you considered. It helped me a lot. Your simple solution is adequate as the most transformations and allocation are done on the 1st LRA subpasses iteration. In order to get the patch accepted for trunk, I think we need: - bootstrap and reg-test on x86_64 - build and reg-test on mips (the only primary platform that has the spill_class hook enabled) Any comments? The patch looks ok to me. You can commit it after successful testing on x86-64 and mips but I am sure there will be no problems with x86-64 as it does not use spill_class currently (actually your patch might help to switch it on again for x86-64. spill_class was quite useful for x86-64 performance on Intel processors). Hi Matthew, there's an lra optimization that is currently enabled for MIPS, and not for any other primary or secondary target. This (already approved) patch fixes a bug in that optimization, and needs to be tested on MIPS. Unfortunately, the optimization is only enabled for MIPS16, and we don't have a current setup to test this. Could you help us out here and test this patch for MIPS16 on trunk? Thanks, - Tom
Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
On Jan 8, 2018, at 9:23 AM, Richard Earnshaw (lists) wrote: > > On 08/01/18 14:19, Bill Schmidt wrote: >> >>> On Jan 7, 2018, at 10:47 PM, Jeff Law wrote: >>> >>> On 01/07/2018 07:20 PM, Bill Schmidt wrote: Hi Richard, Unfortunately, I don't see any way that this will be useful for the ppc targets. We don't have a way to force resolution of a condition prior to continuing speculation, so this will just introduce another comparison that we would speculate past. For our mitigation we will have to introduce an instruction that halts all speculation at that point, and place it in front of all dangerous loads. I wish it were otherwise. >>> So could you have an expander for __builtin_load_no_speculate that just >>> emits the magic insn that halts all speculation and essentially ignores >>> the additional stuff that __builtin_load_no_speculate might be able to >>> do on other platforms? >> >> This is possible, but the builtin documentation is completely misleading for >> powerpc. We will not provide the semantics that this builtin claims to >> provide. >> So at a minimum we would need the documentation to indicate that the >> additional >> range-checking is target-specific behavior as well, not just the speculation >> code. >> At that point it isn't really a very target-neutral solution. >> >> What about other targets? This builtin seems predicated on specific behavior >> of ARM architecture; I don't know whether other targets have a guaranteed >> speculation-rectifying conditional test. >> >> For POWER, all we would need, or be able to exploit, is >> >> void __builtin_speculation_barrier () >> >> or some such. If there are two dangerous loads in one block, a single call >> to this suffices, but a generic solution involving range checks for specific >> loads would require one per load. >> > > Do you have any data to suggest that multiple /independent/ vulnerable > accesses occur under a single guarding condition more often than 'once > in a blue moon'? It seems to me that would be highly unlikely. No, I agree with that. This is more thinking ahead about the problem of trying to identify these cases automatically. Anything we do there is going to be necessarily conservative, so more loads may look dangerous than a human user can identify. But for something like that then you probably aren't looking at using __builtin_load_no_speculate anyhow. Thanks, Bill > > > R.
Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))
On Fri, 2018-01-05 at 17:20 -0500, David Malcolm wrote: > On Fri, 2018-01-05 at 15:29 -0500, Jason Merrill wrote: > > On 12/29/2017 12:06 PM, David Malcolm wrote: > > > One issue I ran into was that fold_for_warn doesn't eliminate > > > location wrappers when processing_template_decl, leading to > > > failures of the template-based cases in > > > g++.dg/warn/Wmemset-transposed-args-1.C. > > > > > > This is due to the early bailout when processing_template_decl > > > within cp_fold: > > > > > > 2078if (processing_template_decl > > > 2079|| (EXPR_P (x) && (!TREE_TYPE (x) || TREE_TYPE > > > (x) == error_mark_node))) > > > 2080 return x; > > > > > > which dates back to the merger of the C++ delayed folding branch. > > > > > > I've fixed that in this version of the patch by removing that > > > "processing_template_decl ||" condition from that cp_fold early > > > bailout. > > > > Hmm, that makes me nervous. We might want to fold in templates > > when > > called from fold_for_warn, but not for other occurrences. But I > > see > > that we check processing_template_decl in cp_fully_fold and in the > > call > > to cp_fold_function, so I guess this is fine. > > (I wondered if it would be possible to add a new flag to the various > fold* calls to request folding in templates, but given that the API > is > partially shared with C doing so doesn't seem to make sense) > > > > +case VIEW_CONVERT_EXPR: > > > +case NON_LVALUE_EXPR: > > > case CAST_EXPR: > > > case REINTERPRET_CAST_EXPR: > > > case CONST_CAST_EXPR: > > > @@ -14937,6 +14940,15 @@ tsubst_copy (tree t, tree args, > > > tsubst_flags_t complain, tree in_decl) > > > case CONVERT_EXPR: > > > case NOP_EXPR: > > >{ > > > + if (location_wrapper_p (t)) > > > + { > > > + /* Handle location wrappers by substituting the > > > wrapped node > > > +first, *then* reusing the resulting type. Doing > > > the type > > > +first ensures that we handle template parameters > > > and > > > +parameter pack expansions. */ > > > + tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args, > > > complain, in_decl); > > > + return build1 (code, TREE_TYPE (op0), op0); > > > + } > > > > I'd rather handle location wrappers separately, and abort if > > VIEW_CONVERT_EXPR or NON_LVALUE_EXPR appear other than as wrappers. > > OK. I'm testing an updated version which does so. Doing so uncovered an issue which I'm not sure how to resolve: it's possible for a decl to change type during parsing, after location wrappers may have been created, which changes location_wrapper_p on those wrappers from true to false. Here's the most minimal reproducer I've generated so far: 1 template 2 struct basic_string { 3static const _CharT _S_terminal; 4static void assign(const _CharT& __c2); 5void _M_set_length_and_sharable() { 6 assign(_S_terminal); 7} 8 }; 9 10 template 11 const _CharT basic_string<_CharT>::_S_terminal = _CharT(); 12 13 void getline(basic_string& __str) { 14__str._M_set_length_and_sharable(); 15 } Asserting that the only VIEW_CONVERT_EXPR or NON_LVALUE_EXPR seen in tsubst_copy and tsubst_copy_and_build are location_wrapper_p leads to an ICE on the above code. What's happening is as follows. First, in the call: 6 assign(_S_terminal); ^~~ the VAR_DECL "_S_terminal" gains a VIEW_CONVERT_EXPR location wrapper node to express the underline shown above. Later, during parsing of this init-declarator: 10 template 11 const _CharT basic_string<_CharT>::_S_terminal = _CharT(); ^~~ ...cp_parser_init_declarator calls start_decl, which calls duplicate_decls, merging the "_S_terminal" seen here: 1 template 2 struct basic_string { 3static const _CharT _S_terminal; ^~~ with that seen here: 10 template 11 const _CharT basic_string<_CharT>::_S_terminal = _CharT(); ^~~ Both "_S_terminal" VAR_DECLs have a "_CharT" TEMPLATE_TYPE_PARM, but these types are different tree nodes. Hence the type of the first VAR_DECL changes in duplicate_decls here: 2152 TREE_TYPE (newdecl) = TREE_TYPE (olddecl) = newtype; ...changing type to the TEMPLATE_TYPE_PARM of the second VAR_DECL. At this point, the location wrapper node at the callsite still has the *old* type, and hence location_wrapper_p (wrapper) changes from true to false, as its type no longer matches that of the decl it's wrapping. Hence my rewritten code in tsubst_copy_and_build fails the assertion here: 18306 case VIEW_CONVERT_EXPR: 18307 case NON_LVALUE_EXPR: 18308 gcc_assert (location_wrapper_p (t)); 18309 RETURN (RECUR (TREE_OPERAND (t, 0))); Assuming I'm correctly understanding
Re: Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))
On 01/08/2018 12:02 PM, David Malcolm wrote: On Fri, 2018-01-05 at 17:20 -0500, David Malcolm wrote: Doing so uncovered an issue which I'm not sure how to resolve: it's possible for a decl to change type during parsing, after location wrappers may have been created, which changes location_wrapper_p on those wrappers from true to false. Asserting that the only VIEW_CONVERT_EXPR or NON_LVALUE_EXPR seen in tsubst_copy and tsubst_copy_and_build are location_wrapper_p leads to an ICE on the above code. What's happening is as follows. First, in the call: 6 assign(_S_terminal); ^~~ the VAR_DECL "_S_terminal" gains a VIEW_CONVERT_EXPR location wrapper node to express the underline shown above. Later, during parsing of this init-declarator: 10 template 11 const _CharT basic_string<_CharT>::_S_terminal = _CharT(); ^~~ ...cp_parser_init_declarator calls start_decl, which calls duplicate_decls, merging the "_S_terminal" seen here: ... Both "_S_terminal" VAR_DECLs have a "_CharT" TEMPLATE_TYPE_PARM, but these types are different tree nodes. correct. they are not EQ but are EQUAL (same_type_p will be true). Hence the type of the first VAR_DECL changes in duplicate_decls here: 2152 TREE_TYPE (newdecl) = TREE_TYPE (olddecl) = newtype; ...changing type to the TEMPLATE_TYPE_PARM of the second VAR_DECL. 18306 case VIEW_CONVERT_EXPR: 18307 case NON_LVALUE_EXPR: 18308 gcc_assert (location_wrapper_p (t)); 18309 RETURN (RECUR (TREE_OPERAND (t, 0))); Assuming I'm correctly understanding the above, I'm not sure what the best solution is. Some ideas: * don't add location wrappers if processing a template * introduce a new tree node for location wrappers (gah) * something I haven't thought of Add a flag on the VIEW_CONVERT/NON_LVALUE expr explicitly noting its wrapperness (rather than infer it from TREE_TYPE == TREE_TYPE (TREE_OPERAND)). TREE_LANG_FLAG_0 looks available? nathan -- Nathan Sidwell
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
Hi, On Mon, 8 Jan 2018, Jakub Jelinek wrote: > On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote: > > See: > > > > https://sourceware.org/ml/binutils/2017-11/msg00369.html > > Program Headers: > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > LOAD 0x00 0x 0x 0x00200 0x00200 R 0x20 > LOAD 0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x20 > LOAD 0x001000 0x00201000 0x00201000 0x00058 0x00058 R 0x20 > LOAD 0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW 0x20 > DYNAMIC0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW 0x4 > GNU_STACK 0x00 0x 0x 0x0 0x0 RW 0x10 > GNU_RELRO 0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R 0x1 > > Uh, 3 read-only LOADs instead of 2? Shouldn't then all the read-only > non-executable sections be emitted together, so that you have a R, then R E, > then RW PT_LOADs? See also my subthread starting at H.J. first version of the set: https://sourceware.org/ml/binutils/2017-11/msg00218.html where some of the issues are hashed through. Ciao, Michael.
Re: Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))
On Mon, Jan 08, 2018 at 12:10:50PM -0500, Nathan Sidwell wrote: > > Both "_S_terminal" VAR_DECLs have a "_CharT" TEMPLATE_TYPE_PARM, but > > these types are different tree nodes. > > correct. they are not EQ but are EQUAL (same_type_p will be true). So perhaps location_wrapper_p could use that instead of pointer comparison. Though it would be expensive. > > Some ideas: > > > * don't add location wrappers if processing a template > > > > * introduce a new tree node for location wrappers (gah) > > > > * something I haven't thought of > > Add a flag on the VIEW_CONVERT/NON_LVALUE expr explicitly noting its > wrapperness (rather than infer it from TREE_TYPE == TREE_TYPE > (TREE_OPERAND)). TREE_LANG_FLAG_0 looks available? Yeah, I think most if not all lang flags are still available for those two tree codes and checking that should be quite cheap. Jakub
Re: [PATCH 1/5] x86: Add -mindirect-branch=
Hi, On Mon, 8 Jan 2018, H.J. Lu wrote: > On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek wrote: > > On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote: > >> > I'm wondering whether thunk creation can be a good target-independent > >> > generalization? I guess > >> > we can emit the function declaration without direct writes to > >> > asm_out_file? And the emission > >> > of function body can be potentially a target hook? > >> > > >> > What about emitting body of the function with RTL instructions instead > >> > of direct assembly write? > >> > My knowledge of RTL is quite small, but maybe it can bring some > >> > generalization and reusability > >> > for other targets? > >> > >> Thunks are x86 specific and they are created the same way as 32-bit PIC > >> thunks. > >> I don't see how a target hook is used. > > > > Talking about PIC thunks, those have I believe . character in their symbols, > > so that they can't be confused with user functions. Any reason these > > retpoline thunks aren't? > > > > They used to have '.'. It was changed at the last minute since kernel > needs to export them as regular symbols. That can be done via asm aliases or direct assembler use; the kernel doesn't absolutely have to access them via C compatible symbol names. Ciao, Michael.
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On Mon, Jan 8, 2018 at 9:18 AM, Michael Matz wrote: > Hi, > > On Mon, 8 Jan 2018, H.J. Lu wrote: > >> On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek wrote: >> > On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote: >> >> > I'm wondering whether thunk creation can be a good target-independent >> >> > generalization? I guess >> >> > we can emit the function declaration without direct writes to >> >> > asm_out_file? And the emission >> >> > of function body can be potentially a target hook? >> >> > >> >> > What about emitting body of the function with RTL instructions instead >> >> > of direct assembly write? >> >> > My knowledge of RTL is quite small, but maybe it can bring some >> >> > generalization and reusability >> >> > for other targets? >> >> >> >> Thunks are x86 specific and they are created the same way as 32-bit PIC >> >> thunks. >> >> I don't see how a target hook is used. >> > >> > Talking about PIC thunks, those have I believe . character in their >> > symbols, >> > so that they can't be confused with user functions. Any reason these >> > retpoline thunks aren't? >> > >> >> They used to have '.'. It was changed at the last minute since kernel >> needs to export them as regular symbols. > > That can be done via asm aliases or direct assembler use; the kernel > doesn't absolutely have to access them via C compatible symbol names. > Hi David, Can you comment on this? -- H.J.
Re: Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))
On 01/08/2018 12:14 PM, Jakub Jelinek wrote: On Mon, Jan 08, 2018 at 12:10:50PM -0500, Nathan Sidwell wrote: Both "_S_terminal" VAR_DECLs have a "_CharT" TEMPLATE_TYPE_PARM, but these types are different tree nodes. correct. they are not EQ but are EQUAL (same_type_p will be true). So perhaps location_wrapper_p could use that instead of pointer comparison. Though it would be expensive. If TYPE_STRUCTURAL_COMPARISON_P (or however it's spelt) is true, it'll be expensive. Otherwise it's a function call, a couple of indirections and a pointer compare. But still more expensive than ... Add a flag on the VIEW_CONVERT/NON_LVALUE expr explicitly noting its wrapperness (rather than infer it from TREE_TYPE == TREE_TYPE (TREE_OPERAND)). TREE_LANG_FLAG_0 looks available? Yeah, I think most if not all lang flags are still available for those two tree codes and checking that should be quite cheap. ... a bit test on the node itself. location_wrapper_p could contain something like bool result = TREE_LANG_FLAG_$FOO (t); gcc_checking_assert (result == same_type_p (TREE_TYPE (t), TREE_TYPE (TREE_OPERAND (t, 0))); return result; for the paranoid. nathan -- Nathan Sidwell
Re: [PATCH], Add optional IEEE/IBM long double multilib support
On Mon, Jan 08, 2018 at 10:17:06AM -0600, Segher Boessenkool wrote: > On Thu, Jan 04, 2018 at 06:05:55PM -0500, Michael Meissner wrote: > > This patch is the beginning step to switching the PowerPC long double > > support > > from IBM extended double to IEEE 128-bit floating point on PowerPC servers. > > It > > will be necessary to have this patch or a similar patch to allow the GLIBC > > team > > to begin their modifications in GLIBC 2.28, so that by the time GCC 9 comes > > out, we can decide to switch the default. It is likely, the default will > > only > > be switched on the 64-bit little endian PowerPC systems, when a distribution > > goes through a major level, such that they can contemplate major changes. > > I would hope the default changes for BE systems at the same time (at > least those with VSX, but ideally *all*). Note, the change has to be on a system by system basis. We will need to support distributions that use the IBM extended double for the long double format, and we will need to support distributions for the IEEE 128-bit format. It all depends on what the host system uses. While the work can be done, I don't know of any BE distribution that will be using GCC 8 as their main compiler. > > If you do not use the configuration option --with-long-double-format=ieee or > > --with-long-double-format=ibm, the system will not build multilibs, and just > > build normal libraries with the default set to IBM extended double. If you > > do > > use either of the switches, and allow multilibs, it will build two sets of > > multilibs, one for -mabi=ieeelongdouble and one for -mabi=ibmlongdouble. > > Huh. Why not always, then? There already is an option to turn off > multilibs, for people who really really want that. I'm trying not to surprise people building compilers for a setup that does not work. In the GCC 9 timeframe, when there is GLIBC support for it, we can make it default (assuming we keep the multilibs). It is a chicken and egg problem. Real users (as opposed to GCC and GLIBC developers) would need GLIBC 2.28 in order to use the IEEE multilib. But if we don't provide the switch or multilib as an option, it makes the GLIBC work harder. I suspect that libstc++-v3 may be more of an issue than GLIBC, since we have people starting to look at the GLIBC work, but we can't really do anything about libstdc++-v3 until we have a GLIBC. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
* H. J. Lu: > This set of patches for GCC 8 mitigates variant #2 of the > speculative execution vulnerabilities on x86 processors identified > by CVE-2017-5715, aka Spectre. They convert indirect branches to > call and return thunks to avoid speculative execution via indirect > call and jmp. Would it make sense to add a mode which relies on an empty return stack cache? Or will CPUs use the regular branch predictor if the return stack is empty? With an empty return stack cache and no branch predictor, a simple PUSH/RET sequence cannot be predicted, so the complex CALL sequence with a speculation barrier is not needed.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
* Sandra Loosemore: > I have a general documentation issue with all the new command-line > options and attributes added by this patch set: the documentation is > very implementor-speaky and doesn't explain what user-level problem > they're trying to solve. Agreed. Ideally, the documentation would also list the CPU models/model groups where it is known to have the desired effect, and if firmware updates are needed. For some users, it may be useful to be able to advertise that they have built their binaries with hardening, but another group of users is interested in hardening which actually works to stop all potential exploits.
Re: [patch,avr] Implement PR83737
2018-01-08 20:19 GMT+04:00 Georg-Johann Lay : > This PR skips saving of any registers in main. > > Attribute OS_main can do this as well, however we can just drop > any saves / restores in all optimized compilation -- not even > the test suite needs these saves. > > The feature can still be switched off by new -mno-OS_main > > Ok for trunk? I like it. Please commit. > > > gcc/ > Don't save registers in main(). > > PR target/83737 > * doc/invoke.texi (AVR Options) [-mOS_main]: Document it. > * config/avr/avr.opt (-mOS_main): New target option. > * config/avr/avr.c (avr_in_main_p): New static function. > (avr_regs_to_save) [avr_in_main_p]: Return 0. > (avr_prologue_setup_frame): Don't save any regs if avr_in_main_p. > (avr_expand_epilogue): Same. > * common/config/avr/avr-common.c (avr_option_optimization_table): > Switch on -mOS_main for optimizing compilations.
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On Mon, 2018-01-08 at 09:25 -0800, H.J. Lu wrote: > On Mon, Jan 8, 2018 at 9:18 AM, Michael Matz wrote: > > > > Hi, > > > > On Mon, 8 Jan 2018, H.J. Lu wrote: > > > > > > > > On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek wrote: > > > > > > > > On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote: > > > > > > > > > > > > > > > > > I'm wondering whether thunk creation can be a good > > > > > > target-independent generalization? I guess > > > > > > we can emit the function declaration without direct writes to > > > > > > asm_out_file? And the emission > > > > > > of function body can be potentially a target hook? > > > > > > > > > > > > What about emitting body of the function with RTL instructions > > > > > > instead of direct assembly write? > > > > > > My knowledge of RTL is quite small, but maybe it can bring some > > > > > > generalization and reusability > > > > > > for other targets? > > > > > Thunks are x86 specific and they are created the same way as 32-bit > > > > > PIC thunks. > > > > > I don't see how a target hook is used. > > > > Talking about PIC thunks, those have I believe . character in their > > > > symbols, > > > > so that they can't be confused with user functions. Any reason these > > > > retpoline thunks aren't? > > > > > > > They used to have '.'. It was changed at the last minute since kernel > > > needs to export them as regular symbols. > > That can be done via asm aliases or direct assembler use; the kernel > > doesn't absolutely have to access them via C compatible symbol names. > > > Hi David, > > Can you comment on this? It ends up being a real pain for the CONFIG_TRIM_UNUSED_SYMBOLS mechanism in the kernel, which really doesn't cope well with the dots. It *does* assume that exported symbols have C-compatible names. MODVERSIONS too, although we had a simpler "just shut up the warnings" solution for that. It was CONFIG_TRIM_UNUSED_SYMBOLS which was the really horrid one. I went a little way down the rabbit-hole of trying to make it cope, but it was far from pretty: https://patchwork.kernel.org/patch/10148081/ If there's a way to make it work sanely, I'm up for that. But if the counter-argument is "But someone might genuinely want to make their own C function called __x86_indirect_thunk_rax"... I'm not so receptive to that argument :) smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/3] Add __builtin_load_no_speculate
On 08/01/18 16:10, Bernd Edlinger wrote: > I thought about your new builtin again, and I wonder if > something like that might work as well? > > > cat despec.s > .arch armv7-a > .eabi_attribute 28, 1 > .eabi_attribute 20, 1 > .eabi_attribute 21, 1 > .eabi_attribute 23, 3 > .eabi_attribute 24, 1 > .eabi_attribute 25, 1 > .eabi_attribute 26, 2 > .eabi_attribute 30, 4 > .eabi_attribute 34, 1 > .eabi_attribute 18, 4 > .section.text.startup,"ax",%progbits > .align 2 > .global despec_i > .syntax unified > .arm > .fpu vfpv3-d16 > > despec_i: > cmp r0,#0 > beq L0 > ldr r0,[r1] > moveq r0,r2 > nop {0x14} @ CSDB > str r0,[r1] > mov r0,#1 > bx lr > L0: mov r0,#0 > bx lr > > cat test.c > extern int despec_i(int predicate, int *untrusted, int fallback); > #define N 8 > int a[N] = {1,2,3,4,5,7,8,9}; > int a2[0x200]; > int test(int untrust) > { >int x = 0; >if (despec_i(untrust >= 0 && untrust < N, &untrust, 0)) >{ > int v = a[untrust] & 0x1 ? 0x100 : 0x0; > x = a2[v]; >} >return x; > } > > > So this should feed the predicate through the builtin, and > clear the untrusted value when the condition has been been > mis-predicted. > > Wouldn't that be more flexible to use? > Or am I missing something? Yes, if you modified your test case to be something like: int test(int untrust) { int x = 0; if (untrust < 0) return x; if (despec_i(untrust >= 0 && untrust < N, &untrust, 0)) { int v = a[untrust] & 0x1 ? 0x100 : 0x0; x = a2[v]; } return x; } then the compiler can (and will) optimize the condition to if (despec (true && untrust < N, &untrust, 0)) and suddenly you don't have the protection against speculative execution that you thought you had. That makes the API exceedingly dangerous as we can't easily detect and inhibit all the sources of information that the compiler might use to make such deductions. What happens, for example if your original case is inlined into a wider function that has additional tests? R. > > > As a side note: I noticed that "nop {0x14}" seems to produce the correct > assembler opcode without the need for using a .insn code. > > > Bernd. >
[PATCH][arm][2/3] Implement fp16fml extension for ARMv8.4-A
Hi all, This patch adds the +fp16fml extension that enables some half-precision floating-point Advanced SIMD instructions, available through arm_neon.h intrinsics. This extension is on by default for armv8.4-a if fp16 is available, so it can be enabled by -march=armv8.4-a+fp16. fp16fml is also available for armv8.2-a and armv8.3-a through the +fp16fml option that is added for these architectures. The new instructions that this patch adds support for are: vfmal.f16 Dr, Sm, Sn vfmal.f16 Qr, Dm, Dn vfmsl.f16 Dr, Sm, Sn vfmsl.f16 Qr, Dm, Dn They interpret their input registers as a vector of half-precision floating-point values, extend them to single-precision vectors and perform a fused multiply-add or subtract of them with the destination vector. This patch exposes these instructions through arm_neon.h intrinsics. The set of intrinsics allows us to do stuff such as perform the multiply-add/subtract operation on the low or top half of float16x4_t and float16x8_t values. This maps naturally in aarch64 to the FMLAL and FMLAL2 instructions but on arm we have to use the fact that consecutive NEON registers overlap the wider register (i.e. d0 is s0 plus s1, q0 is d0 plus d1 etc). This just means we have to be careful to use the right subreg operand print code. New arm-specific builtins are defined to expand to the new patterns. I've managed to compress the define_expands using code, mode and int iterators but the define_insns don't compress very well without two-tiered iterators (iterator attributes expanding to iterators) which we don't support. Bootstrapped and tested on arm-none-linux-gnueabihf and also on armeb-none-eabi. Thanks, Kyrill 2018-01-08 Kyrylo Tkachov * config/arm/arm-cpus.in (fp16fml): New feature. (ALL_SIMD): Add fp16fml. (armv8.2-a): Add fp16fml as an option. (armv8.3-a): Likewise. (armv8.4-a): Add fp16fml as part of fp16. * config/arm/arm.h (TARGET_FP16FML): Define. * config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_FP16_FML when appropriate. * config/arm/arm-modes.def (V2HF): Define. * config/arm/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32, vfmlal_high_u32, vfmlsl_high_u32, vfmlalq_low_u32, vfmlslq_low_u32, vfmlalq_high_u32, vfmlslq_high_u32): Define. * config/arm/arm_neon_builtins.def (vfmal_low, vfmal_high, vfmsl_low, vfmsl_high): New set of builtins. * config/arm/iterators.md (PLUSMINUS): New code iterator. (vfml_op): New code attribute. (VFMLHALVES): New int iterator. (VFML, VFMLSEL): New mode attributes. (V_reg): Define mapping for V2HF. (V_hi, V_lo): New mode attributes. (VF_constraint): Likewise. (vfml_half, vfml_half_selector): New int attributes. * config/arm/neon.md (neon_vfml_): New define_expand. (vfmal_low_intrinsic, vfmsl_high_intrinsic, vfmal_high_intrinsic, vfmsl_low_intrinsic): New define_insn. * config/arm/t-arm-elf (v8_fps): Add fp16fml. * config/arm/t-multilib (v8_2_a_simd_variants): Add fp16fml. * config/arm/unspecs.md (UNSPEC_VFML_LO, UNSPEC_VFML_HI): New unspecs. * doc/invoke.texi (ARM Options): Document fp16fml. Update armv8.4-a documentation. * doc/sourcebuild.texi (arm_fp16fml_neon_ok, arm_fp16fml_neon): Document new effective target and option set. 2018-01-08 Kyrylo Tkachov * gcc.target/arm/multilib.exp: Add combination tests for fp16fml. * gcc.target/arm/simd/fp16fml_high.c: New test. * gcc.target/arm/simd/fp16fml_low.c: Likewise. * lib/target-supports.exp (check_effective_target_arm_fp16fml_neon_ok_nocache, check_effective_target_arm_fp16fml_neon_ok, add_options_for_arm_fp16fml_neon): New procedures. diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index 635bc3c1c38de79802041fc50229b90defd2e467..46dc8d51ffcd80983a70f1bd283caa3688648c9b 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -160,6 +160,7 @@ arm_cpu_builtins (struct cpp_reader* pfile) TARGET_VFP_FP16INST); def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", TARGET_NEON_FP16INST); + def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_FML", TARGET_FP16FML); def_or_undef_macro (pfile, "__ARM_FEATURE_FMA", TARGET_FMA); def_or_undef_macro (pfile, "__ARM_NEON__", TARGET_NEON); diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index 0967b9d2277a0d211452b7cd4d579db1774f29b3..7b9224b6b0791a9a7a315e1807b439604a3c0929 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -165,6 +165,9 @@ define feature fp16 # Dot Product instructions extension to ARMv8.2-a. define feature dotprod +# Half-precision floating-point instructions in ARMv8.4-A. +define feature fp16fml + # ISA Quirks (errata?). Don't forget to add this to the fgroup # ALL_QUIRKS below. @@ -202,7 +205,7 @@ define fgroup ALL_CRYPTO crypto # strip off 32 D-registers, but does not remove support for # double-precision FP. define fgroup ALL_SIMD_INT
[PATCH][arm][3/3] Implement fp16fml lane intrinsics
Hi all, This patch implements the lane-wise fp16fml intrinsics. There's quite a few of them so I've split them up from the other simpler fp16fml intrinsics. These ones expose instructions such as vfmal.f16 Dd, Sn, Sm[] 0 <= index <= 1 vfmal.f16 Qd, Dn, Dm[] 0 <= index <= 3 vfmsl.f16 Dd, Sn, Sm[] 0 <= index <= 1 vfmsl.f16 Qd, Dn, Dm[] 0 <= index <= 3 These instructions extract a single half-precision floating-point value from one of the source regs and perform a vfmal/vfmsl operation as per the normal variant with that value. The nuance here is that some of the intrinsics want to do things like: float32x2_t vfmlal_laneq_low_u32 (float32x2_t __r, float16x4_t __a, float16x8_t __b, const int __index) where the float16x8_t value of '__b' is held in a Q register, so we need to be a bit smart about finding the right D or S sub-register and translating the lane number to a lane in that sub-register, instead of just passing the language-level const-int down to the assembly instruction. That's where most of the complexity of this patch comes from but hopefully it's orthogonal enough to make sense. Bootstrapped and tested on arm-none-linux-gnueabihf as well as armeb-none-eabi. Thanks, Kyrill 2018-01-08 Kyrylo Tkachov * config/arm/arm_neon.h (vfmlal_lane_low_u32, vfmlal_lane_high_u32, vfmlalq_laneq_low_u32, vfmlalq_lane_low_u32, vfmlal_laneq_low_u32, vfmlalq_laneq_high_u32, vfmlalq_lane_high_u32, vfmlal_laneq_high_u32, vfmlsl_lane_low_u32, vfmlsl_lane_high_u32, vfmlslq_laneq_low_u32, vfmlslq_lane_low_u32, vfmlsl_laneq_low_u32, vfmlslq_laneq_high_u32, vfmlslq_lane_high_u32, vfmlsl_laneq_high_u32): Define. * config/arm/arm_neon_builtins.def (vfmal_lane_low, vfmal_lane_lowv4hf, vfmal_lane_lowv8hf, vfmal_lane_high, vfmal_lane_highv4hf, vfmal_lane_highv8hf, vfmsl_lane_low, vfmsl_lane_lowv4hf, vfmsl_lane_lowv8hf, vfmsl_lane_high, vfmsl_lane_highv4hf, vfmsl_lane_highv8hf): New sets of builtins. * config/arm/iterators.md (VFMLSEL2, vfmlsel2): New mode attributes. (V_lane_reg): Likewise. * config/arm/neon.md (neon_vfml_lane_): New define_expand. (neon_vfml_lane_): Likewise. (vfmal_lane_low_intrinsic, vfmal_lane_low_intrinsic, vfmal_lane_high_intrinsic, vfmal_lane_high_intrinsic, vfmsl_lane_low_intrinsic, vfmsl_lane_low_intrinsic, vfmsl_lane_high_intrinsic, vfmsl_lane_high_intrinsic): New define_insns. 2018-01-08 Kyrylo Tkachov * gcc.target/arm/simd/fp16fml_lane_high.c: New test. * gcc.target/arm/simd/fp16fml_lane_low.c: New test. diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 01324096d673187b504b89a6d68785275b445b1b..a8aae4464aa02b4286751c116fae493517056e99 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18164,6 +18164,150 @@ vfmlslq_high_u32 (float32x4_t __r, float16x8_t __a, float16x8_t __b) return __builtin_neon_vfmsl_highv4sf (__r, __a, __b); } +__extension__ extern __inline float32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vfmlal_lane_low_u32 (float32x2_t __r, float16x4_t __a, float16x4_t __b, + const int __index) +{ + __builtin_arm_lane_check (4, __index); + return __builtin_neon_vfmal_lane_lowv2sf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vfmlal_lane_high_u32 (float32x2_t __r, float16x4_t __a, float16x4_t __b, + const int __index) +{ + __builtin_arm_lane_check (4, __index); + return __builtin_neon_vfmal_lane_highv2sf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vfmlalq_laneq_low_u32 (float32x4_t __r, float16x8_t __a, float16x8_t __b, + const int __index) +{ + __builtin_arm_lane_check (8, __index); + return __builtin_neon_vfmal_lane_lowv4sf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vfmlalq_lane_low_u32 (float32x4_t __r, float16x8_t __a, float16x4_t __b, + const int __index) +{ + __builtin_arm_lane_check (4, __index); + return __builtin_neon_vfmal_lane_lowv4hfv4sf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vfmlal_laneq_low_u32 (float32x2_t __r, float16x4_t __a, float16x8_t __b, + const int __index) +{ + __builtin_arm_lane_check (8, __index); + return __builtin_neon_vfmal_lane_lowv8hfv2sf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vfmlalq_laneq_high_u32 (float32x4_t __r, float16x8_t __a, float16x8_t __b, + const int __index) +{ + __builtin_arm_lane_check (8, __index); + return __builtin_neon_vfmal_lane_highv4sf (__r, __a, __b, __index); +} + +__extension__ extern __
[PATCH][arm][1/3] Add -march=armv8.4-a option
[resending due to mailer problems...] Hi all, This patch adds support for the Armv8.4-A architecture [1] in the arm backend. This is done through the new -march=armv8.4-a option. With this patch armv8.4-a is recognised as an argument and supports the extensions: simd, fp16, crypto, nocrypto, nofp with the familiar meaning of these options. Worth noting that there is no dotprod option like in armv8.2-a and armv8.3-a because Dot Product support is mandatory in Armv8.4-A when simd is available, so when using +simd (of fp16 which enables +simd), the +dotprod is implied. The various multilib selection makefile fragments are updated too and the mutlilib.exp test gets a few armv8.4-a combination tests. Bootstrapped and tested on arm-none-linux-gnueabihf. Christophe: Can I ask you for a huge favour to give these 3 patches a run through your testing infrastructure if you get the chance? The changes should be fairly self-contained (i.e. touching only -march=armv8.4-a support) but I've gotten various edge cases with testsuite setup wrong in the past... Thanks, Kyrill [1] https://community.arm.com/processors/b/blog/posts/introducing-2017s-extensions-to-the-arm-architecture 2017-01-08 Kyrylo Tkachov * config/arm/arm-cpus.in (armv8_4): New feature. (ARMv8_4a): New fgroup. (armv8.4-a): New arch. * config/arm/arm-tables.opt: Regenerate. * config/arm/t-aprofile: Add matching rules for -march=armv8.4-a. * config/arm/t-arm-elf (all_v8_archs): Add armv8.4-a. * config/arm/t-multilib (v8_4_a_simd_variants): New variable. Add matching rules for -march=armv8.4-a and extensions. * doc/invoke.texi (ARM Options): Document -march=armv8.4-a. 2017-01-08 Kyrylo Tkachov * gcc.target/arm/multilib.exp: Add some -march=armv8.4-a combination tests. diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index 281ec162db8c982128462d8efac2be1d21959cf7..0967b9d2277a0d211452b7cd4d579db1774f29b3 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -120,6 +120,9 @@ define feature armv8_2 # Architecture rel 8.3. define feature armv8_3 +# Architecture rel 8.4. +define feature armv8_4 + # M-Profile security extensions. define feature cmse @@ -242,6 +245,7 @@ define fgroup ARMv8a ARMv7ve armv8 define fgroup ARMv8_1aARMv8a crc32 armv8_1 define fgroup ARMv8_2aARMv8_1a armv8_2 define fgroup ARMv8_3aARMv8_2a armv8_3 +define fgroup ARMv8_4aARMv8_3a armv8_4 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv define fgroup ARMv8m_main ARMv7m armv8 cmse define fgroup ARMv8r ARMv8a @@ -597,6 +601,19 @@ begin arch armv8.3-a option dotprod add FP_ARMv8 DOTPROD end arch armv8.3-a +begin arch armv8.4-a + tune for cortex-a53 + tune flags CO_PROC + base 8A + profile A + isa ARMv8_4a + option simd add FP_ARMv8 DOTPROD + option fp16 add fp16 FP_ARMv8 DOTPROD + option crypto add FP_ARMv8 CRYPTO DOTPROD + option nocrypto remove ALL_CRYPTO + option nofp remove ALL_FP +end arch armv8.4-a + begin arch armv8-m.base tune for cortex-m23 base 8M_BASE diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index f7937256cd79296ba33d109232bcf0d6f7b03917..b8ebec668b1404fd3f9a71dd1f0d48d1261bcf53 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -455,19 +455,22 @@ EnumValue Enum(arm_arch) String(armv8.3-a) Value(29) EnumValue -Enum(arm_arch) String(armv8-m.base) Value(30) +Enum(arm_arch) String(armv8.4-a) Value(30) EnumValue -Enum(arm_arch) String(armv8-m.main) Value(31) +Enum(arm_arch) String(armv8-m.base) Value(31) EnumValue -Enum(arm_arch) String(armv8-r) Value(32) +Enum(arm_arch) String(armv8-m.main) Value(32) EnumValue -Enum(arm_arch) String(iwmmxt) Value(33) +Enum(arm_arch) String(armv8-r) Value(33) EnumValue -Enum(arm_arch) String(iwmmxt2) Value(34) +Enum(arm_arch) String(iwmmxt) Value(34) + +EnumValue +Enum(arm_arch) String(iwmmxt2) Value(35) Enum Name(arm_fpu) Type(enum fpu_type) diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile index a4bf04794e71381256e1489cdad71e966306477f..167a49d16e468be3c222a50abec57b6a68bc561e 100644 --- a/gcc/config/arm/t-aprofile +++ b/gcc/config/arm/t-aprofile @@ -96,6 +96,13 @@ MULTILIB_MATCHES += $(foreach ARCH, $(v8_2_a_simd_variants), \ march?armv8-a+simd=march?armv8.2-a$(ARCH) \ march?armv8-a+simd=march?armv8.3-a$(ARCH)) +# Baseline v8.4-a: map down to baseline v8-a +MULTILIB_MATCHES += march?armv8-a=march?armv8.4-a + +# Map all v8.4-a SIMD variants to v8-a+simd +MULTILIB_MATCHES += $(foreach ARCH, $(v8_4_a_simd_variants), \ + march?armv8-a+simd=march?armv8.4-a$(ARCH)) + # Use Thumb libraries for everything. MULTILIB_REUSE += mthumb/march.armv7-a/mfloat-abi.soft=marm/march.armv7-a/mfloat-abi.soft diff --git a/gcc/config/arm/t-arm-elf b/gcc/config/arm/t-arm-elf index a15fb2df12f7b0d637976f3912432740ecd104bd..3e721ec789806335c6097d4088642150abf1003a 100644 --- a/gcc/config/arm/t-a
Re: Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))
On Mon, 2018-01-08 at 12:25 -0500, Nathan Sidwell wrote: > On 01/08/2018 12:14 PM, Jakub Jelinek wrote: > > On Mon, Jan 08, 2018 at 12:10:50PM -0500, Nathan Sidwell wrote: > > > > Both "_S_terminal" VAR_DECLs have a "_CharT" > > > > TEMPLATE_TYPE_PARM, but > > > > these types are different tree nodes. > > > > > > correct. they are not EQ but are EQUAL (same_type_p will be > > > true). > > > > So perhaps location_wrapper_p could use that instead of pointer > > comparison. > > Though it would be expensive. > > If TYPE_STRUCTURAL_COMPARISON_P (or however it's spelt) is true, > it'll > be expensive. Otherwise it's a function call, a couple of > indirections > and a pointer compare. But still more expensive than ... > > > > Add a flag on the VIEW_CONVERT/NON_LVALUE expr explicitly noting > > > its > > > wrapperness (rather than infer it from TREE_TYPE == TREE_TYPE > > > (TREE_OPERAND)). TREE_LANG_FLAG_0 looks available? > > > > Yeah, I think most if not all lang flags are still available for > > those two > > tree codes and checking that should be quite cheap. > > ... a bit test on the node itself. > > location_wrapper_p could contain something like >bool result = TREE_LANG_FLAG_$FOO (t); >gcc_checking_assert (result == same_type_p (TREE_TYPE (t), > TREE_TYPE > (TREE_OPERAND (t, 0))); >return result; > > for the paranoid. > > nathan Thanks Nathan and Jakub: a quick smoketest using TREE_LANG_FLAG_0 worked, and fixes this issue. However, should I be using a TREE_LANG_FLAG for something that's in tree.h/c, rather than just in the "cp" subdir? (the wrapper nodes are only added to C++ in this patch kit, but given that e.g. STRIP_NOPS needs to remove them, the lang-independent code needs to handle them, and ultimately we may want wrapper nodes in other FEs). Dave
Re: Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))
On Mon, Jan 08, 2018 at 01:02:37PM -0500, David Malcolm wrote: > Thanks Nathan and Jakub: a quick smoketest using TREE_LANG_FLAG_0 > worked, and fixes this issue. > > However, should I be using a TREE_LANG_FLAG for something that's in > tree.h/c, rather than just in the "cp" subdir? (the wrapper nodes are > only added to C++ in this patch kit, but given that e.g. STRIP_NOPS > needs to remove them, the lang-independent code needs to handle them, > and ultimately we may want wrapper nodes in other FEs). If location_wrapper_p etc. are in generic code rather than only in the FE, sure, you'd need to use some generic flag rather than FE specific. I bet e.g. private_flag and protected_flag aren't used yet for VIEW_CONVERT_EXPR/NON_LVALUE_EXPR, but they are used on some other expressions, e.g. CALL_EXPR, which suggests that it could be used for that. Jakub
Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping
On Mon, Jan 08, 2018 at 01:27:24PM +, Wilco Dijkstra wrote: > Segher Boessenkool wrote: > > On Fri, Jan 05, 2018 at 12:22:44PM +, Wilco Dijkstra wrote: > >> An example epilog in a shrinkwrapped function before: > >> > >> ldp x21, x22, [sp,#16] > >> ldr x23, [sp,#32] > >> ldr x24, [sp,#40] > >> ldp x25, x26, [sp,#48] > >> ldr x27, [sp,#64] > >> ldr x28, [sp,#72] > >> ldr x30, [sp,#80] > >> ldr d8, [sp,#88] > >> ldp x19, x20, [sp],#96 > >> ret > > > > In this example, the compiler already can make a ldp for both x23/x24 and > > x27/x28 just fine (if not in emit_epilogue_components, then simply in a > > peephole); why did that not work? Or is this not the actual generated > > machine code (and there are labels between the insns, for example)? > > This block originally had a label in it, 2 blocks emitted identical restores > and > then branched to the final epilog. The final epilogue was then duplicated so > we end up with 2 almost identical epilogs of 10 instructions (almost since > there were 1-2 unrelated instructions in both blocks). > > Peepholing is very conservative about instructions using SP and won't touch > anything frame related. If this was working better then the backend could just > emit single loads/stores and let peepholing generate LDP/STP. How unfortunate; that should definitely be improved then. Always pairing two registers together *also* degrades code quality. > Another issue is that after pro_and_epilogue pass I see multiple restores > of the same registers and then a branch to the same block. We should try > to avoid the unnecessary duplication. It already does that if *all* predecessors of that block do that. If you want to do it in other cases, you end up with more jumps. That may be beneficial in some cases, of course, but it is not an obvious win (and in the general case it is, hrm let's use nice words, "terrible"). Segher
Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
* H. J. Lu: > On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer wrote: >> * H. J. Lu: >> >>> Add -mindirect-branch-loop= option to control loop filler in call and >>> return thunks generated by -mindirect-branch=. 'lfence' uses "lfence" >>> as loop filler. 'pause' uses "pause" as loop filler. 'nop' uses "nop" >>> as loop filler. The default is 'lfence'. >> >> Why is the loop needed? Doesn't ud2 or cpuid stop speculative >> execution? > > My understanding is that a loop works better. Better how? What about far jumps? I think they prevent some forms of prefetch on i386, so perhaps long mode is similar in that regard?
Re: [PATCH] PR 78534 Change character length from int to size_t
On 1/3/18 11:43 AM, Janne Blomqvist wrote: On Wed, Jan 3, 2018 at 8:34 PM, Bob Deen wrote: On 12/29/17 5:31 AM, Janne Blomqvist wrote: In order to handle large character lengths on (L)LP64 targets, switch the GFortran character length from an int to a size_t. This is an ABI change, as procedures with character arguments take hidden arguments with the character length. Did this change not make it into gcc 7 then? No, it caused regressions on big endian targets, and it was so late in the gcc 7 cycle that there was no time to fix them (at the time I didn't have access to a big endian target to test on, and getting said access took time), so I had to revert it. Those regressions have now been fixed, and the ABI has been broken anyway due to other changes, so I'm trying again for gcc 8. Okay. If for some reason it doesn't make 8, it would be nice to know. I lurk here and try to pay attention but obviously I missed that it was pulled from 7. (which is why I'd prefer something specific to test rather than a version number, but it is what it is). Thanks... -Bob I am one of those who still use these hidden arguments for Fortran <-> C interfaces. Based on discussions a year ago, I added this to my code: #if defined(__GNUC__) && (__GNUC__ > 6) #include #define FORSTR_STDARG_TYPE size_t #else #define FORSTR_STDARG_TYPE int #endif I infer from this thread that I should change this to __GNUC__ > 7 now. Is this still the correct/best way to determine the hidden argument size? Yes, I would say so. (note that we're still using 4.x... ugh, don't ask... so the >6 check hasn't actually been used yet, I just want to future-proof things as much as possible without having to rewrite the entire Fortran <-> C interface.) Thanks... -Bob Bob Deen @ NASA-JPL Multmission Image Processing Lab bob.d...@jpl.nasa.gov
Re: Fix Bug 83566 - cyl_bessel_j returns wrong result for x>1000 for high orders
Formatting fixed. diff --git a/libstdc++-v3/include/tr1/bessel_function.tcc b/libstdc++-v3/include/tr1/bessel_function.tcc index 7ac733d..5f8fc9f 100644 --- a/libstdc++-v3/include/tr1/bessel_function.tcc +++ b/libstdc++-v3/include/tr1/bessel_function.tcc @@ -27,6 +27,10 @@ * Do not attempt to use it directly. @headername{tr1/cmath} */ +/* __cyl_bessel_jn_asymp adapted from GNU GSL version 2.4 specfunc/bessel_j.c + * Copyright (C) 1996-2003 Gerard Jungman + */ + // // ISO C++ 14882 TR1: 5.2 Special functions // @@ -358,16 +362,42 @@ namespace tr1 void __cyl_bessel_jn_asymp(_Tp __nu, _Tp __x, _Tp & __Jnu, _Tp & __Nnu) { - const _Tp __mu = _Tp(4) * __nu * __nu; - const _Tp __mum1 = __mu - _Tp(1); - const _Tp __mum9 = __mu - _Tp(9); - const _Tp __mum25 = __mu - _Tp(25); - const _Tp __mum49 = __mu - _Tp(49); - const _Tp __xx = _Tp(64) * __x * __x; - const _Tp __P = _Tp(1) - __mum1 * __mum9 / (_Tp(2) * __xx) - * (_Tp(1) - __mum25 * __mum49 / (_Tp(12) * __xx)); - const _Tp __Q = __mum1 / (_Tp(8) * __x) - * (_Tp(1) - __mum9 * __mum25 / (_Tp(6) * __xx)); + const _Tp __mu = _Tp(4) * __nu * __nu; + const _Tp __8x = _Tp(8) * __x; + + _Tp __P = _Tp(0); + _Tp __Q = _Tp(0); + + _Tp __k = _Tp(0); + _Tp __term = _Tp(1); + + int __epsP = 0; + int __epsQ = 0; + + _Tp __eps = std::numeric_limits<_Tp>::epsilon(); + + do + { + __term *= (__k == 0 + ? _Tp(1) + : -(__mu - (2 * __k - 1) * (2 * __k - 1)) / (__k * __8x)); + + __epsP = std::abs(__term) < std::abs(__eps * __P); + __P += __term; + + __k++; + + __term *= (__mu - (2 * __k - 1) * (2 * __k - 1)) / (__k * __8x); + __epsQ = std::abs(__term) < std::abs(__eps * __Q); + __Q += __term; + + if (__epsP && __epsQ && __k > __nu / 2.) + break; + + __k++; + } + while (__k < 1000); + const _Tp __chi = __x - (__nu + _Tp(0.5L)) * __numeric_constants<_Tp>::__pi_2(); On 01/06/2018 10:23 AM, Paolo Carlini wrote: Hi, On 05/01/2018 23:46, Michele Pezzutti wrote: + __term *= (__k == 0) ? _Tp(1) : -(__mu - (2 * __k - 1) * (2 * __k - 1)) + / (__k * __8x); In such cases, per the Coding Standards, you want an outer level of parentheses wrapping the whole right side expression. See toward the end of https://www.gnu.org/prep/standards/html_node/Formatting.html#Formatting. I see that many other places will need fixing, at some point - this rather straightforward rule is often overlooked leading to brittle formatting of many expressions, probably because it's really obvious in practice only together with Emacs, I'm not sure. Also - this kind of stylistic nitpicking is partially personal taste - the parentheses around (__k == 0) seem definitely redundant to me, and I don't think you would find many examples of that in our code. About the Assignement, please be patient. For example, used to be the case that when RMS was traveling couldn't process Assignments, for example. It can be slow for many reasons, it's 100% normal. Paolo..
Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
On Mon, Jan 8, 2018 at 10:32 AM, Florian Weimer wrote: > * H. J. Lu: > >> On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer wrote: >>> * H. J. Lu: >>> Add -mindirect-branch-loop= option to control loop filler in call and return thunks generated by -mindirect-branch=. 'lfence' uses "lfence" as loop filler. 'pause' uses "pause" as loop filler. 'nop' uses "nop" as loop filler. The default is 'lfence'. >>> >>> Why is the loop needed? Doesn't ud2 or cpuid stop speculative >>> execution? >> >> My understanding is that a loop works better. > > Better how? > > What about far jumps? I think they prevent some forms of prefetch on > i386, so perhaps long mode is similar in that regard? These are more expensive and we can't guarantee that they are effective, hence the short loop . -- H.J.
Re: [PATCH] lto, testsuite: Fix ICE in -Wodr (PR lto/83121)
On Sat, 2018-01-06 at 08:44 +0100, Richard Biener wrote: > On January 5, 2018 11:55:11 PM GMT+01:00, David Malcolm hat.com> wrote: > > On Fri, 2018-01-05 at 10:36 +0100, Richard Biener wrote: > > > On Thu, Jan 4, 2018 at 10:52 PM, David Malcolm > > om> > > > wrote: > > > > PR lto/83121 reports an ICE deep inside the linemap code when > > > > -Wodr > > > > reports on a type mismatch. > > > > > > > > The root cause is that the warning can access the > > > > DECL_SOURCE_LOCATION > > > > of a streamed-in decl before the lto_location_cache has been > > > > applied. > > > > > > > > lto_location_cache::input_location stores > > > > RESERVED_LOCATION_COUNT > > > > (==2) > > > > as a poison value until the cache is applied: > > > > 250 /* Keep value RESERVED_LOCATION_COUNT in *loc as > > > > linemap > > > > lookups will > > > > 251 ICE on it. */ > > > > > > > > The fix is relatively simple: apply the cache before reading > > > > the > > > > DECL_SOURCE_LOCATION. > > > > > > > > (I wonder if we should instead have a INVALID_LOCATION value to > > > > handle > > > > this case more explicitly? e.g. 0x? or reserve 2 in > > > > libcpp for > > > > that purpose, and have the non-reserved locations start at > > > > 3? Either > > > > would be more invasive, though) > > > > > > > > Triggering the ICE was fiddly: it seems to be affected by many > > > > things, > > > > including the order of files, and (I think) by filenames. My > > > > theory is > > > > that it's affected by the ordering of the tree nodes in the LTO > > > > stream: > > > > for the ICE to occur, the types in question need to be compared > > > > before > > > > some other operation flushes the lto_location_cache. This > > > > ordering > > > > is affected by the hash-based ordering in DFS in lto-streamer- > > > > out.c, which > > > > might explain why r255066 seemed to trigger the bug; the only > > > > relevant > > > > change to LTO there seemed to be: > > > > * lto-streamer-out.c (hash_tree): Hash TYPE_EMPTY_P and > > > > DECL_PADDING_P. > > > > If so, then the bug was presumably already present, but hidden. > > > > > > > > The patch also adds regression test coverage for the ICE, which > > > > is > > > > more > > > > involved - as far as I can tell, we don't have an existing way > > > > to > > > > verify > > > > diagnostics emitted during link-time optimization. > > > > > > > > Hence the patch adds some machinery to lib/lto.exp to support > > > > two > > > > new > > > > directives: dg-lto-warning and dg-lto-message, corresponding to > > > > dg-warning and dg-message respectively, where the diagnostics > > > > are > > > > expected to be emitted at link-time. > > > > > > > > The test case includes examples of LTO warnings and notes in > > > > both > > > > the > > > > primary and secondary source files > > > > > > > > Doing so required reusing the logic from DejaGnu for handling > > > > diagnostics. > > > > Unfortunately the pertinent code is a 50 line loop within a > > > > ~200 > > > > line Tcl > > > > function in dg.exp (dg-test), so I had to copy it from DejaGnu, > > > > making > > > > various changes as necessary (see > > > > lto_handle_diagnostics_for_file > > > > in the > > > > patch; for example the LTO version supports multiple source > > > > files, > > > > identifying which source file emitted a diagnostic). > > > > > > > > For non-LTO diagnostics we currently ignore surplus "note" > > > > diagnostics. > > > > This patch updates lto_prune_warns to follow this behavior > > > > (since > > > > otherwise we'd need numerous dg-lto-message directives for the > > > > motivating > > > > test case). > > > > > > > > The patch adds these PASS results to g++.sum: > > > > > > > > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_0.o assemble, -O0 -flto > > > > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_1.o assemble, -O0 -flto > > > > PASS: g++.dg/lto/pr83121 (test for LTO warnings, pr83121_0.C > > > > line > > > > 6) > > > > PASS: g++.dg/lto/pr83121 (test for LTO warnings, pr83121_0.C > > > > line > > > > 8) > > > > PASS: g++.dg/lto/pr83121 (test for LTO warnings, pr83121_1.C > > > > line > > > > 2) > > > > PASS: g++.dg/lto/pr83121 (test for LTO warnings, pr83121_1.C > > > > line > > > > 3) > > > > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_0.o-cp_lto_pr83121_1.o > > > > link, -O0 -flto > > > > > > > > The output for dg-lto-message above refers to "warnings", > > > > rather > > > > than > > > > "messages" but that's the same as for the non-LTO case, where > > > > dg- > > > > message > > > > also refers to "warnings". > > > > > > > > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. > > > > > > > > OK for trunk? > > > > > > Hmm, but we do this in warn_odr already? How's that not enough? > > > > > > At least it seems the place you add this isn't ideal (not at the > > > "root cause"). > > > > > > Richard. > > > > [CCing Honza] > > > > Yes, warn_odr does apply the cache, but this warning is coming from > > warn_types_mismat
Re: [PATCH, rs6000] generate loop code for memcmp inline expansion
On Tue, 2017-12-12 at 10:13 -0600, Segher Boessenkool wrote: > Please fix those trivialities, and it's okay for trunk (after the > rtlanal patch is approved too). Thanks! Here's the final version of this, which is committed as 256351. 2018-01-08 Aaron Sawdey * config/rs6000/rs6000-string.c (do_load_for_compare_from_addr): New function. (do_ifelse): New function. (do_isel): New function. (do_sub3): New function. (do_add3): New function. (do_load_mask_compare): New function. (do_overlap_load_compare): New function. (expand_compare_loop): New function. (expand_block_compare): Call expand_compare_loop() when appropriate. * config/rs6000/rs6000.opt (-mblock-compare-inline-limit): Change option description. (-mblock-compare-inline-loop-limit): New option. -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-string.c === --- gcc/config/rs6000/rs6000-string.c (revision 256350) +++ gcc/config/rs6000/rs6000-string.c (working copy) @@ -303,6 +303,959 @@ return MIN (base_align, offset & -offset); } +/* Prepare address and then do a load. + + MODE is the mode to use for the load. + DEST is the destination register for the data. + ADDR is the address to be loaded. + ORIG_ADDR is the original address expression. */ +static void +do_load_for_compare_from_addr (machine_mode mode, rtx dest, rtx addr, + rtx orig_addr) +{ + rtx mem = gen_rtx_MEM (mode, addr); + MEM_COPY_ATTRIBUTES (mem, orig_addr); + set_mem_size (mem, GET_MODE_SIZE (mode)); + do_load_for_compare (dest, mem, mode); + return; +} + +/* Do a branch for an if/else decision. + + CMPMODE is the mode to use for the comparison. + COMPARISON is the rtx code for the compare needed. + A is the first thing to be compared. + B is the second thing to be compared. + CR is the condition code reg input, or NULL_RTX. + TRUE_LABEL is the label to branch to if the condition is true. + + The return value is the CR used for the comparison. + If CR is null_rtx, then a new register of CMPMODE is generated. + If A and B are both null_rtx, then CR must not be null, and the + compare is not generated so you can use this with a dot form insn. */ + +static void +do_ifelse (machine_mode cmpmode, rtx_code comparison, + rtx a, rtx b, rtx cr, rtx true_label) +{ + gcc_assert ((a == NULL_RTX && b == NULL_RTX && cr != NULL_RTX) + || (a != NULL_RTX && b != NULL_RTX)); + + if (cr != NULL_RTX) +gcc_assert (GET_MODE (cr) == cmpmode); + else +cr = gen_reg_rtx (cmpmode); + + rtx label_ref = gen_rtx_LABEL_REF (VOIDmode, true_label); + + if (a != NULL_RTX) +emit_move_insn (cr, gen_rtx_COMPARE (cmpmode, a, b)); + + rtx cmp_rtx = gen_rtx_fmt_ee (comparison, VOIDmode, cr, const0_rtx); + + rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx, label_ref, pc_rtx); + rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + JUMP_LABEL (j) = true_label; + LABEL_NUSES (true_label) += 1; +} + +/* Emit an isel of the proper mode for DEST. + + DEST is the isel destination register. + SRC1 is the isel source if CR is true. + SRC2 is the isel source if CR is false. + CR is the condition for the isel. */ +static void +do_isel (rtx dest, rtx cmp, rtx src_t, rtx src_f, rtx cr) +{ + if (GET_MODE (dest) == DImode) +emit_insn (gen_isel_signed_di (dest, cmp, src_t, src_f, cr)); + else +emit_insn (gen_isel_signed_si (dest, cmp, src_t, src_f, cr)); +} + +/* Emit a subtract of the proper mode for DEST. + + DEST is the destination register for the subtract. + SRC1 is the first subtract input. + SRC2 is the second subtract input. + + Computes DEST = SRC1-SRC2. */ +static void +do_sub3 (rtx dest, rtx src1, rtx src2) +{ + if (GET_MODE (dest) == DImode) +emit_insn (gen_subdi3 (dest, src1, src2)); + else +emit_insn (gen_subsi3 (dest, src1, src2)); +} + +/* Emit an add of the proper mode for DEST. + + DEST is the destination register for the add. + SRC1 is the first add input. + SRC2 is the second add input. + + Computes DEST = SRC1+SRC2. */ +static void +do_add3 (rtx dest, rtx src1, rtx src2) +{ + if (GET_MODE (dest) == DImode) +emit_insn (gen_adddi3 (dest, src1, src2)); + else +emit_insn (gen_addsi3 (dest, src1, src2)); +} + +/* Generate rtl for a load, shift, and compare of less than a full word. + + LOAD_MODE is the machine mode for the loads. + DIFF is the reg for the difference. + CMP_REM is the reg containing the remaining bytes to compare. + DCOND is the CCUNS reg for the compare if we are doing P9 code with setb. + SRC1_ADDR is the first source address. + SRC2_ADDR is the second source address. + ORIG_SRC1 is the original first source block's address rtx. + ORIG_SRC2 is the origin
Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
On 01/08/2018 07:19 AM, Bill Schmidt wrote: > >> On Jan 7, 2018, at 10:47 PM, Jeff Law wrote: >> >> On 01/07/2018 07:20 PM, Bill Schmidt wrote: >>> Hi Richard, >>> >>> Unfortunately, I don't see any way that this will be useful for the ppc >>> targets. We don't >>> have a way to force resolution of a condition prior to continuing >>> speculation, so this >>> will just introduce another comparison that we would speculate past. For >>> our mitigation >>> we will have to introduce an instruction that halts all speculation at that >>> point, and place >>> it in front of all dangerous loads. I wish it were otherwise. >> So could you have an expander for __builtin_load_no_speculate that just >> emits the magic insn that halts all speculation and essentially ignores >> the additional stuff that __builtin_load_no_speculate might be able to >> do on other platforms? > > This is possible, but the builtin documentation is completely misleading for > powerpc. We will not provide the semantics that this builtin claims to > provide. > So at a minimum we would need the documentation to indicate that the > additional > range-checking is target-specific behavior as well, not just the speculation > code. > At that point it isn't really a very target-neutral solution. > > What about other targets? This builtin seems predicated on specific behavior > of ARM architecture; I don't know whether other targets have a guaranteed > speculation-rectifying conditional test. > > For POWER, all we would need, or be able to exploit, is > > void __builtin_speculation_barrier () > > or some such. If there are two dangerous loads in one block, a single call > to this suffices, but a generic solution involving range checks for specific > loads would require one per load. So my concern is that if we have multiple builtins to deal with this problem, then we're forcing the pain of figuring out which one to use onto the users. I'd really like there to be a single builtin to address this problem. Otherwise the kernel (and anyone else that wants to use this stuff) is stuck with using both, conditional compilation or something along those lines which seems like a huge lose. We'd really like them to be able to add one appropriate __builtin_whatever call at the key site(s) that does the right thing regardless of the architecture. I think that implies we're likely to have arguments that are unused on some architectures. I can live with that. But it also implies we need better language around the semantics. As you mention -- there's some belief that we're going to want to do something for automatic detection in the. I think a builtin for that could well be different than what we provide to the kernel folks in the immediate term. I want to focus first on getting a builtin the kernel guys can use today as needed though. Jeff
[PATCH, combine]: Use correct mode for ASHIFT in force_int_to_mode
Hello! Attached patch corrects wrong mode argument in the call to force_to_mode call for ASHIFT operator. The patch uses "mode" mode, the same as for all binop and unop operators in the force_int_to_mode function. Also, the unpatched function would force operand to op_mode and later truncate to op_mode again, so it all looks like a typo to me. 2018-01-08 Uros Bizjak PR target/83628 * combine.c (force_int_to_mode) : Use mode instead of op_mode in the force_to_mode call. Together with a follow-up target patch, the patch fixes gcc.target/alpha/pr83628-2.c scan-asm failures on alpha. 2018-01-08 Uros Bizjak PR target/83628 * combine.c (force_int_to_mode) : Use mode instead of op_mode in the force_to_mode call. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. OK for mainline and branches? Uros. diff --git a/gcc/combine.c b/gcc/combine.c index 3a42de53455c..6adc0a7d6f85 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -8908,7 +8908,7 @@ force_int_to_mode (rtx x, scalar_int_mode mode, scalar_int_mode xmode, mask = fuller_mask; op0 = gen_lowpart_or_truncate (op_mode, - force_to_mode (XEXP (x, 0), op_mode, + force_to_mode (XEXP (x, 0), mode, mask, next_select)); if (op_mode != xmode || op0 != XEXP (x, 0))
Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping
Segher Boessenkool wrote: > On Mon, Jan 08, 2018 at 01:27:24PM +, Wilco Dijkstra wrote: > >> Peepholing is very conservative about instructions using SP and won't touch >> anything frame related. If this was working better then the backend could >> just >> emit single loads/stores and let peepholing generate LDP/STP. > > How unfortunate; that should definitely be improved then. Improving that helps indeed but won't fix the issue. The epilog may not always be duplicated and merged like in my example. Any subset of saves and restores may not be pairable, so the worst case still has twice as many memory accesses. > Always pairing two registers together *also* degrades code quality. No, while it's not optimal, it means smaller code and fewer memory accesses. >> Another issue is that after pro_and_epilogue pass I see multiple restores >> of the same registers and then a branch to the same block. We should try >> to avoid the unnecessary duplication. > > It already does that if *all* predecessors of that block do that. If you > want to do it in other cases, you end up with more jumps. That may be > beneficial in some cases, of course, but it is not an obvious win (and in > the general case it is, hrm let's use nice words, "terrible"). That may well be the problem. So if there are N predecessors, of which N-1 need to restore the same set of callee saves, but one was shrinkwrapped, N-1 copies of the same restores might be emitted. N could be the number of blocks in a function - I really hope it doesn't work out like that... Wilco
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Mon, 2018-01-08 at 09:27 +0100, Florian Weimer wrote: > * H. J. Lu: > > > > > This set of patches for GCC 8 mitigates variant #2 of the > > speculative execution vulnerabilities on x86 processors identified > > by CVE-2017-5715, aka Spectre. They convert indirect branches to > > call and return thunks to avoid speculative execution via indirect > > call and jmp. > Would it make sense to add a mode which relies on an empty return > stack cache? Or will CPUs use the regular branch predictor if the > return stack is empty? > > With an empty return stack cache and no branch predictor, a simple > PUSH/RET sequence cannot be predicted, so the complex CALL sequence > with a speculation barrier is not needed. Some CPUs will use the regular branch predictor if the RSB is empty. Others just round-robin the RSB and will use the *oldest* entry if they underflow. smime.p7s Description: S/MIME cryptographic signature
[Committed] Fix typo in comment.
Committed as obvious. Index: ChangeLog === --- ChangeLog (revision 256351) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2018-01-08 Steven G. Kargl + + * expr.c (gfc_check_pointer_assign): Fix typo in comment. + 2018-01-08 Paul Thomas PR fortran/83611 Index: expr.c === --- expr.c (revision 256351) +++ expr.c (working copy) @@ -3911,7 +3911,7 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr * /* Error for assignments of contiguous pointers to targets which is not contiguous. Be lenient in the definition of what counts as - congiguous. */ + contiguous. */ if (lhs_attr.contiguous && !gfc_is_simply_contiguous (rvalue, false, true)) gfc_error ("Assignment to contiguous pointer from non-contiguous " -- Steve
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, 2018-01-07 at 16:36 -0700, Jeff Law wrote: > > My fundamental problem with this patchkit is that it is 100% x86/x86_64 > specific. > > ISTM we want a target independent mechanism (ie, new standard patterns, > options, etc) then an x86/x86_64 implementation using that target > independent framework (ie, the actual implementation of those new > standard patterns). From the kernel point of view, I'm not too worried about GCC internal implementation details. What would be really useful to agree in short order is the command-line options that invoke this behaviour, and the ABI for the thunks. Once that's done, we can push the patches to Linus and people can build safe kernels, and we can build with HJ's existing patch set for the time being. And you can bikeshed the rest to your collective hearts' content... :) smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
On Mon, 2018-01-08 at 09:20 +0100, Florian Weimer wrote: > * H. J. Lu: > > > Add -mindirect-branch-loop= option to control loop filler in call and > > return thunks generated by -mindirect-branch=. 'lfence' uses "lfence" > > as loop filler. 'pause' uses "pause" as loop filler. 'nop' uses "nop" > > as loop filler. The default is 'lfence'. > > Why is the loop needed? Doesn't ud2 or cpuid stop speculative > execution? The idea is not to stop it per se, but to capture it. We trick the speculative execution into *thinking* it's going to return back to that endless loop, which prevents it from doing the branch prediction which would otherwise have got into trouble. There has been a fair amount of bikeshedding of precisely what goes in there already, and '1: pause; jmp 1b' is the best option that hasn't been shot down in flames by the CPU architects. HJ, do we still actually need the options for lfence and nop? I thought those were originally just for testing and could possibly be dropped now? Not that I care for Linux since I'm providing my own external thunk anyway... smime.p7s Description: S/MIME cryptographic signature
Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)
On Fri, 2018-01-05 at 15:29 -0500, Jason Merrill wrote: > On 12/29/2017 12:06 PM, David Malcolm wrote: > > One issue I ran into was that fold_for_warn doesn't eliminate > > location wrappers when processing_template_decl, leading to > > failures of the template-based cases in > > g++.dg/warn/Wmemset-transposed-args-1.C. > > > > This is due to the early bailout when processing_template_decl > > within cp_fold: > > > > 2078 if (processing_template_decl > > 2079 || (EXPR_P (x) && (!TREE_TYPE (x) || TREE_TYPE > > (x) == error_mark_node))) > > 2080return x; > > > > which dates back to the merger of the C++ delayed folding branch. > > > > I've fixed that in this version of the patch by removing that > > "processing_template_decl ||" condition from that cp_fold early > > bailout. > > Hmm, that makes me nervous. We might want to fold in templates when > called from fold_for_warn, but not for other occurrences. But I see > that we check processing_template_decl in cp_fully_fold and in the > call > to cp_fold_function, so I guess this is fine. > > > +case VIEW_CONVERT_EXPR: > > +case NON_LVALUE_EXPR: > > case CAST_EXPR: > > case REINTERPRET_CAST_EXPR: > > case CONST_CAST_EXPR: > > @@ -14937,6 +14940,15 @@ tsubst_copy (tree t, tree args, > > tsubst_flags_t complain, tree in_decl) > > case CONVERT_EXPR: > > case NOP_EXPR: > >{ > > + if (location_wrapper_p (t)) > > + { > > + /* Handle location wrappers by substituting the > > wrapped node > > + first, *then* reusing the resulting type. Doing > > the type > > + first ensures that we handle template parameters > > and > > + parameter pack expansions. */ > > + tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args, > > complain, in_decl); > > + return build1 (code, TREE_TYPE (op0), op0); > > + } > > I'd rather handle location wrappers separately, and abort if > VIEW_CONVERT_EXPR or NON_LVALUE_EXPR appear other than as wrappers. Once I fixed the issue with location_wrapper_p with decls changing type, it turns out that trunk is already passing VIEW_CONVERT_EXPR to tsubst_copy_and_build for non-wrapper nodes (and from there to tsubst_copy), where the default case currently handles them. Adding an assert turns this into an ICE. g++.dg/delayedfold/builtin1.C is the only instance of it I found in our test suite, where it's used here: class RegionLock { template void m_fn1(); int spinlock; } acquire_zero; int acquire_one; template void RegionLock::m_fn1() { __atomic_compare_exchange(&spinlock, &acquire_zero, &acquire_one, false, 2, 2); ^ } (gdb) call debug_tree (t) unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x718c9690 precision:32 min max pointer_to_this > unsigned type_6 DI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical- type 0x718d93f0> arg:0 unsigned type_6 DI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x71a18150> arg:0 arg:0 ../../src/gcc/testsuite/g++.dg/delayedfold/builtin1.C:10:40 start: ../../src/gcc/testsuite/g++.dg/delayedfold/builtin1.C:10:40 finish: ../../src/gcc/testsuite/g++.dg/delayedfold/builtin1.C:10:52>>> (This one is just for VIEW_CONVERT_EXPR; I don't yet know of any existing places where NON_LVALUE_EXPR can be passed to tsubst_*). Given that, is it OK to go with the approach in this (v2) patch? (presumably requiring the fix to location_wrapper_p to use a flag rather than a matching type). > > @@ -24998,6 +25018,8 @@ build_non_dependent_expr (tree expr) > >&& !expanding_concept ()) > > fold_non_dependent_expr (expr); > > > > + STRIP_ANY_LOCATION_WRAPPER (expr); > > Why is this needed? https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00349.html > Jason Thanks Dave
Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
On Jan 8, 2018, at 1:40 PM, Jeff Law wrote: > > On 01/08/2018 07:19 AM, Bill Schmidt wrote: >> >>> On Jan 7, 2018, at 10:47 PM, Jeff Law wrote: >>> >>> On 01/07/2018 07:20 PM, Bill Schmidt wrote: Hi Richard, Unfortunately, I don't see any way that this will be useful for the ppc targets. We don't have a way to force resolution of a condition prior to continuing speculation, so this will just introduce another comparison that we would speculate past. For our mitigation we will have to introduce an instruction that halts all speculation at that point, and place it in front of all dangerous loads. I wish it were otherwise. >>> So could you have an expander for __builtin_load_no_speculate that just >>> emits the magic insn that halts all speculation and essentially ignores >>> the additional stuff that __builtin_load_no_speculate might be able to >>> do on other platforms? >> >> This is possible, but the builtin documentation is completely misleading for >> powerpc. We will not provide the semantics that this builtin claims to >> provide. >> So at a minimum we would need the documentation to indicate that the >> additional >> range-checking is target-specific behavior as well, not just the speculation >> code. >> At that point it isn't really a very target-neutral solution. >> >> What about other targets? This builtin seems predicated on specific behavior >> of ARM architecture; I don't know whether other targets have a guaranteed >> speculation-rectifying conditional test. >> >> For POWER, all we would need, or be able to exploit, is >> >> void __builtin_speculation_barrier () >> >> or some such. If there are two dangerous loads in one block, a single call >> to this suffices, but a generic solution involving range checks for specific >> loads would require one per load. > So my concern is that if we have multiple builtins to deal with this > problem, then we're forcing the pain of figuring out which one to use > onto the users. > > I'd really like there to be a single builtin to address this problem. > Otherwise the kernel (and anyone else that wants to use this stuff) is > stuck with using both, conditional compilation or something along those > lines which seems like a huge lose. > > We'd really like them to be able to add one appropriate > __builtin_whatever call at the key site(s) that does the right thing > regardless of the architecture. > > I think that implies we're likely to have arguments that are unused on > some architectures. I can live with that. But it also implies we need > better language around the semantics. > > As you mention -- there's some belief that we're going to want to do > something for automatic detection in the. I think a builtin for that > could well be different than what we provide to the kernel folks in the > immediate term. I want to focus first on getting a builtin the kernel > guys can use today as needed though. Hi Jeff, I agree 100% with this approach. I just wanted to raise the point in case other architectures have different needs. Power can work around this by just ignoring 4 of the 5 arguments. As long as nobody else needs *additional* arguments, this should work out just fine. But I want to be clear that the only guarantee of the semantics for everybody is that "speculation stops here," while on some processors it may be "speculation stops here if out of range." If we can write this into the documentation, then I'm fine writing a target expander for Power as discussed. I had a brief interchange with Richi last week, and he suggested that for the automatic detection we might look into flagging MEM_REFs rather than inserting a built-in; a target hook can still handle such a flag. That has some advantages and some disadvantages that I can think of, so we'll have to talk that out on the list over time after we get through the crisis mode reactions. Thanks! Bill > > Jeff
Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
On Mon, Jan 8, 2018 at 1:00 PM, David Woodhouse wrote: > On Mon, 2018-01-08 at 09:20 +0100, Florian Weimer wrote: >> * H. J. Lu: >> >> > Add -mindirect-branch-loop= option to control loop filler in call and >> > return thunks generated by -mindirect-branch=. 'lfence' uses "lfence" >> > as loop filler. 'pause' uses "pause" as loop filler. 'nop' uses "nop" >> > as loop filler. The default is 'lfence'. >> >> Why is the loop needed? Doesn't ud2 or cpuid stop speculative >> execution? > > The idea is not to stop it per se, but to capture it. We trick the > speculative execution into *thinking* it's going to return back to that > endless loop, which prevents it from doing the branch prediction which > would otherwise have got into trouble. > > There has been a fair amount of bikeshedding of precisely what goes in > there already, and '1: pause; jmp 1b' is the best option that hasn't > been shot down in flames by the CPU architects. > > HJ, do we still actually need the options for lfence and nop? I thought > those were originally just for testing and could possibly be dropped > now? This is a trial change. It may be useful later. But I can drop it and hardcode it to "pause". -- H.J.
Re: [PATCH 3/5] x86: Add -mfunction-return=
On Mon, 2018-01-08 at 03:59 -0800, H.J. Lu wrote: > On Mon, Jan 8, 2018 at 1:56 AM, Martin Liška wrote: > > > > On 01/07/2018 11:59 PM, H.J. Lu wrote: > > > > > > Function return thunk is the same as memory thunk for -mindirect-branch= > > > where the return address is at the top of the stack: > > > > > > __x86_return_thunk: > > > call L2 > > > L1: > > > lfence > > > jmp L1 > > > L2: > > > lea 8(%rsp), %rsp|lea 4(%esp), %esp > > > ret > > > > > > and function return becomes > > > > > > jmp __x86_return_thunk > > Hello. > > > > Can you please explain more usage of the option? Is to prevent a speculative > > execution of 'ret' instruction (which is an indirect call), as described in > > [1]? > > The paper mentions that return stack predictors are commonly implemented in > > some form. > > Looks that current version of Linux patches does not use the option. > > > This option is requested by Linux kernel people. It may be used in > the future. Right. Essentially the new set of vulnerability are all about speculative execution. Instructions which *don't* get committed, and it's supposed to be like they never happen, actually *do* have side- effects and can leak information. This is *particularly* problematic for Intel CPUs where the CPU architects said "ah, screw it, let's not do memory permission checks in advance; as long as we make sure it's done before we *commit* an instruction it'll be fine". With the result that you can now basically read *all* of kernel memory, and hence all of physical memory, directly from userspace on Intel CPUs. Oops :) The fix for *that* one is to actually remove the kernel pages from the page tables while running userspace, instead of just setting the permissions to prevent access. Hence the whole Kernel Page Table Isolation thing. The next interesting attack is the so-called "variant 2" where the attacker pollutes the branch predictors so that in *kernel* mode the CPU *speculatively* runs... well, whatever the attacker wants. This is one that affects lots of vendors, not just Intel. We mitigate this by eliminating *all* the indirect branches in the kernel, to make it immune to such an attack. This is all very well, but *some* CPUs also pull in predictions from the generic branch target predictor when the return stack buffer has underflowed (e.g. if there was a call stack of more than X depth). Hence, in some cases we may yet end up needing this -mfunction-return= thunk too. As you (Martin) note, we don't use it *yet*. The full set of mitigations for the various attacks are still being put together, and the optimal choice for each CPU family does end up being different. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 1/5] x86: Add -mindirect-branch=
"H.J. Lu" writes: >> >> Talking about PIC thunks, those have I believe . character in their symbols, >> so that they can't be confused with user functions. Any reason these >> retpoline thunks aren't? >> > > They used to have '.'. It was changed at the last minute since kernel needs > to > export them as regular symbols. The kernel doesn't actually need that to export the symbols. While symbol CRCs cannot be generated for symbols with '.', CRCs are not needed and there were already patches to hide the resulting warnings. -Andi
Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping
On Mon, Jan 08, 2018 at 08:25:47PM +, Wilco Dijkstra wrote: > > Always pairing two registers together *also* degrades code quality. > > No, while it's not optimal, it means smaller code and fewer memory accesses. It means you execute *more* memory accesses. Always. This may be sometimes hidden, sure. I'm not saying you do not want more ldp's; I'm saying this particular strategy is very far from ideal. > >> Another issue is that after pro_and_epilogue pass I see multiple restores > >> of the same registers and then a branch to the same block. We should try > >> to avoid the unnecessary duplication. > > > > It already does that if *all* predecessors of that block do that. If you > > want to do it in other cases, you end up with more jumps. That may be > > beneficial in some cases, of course, but it is not an obvious win (and in > > the general case it is, hrm let's use nice words, "terrible"). > > That may well be the problem. So if there are N predecessors, of which N-1 > need to restore the same set of callee saves, but one was shrinkwrapped, > N-1 copies of the same restores might be emitted. N could be the number > of blocks in a function - I really hope it doesn't work out like that... In the worst case it would. OTOH, joining every combo into blocks costs O(2**C) (where C is the # components) bb's worst case. It isn't a simple problem. The current tuning works pretty well for us, but no doubt it can be improved! Segher
[PATCH] PR libstdc++/83709 don't rehash if no insertion
Hi Bug confirmed, limited to range insertion on unordered_set and unordered_map. I had to specialize _M_insert_range for those containers. Now this method maintains the theoretical number of elements to insert which is used only if an insertion takes place. I also took this oportunity to introduce a small change in __distance_fw to report 0 only if __first == __last and do nothing in this case. * include/bits/hashtable_policy.h (__distance_fwd(_Iterator, _Iterator, input_iterator_tag)): Return 1 if __first != __last. (_Insert_base::_M_insert_range(_Ite, _Ite, _NodeGetter, true_type)): New. (_Insert_base::_M_insert_range(_Ite, _Ite, _NodeGetter, false_type)): Add false_type parameter. (_Insert_base::insert): Adapt. * include/bits/hashtable.h (_Hashtable::operator=(initializzr_list<>)): Adapt. (_Hashtable::_M_insert_unique_node): Add __n_elt parameter, defaulted to 1. (_Hashtable::_M_insert(_Arg&&, const _NodeGen&, true_type, size_t)): Likewise. (_Hashtable::_M_merge_unique): Pass target number of elements to add to produce only 1 rehash if necessary. * testsuite/23_containers/unordered_map/insert/83709.cc: New. * testsuite/23_containers/unordered_set/insert/83709.cc: New. Tested under Linux x86_64 normal and debug modes. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index e6108a6..1245d79 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -490,7 +490,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __reuse_or_alloc_node_type __roan(_M_begin(), *this); _M_before_begin._M_nxt = nullptr; clear(); - this->_M_insert_range(__l.begin(), __l.end(), __roan); + this->_M_insert_range(__l.begin(), __l.end(), __roan, __unique_keys()); return *this; } @@ -678,7 +678,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // deallocate it on exception. iterator _M_insert_unique_node(size_type __bkt, __hash_code __code, - __node_type* __n); + __node_type* __n, size_type __n_elt = 1); // Insert node with hash code __code. Take ownership of the node, // deallocate it on exception. @@ -707,12 +707,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template std::pair - _M_insert(_Arg&&, const _NodeGenerator&, std::true_type); + _M_insert(_Arg&&, const _NodeGenerator&, true_type, size_type = 1); template iterator _M_insert(_Arg&& __arg, const _NodeGenerator& __node_gen, - std::false_type __uk) + false_type __uk) { return _M_insert(cend(), std::forward<_Arg>(__arg), __node_gen, __uk); @@ -722,7 +722,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template iterator _M_insert(const_iterator, _Arg&& __arg, - const _NodeGenerator& __node_gen, std::true_type __uk) + const _NodeGenerator& __node_gen, true_type __uk) { return _M_insert(std::forward<_Arg>(__arg), __node_gen, __uk).first; @@ -732,7 +732,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template iterator _M_insert(const_iterator, _Arg&&, - const _NodeGenerator&, std::false_type); + const _NodeGenerator&, false_type); size_type _M_erase(std::true_type, const key_type&); @@ -884,6 +884,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION node_type>, "Node types are compatible"); __glibcxx_assert(get_allocator() == __src.get_allocator()); + auto __n_elt = __src.size(); for (auto __i = __src.begin(), __end = __src.end(); __i != __end;) { auto __pos = __i++; @@ -893,9 +894,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION if (_M_find_node(__bkt, __k, __code) == nullptr) { auto __nh = __src.extract(__pos); - _M_insert_unique_node(__bkt, __code, __nh._M_ptr); + _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt); __nh._M_ptr = nullptr; + __n_elt = 1; } + else if (__n_elt != 1) + --__n_elt; } } @@ -1721,12 +1725,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>:: _M_insert_unique_node(size_type __bkt, __hash_code __code, - __node_type* __node) + __node_type* __node, size_type __n_elt) -> iterator { const __rehash_state& __saved_state = _M_rehash_policy._M_state(); std::pair __do_rehash - = _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count, 1); + = _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count, + __n_elt); __try { @@ -1824,7 +1829,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION auto _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>:: - _M_insert(_Arg&& __v, const _NodeGenerator& __node_gen, std::true_type) + _M_insert(_Arg&& __v, const _NodeGenerator& __node_gen, true_type, + size_type __n_elt) -> pair { const key_type& __k = this->_M_e
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On Mon, Jan 8, 2018 at 8:46 AM, Andi Kleen wrote: > "H.J. Lu" writes: >>> >>> Talking about PIC thunks, those have I believe . character in their symbols, >>> so that they can't be confused with user functions. Any reason these >>> retpoline thunks aren't? >>> >> >> They used to have '.'. It was changed at the last minute since kernel needs >> to >> export them as regular symbols. > > The kernel doesn't actually need that to export the symbols. > > While symbol CRCs cannot be generated for symbols with '.', CRCs are not > needed and there were already patches to hide the resulting warnings. > Andi, can you work it out with David? -- H.J.
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On Mon, 2018-01-08 at 13:32 -0800, H.J. Lu wrote: > On Mon, Jan 8, 2018 at 8:46 AM, Andi Kleen wrote: > > > > "H.J. Lu" writes: > > > > > > > > > > > > > > > Talking about PIC thunks, those have I believe . character in their > > > > symbols, > > > > so that they can't be confused with user functions. Any reason these > > > > retpoline thunks aren't? > > > > > > > They used to have '.'. It was changed at the last minute since kernel > > > needs to > > > export them as regular symbols. > > The kernel doesn't actually need that to export the symbols. > > > > While symbol CRCs cannot be generated for symbols with '.', CRCs are not > > needed and there were already patches to hide the resulting warnings. > > > Andi, can you work it out with David? It wasn't CONFIG_MODVERSIONS but CONFIG_TRIM_UNUSED_SYMBOLS which was the straw that broke the camel's back on that one. I'm open to a solution for that one, but I couldn't see one that didn't make my eyes bleed. Except for making the symbols not have dots in. https://patchwork.kernel.org/patch/10148081/ smime.p7s Description: S/MIME cryptographic signature