inline_small_functions speedup
After inlining A into B, inline_small_functions updates the information for (most) callees and callers of the new B: update_callee_keys (&edge_heap, where, updated_nodes); [...] /* Our profitability metric can depend on local properties such as number of inlinable calls and size of the function body. After inlining these properties might change for the function we inlined into (since it's body size changed) and for the functions called by function we inlined (since number of it inlinable callers might change). */ update_caller_keys (&edge_heap, where, updated_nodes, NULL); These functions in turn call can_inline_edge_p for most of the associated edges: if (can_inline_edge_p (edge, false) && want_inline_small_function_p (edge, false)) update_edge_key (heap, edge); can_inline_edge_p indirectly calls estimate_calls_size_and_time on the caller node, which seems to recursively process all callee edges rooted at the node. It looks from this like algorithm can be at least quadratic in the worst case. Maybe there's something we can do to make can_inline_edge_p cheaper, but since neither of these two calls is responsible for reporting an inline failure reason, it seems cheaper to test want_inline_small_function_p first, so that we don't calculate an estimate for something that we already know isn't a "small function". I think the only change needed to make that work is to check for CIF_FINAL_ERROR in want_inline_small_function_p; at the moment we rely on can_inline_edge_p to make that check. This cuts the time to build optabs.ii by over 4% with an --enable-checking=release compiler on x86_64-linux-gnu. I've seen more dramatic wins on aarch64-linux-gnu due to the NUM_POLY_INT_COEFFS==2 thing. The patch doesn't affect the output code. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2018-01-13 Richard Sandiford gcc/ * ipa-inline.c (want_inline_small_function_p): Return false if inlining has already failed with CIF_FINAL_ERROR. (update_caller_keys): Call want_inline_small_function_p before can_inline_edge_p. (update_callee_keys): Likewise. Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c2018-01-09 14:29:35.151550415 + +++ gcc/ipa-inline.c2018-01-14 08:43:35.653122186 + @@ -706,7 +706,11 @@ want_inline_small_function_p (struct cgr bool want_inline = true; struct cgraph_node *callee = e->callee->ultimate_alias_target (); - if (DECL_DISREGARD_INLINE_LIMITS (callee->decl)) + /* Allow this function to be called before can_inline_edge_p, + since it's usually cheaper. */ + if (cgraph_inline_failed_type (e->inline_failed) == CIF_FINAL_ERROR) +want_inline = false; + else if (DECL_DISREGARD_INLINE_LIMITS (callee->decl)) ; else if (!DECL_DECLARED_INLINE_P (callee->decl) && !opt_for_fn (e->caller->decl, flag_inline_small_functions)) @@ -1312,8 +1316,8 @@ update_caller_keys (edge_heap_t *heap, s if (!check_inlinablity_for || check_inlinablity_for == edge) { - if (can_inline_edge_p (edge, false) - && want_inline_small_function_p (edge, false)) + if (want_inline_small_function_p (edge, false) + && can_inline_edge_p (edge, false)) update_edge_key (heap, edge); else if (edge->aux) { @@ -1356,8 +1360,8 @@ update_callee_keys (edge_heap_t *heap, s && avail >= AVAIL_AVAILABLE && !bitmap_bit_p (updated_nodes, callee->uid)) { - if (can_inline_edge_p (e, false) - && want_inline_small_function_p (e, false)) + if (want_inline_small_function_p (e, false) + && can_inline_edge_p (e, false)) update_edge_key (heap, e); else if (e->aux) {
RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
Hi Arjan, > -Original Message- > From: Van De Ven, Arjan [mailto:arjan.van.de@intel.com] > Sent: Saturday, January 13, 2018 10:16 PM > To: David Woodhouse ; Kumar, Venkataramanan > ; H.J. Lu ; Jeff > Law ; Paul Turner ; Mallick, Asit K > > Cc: Nagarajan, Muthu kumar raj ; > GCC Patches ; Martin Jambor > ; Uros Bizjak (ubiz...@gmail.com) > ; Jan Hubicka ; Dharmakan, Rohit arul raj > > Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop= > > > > If everyone is ok, my suggestion is to use "lfence" as the default > > > loop filler for retpoline. > > can we do BOTH a pause and lfence. > (that way on cpu's where pause is the power stop, it works, and on cpus > where it's a fallthrough (AMD) it goes to the lfence) > I checked with our Architect. Having just "pause" is the concern. It should also be fine for AMD to use "pause" followed by "lfence" in the loop of retpoline. Regards, Venkat.
Re: PR81703 and Martin's fix for PR83501
On 12 January 2018 at 23:25, Jakub Jelinek wrote: > On Fri, Jan 12, 2018 at 10:38:39AM -0700, Jeff Law wrote: >> >>> Thanks for pointing it out. I see it there as well with >> >>> Prathamesh's test case, though not with the test case in >> >>> bug 83543. It is the same root cause in both. I agree >> >>> that enhancing the strlen pass to handle this case would >> >>> be preferable to just xfailing the test. I'm just not >> >>> sure it's possible before stage 3 closes. If not, I'll >> >>> work on it in GCC 9. Although the details are target- >> >>> specific, the limitation affects all targets and so >> >>> having a solution will benefit all all of them. >> >> Indeed, however for now I am not sure what would be the best approach ? >> >> If the test-case starts failing for many targets, not sure if XFAIL >> >> would be the right choice. >> >> Should I just restrict it to x86_64 target for now ? >> > >> > That sounds like a good approach in the interim, until we have >> > a general solution. It will avoid having to maintain a list >> > of targets where it's known to fail. >> Agreed and pre-approved. > > Just please test with > RUNTESTFLAGS='--target_board=unix\{-m64,-m32,-m32/-mno-sse\} > dg.exp=strlenopt-*.c' > and restrict to { i?86-*-* x86_64-*-* }, e.g. on Solaris it is i?86-*-* > canonical target, even when it supports -m64 multilib. > If you need x86_64 64-bit, that would be { { i?86-*-* x86_64-*-* } && lp64 } > or ! ia32, depending on if -mx32 works or not. Thanks, committed in r256657 after verifying that -m32 works. Thanks, Prathamesh > > Jakub
Re: [PATCH] Add new verification for profile-count.h.
On 01/12/2018 09:44 AM, Jan Hubicka wrote: Hi. Following patch adds new sanitization checks for profile_quality. Problem is that zero initialization of a struct with profile_count will lead to an invalid counter. This can help to catch them. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? OK, thanks! Honza Martin >From edec114cf1dd29bb571855a80e1b45ae040da200 Mon Sep 17 00:00:00 2001 From: marxin Date: Wed, 10 Jan 2018 14:46:08 +0100 Subject: [PATCH] Add new verification for profile-count.h. gcc/ChangeLog: 2018-01-12 Martin Liska * profile-count.h (enum profile_quality): Use 0 as invalid enum value of profile_quality. --- gcc/profile-count.h | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/profile-count.h b/gcc/profile-count.h index 3c5f720ee81..7a43917ebbc 100644 --- a/gcc/profile-count.h +++ b/gcc/profile-count.h @@ -30,27 +30,27 @@ enum profile_quality { or may not match reality. It is local to function and can not be compared inter-procedurally. Never used by probabilities (they are always local). */ - profile_guessed_local = 0, + profile_guessed_local = 1, /* Profile was read by feedback and was 0, we used local heuristics to guess better. This is the case of functions not run in profile fedback. Never used by probabilities. */ - profile_guessed_global0 = 1, + profile_guessed_global0 = 2, /* Same as profile_guessed_global0 but global count is adjusted 0. */ - profile_guessed_global0adjusted = 2, + profile_guessed_global0adjusted = 3, /* Profile is based on static branch prediction heuristics. It may or may not reflect the reality but it can be compared interprocedurally (for example, we inlined function w/o profile feedback into function with feedback and propagated from that). Never used by probablities. */ - profile_guessed = 3, + profile_guessed = 4, /* Profile was determined by autofdo. */ - profile_afdo = 4, + profile_afdo = 5, /* Profile was originally based on feedback but it was adjusted by code duplicating optimization. It may not precisely reflect the particular code path. */ - profile_adjusted = 5, + profile_adjusted = 6, /* Profile was read from profile feedback or determined by accurate static method. */ profile_precise = 7 @@ -505,6 +505,8 @@ public: /* Return false if profile_probability is bogus. */ bool verify () const { + gcc_checking_assert (profile_guessed_local <= m_quality + && m_quality <= profile_precise); Hi, FYI, in a no-bootstrap build, I'm seeing a lot of new warnings like this: ... ../../src/gcc/profile-count.h: In member function ‘bool profile_probability::verify() const’: ../../src/gcc/profile-count.h:509:20: warning: comparison is always true due to limited range of data type [-Wtype-limits] && m_quality <= profile_precise); ^ ../../src/gcc/system.h:742:14: note: in definition of macro ‘gcc_assert’ ((void)(!(EXPR) ? fancy_abort (__FILE__, __LINE__, __FUNCTION__), 0 : 0)) ^ ../../src/gcc/profile-count.h:508:7: note: in expansion of macro ‘gcc_checking_assert’ gcc_checking_assert (profile_guessed_local <= m_quality ... Indeed, profile_precise is 7 and m_quality is a 3 bits wide bitfield. Thanks, - Tom if (m_val == uninitialized_probability) return m_quality == profile_guessed; else if (m_quality < profile_guessed) @@ -784,6 +786,8 @@ public: /* Return false if profile_count is bogus. */ bool verify () const { + gcc_checking_assert (profile_guessed_local <= m_quality + && m_quality <= profile_precise); return m_val != uninitialized_count || m_quality == profile_guessed_local; } -- 2.14.3
Re: [PATCH] document -Wclass-memaccess suppression by casting (PR 81327)
On Sat, Jan 13, 2018 at 04:14:38PM -0700, Martin Sebor wrote: > -The @option{-Wclass-memaccess} option is enabled by @option{-Wall}. > +The @option{-Wclass-memaccess} option is enabled by @option{-Wall}. Casting Perhaps "Explicitly casting" instead? The implicit cast doesn't suppress it and occurs whenever there isn't an explicit cast. > +the pointer to the class object to @code{void *} or to a type that can be > +safely accessed by the raw memory function suppresses the warning. > > @item -Wnon-virtual-dtor @r{(C++ and Objective-C++ only)} > @opindex Wnon-virtual-dtor Jakub
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
> Hi HJ, > > > -Original Message- > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of H.J. Lu > > Sent: Sunday, January 14, 2018 9:07 AM > > To: gcc-patches@gcc.gnu.org > > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre > > > > This set of patches for GCC 8 mitigates variant #2 of the speculative > > execution vulnerabilities on x86 processors identified by CVE-2017-5715, aka > > Spectre. They convert indirect branches and function returns to call and > > return thunks to avoid speculative execution via indirect call, jmp and ret. > > > > H.J. Lu (5): > > x86: Add -mindirect-branch= > > x86: Add -mfunction-return= > > x86: Add -mindirect-branch-register > > x86: Add 'V' register operand modifier > > x86: Disallow -mindirect-branch=/-mfunction-return= with > > -mcmodel=large > > Current set of patches don't seem to have any option to generate "lfence" as > the loop filler in "retpoline", which is required by AMD. > Can you please clarify the plan. We would like to get this checked-in GCC 8. > Since thunks are output as strings, it is easy to add the option on the top of patch #1 of the series. I do not fully understand the reason for choosing pause over lfence for Intel, but if we need to do both, we need to have command line option (and possibly attribute). What would be reasonable name for it? Honza
Re: [PATCH] PR 78534, 83704 Large character lengths
On Sat, Jan 13, 2018 at 7:35 PM, Dominique d'Humières wrote: > I have finally bootstrapped gfortran with the two patches applied and the > spurious warnings with -Wall are now gone (limited testing), but I see a > regression for gfortran.dg/string_1.f90 due to an additional error > > /opt/gcc/_clean/gcc/testsuite/gfortran.dg/string_1.f90:13:15: > >print *, len(s) >1 > Error: Result of LEN overflows its kind at (1) > > No idea why this was not emitted without the patch. Thanks for testing, I'll look into this. Was it on a 32 or 64-bit target? (I suspect the issue could be that the 32-bit truncated value fits in [0,INT_MAX] and hence no problem is detected, but I'll have to look into it further to say something definite). >> Le 12 janv. 2018 à 21:43, Joseph Myers a écrit : >> >> On Fri, 12 Jan 2018, Janne Blomqvist wrote: >> >>> (I don't know why HOST_WIDE_INT_PRINT_DEC doesn't work with the >>> diagnostics machinery on darwin, but IMHO at this point it's too late >> >> HOST_WIDE_INT_PRINT_DEC is a host-specific printf format; for example, it >> might use %I64d on Windows host or %lld elsewhere. The formats used in >> diagnostic functions must *not* be host-specific or use macros anywhere in >> them at all, because that won't work with translation. Diagnostic >> functions have their own host-independent formats; %wd is the correct >> format for HOST_WIDE_INT in diagnostic functions. >> >> -- >> Joseph S. Myers >> jos...@codesourcery.com > > I have tried to use %wp, but it didn’t work: > > ../../work/gcc/fortran/decl.c: In function 'void > gfc_set_constant_character_len(gfc_charlen_t, gfc_expr*, gfc_charlen_t)': > ../../work/gcc/fortran/decl.c:1567:5: error: unknown conversion type > character 'w' in format [-Werror=format=] > "CHARACTER expression at %L is being truncated " > ^~~~ > "(%wd/%wd)", &expr->where, > ~~~ > ../../work/gcc/fortran/decl.c:1567:5: error: unknown conversion type > character 'w' in format [-Werror=format=] > ../../work/gcc/fortran/decl.c:1567:5: error: too many arguments for format > [-Werror=format-extra-args] > ../../work/gcc/fortran/decl.c:1575:17: error: unknown conversion type > character 'w' in format [-Werror=format=] > gfc_error_now ("The CHARACTER elements of the array constructor " > ^~ > "at %L must have the same length (%wd/%wd)", > ~~~ > ../../work/gcc/fortran/decl.c:1575:17: error: unknown conversion type > character 'w' in format [-Werror=format=] > ../../work/gcc/fortran/decl.c:1575:17: error: too many arguments for format > [-Werror=format-extra-args] Yes, same for the on x86-64 Linux. Might be something weird on the gfortran diagnostics stuff, as it seems %wd is used in the middle end. Anyway, that's stage1 stuff IMHO. -- Janne Blomqvist
Re: [PATCH 4/5] x86: Add 'V' register operand modifier
> Add 'V', a special modifier which prints the name of the full integer > register without '%'. For > > extern void (*func_p) (void); > > void > foo (void) > { > asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p)); > } > > it generates: > > foo: > movqfunc_p(%rip), %rax > call__x86_indirect_thunk_rax > ret > > gcc/ > > * config/i386/i386.c (print_reg): Print the name of the full > integer register without '%'. > (ix86_print_operand): Handle 'V'. >* doc/extend.texi: Document 'V' modifier. > > gcc/testsuite/ > > * gcc.target/i386/indirect-thunk-register-4.c: New test. OK. Honza
Re: [PATCH 3/5] x86: Add -mindirect-branch-register
> Add -mindirect-branch-register to force indirect branch via register. > This is implemented by disabling patterns of indirect branch via memory, > similar to TARGET_X32. > > -mindirect-branch= and -mfunction-return= tests are updated with > -mno-indirect-branch-register to avoid false test failures when > -mindirect-branch-register is added to RUNTESTFLAGS for "make check". > > gcc/ > > * config/i386/constraints.md (Bs): Disallow memory operand for > -mindirect-branch-register. > (Bw): Likewise. > * config/i386/predicates.md (indirect_branch_operand): Likewise. > (GOT_memory_operand): Likewise. > (call_insn_operand): Likewise. > (sibcall_insn_operand): Likewise. > (GOT32_symbol_operand): Likewise. > * config/i386/i386.md (indirect_jump): Call convert_memory_address > for -mindirect-branch-register. > (tablejump): Likewise. > (*sibcall_memory): Likewise. > (*sibcall_value_memory): Likewise. > Disallow peepholes of indirect call and jump via memory for > -mindirect-branch-register. > (*call_pop): Replace m with Bw. > (*call_value_pop): Likewise. > (*sibcall_pop_memory): Replace m with Bs. > * config/i386/i386.opt (mindirect-branch-register): New option. > * doc/invoke.texi: Document -mindirect-branch-register option. OK. Honza
Re: [PATCH 1/5] x86: Add -mindirect-branch=
> gcc/ > > * config/i386/i386-opts.h (indirect_branch): New. > * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise. > * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone > with local indirect jump when converting indirect call and jump. > (ix86_set_indirect_branch_type): New. > (ix86_set_current_function): Call ix86_set_indirect_branch_type. > (indirectlabelno): New. > (indirect_thunk_needed): Likewise. > (indirect_thunk_bnd_needed): Likewise. > (indirect_thunks_used): Likewise. > (indirect_thunks_bnd_used): Likewise. > (INDIRECT_LABEL): Likewise. > (indirect_thunk_name): Likewise. > (output_indirect_thunk): Likewise. > (output_indirect_thunk_function): Likewise. > (ix86_output_indirect_branch): Likewise. > (ix86_output_indirect_jmp): Likewise. > (ix86_code_end): Call output_indirect_thunk_function if needed. > (ix86_output_call_insn): Call ix86_output_indirect_branch if > needed. > (ix86_handle_fndecl_attribute): Handle indirect_branch. > (ix86_attribute_table): Add indirect_branch. > * config/i386/i386.h (machine_function): Add indirect_branch_type > and has_local_indirect_jump. > * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump > to true. > (tablejump): Likewise. > (*indirect_jump): Use ix86_output_indirect_jmp. > (*tablejump_1): Likewise. > (simple_return_indirect_internal): Likewise. > * config/i386/i386.opt (mindirect-branch=): New option. > (indirect_branch): New. > (keep): Likewise. > (thunk): Likewise. > (thunk-inline): Likewise. > (thunk-extern): Likewise. > * doc/extend.texi: Document indirect_branch function attribute. > * doc/invoke.texi: Document -mindirect-branch= option. > > gcc/testsuite/ > > * gcc.target/i386/indirect-thunk-1.c: New test. > * gcc.target/i386/indirect-thunk-2.c: Likewise. > * gcc.target/i386/indirect-thunk-3.c: Likewise. > * gcc.target/i386/indirect-thunk-4.c: Likewise. > * gcc.target/i386/indirect-thunk-5.c: Likewise. > * gcc.target/i386/indirect-thunk-6.c: Likewise. > * gcc.target/i386/indirect-thunk-7.c: Likewise. > * gcc.target/i386/indirect-thunk-attr-1.c: Likewise. > * gcc.target/i386/indirect-thunk-attr-2.c: Likewise. > * gcc.target/i386/indirect-thunk-attr-3.c: Likewise. > * gcc.target/i386/indirect-thunk-attr-4.c: Likewise. > * gcc.target/i386/indirect-thunk-attr-5.c: Likewise. > * gcc.target/i386/indirect-thunk-attr-6.c: Likewise. > * gcc.target/i386/indirect-thunk-attr-7.c: Likewise. > * gcc.target/i386/indirect-thunk-attr-8.c: Likewise. > * gcc.target/i386/indirect-thunk-bnd-1.c: Likewise. > * gcc.target/i386/indirect-thunk-bnd-2.c: Likewise. > * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise. > * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise. > * gcc.target/i386/indirect-thunk-extern-1.c: Likewise. > * gcc.target/i386/indirect-thunk-extern-2.c: Likewise. > * gcc.target/i386/indirect-thunk-extern-3.c: Likewise. > * gcc.target/i386/indirect-thunk-extern-4.c: Likewise. > * gcc.target/i386/indirect-thunk-extern-5.c: Likewise. > * gcc.target/i386/indirect-thunk-extern-6.c: Likewise. > * gcc.target/i386/indirect-thunk-extern-7.c: Likewise. > * gcc.target/i386/indirect-thunk-inline-1.c: Likewise. > * gcc.target/i386/indirect-thunk-inline-2.c: Likewise. > * gcc.target/i386/indirect-thunk-inline-3.c: Likewise. > * gcc.target/i386/indirect-thunk-inline-4.c: Likewise. > * gcc.target/i386/indirect-thunk-inline-5.c: Likewise. > * gcc.target/i386/indirect-thunk-inline-6.c: Likewise. > * gcc.target/i386/indirect-thunk-inline-7.c: Likewise. > + > + /* Pause . */ > + fprintf (asm_out_file, "\tpause\n"); OK, but please prepare incremental patches to choose between pause and lefence as needed for AMD CPUs and check for large code model.
Re: [PATCH 5/5] x86: Disallow -mindirect-branch=/-mfunction-return= with -mcmodel=large
> Since the thunk function may not be reachable in large code model, > -mcmodel=large is incompatible with -mindirect-branch=thunk, > -mindirect-branch=thunk-extern, -mfunction-return=thunk and > -mfunction-return=thunk-extern. Issue an error when they are used with > -mcmodel=large. > > gcc/ > > * config/i386/i386.c (ix86_set_indirect_branch_type): Disallow > -mcmodel=large with -mindirect-branch=thunk, > -mindirect-branch=thunk-extern, -mfunction-return=thunk and > -mfunction-return=thunk-extern. > * doc/invoke.texi: Document -mcmodel=large is incompatible with > -mindirect-branch=thunk, -mindirect-branch=thunk-extern, > -mfunction-return=thunk and -mfunction-return=thunk-extern. > > gcc/testsuite/ > > * gcc.target/i386/indirect-thunk-10.c: New test. > * gcc.target/i386/indirect-thunk-8.c: Likewise. > * gcc.target/i386/indirect-thunk-9.c: Likewise. > * gcc.target/i386/indirect-thunk-attr-10.c: Likewise. > * gcc.target/i386/indirect-thunk-attr-11.c: Likewise. > * gcc.target/i386/indirect-thunk-attr-9.c: Likewise. > * gcc.target/i386/ret-thunk-17.c: Likewise. > * gcc.target/i386/ret-thunk-18.c: Likewise. > * gcc.target/i386/ret-thunk-19.c: Likewise. > * gcc.target/i386/ret-thunk-20.c: Likewise. > * gcc.target/i386/ret-thunk-21.c: Likewise. OK. Honza
Re: [PATCH 2/5] x86: Add -mfunction-return=
> Add -mfunction-return= option to convert function return to call and > return thunks. The default is 'keep', which keeps function return > unmodified. 'thunk' converts function return to call and return thunk. > 'thunk-inline' converts function return to inlined call and return thunk. > 'thunk-extern' converts function return to external call and return > thunk provided in a separate object file. You can control this behavior > for a specific function by using the function attribute function_return. > > Function return thunk is the same as memory thunk for -mindirect-branch= > where the return address is at the top of the stack: > > __x86_return_thunk: > call L2 > L1: > pause > jmp L1 > L2: > lea 8(%rsp), %rsp|lea 4(%esp), %esp > ret > > and function return becomes > > jmp __x86_return_thunk > > -mindirect-branch= tests are updated with -mfunction-return=keep to > avoid false test failures when -mfunction-return=thunk is added to > RUNTESTFLAGS for "make check". > > gcc/ > > * config/i386/i386-protos.h (ix86_output_function_return): New. > * config/i386/i386.c (ix86_set_indirect_branch_type): Also > set function_return_type. > (indirect_thunk_name): Add ret_p to indicate thunk for function > return. > (output_indirect_thunk_function): Pass false to > indirect_thunk_name. > (ix86_output_indirect_branch): Likewise. > (output_indirect_thunk_function): Create alias for function > return thunk if regno < 0. > (ix86_output_function_return): New function. > (ix86_handle_fndecl_attribute): Handle function_return. > (ix86_attribute_table): Add function_return. > * config/i386/i386.h (machine_function): Add > function_return_type. > * config/i386/i386.md (simple_return_internal): Use > ix86_output_function_return. > (simple_return_internal_long): Likewise. > * config/i386/i386.opt (mfunction-return=): New option. > (indirect_branch): Mention -mfunction-return=. > * doc/extend.texi: Document function_return function attribute. > * doc/invoke.texi: Document -mfunction-return= option. The implementation is reasonable, but I still do not quite understand if we really need this change. Celarly killing every return is going to have significant impact and it is apparently not used by Linux kernel. What is the main motivation for it? Honza
Re: inline_small_functions speedup
> After inlining A into B, inline_small_functions updates the information > for (most) callees and callers of the new B: > > update_callee_keys (&edge_heap, where, updated_nodes); > [...] > /* Our profitability metric can depend on local properties >such as number of inlinable calls and size of the function body. >After inlining these properties might change for the function we >inlined into (since it's body size changed) and for the functions >called by function we inlined (since number of it inlinable callers >might change). */ > update_caller_keys (&edge_heap, where, updated_nodes, NULL); > > These functions in turn call can_inline_edge_p for most of the associated > edges: > > if (can_inline_edge_p (edge, false) > && want_inline_small_function_p (edge, false)) > update_edge_key (heap, edge); > > can_inline_edge_p indirectly calls estimate_calls_size_and_time > on the caller node, which seems to recursively process all callee > edges rooted at the node. It looks from this like algorithm can be > at least quadratic in the worst case. Yep, I have patch to add overall size into summary that makes it constant time. Will need to clean it up for mainline. > > Maybe there's something we can do to make can_inline_edge_p cheaper, but > since neither of these two calls is responsible for reporting an inline > failure reason, it seems cheaper to test want_inline_small_function_p > first, so that we don't calculate an estimate for something that we > already know isn't a "small function". I think the only change > needed to make that work is to check for CIF_FINAL_ERROR in > want_inline_small_function_p; at the moment we rely on can_inline_edge_p > to make that check. > > This cuts the time to build optabs.ii by over 4% with an > --enable-checking=release compiler on x86_64-linux-gnu. I've seen more > dramatic wins on aarch64-linux-gnu due to the NUM_POLY_INT_COEFFS==2 > thing. The patch doesn't affect the output code. > > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. > OK to install? Anyway this seems to make sense to do. > > Richard > > > 2018-01-13 Richard Sandiford > > gcc/ > * ipa-inline.c (want_inline_small_function_p): Return false if > inlining has already failed with CIF_FINAL_ERROR. > (update_caller_keys): Call want_inline_small_function_p before > can_inline_edge_p. > (update_callee_keys): Likewise. OK, thanks! Honza
Re: [PATCH] PR 78534, 83704 Large character lengths
Am 14.01.2018 um 11:45 schrieb Janne Blomqvist: I have tried to use %wp, but it didn’t work: ../../work/gcc/fortran/decl.c: In function 'void gfc_set_constant_character_len(gfc_charlen_t, gfc_expr*, gfc_charlen_t)': ../../work/gcc/fortran/decl.c:1567:5: error: unknown conversion type character 'w' in format [-Werror=format=] "CHARACTER expression at %L is being truncated " ^~~~ gfortran diagnostics stuff, as it seems %wd is used in the middle end. Anyway, that's stage1 stuff IMHO. I'd suggest using %ld and casting the value to long int, or %lld and casting to long long int, at least for the time being. Regards Thomas
[patch, fortran] Change ABI for F2008 - minloc/maxloc BACK argument
Hello world, here is the latest take on the min/maxloc ABI change for BACK. This version now passes BACK as a GFC_LOGCIAL_4 by value in all cases. I did this by using the existing %VAL mechanism. I also added another test case which crashed during one stage of development. So, OK for trunk? Regards Thomas 2018-01-14 Thomas Koenig PR fortran/54613 * gfortran.h (gfc_check_f): Rename f4ml to f5ml. (gfc_logical_4_kind): New macro * intrinsic.h (gfc_simplify_minloc): Add a gfc_expr *argument. (gfc_simplify_maxloc): Likewise. (gfc_resolve_maxloc): Likewise. (gfc_resolve_minloc): Likewise. * check.c (gfc_check_minloc_maxloc): Add checking for "back" argument; also raise error if it is used (for now). Add it if it isn't present. * intrinsic.c (add_sym_4ml): Rename to (add_sym_5ml), adjust for extra argument. (add_functions): Add "back" constant. Adjust maxloc and minloc for back argument. * iresolve.c (gfc_resolve_maxloc): Add back argument. If back is not of gfc_logical_4_kind, convert. (gfc_resolve_minloc): Likewise. * simplify.c (gfc_simplify_minloc): Add back argument. (gfc_simplify_maxloc): Likewise. * trans-intinsic.c (gfc_conv_intrinsic_minmaxloc): Rename last argument to %VAL to ensure passing by value. (gfc_conv_intrinsic_function): Call gfc_conv_intrinsic_minmaxloc also for library calls. 2018-01-14 Thomas Koenig PR fortran/54613 * m4/iparm.m4: Add back_arg macro if in minloc or maxloc. * m4/iforeach-s.m4: Add optional argument back with back_arg macro. Improve m4 quoting. If HAVE_BACK_ARG is defined, assert that back is non-true. * m4/iforeach.m4: Likewise. * m4/ifunction-s.m4: Likewise. * m4/ifunction.m4: Likewise. * m4/maxloc0.m4: Include assert.h * m4/minloc0.m4: Likewise. * m4/maxloc0s.m4: #define HAVE_BACK_ARG. * m4/minloc0s.m4: Likewise. * m4/maxloc1s.m4: Likewise. * m4/minloc1s.m4: Likewise. * m4/maxloc1.m4: Include assert.h, #define HAVE_BACK_ARG. * m4/minloc1.m4: Likewise. * m4/maxloc2s.m4: Add assert.h, add back_arg, assert that back is non-true. * m4/minloc2s.m4: Likewise. * generated/iall_i1.c: Regenerated. * generated/iall_i16.c: Regenerated. * generated/iall_i2.c: Regenerated. * generated/iall_i4.c: Regenerated. * generated/iall_i8.c: Regenerated. * generated/iany_i1.c: Regenerated. * generated/iany_i16.c: Regenerated. * generated/iany_i2.c: Regenerated. * generated/iany_i4.c: Regenerated. * generated/iany_i8.c: Regenerated. * generated/iparity_i1.c: Regenerated. * generated/iparity_i16.c: Regenerated. * generated/iparity_i2.c: Regenerated. * generated/iparity_i4.c: Regenerated. * generated/iparity_i8.c: Regenerated. * generated/maxloc0_16_i1.c: Regenerated. * generated/maxloc0_16_i16.c: Regenerated. * generated/maxloc0_16_i2.c: Regenerated. * generated/maxloc0_16_i4.c: Regenerated. * generated/maxloc0_16_i8.c: Regenerated. * generated/maxloc0_16_r10.c: Regenerated. * generated/maxloc0_16_r16.c: Regenerated. * generated/maxloc0_16_r4.c: Regenerated. * generated/maxloc0_16_r8.c: Regenerated. * generated/maxloc0_16_s1.c: Regenerated. * generated/maxloc0_16_s4.c: Regenerated. * generated/maxloc0_4_i1.c: Regenerated. * generated/maxloc0_4_i16.c: Regenerated. * generated/maxloc0_4_i2.c: Regenerated. * generated/maxloc0_4_i4.c: Regenerated. * generated/maxloc0_4_i8.c: Regenerated. * generated/maxloc0_4_r10.c: Regenerated. * generated/maxloc0_4_r16.c: Regenerated. * generated/maxloc0_4_r4.c: Regenerated. * generated/maxloc0_4_r8.c: Regenerated. * generated/maxloc0_4_s1.c: Regenerated. * generated/maxloc0_4_s4.c: Regenerated. * generated/maxloc0_8_i1.c: Regenerated. * generated/maxloc0_8_i16.c: Regenerated. * generated/maxloc0_8_i2.c: Regenerated. * generated/maxloc0_8_i4.c: Regenerated. * generated/maxloc0_8_i8.c: Regenerated. * generated/maxloc0_8_r10.c: Regenerated. * generated/maxloc0_8_r16.c: Regenerated. * generated/maxloc0_8_r4.c: Regenerated. * generated/maxloc0_8_r8.c: Regenerated. * generated/maxloc0_8_s1.c: Regenerated. * generated/maxloc0_8_s4.c: Regenerated. * generated/maxloc1_16_i1.c: Regenerated. * generated/maxloc1_16_i16.c: Regenerated. * generated/maxloc1_16_i2.c: Regenerated. * generated/maxloc1_16_i4.c: Regenerated. * generated/maxloc1_16_i8.c: Regenerated. * generated/maxloc1_16_r10.c: Regenerated.
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On 2018.01.14 at 11:46 +0100, Jan Hubicka wrote: > > gcc/ > > > > * config/i386/i386-opts.h (indirect_branch): New. > > * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise. > > * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone > > with local indirect jump when converting indirect call and jump. > > (ix86_set_indirect_branch_type): New. > > (ix86_set_current_function): Call ix86_set_indirect_branch_type. > > (indirectlabelno): New. > > (indirect_thunk_needed): Likewise. > > (indirect_thunk_bnd_needed): Likewise. > > (indirect_thunks_used): Likewise. > > (indirect_thunks_bnd_used): Likewise. > > (INDIRECT_LABEL): Likewise. > > (indirect_thunk_name): Likewise. > > (output_indirect_thunk): Likewise. > > (output_indirect_thunk_function): Likewise. > > (ix86_output_indirect_branch): Likewise. > > (ix86_output_indirect_jmp): Likewise. > > (ix86_code_end): Call output_indirect_thunk_function if needed. > > (ix86_output_call_insn): Call ix86_output_indirect_branch if > > needed. > > (ix86_handle_fndecl_attribute): Handle indirect_branch. > > (ix86_attribute_table): Add indirect_branch. > > * config/i386/i386.h (machine_function): Add indirect_branch_type > > and has_local_indirect_jump. > > * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump > > to true. > > (tablejump): Likewise. > > (*indirect_jump): Use ix86_output_indirect_jmp. > > (*tablejump_1): Likewise. > > (simple_return_indirect_internal): Likewise. > > * config/i386/i386.opt (mindirect-branch=): New option. > > (indirect_branch): New. > > (keep): Likewise. > > (thunk): Likewise. > > (thunk-inline): Likewise. > > (thunk-extern): Likewise. > > * doc/extend.texi: Document indirect_branch function attribute. > > * doc/invoke.texi: Document -mindirect-branch= option. > > + > > + /* Pause . */ > > + fprintf (asm_out_file, "\tpause\n"); > > OK, but please prepare incremental patches to choose between pause and lefence > as needed for AMD CPUs and check for large code model. Why not use both? That would make everybody happy, no? pause lfence jmp -- Markus
Re: [PATCH 1/5] x86: Add -mindirect-branch=
> On 2018.01.14 at 11:46 +0100, Jan Hubicka wrote: > > > gcc/ > > > > > > * config/i386/i386-opts.h (indirect_branch): New. > > > * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise. > > > * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone > > > with local indirect jump when converting indirect call and jump. > > > (ix86_set_indirect_branch_type): New. > > > (ix86_set_current_function): Call ix86_set_indirect_branch_type. > > > (indirectlabelno): New. > > > (indirect_thunk_needed): Likewise. > > > (indirect_thunk_bnd_needed): Likewise. > > > (indirect_thunks_used): Likewise. > > > (indirect_thunks_bnd_used): Likewise. > > > (INDIRECT_LABEL): Likewise. > > > (indirect_thunk_name): Likewise. > > > (output_indirect_thunk): Likewise. > > > (output_indirect_thunk_function): Likewise. > > > (ix86_output_indirect_branch): Likewise. > > > (ix86_output_indirect_jmp): Likewise. > > > (ix86_code_end): Call output_indirect_thunk_function if needed. > > > (ix86_output_call_insn): Call ix86_output_indirect_branch if > > > needed. > > > (ix86_handle_fndecl_attribute): Handle indirect_branch. > > > (ix86_attribute_table): Add indirect_branch. > > > * config/i386/i386.h (machine_function): Add indirect_branch_type > > > and has_local_indirect_jump. > > > * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump > > > to true. > > > (tablejump): Likewise. > > > (*indirect_jump): Use ix86_output_indirect_jmp. > > > (*tablejump_1): Likewise. > > > (simple_return_indirect_internal): Likewise. > > > * config/i386/i386.opt (mindirect-branch=): New option. > > > (indirect_branch): New. > > > (keep): Likewise. > > > (thunk): Likewise. > > > (thunk-inline): Likewise. > > > (thunk-extern): Likewise. > > > * doc/extend.texi: Document indirect_branch function attribute. > > > * doc/invoke.texi: Document -mindirect-branch= option. > > > + > > > + /* Pause . */ > > > + fprintf (asm_out_file, "\tpause\n"); > > > > OK, but please prepare incremental patches to choose between pause and > > lefence > > as needed for AMD CPUs and check for large code model. > > Why not use both? That would make everybody happy, no? > > pause > lfence > jmp Yes, according to my understanding we want both pause and lefence by default (it is expensive anyway) + command line option to control which one to use? Probably thus the first patch should default to both. Honza > > -- > Markus
Do not ICE on roundoff errors in ipa-inline
Hi, this patch fixes ICE on the testcase where we estimate function body to be rather slow and get roundoff error out of sreal. Honza PR ipa/83051 * gcc.c-torture/compile/pr83051.c: New testcase. * ipa-inline.c (edge_badness): Tolerate roundoff errors. Index: testsuite/gcc.c-torture/compile/pr83051.c === --- testsuite/gcc.c-torture/compile/pr83051.c (revision 0) +++ testsuite/gcc.c-torture/compile/pr83051.c (working copy) @@ -0,0 +1,31 @@ + +int a[1], b, c, d, e, f, g, h; + +void fn1 (int p) +{ + b = b >> 8 ^ a[b ^ (c & 5)] >> 8 ^ a[(b ^ c) & 5]; + b = b >> 8 ^ a[(b ^ c) & 5]; +} + +static void fn2 () +{ + int k; + while (1) +while (e) + { +while (g) + while (h) +for (k = 0; k < 6; k++) + while (f) +fn1 (0); +fn1 (0); +fn1 (0); +fn1 (0); + } +} + +int main () +{ + fn2 (); + return 0; +} Index: ipa-inline.c === --- ipa-inline.c(revision 256647) +++ ipa-inline.c(working copy) @@ -993,7 +993,8 @@ edge_badness (struct cgraph_edge *edge, /* Check that inlined time is better, but tolerate some roundoff issues. FIXME: When callee profile drops to 0 we account calls more. This should be fixed by never doing that. */ - gcc_checking_assert ((edge_time - callee_info->time).to_int () <= 0 + gcc_checking_assert ((edge_time * 100 + - callee_info->time * 101).to_int () <= 0 || callee->count.ipa ().initialized_p ()); gcc_checking_assert (growth <= callee_info->size);
Re: [PATCH 2/5] x86: Add -mfunction-return=
On Sun, Jan 14, 2018 at 2:48 AM, Jan Hubicka wrote: >> Add -mfunction-return= option to convert function return to call and >> return thunks. The default is 'keep', which keeps function return >> unmodified. 'thunk' converts function return to call and return thunk. >> 'thunk-inline' converts function return to inlined call and return thunk. >> 'thunk-extern' converts function return to external call and return >> thunk provided in a separate object file. You can control this behavior >> for a specific function by using the function attribute function_return. >> >> Function return thunk is the same as memory thunk for -mindirect-branch= >> where the return address is at the top of the stack: >> >> __x86_return_thunk: >> call L2 >> L1: >> pause >> jmp L1 >> L2: >> lea 8(%rsp), %rsp|lea 4(%esp), %esp >> ret >> >> and function return becomes >> >> jmp __x86_return_thunk >> >> -mindirect-branch= tests are updated with -mfunction-return=keep to >> avoid false test failures when -mfunction-return=thunk is added to >> RUNTESTFLAGS for "make check". >> >> gcc/ >> >> * config/i386/i386-protos.h (ix86_output_function_return): New. >> * config/i386/i386.c (ix86_set_indirect_branch_type): Also >> set function_return_type. >> (indirect_thunk_name): Add ret_p to indicate thunk for function >> return. >> (output_indirect_thunk_function): Pass false to >> indirect_thunk_name. >> (ix86_output_indirect_branch): Likewise. >> (output_indirect_thunk_function): Create alias for function >> return thunk if regno < 0. >> (ix86_output_function_return): New function. >> (ix86_handle_fndecl_attribute): Handle function_return. >> (ix86_attribute_table): Add function_return. >> * config/i386/i386.h (machine_function): Add >> function_return_type. >> * config/i386/i386.md (simple_return_internal): Use >> ix86_output_function_return. >> (simple_return_internal_long): Likewise. >> * config/i386/i386.opt (mfunction-return=): New option. >> (indirect_branch): Mention -mfunction-return=. >> * doc/extend.texi: Document function_return function attribute. >> * doc/invoke.texi: Document -mfunction-return= option. > > The implementation is reasonable, but I still do not quite understand > if we really need this change. Celarly killing every return is going > to have significant impact and it is apparently not used by Linux kernel. > What is the main motivation for it? > > Honza See: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00540.html I will check it in. -- H.J.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 11:40 AM, Jan Hubicka wrote: >> Hi HJ, >> >> > -Original Message- >> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >> > ow...@gcc.gnu.org] On Behalf Of H.J. Lu >> > Sent: Sunday, January 14, 2018 9:07 AM >> > To: gcc-patches@gcc.gnu.org >> > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre >> > >> > This set of patches for GCC 8 mitigates variant #2 of the speculative >> > execution vulnerabilities on x86 processors identified by CVE-2017-5715, >> > aka >> > Spectre. They convert indirect branches and function returns to call and >> > return thunks to avoid speculative execution via indirect call, jmp and >> > ret. >> > >> > H.J. Lu (5): >> > x86: Add -mindirect-branch= >> > x86: Add -mfunction-return= >> > x86: Add -mindirect-branch-register >> > x86: Add 'V' register operand modifier >> > x86: Disallow -mindirect-branch=/-mfunction-return= with >> > -mcmodel=large >> >> Current set of patches don't seem to have any option to generate "lfence" as >> the loop filler in "retpoline", which is required by AMD. >> Can you please clarify the plan. We would like to get this checked-in GCC 8. > > Since thunks are output as strings, it is easy to add the option > on the top of patch #1 of the series. I do not fully understand > the reason for choosing pause over lfence for Intel, but if we need > to do both, we need to have command line option (and possibly attribute). > What would be reasonable name for it? Looking at the kernel patch [1], the loop filler should be "pause;lfence" sequence, and should be universally accepted for Intel and AMD targets. [1] https://www.spinics.net/lists/kernel/msg2697507.html Uros.
Re: [PATCH] PR 78534, 83704 Large character lengths
On Sun, Jan 14, 2018 at 12:45 PM, Janne Blomqvist wrote: > On Sat, Jan 13, 2018 at 7:35 PM, Dominique d'Humières > wrote: >> I have finally bootstrapped gfortran with the two patches applied and the >> spurious warnings with -Wall are now gone (limited testing), but I see a >> regression for gfortran.dg/string_1.f90 due to an additional error >> >> /opt/gcc/_clean/gcc/testsuite/gfortran.dg/string_1.f90:13:15: >> >>print *, len(s) >>1 >> Error: Result of LEN overflows its kind at (1) >> >> No idea why this was not emitted without the patch. > > Thanks for testing, I'll look into this. Was it on a 32 or 64-bit target? I can't recreate this on i686-pc-linux-gnu. What host/target are you using exactly, is it part of the normal testsuite run or do you compile it with some special options? In any case, that line isn't particularly relevant to what the testcase is testing, so it could be removed completely, or by specifying kind=8 to len(). -- Janne Blomqvist
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
> On Sun, Jan 14, 2018 at 11:40 AM, Jan Hubicka wrote: > >> Hi HJ, > >> > >> > -Original Message- > >> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > >> > ow...@gcc.gnu.org] On Behalf Of H.J. Lu > >> > Sent: Sunday, January 14, 2018 9:07 AM > >> > To: gcc-patches@gcc.gnu.org > >> > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre > >> > > >> > This set of patches for GCC 8 mitigates variant #2 of the speculative > >> > execution vulnerabilities on x86 processors identified by CVE-2017-5715, > >> > aka > >> > Spectre. They convert indirect branches and function returns to call and > >> > return thunks to avoid speculative execution via indirect call, jmp and > >> > ret. > >> > > >> > H.J. Lu (5): > >> > x86: Add -mindirect-branch= > >> > x86: Add -mfunction-return= > >> > x86: Add -mindirect-branch-register > >> > x86: Add 'V' register operand modifier > >> > x86: Disallow -mindirect-branch=/-mfunction-return= with > >> > -mcmodel=large > >> > >> Current set of patches don't seem to have any option to generate "lfence" > >> as the loop filler in "retpoline", which is required by AMD. > >> Can you please clarify the plan. We would like to get this checked-in GCC > >> 8. > > > > Since thunks are output as strings, it is easy to add the option > > on the top of patch #1 of the series. I do not fully understand > > the reason for choosing pause over lfence for Intel, but if we need > > to do both, we need to have command line option (and possibly attribute). > > What would be reasonable name for it? > > Looking at the kernel patch [1], the loop filler should be > "pause;lfence" sequence, and should be universally accepted for Intel > and AMD targets. > > [1] https://www.spinics.net/lists/kernel/msg2697507.html Yep, I would say we should go with pause;lfence now and see if we want to add argument eventually. HJ, does it sound OK? Honza > > Uros.
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On Sun, Jan 14, 2018 at 3:06 AM, Jan Hubicka wrote: >> On 2018.01.14 at 11:46 +0100, Jan Hubicka wrote: >> > > gcc/ >> > > >> > > * config/i386/i386-opts.h (indirect_branch): New. >> > > * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise. >> > > * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone >> > > with local indirect jump when converting indirect call and jump. >> > > (ix86_set_indirect_branch_type): New. >> > > (ix86_set_current_function): Call ix86_set_indirect_branch_type. >> > > (indirectlabelno): New. >> > > (indirect_thunk_needed): Likewise. >> > > (indirect_thunk_bnd_needed): Likewise. >> > > (indirect_thunks_used): Likewise. >> > > (indirect_thunks_bnd_used): Likewise. >> > > (INDIRECT_LABEL): Likewise. >> > > (indirect_thunk_name): Likewise. >> > > (output_indirect_thunk): Likewise. >> > > (output_indirect_thunk_function): Likewise. >> > > (ix86_output_indirect_branch): Likewise. >> > > (ix86_output_indirect_jmp): Likewise. >> > > (ix86_code_end): Call output_indirect_thunk_function if needed. >> > > (ix86_output_call_insn): Call ix86_output_indirect_branch if >> > > needed. >> > > (ix86_handle_fndecl_attribute): Handle indirect_branch. >> > > (ix86_attribute_table): Add indirect_branch. >> > > * config/i386/i386.h (machine_function): Add indirect_branch_type >> > > and has_local_indirect_jump. >> > > * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump >> > > to true. >> > > (tablejump): Likewise. >> > > (*indirect_jump): Use ix86_output_indirect_jmp. >> > > (*tablejump_1): Likewise. >> > > (simple_return_indirect_internal): Likewise. >> > > * config/i386/i386.opt (mindirect-branch=): New option. >> > > (indirect_branch): New. >> > > (keep): Likewise. >> > > (thunk): Likewise. >> > > (thunk-inline): Likewise. >> > > (thunk-extern): Likewise. >> > > * doc/extend.texi: Document indirect_branch function attribute. >> > > * doc/invoke.texi: Document -mindirect-branch= option. >> > > + >> > > + /* Pause . */ >> > > + fprintf (asm_out_file, "\tpause\n"); >> > >> > OK, but please prepare incremental patches to choose between pause and >> > lefence >> > as needed for AMD CPUs and check for large code model. >> >> Why not use both? That would make everybody happy, no? >> >> pause >> lfence >> jmp > > Yes, according to my understanding we want both pause and lefence by default > (it is expensive anyway) + command line option to control which one to use? > > Probably thus the first patch should default to both. Done. This is the patch I am checking in. -- H.J. From de23211f9a2fae02fe003f7b32016b94be9c8407 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sat, 6 Jan 2018 22:29:55 -0800 Subject: [PATCH 1/6] x86: Add -mindirect-branch= Add -mindirect-branch= option to convert indirect call and jump to call and return thunks. The default is 'keep', which keeps indirect call and jump unmodified. 'thunk' converts indirect call and jump to call and return thunk. 'thunk-inline' converts indirect call and jump to inlined call and return thunk. 'thunk-extern' converts indirect call and jump to external call and return thunk provided in a separate object file. You can control this behavior for a specific function by using the function attribute indirect_branch. 2 kinds of thunks are geneated. Memory thunk where the function address is at the top of the stack: __x86_indirect_thunk: call L2 L1: pause lfence jmp L1 L2: lea 8(%rsp), %rsp|lea 4(%esp), %esp ret Indirect jmp via memory, "jmp mem", is converted to push memory jmp __x86_indirect_thunk Indirect call via memory, "call mem", is converted to jmp L2 L1: push [mem] jmp __x86_indirect_thunk L2: call L1 Register thunk where the function address is in a register, reg: __x86_indirect_thunk_reg: call L2 L1: pause lfence jmp L1 L2: movq %reg, (%rsp)|movl%reg, (%esp) ret where reg is one of (r|e)ax, (r|e)dx, (r|e)cx, (r|e)bx, (r|e)si, (r|e)di, (r|e)bp, r8, r9, r10, r11, r12, r13, r14 and r15. Indirect jmp via register, "jmp reg", is converted to jmp __x86_indirect_thunk_reg Indirect call via register, "call reg", is converted to call __x86_indirect_thunk_reg gcc/ * config/i386/i386-opts.h (indirect_branch): New. * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise. * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone with local indirect jump when converting indirect call and jump. (ix86_set_indirect_branch_type): New. (ix86_set_current_function): Call ix86_set_indirect_branch_type. (indirectlabelno): New. (indirect_thunk_needed): Likewise. (indirect_thunk_bnd_needed): Likewise. (indirect_thunks_used): Likewise. (indirect_thunks_bnd_used): Likewise. (INDIRECT_LABEL): Likewise. (indirect_thunk_name): Likewise. (output_indirect_thunk): Likewise. (output_indirect_thunk_function): Likewise. (ix86_output_indirect_branch): Likewise. (ix86_output_indirect_jmp): Likewise.
Re: [PATCH] x86: Disallow -mindirect-branch=/-mfunction-return= with -mshstk
On Sat, Jan 13, 2018 at 8:09 AM, Jeff Law wrote: > On 01/12/2018 09:16 AM, H.J. Lu wrote: >> On Thu, Jan 11, 2018 at 3:00 PM, H.J. Lu wrote: >>> On Thu, Jan 11, 2018 at 2:46 PM, Jeff Law wrote: >> Do you want to mention that CET and retpolines are inherently >>> >>> I will document it. >>> incompatible? Should an attempt to use them together generate a compile-time error? >>> >>> Compile-time error sounds a good idea. >>> >> >> Here is the patch on my current patch set. Any comments? > Seems reasonable to me. Jan, Uros, Does it look OK: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01072.html Thanks. -- H.J.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 4:42 AM, Jan Hubicka wrote: >> On Sun, Jan 14, 2018 at 11:40 AM, Jan Hubicka wrote: >> >> Hi HJ, >> >> >> >> > -Original Message- >> >> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >> >> > ow...@gcc.gnu.org] On Behalf Of H.J. Lu >> >> > Sent: Sunday, January 14, 2018 9:07 AM >> >> > To: gcc-patches@gcc.gnu.org >> >> > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre >> >> > >> >> > This set of patches for GCC 8 mitigates variant #2 of the speculative >> >> > execution vulnerabilities on x86 processors identified by >> >> > CVE-2017-5715, aka >> >> > Spectre. They convert indirect branches and function returns to call >> >> > and >> >> > return thunks to avoid speculative execution via indirect call, jmp and >> >> > ret. >> >> > >> >> > H.J. Lu (5): >> >> > x86: Add -mindirect-branch= >> >> > x86: Add -mfunction-return= >> >> > x86: Add -mindirect-branch-register >> >> > x86: Add 'V' register operand modifier >> >> > x86: Disallow -mindirect-branch=/-mfunction-return= with >> >> > -mcmodel=large >> >> >> >> Current set of patches don't seem to have any option to generate "lfence" >> >> as the loop filler in "retpoline", which is required by AMD. >> >> Can you please clarify the plan. We would like to get this checked-in GCC >> >> 8. >> > >> > Since thunks are output as strings, it is easy to add the option >> > on the top of patch #1 of the series. I do not fully understand >> > the reason for choosing pause over lfence for Intel, but if we need >> > to do both, we need to have command line option (and possibly attribute). >> > What would be reasonable name for it? >> >> Looking at the kernel patch [1], the loop filler should be >> "pause;lfence" sequence, and should be universally accepted for Intel >> and AMD targets. >> >> [1] https://www.spinics.net/lists/kernel/msg2697507.html > > Yep, I would say we should go with pause;lfence now and see if we want to add > argument > eventually. > HJ, does it sound OK? Yes, I am checking a patch to default to "pause; lfence". -- H.J.
Re: [PATCH 2/5] x86: Add -mfunction-return=
On Sun, Jan 14, 2018 at 3:31 AM, H.J. Lu wrote: > On Sun, Jan 14, 2018 at 2:48 AM, Jan Hubicka wrote: >>> Add -mfunction-return= option to convert function return to call and >>> return thunks. The default is 'keep', which keeps function return >>> unmodified. 'thunk' converts function return to call and return thunk. >>> 'thunk-inline' converts function return to inlined call and return thunk. >>> 'thunk-extern' converts function return to external call and return >>> thunk provided in a separate object file. You can control this behavior >>> for a specific function by using the function attribute function_return. >>> >>> Function return thunk is the same as memory thunk for -mindirect-branch= >>> where the return address is at the top of the stack: >>> >>> __x86_return_thunk: >>> call L2 >>> L1: >>> pause >>> jmp L1 >>> L2: >>> lea 8(%rsp), %rsp|lea 4(%esp), %esp >>> ret >>> >>> and function return becomes >>> >>> jmp __x86_return_thunk >>> >>> -mindirect-branch= tests are updated with -mfunction-return=keep to >>> avoid false test failures when -mfunction-return=thunk is added to >>> RUNTESTFLAGS for "make check". >>> >>> gcc/ >>> >>> * config/i386/i386-protos.h (ix86_output_function_return): New. >>> * config/i386/i386.c (ix86_set_indirect_branch_type): Also >>> set function_return_type. >>> (indirect_thunk_name): Add ret_p to indicate thunk for function >>> return. >>> (output_indirect_thunk_function): Pass false to >>> indirect_thunk_name. >>> (ix86_output_indirect_branch): Likewise. >>> (output_indirect_thunk_function): Create alias for function >>> return thunk if regno < 0. >>> (ix86_output_function_return): New function. >>> (ix86_handle_fndecl_attribute): Handle function_return. >>> (ix86_attribute_table): Add function_return. >>> * config/i386/i386.h (machine_function): Add >>> function_return_type. >>> * config/i386/i386.md (simple_return_internal): Use >>> ix86_output_function_return. >>> (simple_return_internal_long): Likewise. >>> * config/i386/i386.opt (mfunction-return=): New option. >>> (indirect_branch): Mention -mfunction-return=. >>> * doc/extend.texi: Document function_return function attribute. >>> * doc/invoke.texi: Document -mfunction-return= option. >> >> The implementation is reasonable, but I still do not quite understand >> if we really need this change. Celarly killing every return is going >> to have significant impact and it is apparently not used by Linux kernel. >> What is the main motivation for it? >> >> Honza > > See: > > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00540.html > > I will check it in. > Here is the patch I am checking in. -- H.J. From 2f05046a78afb173f301298df0a9a44b6fb38969 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sat, 6 Jan 2018 22:29:56 -0800 Subject: [PATCH 2/6] x86: Add -mfunction-return= Add -mfunction-return= option to convert function return to call and return thunks. The default is 'keep', which keeps function return unmodified. 'thunk' converts function return to call and return thunk. 'thunk-inline' converts function return to inlined call and return thunk. 'thunk-extern' converts function return to external call and return thunk provided in a separate object file. You can control this behavior for a specific function by using the function attribute function_return. Function return thunk is the same as memory thunk for -mindirect-branch= where the return address is at the top of the stack: __x86_return_thunk: call L2 L1: pause lfence jmp L1 L2: lea 8(%rsp), %rsp|lea 4(%esp), %esp ret and function return becomes jmp __x86_return_thunk -mindirect-branch= tests are updated with -mfunction-return=keep to avoid false test failures when -mfunction-return=thunk is added to RUNTESTFLAGS for "make check". gcc/ * config/i386/i386-protos.h (ix86_output_function_return): New. * config/i386/i386.c (ix86_set_indirect_branch_type): Also set function_return_type. (indirect_thunk_name): Add ret_p to indicate thunk for function return. (output_indirect_thunk_function): Pass false to indirect_thunk_name. (ix86_output_indirect_branch): Likewise. (output_indirect_thunk_function): Create alias for function return thunk if regno < 0. (ix86_output_function_return): New function. (ix86_handle_fndecl_attribute): Handle function_return. (ix86_attribute_table): Add function_return. * config/i386/i386.h (machine_function): Add function_return_type. * config/i386/i386.md (simple_return_internal): Use ix86_output_function_return. (simple_return_internal_long): Likewise. * config/i386/i386.opt (mfunction-return=): New option. (indirect_branch): Mention -mfunction-return=. * doc/extend.texi: Document function_return function attribute. * doc/invoke.texi: Document -mfunction-return= option. gcc/testsuite/ * gcc.target/i386/indirect-thunk-1.c (dg-option
Re: [PATCH 5/5] x86: Disallow -mindirect-branch=/-mfunction-return= with -mcmodel=large
On Sun, Jan 14, 2018 at 2:47 AM, Jan Hubicka wrote: >> Since the thunk function may not be reachable in large code model, >> -mcmodel=large is incompatible with -mindirect-branch=thunk, >> -mindirect-branch=thunk-extern, -mfunction-return=thunk and >> -mfunction-return=thunk-extern. Issue an error when they are used with >> -mcmodel=large. >> >> gcc/ >> >> * config/i386/i386.c (ix86_set_indirect_branch_type): Disallow >> -mcmodel=large with -mindirect-branch=thunk, >> -mindirect-branch=thunk-extern, -mfunction-return=thunk and >> -mfunction-return=thunk-extern. >> * doc/invoke.texi: Document -mcmodel=large is incompatible with >> -mindirect-branch=thunk, -mindirect-branch=thunk-extern, >> -mfunction-return=thunk and -mfunction-return=thunk-extern. >> >> gcc/testsuite/ >> >> * gcc.target/i386/indirect-thunk-10.c: New test. >> * gcc.target/i386/indirect-thunk-8.c: Likewise. >> * gcc.target/i386/indirect-thunk-9.c: Likewise. >> * gcc.target/i386/indirect-thunk-attr-10.c: Likewise. >> * gcc.target/i386/indirect-thunk-attr-11.c: Likewise. >> * gcc.target/i386/indirect-thunk-attr-9.c: Likewise. >> * gcc.target/i386/ret-thunk-17.c: Likewise. >> * gcc.target/i386/ret-thunk-18.c: Likewise. >> * gcc.target/i386/ret-thunk-19.c: Likewise. >> * gcc.target/i386/ret-thunk-20.c: Likewise. >> * gcc.target/i386/ret-thunk-21.c: Likewise. > > OK. > Honza This is the patch I am checking in to only run tests on LP64 since -mcmodel=large doesn't work with -mx32. Thanks. -- H.J. From b1ccfb7ff73f3ceacc4bcc0a5737d92494ac0c37 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sat, 13 Jan 2018 18:01:54 -0800 Subject: [PATCH 5/6] x86: Disallow -mindirect-branch=/-mfunction-return= with -mcmodel=large Since the thunk function may not be reachable in large code model, -mcmodel=large is incompatible with -mindirect-branch=thunk, -mindirect-branch=thunk-extern, -mfunction-return=thunk and -mfunction-return=thunk-extern. Issue an error when they are used with -mcmodel=large. gcc/ * config/i386/i386.c (ix86_set_indirect_branch_type): Disallow -mcmodel=large with -mindirect-branch=thunk, -mindirect-branch=thunk-extern, -mfunction-return=thunk and -mfunction-return=thunk-extern. * doc/invoke.texi: Document -mcmodel=large is incompatible with -mindirect-branch=thunk, -mindirect-branch=thunk-extern, -mfunction-return=thunk and -mfunction-return=thunk-extern. gcc/testsuite/ * gcc.target/i386/indirect-thunk-10.c: New test. * gcc.target/i386/indirect-thunk-8.c: Likewise. * gcc.target/i386/indirect-thunk-9.c: Likewise. * gcc.target/i386/indirect-thunk-attr-10.c: Likewise. * gcc.target/i386/indirect-thunk-attr-11.c: Likewise. * gcc.target/i386/indirect-thunk-attr-9.c: Likewise. * gcc.target/i386/ret-thunk-17.c: Likewise. * gcc.target/i386/ret-thunk-18.c: Likewise. * gcc.target/i386/ret-thunk-19.c: Likewise. * gcc.target/i386/ret-thunk-20.c: Likewise. * gcc.target/i386/ret-thunk-21.c: Likewise. --- gcc/config/i386/i386.c | 26 ++ gcc/doc/invoke.texi| 11 + gcc/testsuite/gcc.target/i386/indirect-thunk-10.c | 7 ++ gcc/testsuite/gcc.target/i386/indirect-thunk-8.c | 7 ++ gcc/testsuite/gcc.target/i386/indirect-thunk-9.c | 7 ++ .../gcc.target/i386/indirect-thunk-attr-10.c | 9 .../gcc.target/i386/indirect-thunk-attr-11.c | 9 .../gcc.target/i386/indirect-thunk-attr-9.c| 9 gcc/testsuite/gcc.target/i386/ret-thunk-17.c | 7 ++ gcc/testsuite/gcc.target/i386/ret-thunk-18.c | 8 +++ gcc/testsuite/gcc.target/i386/ret-thunk-19.c | 8 +++ gcc/testsuite/gcc.target/i386/ret-thunk-20.c | 9 gcc/testsuite/gcc.target/i386/ret-thunk-21.c | 9 13 files changed, 126 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-10.c create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-8.c create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-9.c create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-10.c create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-11.c create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-9.c create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-17.c create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-18.c create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-19.c create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-20.c create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-21.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 46cc2563859..3bdbe088182 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5836,6 +5836,19 @@ ix86_set_indirect_branch_type (tree fndecl) } else cfun->machine->indirect_branch_type = ix86_indirect_bra
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
> > Hi HJ, > > > > > -Original Message- > > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > > ow...@gcc.gnu.org] On Behalf Of H.J. Lu > > > Sent: Sunday, January 14, 2018 9:07 AM > > > To: gcc-patches@gcc.gnu.org > > > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre > > > > > > This set of patches for GCC 8 mitigates variant #2 of the speculative > > > execution vulnerabilities on x86 processors identified by CVE-2017-5715, > > > aka > > > Spectre. They convert indirect branches and function returns to call and > > > return thunks to avoid speculative execution via indirect call, jmp and > > > ret. > > > > > > H.J. Lu (5): > > > x86: Add -mindirect-branch= > > > x86: Add -mfunction-return= > > > x86: Add -mindirect-branch-register > > > x86: Add 'V' register operand modifier > > > x86: Disallow -mindirect-branch=/-mfunction-return= with > > > -mcmodel=large > > > > Current set of patches don't seem to have any option to generate "lfence" > > as the loop filler in "retpoline", which is required by AMD. > > Can you please clarify the plan. We would like to get this checked-in GCC > > 8. > > Since thunks are output as strings, it is easy to add the option > on the top of patch #1 of the series. I do not fully understand > the reason for choosing pause over lfence for Intel, but if we need > to do both, we need to have command line option (and possibly attribute). > What would be reasonable name for it? I forgot there is -mindirect-branch-loop for that in the original patchset. So for now we should be happy with having both lfence and pause in there or do we still need it? Honza > > Honza
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 6:20 AM, Jan Hubicka wrote: >> > Hi HJ, >> > >> > > -Original Message- >> > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >> > > ow...@gcc.gnu.org] On Behalf Of H.J. Lu >> > > Sent: Sunday, January 14, 2018 9:07 AM >> > > To: gcc-patches@gcc.gnu.org >> > > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre >> > > >> > > This set of patches for GCC 8 mitigates variant #2 of the speculative >> > > execution vulnerabilities on x86 processors identified by CVE-2017-5715, >> > > aka >> > > Spectre. They convert indirect branches and function returns to call and >> > > return thunks to avoid speculative execution via indirect call, jmp and >> > > ret. >> > > >> > > H.J. Lu (5): >> > > x86: Add -mindirect-branch= >> > > x86: Add -mfunction-return= >> > > x86: Add -mindirect-branch-register >> > > x86: Add 'V' register operand modifier >> > > x86: Disallow -mindirect-branch=/-mfunction-return= with >> > > -mcmodel=large >> > >> > Current set of patches don't seem to have any option to generate "lfence" >> > as the loop filler in "retpoline", which is required by AMD. >> > Can you please clarify the plan. We would like to get this checked-in GCC >> > 8. >> >> Since thunks are output as strings, it is easy to add the option >> on the top of patch #1 of the series. I do not fully understand >> the reason for choosing pause over lfence for Intel, but if we need >> to do both, we need to have command line option (and possibly attribute). >> What would be reasonable name for it? > > I forgot there is -mindirect-branch-loop for that in the original patchset. > So for now we should be happy with having both lfence and pause in there > or do we still need it? > I suggest we leave it out for the time being. -- H.J.
[PATCH] C/C++: Add -Waddress-of-packed-member
When address of packed member of struct or union is taken, it may result in an unaligned pointer value. This patch adds -Waddress-of-packed-member to warn it: $ cat x.i struct pair_t { char c; int i; } __attribute__ ((packed)); extern struct pair_t p; int *addr = &p.i; $ gcc -O2 -S x.i x.i:8:13: warning: initialization of 'int *' from address of packed member of 'struct pair_t' may result in an unaligned pointer value [-Waddress-of-packed-member] int *addr = &p.i; ^ $ This warning is enabled by default. Tested on i686 and x86-64. OK for trunk? H.J. --- gcc/c/ PR c/51628 * doc/invoke.texi: Document -Wno-address-of-packed-member. gcc/c-family/ PR c/51628 * c-common.h (warn_for_address_of_packed_member): New. * c-warn.c (warn_for_address_of_packed_member): New function. * c.opt: Add -Wno-address-of-packed-member. gcc/c/ PR c/51628 * c-typeck.c (convert_for_assignment): Call warn_for_address_of_packed_member. Issue an warning if address of packed member is taken. gcc/cp/ PR c/51628 * call.c (convert_for_arg_passing): Call warn_for_address_of_packed_member. Issue an warning if address of packed member is taken. * typeck.c (convert_for_assignment): Likewise. gcc/testsuite/ PR c/51628 * c-c++-common/pr51628-1.c: New tests. * c-c++-common/pr51628-2.c: Likewise. * c-c++-common/pr51628-3.c: Likewise. * c-c++-common/pr51628-4.c: Likewise. * c-c++-common/pr51628-5.c: Likewise. * c-c++-common/pr51628-6.c: Likewise. * c-c++-common/ubsan/align-2.c: Add -Wno-address-of-packed-member. * c-c++-common/ubsan/align-4.c: Likewise. * c-c++-common/ubsan/align-6.c: Likewise. * c-c++-common/ubsan/align-7.c: Likewise. * c-c++-common/ubsan/align-8.c: Likewise. * g++.dg/ubsan/align-2.C: Likewise. --- gcc/c-family/c-common.h| 1 + gcc/c-family/c-warn.c | 38 +++ gcc/c-family/c.opt | 4 +++ gcc/c/c-typeck.c | 40 - gcc/cp/call.c | 8 ++ gcc/cp/typeck.c| 41 ++ gcc/doc/invoke.texi| 11 ++-- gcc/testsuite/c-c++-common/pr51628-1.c | 29 + gcc/testsuite/c-c++-common/pr51628-2.c | 29 + gcc/testsuite/c-c++-common/pr51628-3.c | 35 + gcc/testsuite/c-c++-common/pr51628-4.c | 35 + gcc/testsuite/c-c++-common/pr51628-5.c | 35 + gcc/testsuite/c-c++-common/pr51628-6.c | 35 + gcc/testsuite/c-c++-common/ubsan/align-2.c | 2 +- gcc/testsuite/c-c++-common/ubsan/align-4.c | 2 +- gcc/testsuite/c-c++-common/ubsan/align-6.c | 2 +- gcc/testsuite/c-c++-common/ubsan/align-7.c | 2 +- gcc/testsuite/c-c++-common/ubsan/align-8.c | 2 +- gcc/testsuite/g++.dg/ubsan/align-2.C | 2 +- 19 files changed, 344 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/pr51628-1.c create mode 100644 gcc/testsuite/c-c++-common/pr51628-2.c create mode 100644 gcc/testsuite/c-c++-common/pr51628-3.c create mode 100644 gcc/testsuite/c-c++-common/pr51628-4.c create mode 100644 gcc/testsuite/c-c++-common/pr51628-5.c create mode 100644 gcc/testsuite/c-c++-common/pr51628-6.c diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index d090881e95d..ef31e4d0aa2 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -1384,6 +1384,7 @@ extern void c_do_switch_warnings (splay_tree, location_t, tree, tree, bool, bool); extern void warn_for_omitted_condop (location_t, tree); extern void warn_for_restrict (unsigned, tree *, unsigned); +extern tree warn_for_address_of_packed_member (tree type, tree rhs); /* Places where an lvalue, or modifiable lvalue, may be required. Used to select diagnostic messages in lvalue_error and diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c index 7d87c455ec0..ac0a2ffcb42 100644 --- a/gcc/c-family/c-warn.c +++ b/gcc/c-family/c-warn.c @@ -2576,3 +2576,41 @@ warn_for_multistatement_macros (location_t body_loc, location_t next_loc, inform (guard_loc, "some parts of macro expansion are not guarded by " "this %qs clause", guard_tinfo_to_string (keyword)); } + +/* Return struct or union type if the right hand type, RHS, is the + address of packed member of struct or union when assigning to TYPE. + Otherwise, return NULL_TREE. */ + +tree +warn_for_address_of_packed_member (tree type, tree rhs) +{ + if (!warn_address_of_packed_member) +return NULL_TREE; + + if (TREE_CODE (rhs) == ADDR_EXPR && POINTER_TYPE_P (type)) +{ + tree base = TREE_OPER
[PATCH 0/3] GCC 7: i386: Move struct ix86_frame to machine_function
This patch set makes ix86_frame available to i386 code generation. They are needed to backport the patch set of -mindirect-branch= to mitigate variant #2 of the speculative execution vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre. H.J. Lu (2): i386: Move struct ix86_frame to machine_function i386: More use reference of struct ix86_frame to avoid copy hjl (1): i386: Use reference of struct ix86_frame to avoid copy gcc/config/i386/i386.c | 73 +- gcc/config/i386/i386.h | 53 +++- 2 files changed, 65 insertions(+), 61 deletions(-) -- 2.14.3
[PATCH 1/3] GCC 7: i386: Move struct ix86_frame to machine_function
Make ix86_frame available to i386 code generation. This is needed to backport the patch set of -mindirect-branch= to mitigate variant #2 of the speculative execution vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre. Backport from mainline * config/i386/i386.c (ix86_frame): Moved to ... * config/i386/i386.h (ix86_frame): Here. (machine_function): Add frame. * config/i386/i386.c (ix86_compute_frame_layout): Repace the frame argument with &cfun->machine->frame. (ix86_can_use_return_insn_p): Don't pass &frame to ix86_compute_frame_layout. Copy frame from cfun->machine->frame. (ix86_can_eliminate): Likewise. (ix86_expand_prologue): Likewise. (ix86_expand_epilogue): Likewise. (ix86_expand_split_stack_prologue): Likewise. --- gcc/config/i386/i386.c | 68 ++ gcc/config/i386/i386.h | 53 ++- 2 files changed, 65 insertions(+), 56 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8a3782c0298..813337242d8 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2444,53 +2444,6 @@ struct GTY(()) stack_local_entry { struct stack_local_entry *next; }; -/* Structure describing stack frame layout. - Stack grows downward: - - [arguments] - <- ARG_POINTER - saved pc - - saved static chain if ix86_static_chain_on_stack - - saved frame pointer if frame_pointer_needed - <- HARD_FRAME_POINTER - [saved regs] - <- regs_save_offset - [padding0] - - [saved SSE regs] - <- sse_regs_save_offset - [padding1] | - |<- FRAME_POINTER - [va_arg registers] | - | - [frame]| - | - [padding2] | = to_allocate - <- STACK_POINTER - */ -struct ix86_frame -{ - int nsseregs; - int nregs; - int va_arg_size; - int red_zone_size; - int outgoing_arguments_size; - - /* The offsets relative to ARG_POINTER. */ - HOST_WIDE_INT frame_pointer_offset; - HOST_WIDE_INT hard_frame_pointer_offset; - HOST_WIDE_INT stack_pointer_offset; - HOST_WIDE_INT hfp_save_offset; - HOST_WIDE_INT reg_save_offset; - HOST_WIDE_INT sse_reg_save_offset; - - /* When save_regs_using_mov is set, emit prologue using - move instead of push instructions. */ - bool save_regs_using_mov; -}; - /* Which cpu are we scheduling for. */ enum attr_cpu ix86_schedule; @@ -2582,7 +2535,7 @@ static unsigned int ix86_function_arg_boundary (machine_mode, const_tree); static rtx ix86_static_chain (const_tree, bool); static int ix86_function_regparm (const_tree, const_tree); -static void ix86_compute_frame_layout (struct ix86_frame *); +static void ix86_compute_frame_layout (void); static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode, rtx, rtx, int); static void ix86_add_new_builtins (HOST_WIDE_INT, HOST_WIDE_INT); @@ -11903,7 +11856,8 @@ ix86_can_use_return_insn_p (void) if (crtl->args.pops_args && crtl->args.size >= 32768) return 0; - ix86_compute_frame_layout (&frame); + ix86_compute_frame_layout (); + frame = cfun->machine->frame; return (frame.stack_pointer_offset == UNITS_PER_WORD && (frame.nregs + frame.nsseregs) == 0); } @@ -12389,8 +12343,8 @@ ix86_can_eliminate (const int from, const int to) HOST_WIDE_INT ix86_initial_elimination_offset (int from, int to) { - struct ix86_frame frame; - ix86_compute_frame_layout (&frame); + ix86_compute_frame_layout (); + struct ix86_frame frame = cfun->machine->frame; if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM) return frame.hard_frame_pointer_offset; @@ -12429,8 +12383,9 @@ ix86_builtin_setjmp_frame_value (void) /* Fill structure ix86_frame about frame of currently computed function. */ static void -ix86_compute_frame_layout (struct ix86_frame *frame) +ix86_compute_frame_layout (void) { + struct ix86_frame *frame = &cfun->machine->frame; unsigned HOST_WIDE_INT stack_alignment_needed; HOST_WIDE_INT offset; unsigned HOST_WIDE_INT preferred_alignment; @@ -13737,7 +13692,8 @@ ix86_expand_prologue (void) m->fs.sp_offset = INCOMING_FRAME_SP_OFFSET; m->fs.sp_valid = true; - ix86_compute_frame_layout (&frame); + ix86_compute_frame_layout (); + frame = m->frame; if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl)) { @@ -14405,7 +14361,8 @@ ix86_expand_epilogue (int style) bool using_drap; ix86_finalize_stack_realign_flags (); - ix86_compute_frame_layout (&frame); + ix86_compute_frame_layout ()
[PATCH 3/3] GCC 7: i386: More use reference of struct ix86_frame to avoid copy
When there is no need to make a copy of ix86_frame, we can use reference of struct ix86_frame to avoid copy. Backport from mainline * config/i386/i386.c (ix86_expand_prologue): Use reference of struct ix86_frame. (ix86_expand_epilogue): Likewise. --- gcc/config/i386/i386.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 397ef7cac26..986e6d79584 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13667,7 +13667,6 @@ ix86_expand_prologue (void) { struct machine_function *m = cfun->machine; rtx insn, t; - struct ix86_frame frame; HOST_WIDE_INT allocate; bool int_registers_saved; bool sse_registers_saved; @@ -13691,7 +13690,7 @@ ix86_expand_prologue (void) m->fs.sp_valid = true; ix86_compute_frame_layout (); - frame = m->frame; + struct ix86_frame &frame = cfun->machine->frame; if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl)) { @@ -14354,13 +14353,12 @@ ix86_expand_epilogue (int style) { struct machine_function *m = cfun->machine; struct machine_frame_state frame_state_save = m->fs; - struct ix86_frame frame; bool restore_regs_via_mov; bool using_drap; ix86_finalize_stack_realign_flags (); ix86_compute_frame_layout (); - frame = m->frame; + struct ix86_frame &frame = cfun->machine->frame; m->fs.sp_valid = (!frame_pointer_needed || (crtl->sp_is_unchanging -- 2.14.3
[PATCH 2/3] GCC 7: i386: Use reference of struct ix86_frame to avoid copy
From: hjl When there is no need to make a copy of ix86_frame, we can use reference of struct ix86_frame to avoid copy. Tested on x86-64. Backport from mainline * config/i386/i386.c (ix86_can_use_return_insn_p): Use reference of struct ix86_frame. (ix86_initial_elimination_offset): Likewise. (ix86_expand_split_stack_prologue): Likewise. --- gcc/config/i386/i386.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 813337242d8..397ef7cac26 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11843,8 +11843,6 @@ symbolic_reference_mentioned_p (rtx op) bool ix86_can_use_return_insn_p (void) { - struct ix86_frame frame; - /* Don't use `ret' instruction in interrupt handler. */ if (! reload_completed || frame_pointer_needed @@ -11857,7 +11855,7 @@ ix86_can_use_return_insn_p (void) return 0; ix86_compute_frame_layout (); - frame = cfun->machine->frame; + struct ix86_frame &frame = cfun->machine->frame; return (frame.stack_pointer_offset == UNITS_PER_WORD && (frame.nregs + frame.nsseregs) == 0); } @@ -12344,7 +12342,7 @@ HOST_WIDE_INT ix86_initial_elimination_offset (int from, int to) { ix86_compute_frame_layout (); - struct ix86_frame frame = cfun->machine->frame; + struct ix86_frame &frame = cfun->machine->frame; if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM) return frame.hard_frame_pointer_offset; @@ -14860,7 +14858,6 @@ static GTY(()) rtx split_stack_fn_large; void ix86_expand_split_stack_prologue (void) { - struct ix86_frame frame; HOST_WIDE_INT allocate; unsigned HOST_WIDE_INT args_size; rtx_code_label *label; @@ -14873,7 +14870,7 @@ ix86_expand_split_stack_prologue (void) ix86_finalize_stack_realign_flags (); ix86_compute_frame_layout (); - frame = cfun->machine->frame; + struct ix86_frame &frame = cfun->machine->frame; allocate = frame.stack_pointer_offset - INCOMING_FRAME_SP_OFFSET; /* This is the label we will branch to if we have enough stack -- 2.14.3
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
Now my patch set has been checked into trunk. Here is a patch set to move struct ix86_frame to machine_function on GCC 7, which is needed to backport the patch set to GCC 7: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01239.html https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01240.html https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01241.html OK for gcc-7-branch? Thanks. -- H.J.
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On Sun, Jan 14, 2018 at 4:52 AM, H.J. Lu wrote: > On Sun, Jan 14, 2018 at 3:06 AM, Jan Hubicka wrote: >>> On 2018.01.14 at 11:46 +0100, Jan Hubicka wrote: >>> > > gcc/ >>> > > >>> > > * config/i386/i386-opts.h (indirect_branch): New. >>> > > * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise. >>> > > * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone >>> > > with local indirect jump when converting indirect call and jump. >>> > > (ix86_set_indirect_branch_type): New. >>> > > (ix86_set_current_function): Call ix86_set_indirect_branch_type. >>> > > (indirectlabelno): New. >>> > > (indirect_thunk_needed): Likewise. >>> > > (indirect_thunk_bnd_needed): Likewise. >>> > > (indirect_thunks_used): Likewise. >>> > > (indirect_thunks_bnd_used): Likewise. >>> > > (INDIRECT_LABEL): Likewise. >>> > > (indirect_thunk_name): Likewise. >>> > > (output_indirect_thunk): Likewise. >>> > > (output_indirect_thunk_function): Likewise. >>> > > (ix86_output_indirect_branch): Likewise. >>> > > (ix86_output_indirect_jmp): Likewise. >>> > > (ix86_code_end): Call output_indirect_thunk_function if needed. >>> > > (ix86_output_call_insn): Call ix86_output_indirect_branch if >>> > > needed. >>> > > (ix86_handle_fndecl_attribute): Handle indirect_branch. >>> > > (ix86_attribute_table): Add indirect_branch. >>> > > * config/i386/i386.h (machine_function): Add indirect_branch_type >>> > > and has_local_indirect_jump. >>> > > * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump >>> > > to true. >>> > > (tablejump): Likewise. >>> > > (*indirect_jump): Use ix86_output_indirect_jmp. >>> > > (*tablejump_1): Likewise. >>> > > (simple_return_indirect_internal): Likewise. >>> > > * config/i386/i386.opt (mindirect-branch=): New option. >>> > > (indirect_branch): New. >>> > > (keep): Likewise. >>> > > (thunk): Likewise. >>> > > (thunk-inline): Likewise. >>> > > (thunk-extern): Likewise. >>> > > * doc/extend.texi: Document indirect_branch function attribute. >>> > > * doc/invoke.texi: Document -mindirect-branch= option. >>> > > + >>> > > + /* Pause . */ >>> > > + fprintf (asm_out_file, "\tpause\n"); >>> > >>> > OK, but please prepare incremental patches to choose between pause and >>> > lefence >>> > as needed for AMD CPUs and check for large code model. >>> >>> Why not use both? That would make everybody happy, no? >>> >>> pause >>> lfence >>> jmp >> >> Yes, according to my understanding we want both pause and lefence by default >> (it is expensive anyway) + command line option to control which one to use? >> >> Probably thus the first patch should default to both. > > Done. This is the patch I am checking in. > Here is the backport for GCC 7. OK for gcc-7-branch? -- H.J. From 09f7c546376f7ed6770fc64f24aed77229f95f67 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sat, 6 Jan 2018 22:29:55 -0800 Subject: [PATCH 1/5] x86: Add -mindirect-branch= Add -mindirect-branch= option to convert indirect call and jump to call and return thunks. The default is 'keep', which keeps indirect call and jump unmodified. 'thunk' converts indirect call and jump to call and return thunk. 'thunk-inline' converts indirect call and jump to inlined call and return thunk. 'thunk-extern' converts indirect call and jump to external call and return thunk provided in a separate object file. You can control this behavior for a specific function by using the function attribute indirect_branch. 2 kinds of thunks are geneated. Memory thunk where the function address is at the top of the stack: __x86_indirect_thunk: call L2 L1: pause lfence jmp L1 L2: lea 8(%rsp), %rsp|lea 4(%esp), %esp ret Indirect jmp via memory, "jmp mem", is converted to push memory jmp __x86_indirect_thunk Indirect call via memory, "call mem", is converted to jmp L2 L1: push [mem] jmp __x86_indirect_thunk L2: call L1 Register thunk where the function address is in a register, reg: __x86_indirect_thunk_reg: call L2 L1: pause lfence jmp L1 L2: movq %reg, (%rsp)|movl%reg, (%esp) ret where reg is one of (r|e)ax, (r|e)dx, (r|e)cx, (r|e)bx, (r|e)si, (r|e)di, (r|e)bp, r8, r9, r10, r11, r12, r13, r14 and r15. Indirect jmp via register, "jmp reg", is converted to jmp __x86_indirect_thunk_reg Indirect call via register, "call reg", is converted to call __x86_indirect_thunk_reg gcc/ Backport from mainline * config/i386/i386-opts.h (indirect_branch): New. * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise. * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone with local indirect jump when converting indirect call and jump. (ix86_set_indirect_branch_type): New. (ix86_set_current_function): Call ix86_set_indirect_branch_type. (indirectlabelno): New. (indirect_thunk_needed): Likewise. (indirect_thunk_bnd_needed): Likewise. (indirect_thunks_used): Likewise. (indirect_thunks_bnd_used): Likewise. (INDIRECT_LABEL): Likewise.
Re: [PATCH 2/5] x86: Add -mfunction-return=
On Sun, Jan 14, 2018 at 4:54 AM, H.J. Lu wrote: > On Sun, Jan 14, 2018 at 3:31 AM, H.J. Lu wrote: >> On Sun, Jan 14, 2018 at 2:48 AM, Jan Hubicka wrote: Add -mfunction-return= option to convert function return to call and return thunks. The default is 'keep', which keeps function return unmodified. 'thunk' converts function return to call and return thunk. 'thunk-inline' converts function return to inlined call and return thunk. 'thunk-extern' converts function return to external call and return thunk provided in a separate object file. You can control this behavior for a specific function by using the function attribute function_return. Function return thunk is the same as memory thunk for -mindirect-branch= where the return address is at the top of the stack: __x86_return_thunk: call L2 L1: pause jmp L1 L2: lea 8(%rsp), %rsp|lea 4(%esp), %esp ret and function return becomes jmp __x86_return_thunk -mindirect-branch= tests are updated with -mfunction-return=keep to avoid false test failures when -mfunction-return=thunk is added to RUNTESTFLAGS for "make check". gcc/ * config/i386/i386-protos.h (ix86_output_function_return): New. * config/i386/i386.c (ix86_set_indirect_branch_type): Also set function_return_type. (indirect_thunk_name): Add ret_p to indicate thunk for function return. (output_indirect_thunk_function): Pass false to indirect_thunk_name. (ix86_output_indirect_branch): Likewise. (output_indirect_thunk_function): Create alias for function return thunk if regno < 0. (ix86_output_function_return): New function. (ix86_handle_fndecl_attribute): Handle function_return. (ix86_attribute_table): Add function_return. * config/i386/i386.h (machine_function): Add function_return_type. * config/i386/i386.md (simple_return_internal): Use ix86_output_function_return. (simple_return_internal_long): Likewise. * config/i386/i386.opt (mfunction-return=): New option. (indirect_branch): Mention -mfunction-return=. * doc/extend.texi: Document function_return function attribute. * doc/invoke.texi: Document -mfunction-return= option. >>> >>> The implementation is reasonable, but I still do not quite understand >>> if we really need this change. Celarly killing every return is going >>> to have significant impact and it is apparently not used by Linux kernel. >>> What is the main motivation for it? >>> >>> Honza >> >> See: >> >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00540.html >> >> I will check it in. >> > > Here is the patch I am checking in. > > Here is the backport for GCC 7. OK for gcc-7-branch? -- H.J.
Re: [PATCH 3/5] x86: Add -mindirect-branch-register
On Sun, Jan 14, 2018 at 2:46 AM, Jan Hubicka wrote: >> Add -mindirect-branch-register to force indirect branch via register. >> This is implemented by disabling patterns of indirect branch via memory, >> similar to TARGET_X32. >> >> -mindirect-branch= and -mfunction-return= tests are updated with >> -mno-indirect-branch-register to avoid false test failures when >> -mindirect-branch-register is added to RUNTESTFLAGS for "make check". >> >> gcc/ >> >> * config/i386/constraints.md (Bs): Disallow memory operand for >> -mindirect-branch-register. >> (Bw): Likewise. >> * config/i386/predicates.md (indirect_branch_operand): Likewise. >> (GOT_memory_operand): Likewise. >> (call_insn_operand): Likewise. >> (sibcall_insn_operand): Likewise. >> (GOT32_symbol_operand): Likewise. >> * config/i386/i386.md (indirect_jump): Call convert_memory_address >> for -mindirect-branch-register. >> (tablejump): Likewise. >> (*sibcall_memory): Likewise. >> (*sibcall_value_memory): Likewise. >> Disallow peepholes of indirect call and jump via memory for >> -mindirect-branch-register. >> (*call_pop): Replace m with Bw. >> (*call_value_pop): Likewise. >> (*sibcall_pop_memory): Replace m with Bs. >> * config/i386/i386.opt (mindirect-branch-register): New option. >> * doc/invoke.texi: Document -mindirect-branch-register option. > > OK. > Honza Here is the backport for GCC 7. OK for gcc-7-branch? -- H.J. From 139dd2c61a11430263f91030910e2b63a73a11e7 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sat, 6 Jan 2018 22:29:56 -0800 Subject: [PATCH 3/5] x86: Add -mindirect-branch-register Add -mindirect-branch-register to force indirect branch via register. This is implemented by disabling patterns of indirect branch via memory, similar to TARGET_X32. -mindirect-branch= and -mfunction-return= tests are updated with -mno-indirect-branch-register to avoid false test failures when -mindirect-branch-register is added to RUNTESTFLAGS for "make check". gcc/ Backport from mainline * config/i386/constraints.md (Bs): Disallow memory operand for -mindirect-branch-register. (Bw): Likewise. * config/i386/predicates.md (indirect_branch_operand): Likewise. (GOT_memory_operand): Likewise. (call_insn_operand): Likewise. (sibcall_insn_operand): Likewise. (GOT32_symbol_operand): Likewise. * config/i386/i386.md (indirect_jump): Call convert_memory_address for -mindirect-branch-register. (tablejump): Likewise. (*sibcall_memory): Likewise. (*sibcall_value_memory): Likewise. Disallow peepholes of indirect call and jump via memory for -mindirect-branch-register. (*call_pop): Replace m with Bw. (*call_value_pop): Likewise. (*sibcall_pop_memory): Replace m with Bs. * config/i386/i386.opt (mindirect-branch-register): New option. * doc/invoke.texi: Document -mindirect-branch-register option. gcc/testsuite/ Backport from mainline * gcc.target/i386/indirect-thunk-1.c (dg-options): Add -mno-indirect-branch-register. * gcc.target/i386/indirect-thunk-2.c: Likewise. * gcc.target/i386/indirect-thunk-3.c: Likewise. * gcc.target/i386/indirect-thunk-4.c: Likewise. * gcc.target/i386/indirect-thunk-5.c: Likewise. * gcc.target/i386/indirect-thunk-6.c: Likewise. * gcc.target/i386/indirect-thunk-7.c: Likewise. * gcc.target/i386/indirect-thunk-attr-1.c: Likewise. * gcc.target/i386/indirect-thunk-attr-2.c: Likewise. * gcc.target/i386/indirect-thunk-attr-3.c: Likewise. * gcc.target/i386/indirect-thunk-attr-4.c: Likewise. * gcc.target/i386/indirect-thunk-attr-5.c: Likewise. * gcc.target/i386/indirect-thunk-attr-6.c: Likewise. * gcc.target/i386/indirect-thunk-attr-7.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-1.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-2.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise. * gcc.target/i386/indirect-thunk-extern-1.c: Likewise. * gcc.target/i386/indirect-thunk-extern-2.c: Likewise. * gcc.target/i386/indirect-thunk-extern-3.c: Likewise. * gcc.target/i386/indirect-thunk-extern-4.c: Likewise. * gcc.target/i386/indirect-thunk-extern-5.c: Likewise. * gcc.target/i386/indirect-thunk-extern-6.c: Likewise. * gcc.target/i386/indirect-thunk-extern-7.c: Likewise. * gcc.target/i386/indirect-thunk-inline-1.c: Likewise. * gcc.target/i386/indirect-thunk-inline-2.c: Likewise. * gcc.target/i386/indirect-thunk-inline-3.c: Likewise. * gcc.target/i386/indirect-thunk-inline-4.c: Likewise. * gcc.target/i386/indirect-thunk-inline-5.c: Likewise. * gcc.target/i386/indirect-thunk-inline-6.c: Likewise. * gcc.target/i386/indirect-thunk-inline-7.c: Likewise. * gcc.target/i386/ret-thunk-10.c: Likewise. * gcc.target/i386/ret-thunk-11.c: Likewise. * gcc.target/i386/ret-thunk-12.c: Likewise. * gcc.target/i386/ret-thunk-13.c: Likewise. * gcc.target/i386/ret-thunk-14.c: Likewise. * gcc.target/i386/ret-thunk-15.c: Likewise. * g
Re: [PATCH 4/5] x86: Add 'V' register operand modifier
On Sun, Jan 14, 2018 at 2:47 AM, Jan Hubicka wrote: >> Add 'V', a special modifier which prints the name of the full integer >> register without '%'. For >> >> extern void (*func_p) (void); >> >> void >> foo (void) >> { >> asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p)); >> } >> >> it generates: >> >> foo: >> movqfunc_p(%rip), %rax >> call__x86_indirect_thunk_rax >> ret >> >> gcc/ >> >> * config/i386/i386.c (print_reg): Print the name of the full >> integer register without '%'. >> (ix86_print_operand): Handle 'V'. >>* doc/extend.texi: Document 'V' modifier. >> >> gcc/testsuite/ >> >> * gcc.target/i386/indirect-thunk-register-4.c: New test. > > OK. > Honza Here is the backport for GCC 7. OK for gcc-7-branch? -- H.J. From 5e977dfedb93e764dc480c0e0674500590ef5604 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sat, 6 Jan 2018 22:29:56 -0800 Subject: [PATCH 4/5] x86: Add 'V' register operand modifier Add 'V', a special modifier which prints the name of the full integer register without '%'. For extern void (*func_p) (void); void foo (void) { asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p)); } it generates: foo: movq func_p(%rip), %rax call __x86_indirect_thunk_rax ret gcc/ Backport from mainline * config/i386/i386.c (print_reg): Print the name of the full integer register without '%'. (ix86_print_operand): Handle 'V'. * doc/extend.texi: Document 'V' modifier. gcc/testsuite/ Backport from mainline * gcc.target/i386/indirect-thunk-register-4.c: New test. --- gcc/config/i386/i386.c| 13 - gcc/doc/extend.texi | 3 +++ gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c | 13 + 3 files changed, 28 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 4bfe2fa8c1d..e32de13688a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -17925,6 +17925,7 @@ put_condition_code (enum rtx_code code, machine_mode mode, bool reverse, If CODE is 'h', pretend the reg is the 'high' byte register. If CODE is 'y', print "st(0)" instead of "st", if the reg is stack op. If CODE is 'd', duplicate the operand for AVX instruction. + If CODE is 'V', print naked full integer register name without %. */ void @@ -17935,7 +17936,7 @@ print_reg (rtx x, int code, FILE *file) unsigned int regno; bool duplicated; - if (ASSEMBLER_DIALECT == ASM_ATT) + if (ASSEMBLER_DIALECT == ASM_ATT && code != 'V') putc ('%', file); if (x == pc_rtx) @@ -17983,6 +17984,14 @@ print_reg (rtx x, int code, FILE *file) return; } + if (code == 'V') +{ + if (GENERAL_REGNO_P (regno)) + msize = GET_MODE_SIZE (word_mode); + else + error ("'V' modifier on non-integer register"); +} + duplicated = code == 'd' && TARGET_AVX; switch (msize) @@ -18102,6 +18111,7 @@ print_reg (rtx x, int code, FILE *file) & -- print some in-use local-dynamic symbol name. H -- print a memory address offset by 8; used for sse high-parts Y -- print condition for XOP pcom* instruction. + V -- print naked full integer register name without %. + -- print a branch hint as 'cs' or 'ds' prefix ; -- print a semicolon (after prefixes due to bug in older gas). ~ -- print "i" if TARGET_AVX2, "f" otherwise. @@ -18326,6 +18336,7 @@ ix86_print_operand (FILE *file, rtx x, int code) case 'X': case 'P': case 'p': + case 'V': break; case 's': diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 46e0a3623a6..9db9e0e27e9 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8778,6 +8778,9 @@ The table below shows the list of supported modifiers and their effects. @tab @code{2} @end multitable +@code{V} is a special modifier which prints the name of the full integer +register without @code{%}. + @anchor{x86floatingpointasmoperands} @subsubsection x86 Floating-Point @code{asm} Operands diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c new file mode 100644 index 000..f0cd9b75be8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mindirect-branch=keep -fno-pic" } */ + +extern void (*func_p) (void); + +void +foo (void) +{ + asm("call __x86_indirect_thunk_%V0" : : "a" (func_p)); +} + +/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_eax" { target ia32 } } } */ +/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_rax" { target { ! ia32 } } } } */ -- 2.14.3
Re: [PATCH 5/5] x86: Disallow -mindirect-branch=/-mfunction-return= with -mcmodel=large
On Sun, Jan 14, 2018 at 4:57 AM, H.J. Lu wrote: > On Sun, Jan 14, 2018 at 2:47 AM, Jan Hubicka wrote: >>> Since the thunk function may not be reachable in large code model, >>> -mcmodel=large is incompatible with -mindirect-branch=thunk, >>> -mindirect-branch=thunk-extern, -mfunction-return=thunk and >>> -mfunction-return=thunk-extern. Issue an error when they are used with >>> -mcmodel=large. >>> >>> gcc/ >>> >>> * config/i386/i386.c (ix86_set_indirect_branch_type): Disallow >>> -mcmodel=large with -mindirect-branch=thunk, >>> -mindirect-branch=thunk-extern, -mfunction-return=thunk and >>> -mfunction-return=thunk-extern. >>> * doc/invoke.texi: Document -mcmodel=large is incompatible with >>> -mindirect-branch=thunk, -mindirect-branch=thunk-extern, >>> -mfunction-return=thunk and -mfunction-return=thunk-extern. >>> >>> gcc/testsuite/ >>> >>> * gcc.target/i386/indirect-thunk-10.c: New test. >>> * gcc.target/i386/indirect-thunk-8.c: Likewise. >>> * gcc.target/i386/indirect-thunk-9.c: Likewise. >>> * gcc.target/i386/indirect-thunk-attr-10.c: Likewise. >>> * gcc.target/i386/indirect-thunk-attr-11.c: Likewise. >>> * gcc.target/i386/indirect-thunk-attr-9.c: Likewise. >>> * gcc.target/i386/ret-thunk-17.c: Likewise. >>> * gcc.target/i386/ret-thunk-18.c: Likewise. >>> * gcc.target/i386/ret-thunk-19.c: Likewise. >>> * gcc.target/i386/ret-thunk-20.c: Likewise. >>> * gcc.target/i386/ret-thunk-21.c: Likewise. >> >> OK. >> Honza > > This is the patch I am checking in to only run tests on LP64 since > -mcmodel=large doesn't work with -mx32. > Here is the backport for GCC 7. OK for gcc-7-branch? -- H.J. From 13dce7cceef28026c4fc2e505d724526141fe4c1 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sat, 13 Jan 2018 18:01:54 -0800 Subject: [PATCH 5/5] x86: Disallow -mindirect-branch=/-mfunction-return= with -mcmodel=large Since the thunk function may not be reachable in large code model, -mcmodel=large is incompatible with -mindirect-branch=thunk, -mindirect-branch=thunk-extern, -mfunction-return=thunk and -mfunction-return=thunk-extern. Issue an error when they are used with -mcmodel=large. gcc/ Backport from mainline * config/i386/i386.c (ix86_set_indirect_branch_type): Disallow -mcmodel=large with -mindirect-branch=thunk, -mindirect-branch=thunk-extern, -mfunction-return=thunk and -mfunction-return=thunk-extern. * doc/invoke.texi: Document -mcmodel=large is incompatible with -mindirect-branch=thunk, -mindirect-branch=thunk-extern, -mfunction-return=thunk and -mfunction-return=thunk-extern. gcc/testsuite/ Backport from mainline * gcc.target/i386/indirect-thunk-10.c: New test. * gcc.target/i386/indirect-thunk-8.c: Likewise. * gcc.target/i386/indirect-thunk-9.c: Likewise. * gcc.target/i386/indirect-thunk-attr-10.c: Likewise. * gcc.target/i386/indirect-thunk-attr-11.c: Likewise. * gcc.target/i386/indirect-thunk-attr-9.c: Likewise. * gcc.target/i386/ret-thunk-17.c: Likewise. * gcc.target/i386/ret-thunk-18.c: Likewise. * gcc.target/i386/ret-thunk-19.c: Likewise. * gcc.target/i386/ret-thunk-20.c: Likewise. * gcc.target/i386/ret-thunk-21.c: Likewise. --- gcc/config/i386/i386.c | 26 ++ gcc/doc/invoke.texi| 11 + gcc/testsuite/gcc.target/i386/indirect-thunk-10.c | 7 ++ gcc/testsuite/gcc.target/i386/indirect-thunk-8.c | 7 ++ gcc/testsuite/gcc.target/i386/indirect-thunk-9.c | 7 ++ .../gcc.target/i386/indirect-thunk-attr-10.c | 9 .../gcc.target/i386/indirect-thunk-attr-11.c | 9 .../gcc.target/i386/indirect-thunk-attr-9.c| 9 gcc/testsuite/gcc.target/i386/ret-thunk-17.c | 7 ++ gcc/testsuite/gcc.target/i386/ret-thunk-18.c | 8 +++ gcc/testsuite/gcc.target/i386/ret-thunk-19.c | 8 +++ gcc/testsuite/gcc.target/i386/ret-thunk-20.c | 9 gcc/testsuite/gcc.target/i386/ret-thunk-21.c | 9 13 files changed, 126 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-10.c create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-8.c create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-9.c create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-10.c create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-11.c create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-9.c create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-17.c create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-18.c create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-19.c create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-20.c create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-21.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index e32de13688a..318a71840c9 100644 --- a/gcc/conf
Re: [PATCH 0/5] Make std::future::wait_* use std::chrono::steady_clock when required
Hi Torvald, Thanks for reviewing this change. On Saturday 13 January 2018 at 16:29:57 +0100, Torvald Riegel wrote: > On Sun, 2018-01-07 at 20:55 +, Mike Crowe wrote: > > This is a first attempt to make std::future::wait_until and > > std::future::wait_for make correct use of > > std::chrono::steady_clock/CLOCK_MONOTONIC. It also makes > > std::future::wait_until react to changes to CLOCK_REALTIME during the > > wait, but only when passed a std::chrono::system_clock time point. > > I have comments on the design. > > First, I don't think we should not change > __atomic_futex_unsigned_base::_M_futex_wait_until, as there's a risk > that we'll change behavior of existing applications that work as > expected. I assume you mean "I don't think we should change" or "I think we should not change"... :-) The only way I can see that behaviour will change for existing programs is when the system clock changes (i.e. when someone calls settimeofday.) In the existing code, the maximum wait time is fixed once gettimeofday is called to calculate the relative timeout. When using FUTEX_CLOCK_REALTIME, the maximum wait can change based on changes to the system clock after that point. It appears that glibc made this transition successfully and currently uses FUTEX_CLOCK_REALTIME. I think that the new behaviour is better than the old behaviour. Or perhaps I've missed another possibility. Did you have another risk in mind? > Instead, ISTM we should additionally expose the two options we have at > the level of futexes: > * Relative timeout using CLOCK_MONOTONIC > * Absolute timeout using CLOCK_REALTIME (which will fall back to the > former on old kernels, which is fine I think). > > Then we do the following translations from functions that programs would > call to the new futex functions: > > 1) wait_for is a loop in which we load the current time from the steady > clock, then call the relative futex wait, and if that returns for a > spurious reason (ie, neither timeout nor is the expected value present), > we reduce the prior relative amount by the difference between the time > before the futex wait and the current time. If we're going to loop on a relative timeout it sounds safer to convert it to an absolute (steady clock) timeout. That way we won't risk increasing the timeout if the scheduler decides not to run us at an inopportune moment between waits. _M_load_when_equal_for already does this. _M_load_and_test_until already has a loop for spurious wakeup. I think that it makes sense to only loop at one level. That loop relies on the timeout being absolute, which is why my _M_load_and_test_until_steady also uses an absolute timeout. > 2) wait_until using the steady clock is a loop similar to wait_for, just > that we additionally compute the initial relative timeout. Clearly an absolute wait can be implemented in terms of a relative one and vice-versa, but at least in my attempts to write them I find the code easier to understand (and therefore get right) if the fundamental wait is the absolute one and the relative one is implemented on top of it. > 3) wait_until using the system clock is a loop that uses > absolute-timeout futex wait. > > 4) For wait_until using an unknown clock, I'd say that synching to the > system clock is the right approach. Using wait_until indicates that the > programmer wanted to have a point in time, not a duration. For my embedded and desktop point of view, the system clock should not be trusted, can suddenly change in any direction and doesn't necessarily help identify a point in real time. If we assume that the non-standard clock is advancing steadily too, then steady_clock is a better match than system_clock. If you have a machine that has its system clock locked with PTP to an atomic clock then you might think the opposite. However, even in that situation you're reliant on steady_clock being reliable enough for short periods of time anyway, because that shares the same local clock as system_time. > Does this work for you? Not yet, but maybe there's parts that I don't fully understand the reasoning behind. > If so, could you provide a revised patch that uses this approach and > includes this approach in the documentation? > (Sorry for the lack of comments in the current code). I'm definitely willing to improve the current code rather than just add to it. Mike.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak wrote: > On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu wrote: > >> Hi Uros, >> >> Can you take a look at my x86 backend changes so that they are ready >> to check in once we have consensus. > > Please finish the talks about the correct approach first. Once the > consensus is reached, please post the final version of the patches for > review. > > BTW: I have no detailed insight in these issues, so I'll look mostly > at the implementation details, probably early next week. One general remark is on the usage of -1 as an invalid register number. We have INVALID_REGNUM definition for this, and many tests, like: if (regno >= 0) could become much more informative: if (regno != INVALID_REGNUM) Uros.
[committed] Small i?86/x86_64-linux config.gcc extra_headers fixes
Hi! I've noticed that i?86-*-* has 2 gfniintrin.h entries in extra_headers and x86_64-*-* even 3, one should be enough. Fixed thusly, committed to trunk as obvious. 2018-01-14 Jakub Jelinek * config.gcc (i[34567]86-*-*): Remove one duplicate gfniintrin.h entry from extra_headers. (x86_64-*-*): Remove two duplicate gfniintrin.h entries from extra_headers, make the list bitwise identical to the i?86-*-* one. --- gcc/config.gcc (revision 25) +++ gcc/config.gcc (revision 256667) @@ -381,9 +381,8 @@ i[34567]86-*-*) clzerointrin.h pkuintrin.h sgxintrin.h cetintrin.h gfniintrin.h cet.h avx512vbmi2intrin.h avx512vbmi2vlintrin.h avx512vnniintrin.h - avx512vnnivlintrin.h gfniintrin.h vaesintrin.h - vpclmulqdqintrin.h avx512vpopcntdqvlintrin.h - avx512bitalgintrin.h" + avx512vnnivlintrin.h vaesintrin.h vpclmulqdqintrin.h + avx512vpopcntdqvlintrin.h avx512bitalgintrin.h" ;; x86_64-*-*) cpu_type=i386 @@ -396,7 +395,7 @@ x86_64-*-*) nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h immintrin.h x86intrin.h avxintrin.h xopintrin.h ia32intrin.h cross-stdarg.h lwpintrin.h popcntintrin.h - lzcntintrin.h bmiintrin.h tbmintrin.h bmi2intrin.h + lzcntintrin.h bmiintrin.h bmi2intrin.h tbmintrin.h avx2intrin.h avx512fintrin.h fmaintrin.h f16cintrin.h rtmintrin.h xtestintrin.h rdseedintrin.h prfchwintrin.h adxintrin.h fxsrintrin.h xsaveintrin.h xsaveoptintrin.h @@ -410,8 +409,7 @@ x86_64-*-*) clzerointrin.h pkuintrin.h sgxintrin.h cetintrin.h gfniintrin.h cet.h avx512vbmi2intrin.h avx512vbmi2vlintrin.h avx512vnniintrin.h - avx512vnnivlintrin.h gfniintrin.h vaesintrin.h - vpclmulqdqintrin.h gfniintrin.h + avx512vnnivlintrin.h vaesintrin.h vpclmulqdqintrin.h avx512vpopcntdqvlintrin.h avx512bitalgintrin.h" ;; ia64-*-*) Jakub
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak wrote: > On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak wrote: >> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu wrote: >> >>> Hi Uros, >>> >>> Can you take a look at my x86 backend changes so that they are ready >>> to check in once we have consensus. >> >> Please finish the talks about the correct approach first. Once the >> consensus is reached, please post the final version of the patches for >> review. >> >> BTW: I have no detailed insight in these issues, so I'll look mostly >> at the implementation details, probably early next week. > > One general remark is on the usage of -1 as an invalid register This has been rewritten. The checked in patch no longer does that. > number. We have INVALID_REGNUM definition for this, and many tests, > like: > > if (regno >= 0) > > could become much more informative: > > if (regno != INVALID_REGNUM) -- H.J.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu wrote: > On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak wrote: >> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak wrote: >>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu wrote: >>> Hi Uros, Can you take a look at my x86 backend changes so that they are ready to check in once we have consensus. >>> >>> Please finish the talks about the correct approach first. Once the >>> consensus is reached, please post the final version of the patches for >>> review. >>> >>> BTW: I have no detailed insight in these issues, so I'll look mostly >>> at the implementation details, probably early next week. >> >> One general remark is on the usage of -1 as an invalid register > > This has been rewritten. The checked in patch no longer does that. Another issue: +static void +indirect_thunk_name (char name[32], int regno, bool need_bnd_p) +{ + if (USE_HIDDEN_LINKONCE) +{ + const char *bnd = need_bnd_p ? "_bnd" : ""; + if (regno >= 0) +{ + const char *reg_prefix; + if (LEGACY_INT_REGNO_P (regno)) +reg_prefix = TARGET_64BIT ? "r" : "e"; + else +reg_prefix = ""; + sprintf (name, "__x86_indirect_thunk%s_%s%s", + bnd, reg_prefix, reg_names[regno]); +} + else +sprintf (name, "__x86_indirect_thunk%s", bnd); +} What is the benefit of reg_prefix? Can't we just live with e.g.: __x86_indirect_thunk_ax which is the true register name and is valid for 32bit and 64bit targets. Uros.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 8:39 AM, Uros Bizjak wrote: > On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu wrote: >> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak wrote: >>> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak wrote: On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu wrote: > Hi Uros, > > Can you take a look at my x86 backend changes so that they are ready > to check in once we have consensus. Please finish the talks about the correct approach first. Once the consensus is reached, please post the final version of the patches for review. BTW: I have no detailed insight in these issues, so I'll look mostly at the implementation details, probably early next week. >>> >>> One general remark is on the usage of -1 as an invalid register >> >> This has been rewritten. The checked in patch no longer does that. > > Another issue: > > +static void > +indirect_thunk_name (char name[32], int regno, bool need_bnd_p) > +{ > + if (USE_HIDDEN_LINKONCE) > +{ > + const char *bnd = need_bnd_p ? "_bnd" : ""; > + if (regno >= 0) > +{ > + const char *reg_prefix; > + if (LEGACY_INT_REGNO_P (regno)) > +reg_prefix = TARGET_64BIT ? "r" : "e"; > + else > +reg_prefix = ""; > + sprintf (name, "__x86_indirect_thunk%s_%s%s", > + bnd, reg_prefix, reg_names[regno]); > +} > + else > +sprintf (name, "__x86_indirect_thunk%s", bnd); > +} > > What is the benefit of reg_prefix? Can't we just live with e.g.: > > __x86_indirect_thunk_ax > > which is the true register name and is valid for 32bit and 64bit targets. They are used in asm statements in kernel: extern void (*func_p) (void); void foo (void) { asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p)); } it generates: foo: movq func_p(%rip), %rax call __x86_indirect_thunk_rax ret -- H.J.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu wrote: > On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak wrote: >> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak wrote: >>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu wrote: >>> Hi Uros, Can you take a look at my x86 backend changes so that they are ready to check in once we have consensus. >>> >>> Please finish the talks about the correct approach first. Once the >>> consensus is reached, please post the final version of the patches for >>> review. >>> >>> BTW: I have no detailed insight in these issues, so I'll look mostly >>> at the implementation details, probably early next week. >> >> One general remark is on the usage of -1 as an invalid register > > This has been rewritten. The checked in patch no longer does that. I'm looking directly into current indirect_thunk_name, output_indirect_thunk and output_indirect_thunk_function functions in i386.c which have plenty of the mentioned checks. Uros.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 08:41:54AM -0800, H.J. Lu wrote: > They are used in asm statements in kernel: > > extern void (*func_p) (void); > > void > foo (void) > { > asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p)); Well, using it just with a single register classes wouldn't make much sense, then you can just use "call __x86_indirect_thunk_rax" or "call __x86_indirect_thunk_eax" depending on __x86_64__, you wouldn't need to extend anything. But supposedly if you use it with "r" or "q" or similar class this will be different. Jakub
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 5:41 PM, H.J. Lu wrote: > On Sun, Jan 14, 2018 at 8:39 AM, Uros Bizjak wrote: >> On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu wrote: >>> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak wrote: On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak wrote: > On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu wrote: > >> Hi Uros, >> >> Can you take a look at my x86 backend changes so that they are ready >> to check in once we have consensus. > > Please finish the talks about the correct approach first. Once the > consensus is reached, please post the final version of the patches for > review. > > BTW: I have no detailed insight in these issues, so I'll look mostly > at the implementation details, probably early next week. One general remark is on the usage of -1 as an invalid register >>> >>> This has been rewritten. The checked in patch no longer does that. >> >> Another issue: >> >> +static void >> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p) >> +{ >> + if (USE_HIDDEN_LINKONCE) >> +{ >> + const char *bnd = need_bnd_p ? "_bnd" : ""; >> + if (regno >= 0) >> +{ >> + const char *reg_prefix; >> + if (LEGACY_INT_REGNO_P (regno)) >> +reg_prefix = TARGET_64BIT ? "r" : "e"; >> + else >> +reg_prefix = ""; >> + sprintf (name, "__x86_indirect_thunk%s_%s%s", >> + bnd, reg_prefix, reg_names[regno]); >> +} >> + else >> +sprintf (name, "__x86_indirect_thunk%s", bnd); >> +} >> >> What is the benefit of reg_prefix? Can't we just live with e.g.: >> >> __x86_indirect_thunk_ax >> >> which is the true register name and is valid for 32bit and 64bit targets. > > They are used in asm statements in kernel: > > extern void (*func_p) (void); > > void > foo (void) > { > asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p)); > } > > it generates: > > foo: > movq func_p(%rip), %rax > call __x86_indirect_thunk_rax > ret Please fix %V to output reg_name instead. Uros.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 8:48 AM, Uros Bizjak wrote: > On Sun, Jan 14, 2018 at 5:41 PM, H.J. Lu wrote: >> On Sun, Jan 14, 2018 at 8:39 AM, Uros Bizjak wrote: >>> On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu wrote: On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak wrote: > On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak wrote: >> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu wrote: >> >>> Hi Uros, >>> >>> Can you take a look at my x86 backend changes so that they are ready >>> to check in once we have consensus. >> >> Please finish the talks about the correct approach first. Once the >> consensus is reached, please post the final version of the patches for >> review. >> >> BTW: I have no detailed insight in these issues, so I'll look mostly >> at the implementation details, probably early next week. > > One general remark is on the usage of -1 as an invalid register This has been rewritten. The checked in patch no longer does that. >>> >>> Another issue: >>> >>> +static void >>> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p) >>> +{ >>> + if (USE_HIDDEN_LINKONCE) >>> +{ >>> + const char *bnd = need_bnd_p ? "_bnd" : ""; >>> + if (regno >= 0) >>> +{ >>> + const char *reg_prefix; >>> + if (LEGACY_INT_REGNO_P (regno)) >>> +reg_prefix = TARGET_64BIT ? "r" : "e"; >>> + else >>> +reg_prefix = ""; >>> + sprintf (name, "__x86_indirect_thunk%s_%s%s", >>> + bnd, reg_prefix, reg_names[regno]); >>> +} >>> + else >>> +sprintf (name, "__x86_indirect_thunk%s", bnd); >>> +} >>> >>> What is the benefit of reg_prefix? Can't we just live with e.g.: >>> >>> __x86_indirect_thunk_ax >>> >>> which is the true register name and is valid for 32bit and 64bit targets. >> >> They are used in asm statements in kernel: >> >> extern void (*func_p) (void); >> >> void >> foo (void) >> { >> asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p)); >> } >> >> it generates: >> >> foo: >> movq func_p(%rip), %rax >> call __x86_indirect_thunk_rax >> ret > > Please fix %V to output reg_name instead. > David, please comment. -- H.J.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 8:45 AM, Jakub Jelinek wrote: > On Sun, Jan 14, 2018 at 08:41:54AM -0800, H.J. Lu wrote: >> They are used in asm statements in kernel: >> >> extern void (*func_p) (void); >> >> void >> foo (void) >> { >> asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p)); > > Well, using it just with a single register classes wouldn't make much sense, > then you can just use "call __x86_indirect_thunk_rax" > or "call __x86_indirect_thunk_eax" depending on __x86_64__, you wouldn't > need to extend anything. > But supposedly if you use it with "r" or "q" or similar class this will be > different. > I believe "r" is allowed. -- H.J.
Re: [RFC][PATCH] PR preprocessor/83173: Additional check before decrementing highest_location
On 01/12/2018 06:16 PM, David Malcolm wrote: > On Fri, 2017-12-01 at 17:57 -0500, Mike Gulick wrote: >> I've come up with some patches that fix PR preprocessor/83173, which >> I reported >> a couple of weeks ago. >> >> The first patch is a test case. The second and third patches are two >> versions >> of the fix. The first version is simpler, but it may still leave in >> place some >> subtle incorrect behavior that happens when the current source >> location is less >> than LINE_MAP_MAX_COLUMN_NUMBER. The second version tries to handle >> that case >> as well, however I'm less comfortable with it as I don't know whether >> I'm >> computing the source_location of the *end* of the current line >> correctly in all >> cases. Both of these pass the gcc/g++ test suites with no >> regressions. >> >> Thanks in advance for the review/feedback! >> >> -Mike > > Hi Mike; sorry about the delay in reviewing this. > > Do you have the gcc contributor paperwork in place? Hi Dave. I don't have any gcc contributor paperwork in place, however I just finished that process for a gdb patch so I think it could be done pretty quickly. >> From 6ff0068284c346c8db08c4b6b4d9a66d8464aeac Mon Sep 17 00:00:00 2001 >> From: Mike Gulick >> Date: Thu, 30 Nov 2017 18:35:48 -0500 >> Subject: [PATCH 1/2] PR preprocessor/83173: New test >> >> 2017-12-01 Mike Gulick >> >> PR preprocessor/83173 >> * gcc.dg/plugin/pr83173.c: New test. >> * gcc.dg/plugin/pr83173.h: Header for pr83173.c >> * gcc.dg/plugin/pr83173-1.h: Header for pr83173.c >> * gcc.dg/plugin/pr83173-2.h: Header for pr83173.c >> * gcc.dg/plugin/location_overflow_pp_plugin.c: New plugin to >> override line_table->highest_location for preprocessor. >> --- >> .../gcc.dg/plugin/location_overflow_pp_plugin.c| 44 >> ++ >> gcc/testsuite/gcc.dg/plugin/plugin.exp | 1 + >> gcc/testsuite/gcc.dg/plugin/pr83173-1.h| 2 + >> gcc/testsuite/gcc.dg/plugin/pr83173-2.h| 2 + >> gcc/testsuite/gcc.dg/plugin/pr83173.c | 21 +++ >> gcc/testsuite/gcc.dg/plugin/pr83173.h | 2 + >> 6 files changed, 72 insertions(+) >> create mode 100644 gcc/testsuite/gcc.dg/plugin/location_overflow_pp_plugin.c >> create mode 100644 gcc/testsuite/gcc.dg/plugin/pr83173-1.h >> create mode 100644 gcc/testsuite/gcc.dg/plugin/pr83173-2.h >> create mode 100644 gcc/testsuite/gcc.dg/plugin/pr83173.c >> create mode 100644 gcc/testsuite/gcc.dg/plugin/pr83173.h >> >> diff --git a/gcc/testsuite/gcc.dg/plugin/location_overflow_pp_plugin.c >> b/gcc/testsuite/gcc.dg/plugin/location_overflow_pp_plugin.c >> new file mode 100644 >> index 000..ba5a795b937 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/plugin/location_overflow_pp_plugin.c >> @@ -0,0 +1,44 @@ >> +/* Plugin for testing how gracefully we degrade in the face of very >> + large source files. */ >> + >> +#include "config.h" >> +#include "gcc-plugin.h" >> +#include "system.h" >> +#include "coretypes.h" >> +#include "diagnostic.h" >> + >> +int plugin_is_GPL_compatible; >> + >> +static location_t base_location; >> + >> +/* Callback handler for the PLUGIN_PRAGMAS event. This is used to set the >> + initial line table offset for the preprocessor, to make it appear as if >> we >> + had parsed a very large file. PRAGMA_START_UNIT is not suitable here as >> is > > PLUGIN_START_UNIT, presumably? > Sorry, yes that is right. >> + not invoked during the preprocessor stage. */ > > This new test plugin seems almost identical to the existing > location_overflow_plugin.c. Yes, I was originally hoping to use location_overflow_plugin.c until I realized that PLUGIN_START_UNIT event was not triggered during the preprocessor. Additionally, location_overflow_plugin.c has some restrictions that allow only a couple of specific initial offsets to be accepted, although I don't see any reason why that couldn't be changed. > > I tested changing the existing plugin to use PLUGIN_PRAGMAS rather than > PLUGIN_START_UNIT, and it works fine for that event, so if that's the > only difference, then maybe we don't need this new plugin? > I imagine it would work, although it does seem like PLUGIN_PRAGMAS is being used for something other than its intended purpose here, which I was trying to keep to a minimum. Its been a while since I wrote this patch, and I can't recall whether I noticed any other side-effects of using PLUGIN_PRAGMAS that could impact the existing location overflow tests. I'll take a look at it again and update the patch or report back any concerns. > [...snip...] > >> diff --git a/gcc/testsuite/gcc.dg/plugin/pr83173.c >> b/gcc/testsuite/gcc.dg/plugin/pr83173.c >> new file mode 100644 >> index 000..ff1858a2b33 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/plugin/pr83173.c >> @@ -0,0 +1,21 @@ >> +/* >> + { dg-options "-fplugin-arg-location_overflow_pp_plugin-value=0x6001" } > > This hardcodes a location
ia64: Update libstdc++ baseline_symbols.txt
Installed as obvious. Andreas. PR libstdc++/81092 * config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Update. diff --git a/libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt b/libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt index 0b166d5e2a..0411f410f6 100644 --- a/libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt +++ b/libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt @@ -444,7 +444,7 @@ FUNC:_ZNKSt13basic_fstreamIwSt11char_traitsIwEE7is_openEv@GLIBCXX_3.4 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6gcountEv@@GLIBCXX_3.4 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4 FUNC:_ZNKSt13basic_ostreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4 -FUNC:_ZNKSt13random_device13_M_getentropyEv@@GLIBCXX_3.4.24 +FUNC:_ZNKSt13random_device13_M_getentropyEv@@GLIBCXX_3.4.25 FUNC:_ZNKSt13runtime_error4whatEv@@GLIBCXX_3.4 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE5rdbufEv@@GLIBCXX_3.4 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE7is_openEv@@GLIBCXX_3.4.5 @@ -4006,6 +4006,7 @@ OBJECT:0:GLIBCXX_3.4.21 OBJECT:0:GLIBCXX_3.4.22 OBJECT:0:GLIBCXX_3.4.23 OBJECT:0:GLIBCXX_3.4.24 +OBJECT:0:GLIBCXX_3.4.25 OBJECT:0:GLIBCXX_3.4.3 OBJECT:0:GLIBCXX_3.4.4 OBJECT:0:GLIBCXX_3.4.5 -- 2.15.1 -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
[PATCH, rs6000] Executable tests for -msafe-indirect-jumps
Hi, It was pointed out off-list that I should add some executable tests for the new -msafe-indirect-jumps implementation. This patch adds three such tests to demonstrate correct behavior. Tested on powerpc64-linux-gnu and powerpc64le-linux-gnu. Are these tests okay for trunk after the other patch is approved? Thanks, Bill 2018-01-14 Bill Schmidt * gcc.target/powerpc/safe-indirect-jump-4.c: New file. * gcc.target/powerpc/safe-indirect-jump-5.c: New file. * gcc.target/powerpc/safe-indirect-jump-6.c: New file. Index: gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-4.c === --- gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-4.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-4.c (working copy) @@ -0,0 +1,25 @@ +/* { dg-do run { target { powerpc64le-*-* } } } */ +/* { dg-additional-options "-msafe-indirect-jumps" } */ + +/* Test for deliberate misprediction of indirect calls for ELFv2. */ + +int (*f)(); + +int __attribute__((noinline)) bar () +{ + return (*f) (); +} + +int g () +{ + return 26; +} + +int main () +{ + f = &g; + if (bar () != 26) +__builtin_abort (); + + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-5.c === --- gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-5.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-5.c (working copy) @@ -0,0 +1,55 @@ +/* { dg-do run { target { powerpc*-*-* } } } */ +/* { dg-additional-options "-msafe-indirect-jumps -Wno-pedantic" } */ + +/* Test for deliberate misprediction of computed goto. */ + +int __attribute__((noinline)) bar (int i) +{ + return 1960 + i; +} + +int __attribute__((noinline)) baz (int i) +{ + return i * i; +} + +int __attribute__((noinline)) spaz (int i) +{ + return i + 1; +} + +int foo (int x) +{ + static void *labptr[] = { &&lab0, &&lab1, &&lab2 }; + + if (x < 0 || x > 2) +return -1; + + goto *labptr[x]; + + lab0: + return bar (x); + + lab1: + return baz (x) + 1; + + lab2: + return spaz (x) / 2; +} + +int main () +{ + if (foo (0) != 1960) +__builtin_abort (); + + if (foo (1) != 2) +__builtin_abort (); + + if (foo (2) != 1) +__builtin_abort (); + + if (foo (3) != -1) +__builtin_abort (); + + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-6.c === --- gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-6.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-6.c (working copy) @@ -0,0 +1,80 @@ +/* { dg-do run { target { powerpc*-*-* } } } */ +/* { dg-additional-options "-msafe-indirect-jumps" } */ + +/* Test for deliberate misprediction of jump tables. */ + +void __attribute__((noinline)) bar () +{ +} + +int foo (int x) +{ + int a; + + switch (x) +{ +default: + a = -1; + break; +case 0: + a = x * x + 3; + break; +case 1: + a = x + 1; + break; +case 2: + a = x + x; + break; +case 3: + a = x << 3; + break; +case 4: + a = x >> 1; + break; +case 5: + a = x; + break; +case 6: + a = 0; + break; +case 7: + a = x * x + x; + break; +} + + bar(); + + return a; +} + +int main () +{ + if (foo (0) != 3) +__builtin_abort (); + + if (foo (1) != 2) +__builtin_abort (); + + if (foo (2) != 4) +__builtin_abort (); + + if (foo (3) != 24) +__builtin_abort (); + + if (foo (4) != 2) +__builtin_abort (); + + if (foo (5) != 5) +__builtin_abort (); + + if (foo (6) != 0) +__builtin_abort (); + + if (foo (7) != 56) +__builtin_abort (); + + if (foo (8) != -1) +__builtin_abort (); + + return 0; +}
[patch, committed] PR83811 fortran 'e' format broken for single digit exponents
Hello all, I committed the following as trivial. Regression tested on x86_64-pc-linux-gnu. This is a regression on 7 so I will backport. Regards, Jerry 2018-01-18 Jerry DeLisle PR libgfortran/83811 * write.c (select_buffer): Adjust buffer size up by 1. diff --git a/libgfortran/io/write.c b/libgfortran/io/write.c index 8021a1e9c4b..28ea852867b 100644 --- a/libgfortran/io/write.c +++ b/libgfortran/io/write.c @@ -1519,8 +1519,9 @@ select_buffer (st_parameter_dt *dtp, const fnode *f, int precision, { char *result; - /* The buffer needs at least one more byte to allow room for normalizing. */ - *size = size_from_kind (dtp, f, kind) + precision + 1; + /* The buffer needs at least one more byte to allow room for + normalizing and 1 to hold null terminator. */ + *size = size_from_kind (dtp, f, kind) + precision + 1 + 1; if (*size > BUF_STACK_SZ) result = xmalloc (*size); New test case: ! { dg-do run } ! PR83811 fortran 'e' format broken for single digit exponents program test character(25) :: s write(s, '(1pe5.0e1)') 1.e-4 if (s.ne."1.E-4") call abort write(s, '(e5.1e1)') 1.e12 if (s.ne."*") call abort end
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
This won't make the list; I'll send a more coherent and less HTML-afflicted version later. The bare 'ax' naming made it painful to instantiate the external thunks for 32-bit and 64-bot code because we had to put the e/r back again inside the .irp reg ax bx... code. We could probably have lived with that but it would be painful to change now that Linux and Xen patches with the current ABI are all lined up. I appreciate they weren't in GCC yet so we get little sympathy but these are strange times and we had to move fast. I'd really like *not* to change it now. Having the thunk name actually include the name of the register it's using does seem nicer anyway... On 14 Jan 2018 17:58, "H.J. Lu" wrote: On Sun, Jan 14, 2018 at 8:45 AM, Jakub Jelinek wrote: > On Sun, Jan 14, 2018 at 08:41:54AM -0800, H.J. Lu wrote: >> They are used in asm statements in kernel: >> >> extern void (*func_p) (void); >> >> void >> foo (void) >> { >> asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p)); > > Well, using it just with a single register classes wouldn't make much sense, > then you can just use "call __x86_indirect_thunk_rax" > or "call __x86_indirect_thunk_eax" depending on __x86_64__, you wouldn't > need to extend anything. > But supposedly if you use it with "r" or "q" or similar class this will be > different. > I believe "r" is allowed. -- H.J. Amazon Web Services UK Limited. Registered in England and Wales with registration number 08650665 and which has its registered office at 60 Holborn Viaduct, London EC1A 2FD, United Kingdom.
Re: [PATCH 3/5] x86: Add -mindirect-branch-register
- (ior (and (not (match_test "TARGET_X32")) + (ior (and (not (match_test "TARGET_X32 + || ix86_indirect_branch_thunk_register")) (match_operand 0 "sibcall_memory_operand")) - (and (match_test "TARGET_X32 && Pmode == DImode") + (and (match_test "TARGET_X32 && Pmode == DImode + && !ix86_indirect_branch_thunk_register") (match_operand 0 "GOT_memory_operand" Is this patch just trying to disable the predicate when ix86_indirect_branch_thunk_register is set? Because this is what this convoluted logic does. While there is some sense for "TARGET_X32 && Pmode == DImode", we try to use RTX codes as much as possible in constraints.md and predicates.md. Uros.
Re: [PATCH 3/5] x86: Add -mindirect-branch-register
On Sun, Jan 14, 2018 at 9:51 AM, Uros Bizjak wrote: > - (ior (and (not (match_test "TARGET_X32")) > + (ior (and (not (match_test "TARGET_X32 > + || ix86_indirect_branch_thunk_register")) > (match_operand 0 "sibcall_memory_operand")) > - (and (match_test "TARGET_X32 && Pmode == DImode") > + (and (match_test "TARGET_X32 && Pmode == DImode > + && !ix86_indirect_branch_thunk_register") > (match_operand 0 "GOT_memory_operand" > > Is this patch just trying to disable the predicate when > ix86_indirect_branch_thunk_register is set? Because this is what this > convoluted logic does. Yes, we want to disable all indirect branch via memory with -mindirect-branch-register, just like -mx32. We could do #idefine TARGET_INDIRECT_BRANCH_REGISTER \ (TARGER_X32 || ix86_indirect_branch_thunk_register) > While there is some sense for "TARGET_X32 && Pmode == DImode", we try > to use RTX codes as much as possible in constraints.md and > predicates.md. > > Uros. -- H.J.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 6:44 PM, Woodhouse, David wrote: > This won't make the list; I'll send a more coherent and less HTML-afflicted > version later. > > The bare 'ax' naming made it painful to instantiate the external thunks for > 32-bit and 64-bot code because we had to put the e/r back again inside the > .irp reg ax bx... code. > > We could probably have lived with that but it would be painful to change now > that Linux and Xen patches with the current ABI are all lined up. I > appreciate they weren't in GCC yet so we get little sympathy but these are > strange times and we had to move fast. > > I'd really like *not* to change it now. Having the thunk name actually > include the name of the register it's using does seem nicer anyway... That's unfortunate... I suspect that in the future, one will need #ifdef __x86_64__ around eventual calls to thunks from c code because of this decision, since thunks for x86_64 target will have different names than thunks for x86_32 target. I don't know if the (single?) case of mixing 32 and 64 bit assembly in the highly specialized part of the kernel really warrants this decision. Future programmers will be grateful if kernel people can re-consider their choice in not-yet-release ABI. Uros.
Re: [PATCH 3/5] x86: Add -mindirect-branch-register
On Sun, Jan 14, 2018 at 7:08 PM, H.J. Lu wrote: > On Sun, Jan 14, 2018 at 9:51 AM, Uros Bizjak wrote: >> - (ior (and (not (match_test "TARGET_X32")) >> + (ior (and (not (match_test "TARGET_X32 >> + || ix86_indirect_branch_thunk_register")) >> (match_operand 0 "sibcall_memory_operand")) >> - (and (match_test "TARGET_X32 && Pmode == DImode") >> + (and (match_test "TARGET_X32 && Pmode == DImode >> + && !ix86_indirect_branch_thunk_register") >> (match_operand 0 "GOT_memory_operand" >> >> Is this patch just trying to disable the predicate when >> ix86_indirect_branch_thunk_register is set? Because this is what this >> convoluted logic does. > > Yes, we want to disable all indirect branch via memory with > -mindirect-branch-register, just like -mx32. We could do > > #idefine TARGET_INDIRECT_BRANCH_REGISTER \ > (TARGER_X32 || ix86_indirect_branch_thunk_register) Index: predicates.md === --- predicates.md (revision 25) +++ predicates.md (working copy) @@ -710,11 +710,10 @@ (ior (match_test "constant_call_address_operand (op, mode == VOIDmode ? mode : Pmode)") (match_operand 0 "call_register_no_elim_operand") - (ior (and (not (match_test "TARGET_X32 - || ix86_indirect_branch_thunk_register")) + (and (not (match_test "ix86_indirect_branch_thunk_register")) + (ior (and (not (match_test "TARGET_X32"))) (match_operand 0 "memory_operand")) - (and (match_test "TARGET_X32 && Pmode == DImode - && !ix86_indirect_branch_thunk_register") +(and (match_test "TARGET_X32 && Pmode == DImode") (match_operand 0 "GOT_memory_operand") or something like that. Uros.
Re: [PATCH 1/5] x86: Add -mindirect-branch=
On Sun, 2018-01-14 at 12:06 +0100, Jan Hubicka wrote: > > Yes, according to my understanding we want both pause and lefence by default > (it is expensive anyway) + command line option to control which one to use? > > Probably thus the first patch should default to both. In the kernel we're going to just do both. I'm not sure I see the merit in a command line option. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 7:22 PM, Uros Bizjak wrote: > On Sun, Jan 14, 2018 at 6:44 PM, Woodhouse, David wrote: >> This won't make the list; I'll send a more coherent and less HTML-afflicted >> version later. >> >> The bare 'ax' naming made it painful to instantiate the external thunks for >> 32-bit and 64-bot code because we had to put the e/r back again inside the >> .irp reg ax bx... code. >> >> We could probably have lived with that but it would be painful to change now >> that Linux and Xen patches with the current ABI are all lined up. I >> appreciate they weren't in GCC yet so we get little sympathy but these are >> strange times and we had to move fast. >> >> I'd really like *not* to change it now. Having the thunk name actually >> include the name of the register it's using does seem nicer anyway... > > That's unfortunate... I suspect that in the future, one will need > #ifdef __x86_64__ around eventual calls to thunks from c code because > of this decision, since thunks for x86_64 target will have different > names than thunks for x86_32 target. I don't know if the (single?) > case of mixing 32 and 64 bit assembly in the highly specialized part > of the kernel really warrants this decision. Future programmers will > be grateful if kernel people can re-consider their choice in > not-yet-release ABI. A quick look through latest x86/pti update [1] shows: +#ifdef CONFIG_X86_32 +#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## reg(void); +#else +#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## reg(void); +INDIRECT_THUNK(8) +INDIRECT_THUNK(9) +INDIRECT_THUNK(10) +INDIRECT_THUNK(11) +INDIRECT_THUNK(12) +INDIRECT_THUNK(13) +INDIRECT_THUNK(14) +INDIRECT_THUNK(15) +#endif +INDIRECT_THUNK(ax) +INDIRECT_THUNK(bx) +INDIRECT_THUNK(cx) +INDIRECT_THUNK(dx) +INDIRECT_THUNK(si) +INDIRECT_THUNK(di) +INDIRECT_THUNK(bp) +INDIRECT_THUNK(sp) and: +/* + * Despite being an assembler file we can't just use .irp here + * because __KSYM_DEPS__ only uses the C preprocessor and would + * only see one instance of "__x86_indirect_thunk_\reg" rather + * than one per register with the correct names. So we do it + * the simple and nasty way... + */ +#define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg) +#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg) + +GENERATE_THUNK(_ASM_AX) +GENERATE_THUNK(_ASM_BX) +GENERATE_THUNK(_ASM_CX) +GENERATE_THUNK(_ASM_DX) +GENERATE_THUNK(_ASM_SI) +GENERATE_THUNK(_ASM_DI) +GENERATE_THUNK(_ASM_BP) +GENERATE_THUNK(_ASM_SP) +#ifdef CONFIG_64BIT +GENERATE_THUNK(r8) +GENERATE_THUNK(r9) +GENERATE_THUNK(r10) +GENERATE_THUNK(r11) +GENERATE_THUNK(r12) +GENERATE_THUNK(r13) +GENERATE_THUNK(r14) +GENERATE_THUNK(r15) I have a feeling that using e.g. __x86_indirect_thunk_ax would be more convenient in both cases. [1] https://www.spinics.net/lists/kernel/msg2697606.html Uros.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, 2018-01-14 at 21:21 +0100, Uros Bizjak wrote: > A quick look through latest x86/pti update [1] shows: > > +#ifdef CONFIG_X86_32 > +#define INDIRECT_THUNK(reg) extern asmlinkage void > __x86_indirect_thunk_e ## reg(void); > +#else > +#define INDIRECT_THUNK(reg) extern asmlinkage void > __x86_indirect_thunk_r ## reg(void); > +INDIRECT_THUNK(8) > +INDIRECT_THUNK(9) > +INDIRECT_THUNK(10) > +INDIRECT_THUNK(11) > +INDIRECT_THUNK(12) > +INDIRECT_THUNK(13) > +INDIRECT_THUNK(14) > +INDIRECT_THUNK(15) > +#endif > +INDIRECT_THUNK(ax) > +INDIRECT_THUNK(bx) > +INDIRECT_THUNK(cx) > +INDIRECT_THUNK(dx) > +INDIRECT_THUNK(si) > +INDIRECT_THUNK(di) > +INDIRECT_THUNK(bp) > +INDIRECT_THUNK(sp) Yeah, that one is purely for the CONFIG_MODVERSIONS system, which I'm hoping to fix properly by not having to have fake (and clearly incorrect) C prototypes for the thunks which aren't actually C functions. It's intended to go away. > and: > > +/* > + * Despite being an assembler file we can't just use .irp here > + * because __KSYM_DEPS__ only uses the C preprocessor and would > + * only see one instance of "__x86_indirect_thunk_\reg" rather > + * than one per register with the correct names. So we do it > + * the simple and nasty way... > + */ > +#define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg) > +#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg) > + > +GENERATE_THUNK(_ASM_AX) > +GENERATE_THUNK(_ASM_BX) > +GENERATE_THUNK(_ASM_CX) > +GENERATE_THUNK(_ASM_DX) > +GENERATE_THUNK(_ASM_SI) > +GENERATE_THUNK(_ASM_DI) > +GENERATE_THUNK(_ASM_BP) > +GENERATE_THUNK(_ASM_SP) > +#ifdef CONFIG_64BIT > +GENERATE_THUNK(r8) > +GENERATE_THUNK(r9) > +GENERATE_THUNK(r10) > +GENERATE_THUNK(r11) > +GENERATE_THUNK(r12) > +GENERATE_THUNK(r13) > +GENERATE_THUNK(r14) > +GENERATE_THUNK(r15) > > I have a feeling that using e.g. __x86_indirect_thunk_ax would be more > convenient in both cases. Likewise, the CONFIG_TRIM_UNUSED_SYMBOLS mechanism in the kernel passes .S files through the preprocessor and looks for EXPORT_SYMBOL, so it wasn't working well with my .irp-based implementation like the one in Xen. So I've swapped it out for this one for now. Again, I was hoping to clean that up and make it do something saner, and then this could switch back too. But sure, right now it isn't that might of a difference for me; my implementation has changed since I made that reqeust. I have no fundamental technical objection to the bare 'ax' naming. We can live with either. It's just that we've been asking for an agreement on the basics (the command line we use, and the thunk names) for some days now, and this is the first time we've had this discussion, and Linus has just taken the patches. That's still fine. I know we get no sympathy, and we *can* change the Linux kernel between -rc8 and -final if we must, and change the Xen patches too. I'd just rather not. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/5] Make std::future::wait_* use std::chrono::steady_clock when required
On Sunday 14 January 2018 at 16:08:09 +, Mike Crowe wrote: > Hi Torvald, > > Thanks for reviewing this change. > > On Saturday 13 January 2018 at 16:29:57 +0100, Torvald Riegel wrote: > > On Sun, 2018-01-07 at 20:55 +, Mike Crowe wrote: > > > This is a first attempt to make std::future::wait_until and > > > std::future::wait_for make correct use of > > > std::chrono::steady_clock/CLOCK_MONOTONIC. It also makes > > > std::future::wait_until react to changes to CLOCK_REALTIME during the > > > wait, but only when passed a std::chrono::system_clock time point. > > > > I have comments on the design. > > > > First, I don't think we should not change > > __atomic_futex_unsigned_base::_M_futex_wait_until, as there's a risk > > that we'll change behavior of existing applications that work as > > expected. > > I assume you mean "I don't think we should change" or "I think we should > not change"... :-) > > The only way I can see that behaviour will change for existing programs is > when the system clock changes (i.e. when someone calls settimeofday.) In > the existing code, the maximum wait time is fixed once gettimeofday is > called to calculate the relative timeout. When using FUTEX_CLOCK_REALTIME, > the maximum wait can change based on changes to the system clock after that > point. It appears that glibc made this transition successfully and > currently uses FUTEX_CLOCK_REALTIME. I think that the new behaviour is > better than the old behaviour. > > Or perhaps I've missed another possibility. Did you have another risk in > mind? > > > Instead, ISTM we should additionally expose the two options we have at > > the level of futexes: > > * Relative timeout using CLOCK_MONOTONIC > > * Absolute timeout using CLOCK_REALTIME (which will fall back to the > > former on old kernels, which is fine I think). > > > > Then we do the following translations from functions that programs would > > call to the new futex functions: > > > > 1) wait_for is a loop in which we load the current time from the steady > > clock, then call the relative futex wait, and if that returns for a > > spurious reason (ie, neither timeout nor is the expected value present), > > we reduce the prior relative amount by the difference between the time > > before the futex wait and the current time. > > If we're going to loop on a relative timeout it sounds safer to convert it > to an absolute (steady clock) timeout. That way we won't risk increasing > the timeout if the scheduler decides not to run us at an inopportune moment > between waits. _M_load_when_equal_for already does this. > > _M_load_and_test_until already has a loop for spurious wakeup. I think that > it makes sense to only loop at one level. That loop relies on the timeout > being absolute, which is why my _M_load_and_test_until_steady also uses an > absolute timeout. > > > 2) wait_until using the steady clock is a loop similar to wait_for, just > > that we additionally compute the initial relative timeout. > > Clearly an absolute wait can be implemented in terms of a relative one and > vice-versa, but at least in my attempts to write them I find the code > easier to understand (and therefore get right) if the fundamental wait is > the absolute one and the relative one is implemented on top of it. I had a quick go at implementing at least the first part of your design, as I understood it. (I've kept the loops inside atomic_futex_unsigned - and I think that you wanted to move them out to the client code.) I've not tested it much. I think that this implementation of _M_load_and_test_for is rather more error-prone than my previous _M_load_and_test_until_steady. That's probably partly because the type-safe duration has already been separated into seconds and nanoseconds. It would be nice to push this separation as deeply as possible in the code, but I'm afraid that would break ABI compatibility. Thanks. Mike. --8<-- diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h index ad9437da4e2..fa4a4382c79 100644 --- a/libstdc++-v3/include/bits/atomic_futex.h +++ b/libstdc++-v3/include/bits/atomic_futex.h @@ -57,6 +57,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_futex_wait_until(unsigned *__addr, unsigned __val, bool __has_timeout, chrono::seconds __s, chrono::nanoseconds __ns); +// Returns false iff a timeout occurred. +bool +_M_futex_wait_for(unsigned *__addr, unsigned __val, bool __has_timeout, + chrono::seconds __s, chrono::nanoseconds __ns); + // This can be executed after the object has been destroyed. static void _M_futex_notify_all(unsigned* __addr); }; @@ -110,6 +115,40 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } } +// If a timeout occurs, returns a current value after the timeout; +// otherwise, returns the operand's value if equal is true or a different +// value if equal is false. +// The assumed value is the caller's assumption about the current value +
Re: [PATCH 3/5] x86: Add -mindirect-branch-register
On Sun, Jan 14, 2018 at 10:52 AM, Uros Bizjak wrote: > On Sun, Jan 14, 2018 at 7:08 PM, H.J. Lu wrote: >> On Sun, Jan 14, 2018 at 9:51 AM, Uros Bizjak wrote: >>> - (ior (and (not (match_test "TARGET_X32")) >>> + (ior (and (not (match_test "TARGET_X32 >>> + || ix86_indirect_branch_thunk_register")) >>> (match_operand 0 "sibcall_memory_operand")) >>> - (and (match_test "TARGET_X32 && Pmode == DImode") >>> + (and (match_test "TARGET_X32 && Pmode == DImode >>> + && !ix86_indirect_branch_thunk_register") >>> (match_operand 0 "GOT_memory_operand" >>> >>> Is this patch just trying to disable the predicate when >>> ix86_indirect_branch_thunk_register is set? Because this is what this >>> convoluted logic does. >> >> Yes, we want to disable all indirect branch via memory with >> -mindirect-branch-register, just like -mx32. We could do >> >> #idefine TARGET_INDIRECT_BRANCH_REGISTER \ >> (TARGER_X32 || ix86_indirect_branch_thunk_register) > > Index: predicates.md > === > --- predicates.md (revision 25) > +++ predicates.md (working copy) > @@ -710,11 +710,10 @@ >(ior (match_test "constant_call_address_operand > (op, mode == VOIDmode ? mode : Pmode)") > (match_operand 0 "call_register_no_elim_operand") > - (ior (and (not (match_test "TARGET_X32 > - || ix86_indirect_branch_thunk_register")) > + (and (not (match_test "ix86_indirect_branch_thunk_register")) > + (ior (and (not (match_test "TARGET_X32"))) > (match_operand 0 "memory_operand")) > - (and (match_test "TARGET_X32 && Pmode == DImode > - && !ix86_indirect_branch_thunk_register") > +(and (match_test "TARGET_X32 && Pmode == DImode") > (match_operand 0 "GOT_memory_operand") > > or something like that. > LGTM. Thanks. -- H.J.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 9:34 PM, David Woodhouse wrote: > Likewise, the CONFIG_TRIM_UNUSED_SYMBOLS mechanism in the kernel passes > .S files through the preprocessor and looks for EXPORT_SYMBOL, so it > wasn't working well with my .irp-based implementation like the one in > Xen. So I've swapped it out for this one for now. > > Again, I was hoping to clean that up and make it do something saner, > and then this could switch back too. > > But sure, right now it isn't that might of a difference for me; my > implementation has changed since I made that reqeust. I have no > fundamental technical objection to the bare 'ax' naming. We can live > with either. > > It's just that we've been asking for an agreement on the basics (the > command line we use, and the thunk names) for some days now, and this > is the first time we've had this discussion, and Linus has just taken > the patches. > > That's still fine. I know we get no sympathy, and we *can* change the > Linux kernel between -rc8 and -final if we must, and change the Xen > patches too. I'd just rather not. Well, you did say that these are strange times ;) >From the user perspective, it would be more convenient to use the thunk names that are the same for 32bit and 64bit targets. If we ignore this fact, the difference is only a couple of lines in the compiler source which we also can live with. But please discuss my proposal also in the kernel community, and weight the benefits and drawbacks of each approach before the final decision. Please pass the final decision to gcc community, and we'll implement it. Uros.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, 2018-01-14 at 21:52 +0100, Uros Bizjak wrote: > On Sun, Jan 14, 2018 at 9:34 PM, David Woodhouse wrote: > > But sure, right now it isn't that might of a difference for me; my > > implementation has changed since I made that reqeust. I have no > > fundamental technical objection to the bare 'ax' naming. We can live > > with either. > > > > It's just that we've been asking for an agreement on the basics (the > > command line we use, and the thunk names) for some days now, and this > > is the first time we've had this discussion, and Linus has just taken > > the patches. > > > > That's still fine. I know we get no sympathy, and we *can* change the > > Linux kernel between -rc8 and -final if we must, and change the Xen > > patches too. I'd just rather not. > Well, you did say that these are strange times ;) > > From the user perspective, it would be more convenient to use the > thunk names that are the same for 32bit and 64bit targets. If we > ignore this fact, the difference is only a couple of lines in the > compiler source which we also can live with. But please discuss my > proposal also in the kernel community, and weight the benefits and > drawbacks of each approach before the final decision. > > Please pass the final decision to gcc community, and we'll implement it. +Linus, Thomas. Review on the GCC patches has led to a request that the thunk symbols be changed from e.g. __x86_indirect_thunk_rax to __x86_indirect_thunk_ax without the 'r'. If we're going to change the thunk names, it's best to do it *right* now before the 4.15-rc8 release. I genuinely don't care at this point what the thunk names are. It's just that Linus is probably preparing the -rc8 release as we speak, and I'd want to do a new compiler build and set of tests if we make the change. For that reason alone, I'm inclined to answer that we should leave them as they are. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse wrote: > Review on the GCC patches has led to a request that the thunk symbols > be changed from e.g. __x86_indirect_thunk_rax to > __x86_indirect_thunk_ax without the 'r'. Ok. I think that just makes it easier for us, since then the names are independent of 32-vs/64, and we don't need to use the _ASM_XY names. What about r8-r15? I'm assuming 'r' there is used? Mind sending me a tested patch? I'll was indeed planning on generating rc8, but I might as well go grocery shopping now instead, and do rc8 later in the evening. Linus
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote: > On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse wrote: > > Review on the GCC patches has led to a request that the thunk symbols > > be changed from e.g. __x86_indirect_thunk_rax to > > __x86_indirect_thunk_ax without the 'r'. > > Ok. I think that just makes it easier for us, since then the names are > independent of 32-vs/64, and we don't need to use the _ASM_XY names. > > What about r8-r15? I'm assuming 'r' there is used? > > Mind sending me a tested patch? I'll was indeed planning on generating > rc8, but I might as well go grocery shopping now instead, and do rc8 > later in the evening. I'll kick off a compiler build now... smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, 14 Jan 2018, David Woodhouse wrote: > On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote: > > On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse > > wrote: > > > Review on the GCC patches has led to a request that the thunk symbols > > > be changed from e.g. __x86_indirect_thunk_rax to > > > __x86_indirect_thunk_ax without the 'r'. > > > > Ok. I think that just makes it easier for us, since then the names are > > independent of 32-vs/64, and we don't need to use the _ASM_XY names. > > > > What about r8-r15? I'm assuming 'r' there is used? > > > > Mind sending me a tested patch? I'll was indeed planning on generating > > rc8, but I might as well go grocery shopping now instead, and do rc8 > > later in the evening. > > I'll kick off a compiler build now... Send the patch to me/LKML. I'm queueing the compile time warning removal and then can add that one on top so Linus can pull the lot. Thanks, tglx
Re: [PATCH 3/5] x86: Add -mindirect-branch-register
On Sun, Jan 14, 2018 at 10:52 AM, Uros Bizjak wrote: > On Sun, Jan 14, 2018 at 7:08 PM, H.J. Lu wrote: >> On Sun, Jan 14, 2018 at 9:51 AM, Uros Bizjak wrote: >>> - (ior (and (not (match_test "TARGET_X32")) >>> + (ior (and (not (match_test "TARGET_X32 >>> + || ix86_indirect_branch_thunk_register")) >>> (match_operand 0 "sibcall_memory_operand")) >>> - (and (match_test "TARGET_X32 && Pmode == DImode") >>> + (and (match_test "TARGET_X32 && Pmode == DImode >>> + && !ix86_indirect_branch_thunk_register") >>> (match_operand 0 "GOT_memory_operand" >>> >>> Is this patch just trying to disable the predicate when >>> ix86_indirect_branch_thunk_register is set? Because this is what this >>> convoluted logic does. >> >> Yes, we want to disable all indirect branch via memory with >> -mindirect-branch-register, just like -mx32. We could do >> >> #idefine TARGET_INDIRECT_BRANCH_REGISTER \ >> (TARGER_X32 || ix86_indirect_branch_thunk_register) > > Index: predicates.md > === > --- predicates.md (revision 25) > +++ predicates.md (working copy) > @@ -710,11 +710,10 @@ >(ior (match_test "constant_call_address_operand > (op, mode == VOIDmode ? mode : Pmode)") > (match_operand 0 "call_register_no_elim_operand") > - (ior (and (not (match_test "TARGET_X32 > - || ix86_indirect_branch_thunk_register")) > + (and (not (match_test "ix86_indirect_branch_thunk_register")) > + (ior (and (not (match_test "TARGET_X32"))) > (match_operand 0 "memory_operand")) > - (and (match_test "TARGET_X32 && Pmode == DImode > - && !ix86_indirect_branch_thunk_register") > +(and (match_test "TARGET_X32 && Pmode == DImode") > (match_operand 0 "GOT_memory_operand") > > or something like that. > I am testing this patch. OK for trunk if there is no regression? -- H.J. From e332a970f82439c5169e6f87c77f15123e7ec49e Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sun, 14 Jan 2018 13:19:37 -0800 Subject: [PATCH] x86: Rewrite ix86_indirect_branch_thunk_register logic Rewrite ix86_indirect_branch_thunk_register logic with (and (not (match_test "ix86_indirect_branch_thunk_register")) (original condition before r256662)) * config/i386/predicates.md (indirect_branch_operand): Rewrite ix86_indirect_branch_thunk_register logic. (constant_call_address_operand): Likewise. (sibcall_insn_operand): Likewise. --- gcc/config/i386/predicates.md | 30 ++ 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 5ae443231b8..68556d8b827 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -664,10 +664,10 @@ ;; Test for a valid operand for indirect branch. (define_predicate "indirect_branch_operand" - (ior (match_operand 0 "register_operand") - (and (not (match_test "TARGET_X32 - || ix86_indirect_branch_thunk_register")) - (match_operand 0 "memory_operand" + (and (not (match_test "ix86_indirect_branch_thunk_register")) + (ior (match_operand 0 "register_operand") + (and (not (match_test "TARGET_X32")) + (match_operand 0 "memory_operand") ;; Return true if OP is a memory operands that can be used in sibcalls. ;; Since sibcall never returns, we can only use call-clobbered register @@ -710,24 +710,22 @@ (ior (match_test "constant_call_address_operand (op, mode == VOIDmode ? mode : Pmode)") (match_operand 0 "call_register_no_elim_operand") - (ior (and (not (match_test "TARGET_X32 - || ix86_indirect_branch_thunk_register")) - (match_operand 0 "memory_operand")) - (and (match_test "TARGET_X32 && Pmode == DImode - && !ix86_indirect_branch_thunk_register") - (match_operand 0 "GOT_memory_operand") + (and (not (match_test "ix86_indirect_branch_thunk_register")) + (ior (and (not (match_test "TARGET_X32")) + (match_operand 0 "memory_operand")) + (and (match_test "TARGET_X32 && Pmode == DImode") + (match_operand 0 "GOT_memory_operand")) ;; Similarly, but for tail calls, in which we cannot allow memory references. (define_special_predicate "sibcall_insn_operand" (ior (match_test "constant_call_address_operand (op, mode == VOIDmode ? mode : Pmode)") (match_operand 0 "register_no_elim_operand") - (ior (and (not (match_test "TARGET_X32 - || ix86_indirect_branch_thunk_register")) - (match_operand 0 "sibcall_memory_operand")) - (and (match_test "TARGET_X32 && Pmode == DImode - && !ix86_indirect_branch_thunk_register") - (match_operand 0 "GOT_memory_operand") + (and (not (match_test "ix86_indirect_branch_thunk_register")) + (ior (and (not (match_test "TARGET_X32")) +
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 1:07 PM, Linus Torvalds wrote: > On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse wrote: >> Review on the GCC patches has led to a request that the thunk symbols >> be changed from e.g. __x86_indirect_thunk_rax to >> __x86_indirect_thunk_ax without the 'r'. > > Ok. I think that just makes it easier for us, since then the names are > independent of 32-vs/64, and we don't need to use the _ASM_XY names. > > What about r8-r15? I'm assuming 'r' there is used? They will remain r8-r15. > Mind sending me a tested patch? I'll was indeed planning on generating > rc8, but I might as well go grocery shopping now instead, and do rc8 > later in the evening. > > Linus -- H.J.
Re: [patch, committed] PR83811 fortran 'e' format broken for single digit exponents
On 01/14/2018 09:42 AM, Jerry DeLisle wrote: > Hello all, > > I committed the following as trivial. > > Regression tested on x86_64-pc-linux-gnu. > > This is a regression on 7 so I will backport. > > Regards, > > Jerry > > 2018-01-18 Jerry DeLisle > > PR libgfortran/83811 > * write.c (select_buffer): Adjust buffer size up by 1. > Committed to gcc7 after regression testing and I fixed the dates I had in the ChangeLogs. Cheers, Jerry
Re: [PATCH] document -Wclass-memaccess suppression by casting (PR 81327)
On 01/14/2018 03:31 AM, Jakub Jelinek wrote: On Sat, Jan 13, 2018 at 04:14:38PM -0700, Martin Sebor wrote: -The @option{-Wclass-memaccess} option is enabled by @option{-Wall}. +The @option{-Wclass-memaccess} option is enabled by @option{-Wall}. Casting Perhaps "Explicitly casting" instead? The implicit cast doesn't suppress it and occurs whenever there isn't an explicit cast. +the pointer to the class object to @code{void *} or to a type that can be +safely accessed by the raw memory function suppresses the warning. Sure. I've committed r256677. Martin
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote: > On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse wrote: > > > > Review on the GCC patches has led to a request that the thunk symbols > > be changed from e.g. __x86_indirect_thunk_rax to > > __x86_indirect_thunk_ax without the 'r'. > Ok. I think that just makes it easier for us, since then the names are > independent of 32-vs/64, and we don't need to use the _ASM_XY names. > > What about r8-r15? I'm assuming 'r' there is used? Ah yes, *this* is why I hated it... for 'r8' onwards that is indeed the register names as well as the suffix of the thunk name. But for the legacy registers I have to prepend 'e' or 'r' myself in the macro. So it ends up looking like this: .macro THUNK reg .section .text.__x86.indirect_thunk.\reg ENTRY(__x86_indirect_thunk_\reg) CFI_STARTPROC $done = 0 .irp xreg r8 r9 r10 r11 r12 r13 r14 r15 .ifeqs "\reg", "\xreg" JMP_NOSPEC %\reg $done = 1 .endif .endr .if $done != 1 JMP_NOSPEC %__ASM_REG(\reg) .endif CFI_ENDPROC ENDPROC(__x86_indirect_thunk_\reg) .endm /* * Despite being an assembler file we can't just use .irp here * because __KSYM_DEPS__ only uses the C preprocessor and would * only see one instance of "__x86_indirect_thunk_\reg" rather * than one per register with the correct names. So we do it * the simple and nasty way... */ #define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg) #define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg) GENERATE_THUNK(ax) GENERATE_THUNK(bx) GENERATE_THUNK(cx) GENERATE_THUNK(dx) GENERATE_THUNK(si) GENERATE_THUNK(di) GENERATE_THUNK(bp) #ifdef CONFIG_64BIT GENERATE_THUNK(r8) GENERATE_THUNK(r9) GENERATE_THUNK(r10) GENERATE_THUNK(r11) GENERATE_THUNK(r12) GENERATE_THUNK(r13) GENERATE_THUNK(r14) GENERATE_THUNK(r15) #endif And *that* was the point at which I asked HJ to just use the proper bloody register names :) smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 1:58 PM, David Woodhouse wrote: > On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote: >> On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse wrote: >> > >> > Review on the GCC patches has led to a request that the thunk symbols >> > be changed from e.g. __x86_indirect_thunk_rax to >> > __x86_indirect_thunk_ax without the 'r'. >> Ok. I think that just makes it easier for us, since then the names are >> independent of 32-vs/64, and we don't need to use the _ASM_XY names. >> >> What about r8-r15? I'm assuming 'r' there is used? > > Ah yes, *this* is why I hated it... for 'r8' onwards that is indeed the > register names as well as the suffix of the thunk name. But for the > legacy registers I have to prepend 'e' or 'r' myself in the macro. So > it ends up looking like this: > > > .macro THUNK reg > .section .text.__x86.indirect_thunk.\reg > > ENTRY(__x86_indirect_thunk_\reg) > CFI_STARTPROC > $done = 0 > .irp xreg r8 r9 r10 r11 r12 r13 r14 r15 > .ifeqs "\reg", "\xreg" > JMP_NOSPEC %\reg > $done = 1 > .endif > .endr > .if $done != 1 > JMP_NOSPEC %__ASM_REG(\reg) > .endif > CFI_ENDPROC > ENDPROC(__x86_indirect_thunk_\reg) > .endm > > /* > * Despite being an assembler file we can't just use .irp here > * because __KSYM_DEPS__ only uses the C preprocessor and would > * only see one instance of "__x86_indirect_thunk_\reg" rather > * than one per register with the correct names. So we do it > * the simple and nasty way... > */ > #define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg) > #define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg) > > GENERATE_THUNK(ax) > GENERATE_THUNK(bx) > GENERATE_THUNK(cx) > GENERATE_THUNK(dx) > GENERATE_THUNK(si) > GENERATE_THUNK(di) > GENERATE_THUNK(bp) > #ifdef CONFIG_64BIT > GENERATE_THUNK(r8) > GENERATE_THUNK(r9) > GENERATE_THUNK(r10) > GENERATE_THUNK(r11) > GENERATE_THUNK(r12) > GENERATE_THUNK(r13) > GENERATE_THUNK(r14) > GENERATE_THUNK(r15) > #endif > > > And *that* was the point at which I asked HJ to just use the proper > bloody register names :) Please let me know if I should make the change to ax,..., r8,..r15. -- H.J.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, 2018-01-14 at 14:03 -0800, H.J. Lu wrote: > > > And *that* was the point at which I asked HJ to just use the proper > > bloody register names :) > > Please let me know if I should make the change to ax,..., r8,..r15. This is what I'm building my compiler with now, to make that change: http://git.infradead.org/users/dwmw2/gcc-retpoline.git/shortlog/refs/heads/retpoline-regnames At this point I'm inclined to suggest we don't make the change. I'll finish and test it anyway. I *can* change my GENERATE_THUNK macro to take two arguments — the suffix of the thunk name, and the register name to use. That lets me ditch the clever but ugly loop trick. I *did* want to get this file back to using .irp in the end, by fixing up other kernel infrastructure to do things properly. But I can live without that too if I must. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 2:09 PM, David Woodhouse wrote: > On Sun, 2018-01-14 at 14:03 -0800, H.J. Lu wrote: >> >> > And *that* was the point at which I asked HJ to just use the proper >> > bloody register names :) >> >> Please let me know if I should make the change to ax,..., r8,..r15. > > This is what I'm building my compiler with now, to make that change: > http://git.infradead.org/users/dwmw2/gcc-retpoline.git/shortlog/refs/heads/retpoline-regnames > > At this point I'm inclined to suggest we don't make the change. I'll > finish and test it anyway. I *can* change my GENERATE_THUNK macro to > take two arguments — the suffix of the thunk name, and the register > name to use. That lets me ditch the clever but ugly loop trick. > > I *did* want to get this file back to using .irp in the end, by fixing > up other kernel infrastructure to do things properly. But I can live > without that too if I must. Please use this GCC patch instead. -- H.J. From 223c07edf531eaa05c7fff564ce6b5dc48e6a49b Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sun, 14 Jan 2018 14:04:55 -0800 Subject: [PATCH] x86: Change register names in __x86_indirect_thunk_reg Change names of the lower 8 integer registers in __x86_indirect_thunk_reg to ax, dx, cx, bx, si, di and bp. gcc/ * config/i386/i386.c (indirect_thunk_name): Don't check LEGACY_INT_REGNO_P. (print_reg): Use reg_names[regno] for 'V'. * doc/extend.texi: Replace "the full integer" with "the integer" for 'V'. gcc/testsuite/ * gcc.target/i386/indirect-thunk-1.c: Updated. * gcc.target/i386/indirect-thunk-2.c: Likewise. * gcc.target/i386/indirect-thunk-3.c: Likewise. * gcc.target/i386/indirect-thunk-4.c: Likewise. * gcc.target/i386/indirect-thunk-7.c: Likewise. * gcc.target/i386/indirect-thunk-attr-1.c: Likewise. * gcc.target/i386/indirect-thunk-attr-2.c: Likewise. * gcc.target/i386/indirect-thunk-attr-5.c: Likewise. * gcc.target/i386/indirect-thunk-attr-6.c: Likewise. * gcc.target/i386/indirect-thunk-attr-7.c: Likewise. * gcc.target/i386/indirect-thunk-extern-1.c: Likewise. * gcc.target/i386/indirect-thunk-extern-2.c: Likewise. * gcc.target/i386/indirect-thunk-extern-3.c: Likewise. * gcc.target/i386/indirect-thunk-extern-4.c: Likewise. * gcc.target/i386/indirect-thunk-extern-7.c: Likewise. * gcc.target/i386/indirect-thunk-register-1.c: Likewise. * gcc.target/i386/indirect-thunk-register-3.c: Likewise. * gcc.target/i386/indirect-thunk-register-4.c: Likewise. * gcc.target/i386/ret-thunk-10.c: Likewise. * gcc.target/i386/ret-thunk-11.c: Likewise. * gcc.target/i386/ret-thunk-12.c: Likewise. * gcc.target/i386/ret-thunk-13.c: Likewise. * gcc.target/i386/ret-thunk-14.c: Likewise. * gcc.target/i386/ret-thunk-15.c: Likewise. * gcc.target/i386/ret-thunk-9.c: Likewise. --- gcc/config/i386/i386.c | 20 gcc/doc/extend.texi | 2 +- gcc/testsuite/gcc.target/i386/indirect-thunk-1.c | 2 +- gcc/testsuite/gcc.target/i386/indirect-thunk-2.c | 2 +- gcc/testsuite/gcc.target/i386/indirect-thunk-3.c | 2 +- gcc/testsuite/gcc.target/i386/indirect-thunk-4.c | 2 +- gcc/testsuite/gcc.target/i386/indirect-thunk-7.c | 2 +- .../gcc.target/i386/indirect-thunk-attr-1.c | 2 +- .../gcc.target/i386/indirect-thunk-attr-2.c | 2 +- .../gcc.target/i386/indirect-thunk-attr-5.c | 2 +- .../gcc.target/i386/indirect-thunk-attr-6.c | 2 +- .../gcc.target/i386/indirect-thunk-attr-7.c | 2 +- .../gcc.target/i386/indirect-thunk-extern-1.c| 2 +- .../gcc.target/i386/indirect-thunk-extern-2.c| 2 +- .../gcc.target/i386/indirect-thunk-extern-3.c| 2 +- .../gcc.target/i386/indirect-thunk-extern-4.c| 2 +- .../gcc.target/i386/indirect-thunk-extern-7.c| 2 +- .../gcc.target/i386/indirect-thunk-register-1.c | 2 +- .../gcc.target/i386/indirect-thunk-register-3.c | 2 +- .../gcc.target/i386/indirect-thunk-register-4.c | 3 +-- gcc/testsuite/gcc.target/i386/ret-thunk-10.c | 4 ++-- gcc/testsuite/gcc.target/i386/ret-thunk-11.c | 4 ++-- gcc/testsuite/gcc.target/i386/ret-thunk-12.c | 4 ++-- gcc/testsuite/gcc.target/i386/ret-thunk-13.c | 2 +- gcc/testsuite/gcc.target/i386/ret-thunk-14.c | 2 +- gcc/testsuite/gcc.target/i386/ret-thunk-15.c | 2 +- gcc/testsuite/gcc.target/i386/ret-thunk-9.c | 2 +- 27 files changed, 37 insertions(+), 42 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 5e4f845a1bd..890bd701cd1 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -10787,15 +10787,8 @@ indirect_thunk_name (char name[32], int regno, bool need_bnd_p, { const char *bnd = need_bnd_p ? "_bnd" : ""; if (regno >= 0) - { - const char *reg_prefix; - if (LEGACY_INT_REGNO_P (regno)) - reg_prefix = TARGET_64BIT ? "r" : "e"; - else - reg_prefix = "";
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, 2018-01-14 at 14:12 -0800, H.J. Lu wrote: > Please use this GCC patch instead. Building now; thanks. This is the kernel patch I'll test as soon as the compiler is done. It's slightly less horrid than the "clever" one I sent out earlier, but does still end up needing those _ASM_AX etc. macros in *addition* to the bare "ax" that goes in the symbol names. I'm not convinced we want to do this, but I'll defer to Linus. From 755f50731a99b0ce0890e478e6a2d6ebd647da15 Mon Sep 17 00:00:00 2001 From: David Woodhouse Date: Sun, 14 Jan 2018 22:21:02 + Subject: [PATCH] x86/retpoline: Switch thunk names to match final GCC patches At the last minute, they were switched from __x86_indirect_thunk_rax to __x86_indirect_thunk_ax without the 'r' or 'e' on the register name. This is not entirely an improvement, IMO. Signed-off-by: David Woodhouse --- arch/x86/include/asm/asm-prototypes.h | 24 ++-- arch/x86/lib/retpoline.S | 41 +-- 2 files changed, 31 insertions(+), 34 deletions(-) diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h index 0927cdc4f946..df80478fb682 100644 --- a/arch/x86/include/asm/asm-prototypes.h +++ b/arch/x86/include/asm/asm-prototypes.h @@ -18,19 +18,7 @@ extern void cmpxchg8b_emu(void); #endif #ifdef CONFIG_RETPOLINE -#ifdef CONFIG_X86_32 -#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## reg(void); -#else -#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## reg(void); -INDIRECT_THUNK(8) -INDIRECT_THUNK(9) -INDIRECT_THUNK(10) -INDIRECT_THUNK(11) -INDIRECT_THUNK(12) -INDIRECT_THUNK(13) -INDIRECT_THUNK(14) -INDIRECT_THUNK(15) -#endif +#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_ ## reg(void); INDIRECT_THUNK(ax) INDIRECT_THUNK(bx) INDIRECT_THUNK(cx) @@ -39,4 +27,14 @@ INDIRECT_THUNK(si) INDIRECT_THUNK(di) INDIRECT_THUNK(bp) INDIRECT_THUNK(sp) +#ifdef CONFIG_64BIT +INDIRECT_THUNK(r8) +INDIRECT_THUNK(r9) +INDIRECT_THUNK(r10) +INDIRECT_THUNK(r11) +INDIRECT_THUNK(r12) +INDIRECT_THUNK(r13) +INDIRECT_THUNK(r14) +INDIRECT_THUNK(r15) +#endif /* CONFIG_64BIT */ #endif /* CONFIG_RETPOLINE */ diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S index cb45c6cb465f..7da2c9035836 100644 --- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -8,14 +8,14 @@ #include #include -.macro THUNK reg - .section .text.__x86.indirect_thunk.\reg +.macro THUNK reg suffix + .section .text.__x86.indirect_thunk.\suffix -ENTRY(__x86_indirect_thunk_\reg) +ENTRY(__x86_indirect_thunk_\suffix) CFI_STARTPROC JMP_NOSPEC %\reg CFI_ENDPROC -ENDPROC(__x86_indirect_thunk_\reg) +ENDPROC(__x86_indirect_thunk_\suffix) .endm /* @@ -26,23 +26,22 @@ ENDPROC(__x86_indirect_thunk_\reg) * the simple and nasty way... */ #define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg) -#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg) +#define GENERATE_THUNK(reg, suffix) THUNK reg suffix; EXPORT_THUNK(suffix) -GENERATE_THUNK(_ASM_AX) -GENERATE_THUNK(_ASM_BX) -GENERATE_THUNK(_ASM_CX) -GENERATE_THUNK(_ASM_DX) -GENERATE_THUNK(_ASM_SI) -GENERATE_THUNK(_ASM_DI) -GENERATE_THUNK(_ASM_BP) -GENERATE_THUNK(_ASM_SP) +GENERATE_THUNK(_ASM_AX, ax) +GENERATE_THUNK(_ASM_BX, bx) +GENERATE_THUNK(_ASM_CX, cx) +GENERATE_THUNK(_ASM_DX, dx) +GENERATE_THUNK(_ASM_SI, si) +GENERATE_THUNK(_ASM_DI, di) +GENERATE_THUNK(_ASM_BP, bp) #ifdef CONFIG_64BIT -GENERATE_THUNK(r8) -GENERATE_THUNK(r9) -GENERATE_THUNK(r10) -GENERATE_THUNK(r11) -GENERATE_THUNK(r12) -GENERATE_THUNK(r13) -GENERATE_THUNK(r14) -GENERATE_THUNK(r15) +GENERATE_THUNK(r8, r8) +GENERATE_THUNK(r9, r9) +GENERATE_THUNK(r10, r10) +GENERATE_THUNK(r11, r11) +GENERATE_THUNK(r12, r12) +GENERATE_THUNK(r13, r13) +GENERATE_THUNK(r14, r14) +GENERATE_THUNK(r15, r15) #endif -- 2.14.3 smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 2:39 PM, David Woodhouse wrote: > > I'm not convinced we want to do this, but I'll defer to Linus. Well, I guess we have no choice, if gcc ends up using the stupid names. And yes, apparently this just made our macros worse instead of cleaning anything up. Oh well. I do have one (possible) solution: just export both names. So you'd export __x86_indirect_thunk_ax __x86_indirect_thunk_rax .. __x86_indirect_thunk_8 __x86_indirect_thunk_r8 as symbols (same code, obviously), and then (a) the macros would be simpler (b) it just happens to work with even the old gcc patch But at this point I don't really care. Linus
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, 2018-01-14 at 15:02 -0800, Linus Torvalds wrote: > On Sun, Jan 14, 2018 at 2:39 PM, David Woodhouse > wrote: > > > > I'm not convinced we want to do this, but I'll defer to Linus. > > Well, I guess we have no choice, if gcc ends up using the stupid > names. At this point, they'll do what we ask. See Uros's earlier message: > ... the difference is only a couple of lines in the > compiler source which we also can live with. But please discuss my > proposal also in the kernel community, and weight the benefits and > drawbacks of each approach before the final decision. > > Please pass the final decision to gcc community, and we'll implement it. I think we should stick with what we have now, with the names of the thunks actually being the *full* name of the register (rax, eax, etc.) that they use. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 3:09 PM, David Woodhouse wrote: > > I think we should stick with what we have now, with the names of the > thunks actually being the *full* name of the register (rax, eax, etc.) > that they use. It that works for the gcc people, then yes, I agree. The mixed "sometimes full, sometimes not" approach just seems broken. Linus
Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.
On 10 January 2018 15:59:28 CET, "Martin Liška" wrote: >On 01/10/2018 02:13 PM, Richard Biener wrote: >> On Tue, Jan 9, 2018 at 7:29 PM, Jeff Law wrote: >>> On 01/09/2018 07:43 AM, Martin Liška wrote: On 09/20/2017 05:00 PM, Jeff Law wrote: > On 09/20/2017 01:24 AM, Martin Liška wrote: > >> >> Hello. >> >> Thank you Jeff for very verbose explanation what's happening. I'm >planning to do >> follow-up of this patch that will include clustering for >bit-tests and jump tables. >> Maybe that will make aforementioned issues even more difficult, >but we'll see. > FWIW, the DOM changes to simplify the conditionals seem to help >both > cases, trigger reasonably consistently in a bootstrap and for some > subset of the triggers actually result in transformations that >allow > other passes to do a better job in the common (-O2) case. So my > inclination is to polish them a bit further get them on the trunk. > > My recommendation is to ignore the two regressions for now and >focus on > the cleanups you're trying to do. > > jeff > Hello. Some time ago I've decided that I'll make patch submission of >switch clustering in next stage1. However, this patch can be applied as is in this >stage3. Would it be possible or is it too late? >>> I'll let Richi make the call here. FWIW, the DOM changes to avoid >the >>> two missed-optimization regressions you ran into are on the trunk, >so >>> that's no longer a blocking issue. >> >> If you are fine with waiting then please wait ;) > >Yep, it's not urgent. Can you please post CSiBE numbers? Ideally throwing in gcc-3.4.6 numbers too? thanks,
[PATCH] x86/retpoline: Switch thunk names to match final GCC patches
At the last minute, they were switched from __x86_indirect_thunk_rax to __x86_indirect_thunk_ax without the 'r' or 'e' on the register name. Except for the _r[89..] versions, obviously. This is not entirely an improvement, IMO. Reluctantly-signed-off-by: David Woodhouse --- I think we *shouldn't* do this. Uros said we could look at it and make a decision, and GCC would implement what we decide. Up to Linus. I'm sending this because I've built the compiler with the proposed changes and tested it, and it's quarter past Monday and I'm done. Not because I really want it. arch/x86/entry/entry_64.S | 2 +- arch/x86/include/asm/asm-prototypes.h | 24 ++-- arch/x86/lib/retpoline.S | 41 +-- 3 files changed, 32 insertions(+), 35 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 59874bc..e393163 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -269,7 +269,7 @@ entry_SYSCALL_64_fastpath: */ #ifdef CONFIG_RETPOLINE movqsys_call_table(, %rax, 8), %rax - call__x86_indirect_thunk_rax + call__x86_indirect_thunk_ax #else call*sys_call_table(, %rax, 8) #endif diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h index 0927cdc..df80478 100644 --- a/arch/x86/include/asm/asm-prototypes.h +++ b/arch/x86/include/asm/asm-prototypes.h @@ -18,19 +18,7 @@ extern void cmpxchg8b_emu(void); #endif #ifdef CONFIG_RETPOLINE -#ifdef CONFIG_X86_32 -#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## reg(void); -#else -#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## reg(void); -INDIRECT_THUNK(8) -INDIRECT_THUNK(9) -INDIRECT_THUNK(10) -INDIRECT_THUNK(11) -INDIRECT_THUNK(12) -INDIRECT_THUNK(13) -INDIRECT_THUNK(14) -INDIRECT_THUNK(15) -#endif +#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_ ## reg(void); INDIRECT_THUNK(ax) INDIRECT_THUNK(bx) INDIRECT_THUNK(cx) @@ -39,4 +27,14 @@ INDIRECT_THUNK(si) INDIRECT_THUNK(di) INDIRECT_THUNK(bp) INDIRECT_THUNK(sp) +#ifdef CONFIG_64BIT +INDIRECT_THUNK(r8) +INDIRECT_THUNK(r9) +INDIRECT_THUNK(r10) +INDIRECT_THUNK(r11) +INDIRECT_THUNK(r12) +INDIRECT_THUNK(r13) +INDIRECT_THUNK(r14) +INDIRECT_THUNK(r15) +#endif /* CONFIG_64BIT */ #endif /* CONFIG_RETPOLINE */ diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S index cb45c6c..7da2c90 100644 --- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -8,14 +8,14 @@ #include #include -.macro THUNK reg - .section .text.__x86.indirect_thunk.\reg +.macro THUNK reg suffix + .section .text.__x86.indirect_thunk.\suffix -ENTRY(__x86_indirect_thunk_\reg) +ENTRY(__x86_indirect_thunk_\suffix) CFI_STARTPROC JMP_NOSPEC %\reg CFI_ENDPROC -ENDPROC(__x86_indirect_thunk_\reg) +ENDPROC(__x86_indirect_thunk_\suffix) .endm /* @@ -26,23 +26,22 @@ ENDPROC(__x86_indirect_thunk_\reg) * the simple and nasty way... */ #define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg) -#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg) +#define GENERATE_THUNK(reg, suffix) THUNK reg suffix; EXPORT_THUNK(suffix) -GENERATE_THUNK(_ASM_AX) -GENERATE_THUNK(_ASM_BX) -GENERATE_THUNK(_ASM_CX) -GENERATE_THUNK(_ASM_DX) -GENERATE_THUNK(_ASM_SI) -GENERATE_THUNK(_ASM_DI) -GENERATE_THUNK(_ASM_BP) -GENERATE_THUNK(_ASM_SP) +GENERATE_THUNK(_ASM_AX, ax) +GENERATE_THUNK(_ASM_BX, bx) +GENERATE_THUNK(_ASM_CX, cx) +GENERATE_THUNK(_ASM_DX, dx) +GENERATE_THUNK(_ASM_SI, si) +GENERATE_THUNK(_ASM_DI, di) +GENERATE_THUNK(_ASM_BP, bp) #ifdef CONFIG_64BIT -GENERATE_THUNK(r8) -GENERATE_THUNK(r9) -GENERATE_THUNK(r10) -GENERATE_THUNK(r11) -GENERATE_THUNK(r12) -GENERATE_THUNK(r13) -GENERATE_THUNK(r14) -GENERATE_THUNK(r15) +GENERATE_THUNK(r8, r8) +GENERATE_THUNK(r9, r9) +GENERATE_THUNK(r10, r10) +GENERATE_THUNK(r11, r11) +GENERATE_THUNK(r12, r12) +GENERATE_THUNK(r13, r13) +GENERATE_THUNK(r14, r14) +GENERATE_THUNK(r15, r15) #endif -- 2.7.4
Re: [PATCH] x86/retpoline: Switch thunk names to match final GCC patches
On Sun, Jan 14, 2018 at 3:23 PM, David Woodhouse wrote: > I think we *shouldn't* do this. Uros said we could look at it and make > a decision, and GCC would implement what we decide. Up to Linus. Regardless of whether we end up having to do this, I'm not doing rc8 with it, and let's hope we can just skip it entirely. It seems silly to have the 'r' for the r8-r15 case, but not the legacy registers. Linus
[PATCH] handle multiple flexible array members (PR 83588)
The attached patch fixes PR c++/83588 - struct with two flexible arrays causes an internal compiler error. The ICE is caused by the same assertion in varasm.c that has led to other similar reports in the past: /* Given a non-empty initialization, this field had better be last. Given a flexible array member, the next field on the chain is a TYPE_DECL of the enclosing struct. */ const_tree next = DECL_CHAIN (local->field); gcc_assert (!fieldsize || !next || TREE_CODE (next) != FIELD_DECL); The fix is simply to also detect when a class defines more than one flexible array member and treat the subsequent array as any other member, and reject such class definitions to make sure they never reach the assertion above. Martin PR c++/83588 - struct with two flexible arrays causes an internal compiler error gcc/cp/ChangeLog: PR c++/83588 * class.c (find_flexarrays): Make a record of multiple flexible array members. gcc/testsuite/ChangeLog: PR c++/83588 * g++.dg/ext/flexary28.C: New test. Index: gcc/cp/class.c === --- gcc/cp/class.c (revision 256674) +++ gcc/cp/class.c (working copy) @@ -6566,14 +6566,17 @@ find_flexarrays (tree t, flexmems_t *fmem, bool ba /* Flexible array members have no upper bound. */ if (fmem->array) { - /* Replace the zero-length array if it's been stored and - reset the after pointer. */ if (TYPE_DOMAIN (TREE_TYPE (fmem->array))) { + /* Replace the zero-length array if it's been stored and + reset the after pointer. */ fmem->after[bool (pun)] = NULL_TREE; fmem->array = fld; fmem->enclosing = pstr; } + else if (!fmem->after[bool (pun)]) + /* Make a record of another flexible array member. */ + fmem->after[bool (pun)] = fld; } else { Index: gcc/testsuite/g++.dg/ext/flexary28.C === --- gcc/testsuite/g++.dg/ext/flexary28.C (nonexistent) +++ gcc/testsuite/g++.dg/ext/flexary28.C (working copy) @@ -0,0 +1,46 @@ +// PR c++/83588 - struct with two flexible arrays causes an internal compiler +// error +// { dg-do compile } +// { dg-options "-Wno-pedantic" } + +struct A { + int i; + int a[]; // { dg-error "flexible array member .A::a. not at end of .struct A." } + int b[]; +}; + +struct B { + int i; + int a[]; // { dg-error "flexible array member .B::a. not at end of .struct B." } + int j; + int b[][2]; +}; + +struct C { + int i; + struct { +int a[];// { dg-error "flexible array member .Ca. not at end of .struct C." } + }; + int b[]; +}; + +struct D { + int i; + struct { +int a[];// { dg-error "flexible array member .Da. not at end of .struct D." } + } b[]; + int c[]; +}; + +struct E { + int i; + int a[0]; + int b[]; // { dg-error "flexible array member .E::b. not at end of .struct E." } + int d[]; +}; + +struct F { + int i; + int a[]; // { dg-error "flexible array member .F::a. not at end of .struct F." } + int b[], c[], d[]; +};
Re: [PATCH 3/5] x86: Add -mindirect-branch-register
On Sun, Jan 14, 2018 at 1:23 PM, H.J. Lu wrote: > On Sun, Jan 14, 2018 at 10:52 AM, Uros Bizjak wrote: >> On Sun, Jan 14, 2018 at 7:08 PM, H.J. Lu wrote: >>> On Sun, Jan 14, 2018 at 9:51 AM, Uros Bizjak wrote: - (ior (and (not (match_test "TARGET_X32")) + (ior (and (not (match_test "TARGET_X32 + || ix86_indirect_branch_thunk_register")) (match_operand 0 "sibcall_memory_operand")) - (and (match_test "TARGET_X32 && Pmode == DImode") + (and (match_test "TARGET_X32 && Pmode == DImode + && !ix86_indirect_branch_thunk_register") (match_operand 0 "GOT_memory_operand" Is this patch just trying to disable the predicate when ix86_indirect_branch_thunk_register is set? Because this is what this convoluted logic does. >>> >>> Yes, we want to disable all indirect branch via memory with >>> -mindirect-branch-register, just like -mx32. We could do >>> >>> #idefine TARGET_INDIRECT_BRANCH_REGISTER \ >>> (TARGER_X32 || ix86_indirect_branch_thunk_register) >> >> Index: predicates.md >> === >> --- predicates.md (revision 25) >> +++ predicates.md (working copy) >> @@ -710,11 +710,10 @@ >>(ior (match_test "constant_call_address_operand >> (op, mode == VOIDmode ? mode : Pmode)") >> (match_operand 0 "call_register_no_elim_operand") >> - (ior (and (not (match_test "TARGET_X32 >> - || ix86_indirect_branch_thunk_register")) >> + (and (not (match_test "ix86_indirect_branch_thunk_register")) >> + (ior (and (not (match_test "TARGET_X32"))) >> (match_operand 0 "memory_operand")) >> - (and (match_test "TARGET_X32 && Pmode == DImode >> - && !ix86_indirect_branch_thunk_register") >> +(and (match_test "TARGET_X32 && Pmode == DImode") >> (match_operand 0 "GOT_memory_operand") >> >> or something like that. >> > > I am testing this patch. OK for trunk if there is no regression? > Here is the updated patch. Tested on i686 and x86-64. OK for trunk? -- H.J. From b76722e4050241d0b4c0a46bd12015d3973b118b Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sun, 14 Jan 2018 13:19:37 -0800 Subject: [PATCH] x86: Rewrite ix86_indirect_branch_thunk_register logic Rewrite ix86_indirect_branch_thunk_register logic with (and (not (match_test "ix86_indirect_branch_thunk_register")) (original condition before r256662)) * config/i386/predicates.md (constant_call_address_operand): Rewrite ix86_indirect_branch_thunk_register logic. (sibcall_insn_operand): Likewise. --- gcc/config/i386/predicates.md | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 5ae443231b8..23bf564f58c 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -710,24 +710,22 @@ (ior (match_test "constant_call_address_operand (op, mode == VOIDmode ? mode : Pmode)") (match_operand 0 "call_register_no_elim_operand") - (ior (and (not (match_test "TARGET_X32 - || ix86_indirect_branch_thunk_register")) - (match_operand 0 "memory_operand")) - (and (match_test "TARGET_X32 && Pmode == DImode - && !ix86_indirect_branch_thunk_register") - (match_operand 0 "GOT_memory_operand") + (and (not (match_test "ix86_indirect_branch_thunk_register")) + (ior (and (not (match_test "TARGET_X32")) + (match_operand 0 "memory_operand")) + (and (match_test "TARGET_X32 && Pmode == DImode") + (match_operand 0 "GOT_memory_operand")) ;; Similarly, but for tail calls, in which we cannot allow memory references. (define_special_predicate "sibcall_insn_operand" (ior (match_test "constant_call_address_operand (op, mode == VOIDmode ? mode : Pmode)") (match_operand 0 "register_no_elim_operand") - (ior (and (not (match_test "TARGET_X32 - || ix86_indirect_branch_thunk_register")) - (match_operand 0 "sibcall_memory_operand")) - (and (match_test "TARGET_X32 && Pmode == DImode - && !ix86_indirect_branch_thunk_register") - (match_operand 0 "GOT_memory_operand") + (and (not (match_test "ix86_indirect_branch_thunk_register")) + (ior (and (not (match_test "TARGET_X32")) + (match_operand 0 "sibcall_memory_operand")) + (and (match_test "TARGET_X32 && Pmode == DImode") + (match_operand 0 "GOT_memory_operand")) ;; Return true if OP is a 32-bit GOT symbol operand. (define_predicate "GOT32_symbol_operand" -- 2.14.3
RE: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
Hi > -Original Message- > From: H.J. Lu [mailto:hjl.to...@gmail.com] > Sent: Sunday, January 14, 2018 7:52 PM > To: Jan Hubicka > Cc: Kumar, Venkataramanan ; gcc- > patc...@gcc.gnu.org; Dharmakan, Rohit arul raj > ; Nagarajan, Muthu kumar raj > ; Uros Bizjak (ubiz...@gmail.com) > > Subject: Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre > > On Sun, Jan 14, 2018 at 6:20 AM, Jan Hubicka wrote: > >> > Hi HJ, > >> > > >> > > -Original Message- > >> > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > >> > > ow...@gcc.gnu.org] On Behalf Of H.J. Lu > >> > > Sent: Sunday, January 14, 2018 9:07 AM > >> > > To: gcc-patches@gcc.gnu.org > >> > > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre > >> > > > >> > > This set of patches for GCC 8 mitigates variant #2 of the > >> > > speculative execution vulnerabilities on x86 processors > >> > > identified by CVE-2017-5715, aka Spectre. They convert indirect > >> > > branches and function returns to call and return thunks to avoid > speculative execution via indirect call, jmp and ret. > >> > > > >> > > H.J. Lu (5): > >> > > x86: Add -mindirect-branch= > >> > > x86: Add -mfunction-return= > >> > > x86: Add -mindirect-branch-register > >> > > x86: Add 'V' register operand modifier > >> > > x86: Disallow -mindirect-branch=/-mfunction-return= with > >> > > -mcmodel=large > >> > > >> > Current set of patches don't seem to have any option to generate > "lfence" as the loop filler in "retpoline", which is required by AMD. > >> > Can you please clarify the plan. We would like to get this checked-in GCC > 8. > >> > >> Since thunks are output as strings, it is easy to add the option on > >> the top of patch #1 of the series. I do not fully understand the > >> reason for choosing pause over lfence for Intel, but if we need to do > >> both, we need to have command line option (and possibly attribute). > >> What would be reasonable name for it? > > > > I forgot there is -mindirect-branch-loop for that in the original patchset. > > So for now we should be happy with having both lfence and pause in > > there or do we still need it? > > > > I suggest we leave it out for the time being. Yes as of now having both "lfence" and "pause" is Ok. Hope we can add "- indirect-branch-loop" option later if required. Regards, Venkat, > > > -- > H.J.
Re: [PATCH] MicroBlaze resolve section change issues for ident output
On 01/11/2018 06:54 AM, Nathan Rossi wrote: > 2018-01-11 Nathan Rossi > > PR target/83013 > * config/microblaze/microblaze.c (microblaze_asm_output_ident): > Use .pushsection/.popsection THanks. Installed on the trunk. jeff
Re: [PATCH 1/2] Fix unstable sort
On 01/12/2018 01:58 PM, li...@coryfields.com wrote: > From: Cory Fields > > 2018-01-12 Cory Fields >* tree-ssa-loop-im.c (sort_bbs_in_loop_postorder_cmp): stabilize sort Thanks. Installed onto the trunk. jeff
Re: [PATCH 2/2] Fix unstable sort
On 01/12/2018 01:58 PM, li...@coryfields.com wrote: > From: Cory Fields > > 2018-01-12 Cory Fields >* tree-ira.c (allocno_hard_regs_compare): stabilize sort Thanks. I fixed the ChangeLog entry and installed hte patch on the trunk. jeff
Re: [PATCH] suppress -Wstringop-overflow when no-warning is set (PR 83508)
On 01/11/2018 04:49 PM, Martin Sebor wrote: > On 01/11/2018 04:24 PM, Jeff Law wrote: >> On 01/10/2018 01:26 PM, Martin Sebor wrote: >>> To avoid issuing duplicate warnings for the same function call >>> in the source code the -Wrestrict warning code makes sure >>> the no-warning bit is propagated between trees and GIMPLE and >>> tested before issuing a warning. But the warning also detects >>> some of the same problems as -Wstringop-overflow, and that >>> warning was not updated to pay attention to the no-warning bit. >>> This can in turn lead to two warnings for what boils down to >>> the same bug. The warnings can be confusing when the first >>> one references the function as it appears in the source code >>> and the second one the one it was transformed to by GCC after >>> the first warning was issued. >>> >>> The attached patch corrects this oversight by having >>> the buffer overflow checker test the no-warning bit and skip >>> issuing a diagnostic. (The function that does the overflow >>> checking still runs so that it can continue to indicate to its >>> callers whether an overflow has been detected.) >>> >>> Bootstrap on x86_64-linux in progress. >>> >>> Martin >>> >>> gcc-83508.diff >>> >>> >>> PR other/83508 - c-c++-common/Wrestrict.c fails since r255836 >>> >>> gcc/testsuite/ChangeLog: >>> >>> PR other/83508 >>> * gcc.dg/Wstringop-overflow-2.c: New test. >>> >>> gcc/ChangeLog: >>> >>> PR other/83508 >>> * builtins.c (check_access): Avoid warning when the no-warning bit >>> is set. >> Would it be better to check for TREE_NO_WARNING at the very start of >> check_access rather than sprinkling it at various sites later in the >> code? > > The function is called from a couple of places to check that > there is no overflow and avoid expansion (we discussed this > not too long ago when I enhanced compute_objsize() to handle > ranges). I didn't want to change that (I mention it in the > last sentence above.) OK. Just wanted to raise it as a possibility. > > Looking forward, I would like to see these middle-end checkers > used to avoid making certain kinds of transformations that we > know are dangerous (we discussed emitting __builtin_trap or > __builtin_unreachable) for some such cases. I realize they > aren't suitable for it quite it yet and will need work to make > them so (assuming we can agree on that approach), but I mention > it to explain what I was thinking when I sprinkled the tests > in check_access() the way I did. Understood. > > As an aside, even though I think GCC should issue only one > warning per function, I'm not sure that -Wrestrict should > trump -Wstringop-overflow. It seems like it should be the > other way around because the latter is more severe. That's > also how it worked before -Wrestrict was moved into its own > pass. To make it work like that again, -Wstringop-overflow > would need to run either before -Wrestrict or at the same > time. Maybe it should be moved in GCC 9. I think we're OK for gcc-8. We can revisit ordering and such for gcc-9. FWIW, I went ahead and committed your patch to the trunk. jeff
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, 2018-01-14 at 21:52 +0100, Uros Bizjak wrote: > > Well, you did say that these are strange times ;) > > From the user perspective, it would be more convenient to use the > thunk names that are the same for 32bit and 64bit targets. If we > ignore this fact, the difference is only a couple of lines in the > compiler source which we also can live with. But please discuss my > proposal also in the kernel community, and weight the benefits and > drawbacks of each approach before the final decision. > > Please pass the final decision to gcc community, and we'll implement > it. I think you watched this happen, but just to be explicitly clear: We weighed the benefits and tested this, and we concluded that we don't want it. Let's stick with e.g. __x86_indirect_thunk_rax please. Thank you for being flexible. smime.p7s Description: S/MIME cryptographic signature