inline_small_functions speedup

2018-01-14 Thread Richard Sandiford
After inlining A into B, inline_small_functions updates the information
for (most) callees and callers of the new B:

  update_callee_keys (&edge_heap, where, updated_nodes);
  [...]
  /* Our profitability metric can depend on local properties
 such as number of inlinable calls and size of the function body.
 After inlining these properties might change for the function we
 inlined into (since it's body size changed) and for the functions
 called by function we inlined (since number of it inlinable callers
 might change).  */
  update_caller_keys (&edge_heap, where, updated_nodes, NULL);

These functions in turn call can_inline_edge_p for most of the associated
edges:

if (can_inline_edge_p (edge, false)
&& want_inline_small_function_p (edge, false))
  update_edge_key (heap, edge);

can_inline_edge_p indirectly calls estimate_calls_size_and_time
on the caller node, which seems to recursively process all callee
edges rooted at the node.  It looks from this like algorithm can be
at least quadratic in the worst case.

Maybe there's something we can do to make can_inline_edge_p cheaper, but
since neither of these two calls is responsible for reporting an inline
failure reason, it seems cheaper to test want_inline_small_function_p
first, so that we don't calculate an estimate for something that we
already know isn't a "small function".  I think the only change
needed to make that work is to check for CIF_FINAL_ERROR in
want_inline_small_function_p; at the moment we rely on can_inline_edge_p
to make that check.

This cuts the time to build optabs.ii by over 4% with an
--enable-checking=release compiler on x86_64-linux-gnu.  I've seen more
dramatic wins on aarch64-linux-gnu due to the NUM_POLY_INT_COEFFS==2
thing.  The patch doesn't affect the output code.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2018-01-13  Richard Sandiford  

gcc/
* ipa-inline.c (want_inline_small_function_p): Return false if
inlining has already failed with CIF_FINAL_ERROR.
(update_caller_keys): Call want_inline_small_function_p before
can_inline_edge_p.
(update_callee_keys): Likewise.

Index: gcc/ipa-inline.c
===
--- gcc/ipa-inline.c2018-01-09 14:29:35.151550415 +
+++ gcc/ipa-inline.c2018-01-14 08:43:35.653122186 +
@@ -706,7 +706,11 @@ want_inline_small_function_p (struct cgr
   bool want_inline = true;
   struct cgraph_node *callee = e->callee->ultimate_alias_target ();
 
-  if (DECL_DISREGARD_INLINE_LIMITS (callee->decl))
+  /* Allow this function to be called before can_inline_edge_p,
+ since it's usually cheaper.  */
+  if (cgraph_inline_failed_type (e->inline_failed) == CIF_FINAL_ERROR)
+want_inline = false;
+  else if (DECL_DISREGARD_INLINE_LIMITS (callee->decl))
 ;
   else if (!DECL_DECLARED_INLINE_P (callee->decl)
   && !opt_for_fn (e->caller->decl, flag_inline_small_functions))
@@ -1312,8 +1316,8 @@ update_caller_keys (edge_heap_t *heap, s
 if (!check_inlinablity_for
|| check_inlinablity_for == edge)
  {
-   if (can_inline_edge_p (edge, false)
-   && want_inline_small_function_p (edge, false))
+   if (want_inline_small_function_p (edge, false)
+   && can_inline_edge_p (edge, false))
  update_edge_key (heap, edge);
else if (edge->aux)
  {
@@ -1356,8 +1360,8 @@ update_callee_keys (edge_heap_t *heap, s
&& avail >= AVAIL_AVAILABLE
&& !bitmap_bit_p (updated_nodes, callee->uid))
  {
-   if (can_inline_edge_p (e, false)
-   && want_inline_small_function_p (e, false))
+   if (want_inline_small_function_p (e, false)
+   && can_inline_edge_p (e, false))
  update_edge_key (heap, e);
else if (e->aux)
  {


RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=

2018-01-14 Thread Kumar, Venkataramanan
Hi Arjan, 

> -Original Message-
> From: Van De Ven, Arjan [mailto:arjan.van.de@intel.com]
> Sent: Saturday, January 13, 2018 10:16 PM
> To: David Woodhouse ; Kumar, Venkataramanan
> ; H.J. Lu ; Jeff
> Law ; Paul Turner ; Mallick, Asit K
> 
> Cc: Nagarajan, Muthu kumar raj ;
> GCC Patches ; Martin Jambor
> ; Uros Bizjak (ubiz...@gmail.com)
> ; Jan Hubicka ; Dharmakan, Rohit arul raj
> 
> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> 
> > > If everyone is ok, my suggestion is to use  "lfence" as the default
> > > loop filler for retpoline.
> 
> can we do BOTH a pause and lfence.
> (that way on cpu's where pause is the power stop, it works, and on cpus
> where it's a fallthrough (AMD) it goes to the lfence)
> 

I checked with our Architect. Having just "pause" is the concern. 
It should also be fine for AMD to use "pause" followed by "lfence" in the loop 
of retpoline.

Regards,
Venkat.


Re: PR81703 and Martin's fix for PR83501

2018-01-14 Thread Prathamesh Kulkarni
On 12 January 2018 at 23:25, Jakub Jelinek  wrote:
> On Fri, Jan 12, 2018 at 10:38:39AM -0700, Jeff Law wrote:
>> >>> Thanks for pointing it out.  I see it there as well with
>> >>> Prathamesh's test case, though not with the test case in
>> >>> bug 83543.  It is the same root cause in both.  I agree
>> >>> that enhancing the strlen pass to handle this case would
>> >>> be preferable to just xfailing the test.  I'm just not
>> >>> sure it's possible before stage 3 closes.  If not, I'll
>> >>> work on it in GCC 9.  Although the details are target-
>> >>> specific, the limitation affects all targets and so
>> >>> having a solution will benefit all all of them.
>> >> Indeed, however for now I am not sure what would be the best approach ?
>> >> If the test-case starts failing for many targets, not sure if XFAIL
>> >> would be the right choice.
>> >> Should I just restrict it to x86_64 target for now ?
>> >
>> > That sounds like a good approach in the interim, until we have
>> > a general solution.  It will avoid having to maintain a list
>> > of targets where it's known to fail.
>> Agreed and pre-approved.
>
> Just please test with
> RUNTESTFLAGS='--target_board=unix\{-m64,-m32,-m32/-mno-sse\} 
> dg.exp=strlenopt-*.c'
> and restrict to { i?86-*-* x86_64-*-* }, e.g. on Solaris it is i?86-*-*
> canonical target, even when it supports -m64 multilib.
> If you need x86_64 64-bit, that would be { { i?86-*-* x86_64-*-* } && lp64 }
> or ! ia32, depending on if -mx32 works or not.
Thanks, committed in r256657 after verifying that -m32 works.

Thanks,
Prathamesh
>
> Jakub


Re: [PATCH] Add new verification for profile-count.h.

2018-01-14 Thread Tom de Vries

On 01/12/2018 09:44 AM, Jan Hubicka wrote:

Hi.

Following patch adds new sanitization checks for profile_quality.
Problem is that zero initialization of a struct with profile_count will
lead to an invalid counter. This can help to catch them.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?

OK,
thanks!
Honza

Martin



>From edec114cf1dd29bb571855a80e1b45ae040da200 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 10 Jan 2018 14:46:08 +0100
Subject: [PATCH] Add new verification for profile-count.h.

gcc/ChangeLog:

2018-01-12  Martin Liska  

* profile-count.h (enum profile_quality): Use 0 as invalid
enum value of profile_quality.
---
  gcc/profile-count.h | 16 ++--
  1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index 3c5f720ee81..7a43917ebbc 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -30,27 +30,27 @@ enum profile_quality {
   or may not match reality.  It is local to function and can not be 
compared
   inter-procedurally.  Never used by probabilities (they are always local).
 */
-  profile_guessed_local = 0,
+  profile_guessed_local = 1,
/* Profile was read by feedback and was 0, we used local heuristics to guess
   better.  This is the case of functions not run in profile fedback.
   Never used by probabilities.  */
-  profile_guessed_global0 = 1,
+  profile_guessed_global0 = 2,
  
/* Same as profile_guessed_global0 but global count is adjusted 0.  */

-  profile_guessed_global0adjusted = 2,
+  profile_guessed_global0adjusted = 3,
  
/* Profile is based on static branch prediction heuristics.  It may or may

   not reflect the reality but it can be compared interprocedurally
   (for example, we inlined function w/o profile feedback into function
with feedback and propagated from that).
   Never used by probablities.  */
-  profile_guessed = 3,
+  profile_guessed = 4,
/* Profile was determined by autofdo.  */
-  profile_afdo = 4,
+  profile_afdo = 5,
/* Profile was originally based on feedback but it was adjusted
   by code duplicating optimization.  It may not precisely reflect the
   particular code path.  */
-  profile_adjusted = 5,
+  profile_adjusted = 6,
/* Profile was read from profile feedback or determined by accurate static
   method.  */
profile_precise = 7
@@ -505,6 +505,8 @@ public:
/* Return false if profile_probability is bogus.  */
bool verify () const
  {
+  gcc_checking_assert (profile_guessed_local <= m_quality
+  && m_quality <= profile_precise);


Hi,

FYI, in a no-bootstrap build, I'm seeing a lot of new warnings like this:
...
../../src/gcc/profile-count.h: In member function ‘bool 
profile_probability::verify() const’:
../../src/gcc/profile-count.h:509:20: warning: comparison is always true 
due to limited range of data type [-Wtype-limits]

   && m_quality <= profile_precise);
^
../../src/gcc/system.h:742:14: note: in definition of macro ‘gcc_assert’
((void)(!(EXPR) ? fancy_abort (__FILE__, __LINE__, __FUNCTION__), 0 
: 0))

  ^
../../src/gcc/profile-count.h:508:7: note: in expansion of macro 
‘gcc_checking_assert’

   gcc_checking_assert (profile_guessed_local <= m_quality
...

Indeed, profile_precise is 7 and m_quality is a 3 bits wide bitfield.

Thanks,
- Tom


if (m_val == uninitialized_probability)
return m_quality == profile_guessed;
else if (m_quality < profile_guessed)
@@ -784,6 +786,8 @@ public:
/* Return false if profile_count is bogus.  */
bool verify () const
  {
+  gcc_checking_assert (profile_guessed_local <= m_quality
+  && m_quality <= profile_precise);
return m_val != uninitialized_count || m_quality == 
profile_guessed_local;
  }
  
--

2.14.3







Re: [PATCH] document -Wclass-memaccess suppression by casting (PR 81327)

2018-01-14 Thread Jakub Jelinek
On Sat, Jan 13, 2018 at 04:14:38PM -0700, Martin Sebor wrote:
> -The @option{-Wclass-memaccess} option is enabled by @option{-Wall}.
> +The @option{-Wclass-memaccess} option is enabled by @option{-Wall}.  Casting

Perhaps "Explicitly casting" instead?  The implicit cast doesn't suppress it
and occurs whenever there isn't an explicit cast.

> +the pointer to the class object to @code{void *} or to a type that can be
> +safely accessed by the raw memory function suppresses the warning.
>  
>  @item -Wnon-virtual-dtor @r{(C++ and Objective-C++ only)}
>  @opindex Wnon-virtual-dtor


Jakub


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Jan Hubicka
> Hi HJ, 
> 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of H.J. Lu
> > Sent: Sunday, January 14, 2018 9:07 AM
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
> > 
> > This set of patches for GCC 8 mitigates variant #2 of the speculative
> > execution vulnerabilities on x86 processors identified by CVE-2017-5715, aka
> > Spectre.  They convert indirect branches and function returns to call and
> > return thunks to avoid speculative execution via indirect call, jmp and ret.
> > 
> > H.J. Lu (5):
> >   x86: Add -mindirect-branch=
> >   x86: Add -mfunction-return=
> >   x86: Add -mindirect-branch-register
> >   x86: Add 'V' register operand modifier
> >   x86: Disallow -mindirect-branch=/-mfunction-return= with
> > -mcmodel=large
> 
> Current set of patches don't seem to have any option to generate "lfence" as 
> the loop filler in "retpoline",  which is required by AMD.
> Can you please clarify the plan. We would like to get this checked-in GCC 8.  
>  

Since thunks are output as strings, it is easy to add the option
on the top of patch #1 of the series.  I do not fully understand
the reason for choosing pause over lfence for Intel, but if we need
to do both, we need to have command line option (and possibly attribute).
What would be reasonable name for it?

Honza


Re: [PATCH] PR 78534, 83704 Large character lengths

2018-01-14 Thread Janne Blomqvist
On Sat, Jan 13, 2018 at 7:35 PM, Dominique d'Humières
 wrote:
> I have finally bootstrapped gfortran with the two patches applied and the 
> spurious warnings with -Wall are now gone (limited testing), but I see a 
> regression for gfortran.dg/string_1.f90 due to an additional error
>
> /opt/gcc/_clean/gcc/testsuite/gfortran.dg/string_1.f90:13:15:
>
>print *, len(s)
>1
> Error: Result of LEN overflows its kind at (1)
>
> No idea why this was not emitted without the patch.

Thanks for testing, I'll look into this. Was it on a 32 or 64-bit target?

(I suspect the issue could be that the 32-bit truncated value fits in
[0,INT_MAX] and hence no problem is detected, but I'll have to look
into it further to say something definite).

>> Le 12 janv. 2018 à 21:43, Joseph Myers  a écrit :
>>
>> On Fri, 12 Jan 2018, Janne Blomqvist wrote:
>>
>>> (I don't know why HOST_WIDE_INT_PRINT_DEC doesn't work with the
>>> diagnostics machinery on darwin, but IMHO at this point it's too late
>>
>> HOST_WIDE_INT_PRINT_DEC is a host-specific printf format; for example, it
>> might use %I64d on Windows host or %lld elsewhere.  The formats used in
>> diagnostic functions must *not* be host-specific or use macros anywhere in
>> them at all, because that won't work with translation.  Diagnostic
>> functions have their own host-independent formats; %wd is the correct
>> format for HOST_WIDE_INT in diagnostic functions.
>>
>> --
>> Joseph S. Myers
>> jos...@codesourcery.com
>
> I have tried to use %wp, but it didn’t work:
>
> ../../work/gcc/fortran/decl.c: In function 'void 
> gfc_set_constant_character_len(gfc_charlen_t, gfc_expr*, gfc_charlen_t)':
> ../../work/gcc/fortran/decl.c:1567:5: error: unknown conversion type 
> character 'w' in format [-Werror=format=]
>  "CHARACTER expression at %L is being truncated "
>  ^~~~
>  "(%wd/%wd)", &expr->where,
>  ~~~
> ../../work/gcc/fortran/decl.c:1567:5: error: unknown conversion type 
> character 'w' in format [-Werror=format=]
> ../../work/gcc/fortran/decl.c:1567:5: error: too many arguments for format 
> [-Werror=format-extra-args]
> ../../work/gcc/fortran/decl.c:1575:17: error: unknown conversion type 
> character 'w' in format [-Werror=format=]
>   gfc_error_now ("The CHARACTER elements of the array constructor "
>  ^~
>   "at %L must have the same length (%wd/%wd)",
>   ~~~
> ../../work/gcc/fortran/decl.c:1575:17: error: unknown conversion type 
> character 'w' in format [-Werror=format=]
> ../../work/gcc/fortran/decl.c:1575:17: error: too many arguments for format 
> [-Werror=format-extra-args]

Yes, same for the on x86-64 Linux. Might be something weird on the
gfortran diagnostics stuff, as it seems %wd is used in the middle end.
Anyway, that's stage1 stuff IMHO.



-- 
Janne Blomqvist


Re: [PATCH 4/5] x86: Add 'V' register operand modifier

2018-01-14 Thread Jan Hubicka
> Add 'V', a special modifier which prints the name of the full integer
> register without '%'.  For
> 
> extern void (*func_p) (void);
> 
> void
> foo (void)
> {
>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
> }
> 
> it generates:
> 
> foo:
>   movqfunc_p(%rip), %rax
>   call__x86_indirect_thunk_rax
>   ret
> 
> gcc/
> 
>   * config/i386/i386.c (print_reg): Print the name of the full
>   integer register without '%'.
>   (ix86_print_operand): Handle 'V'.
>* doc/extend.texi: Document 'V' modifier.
> 
> gcc/testsuite/
> 
>   * gcc.target/i386/indirect-thunk-register-4.c: New test.

OK.
Honza


Re: [PATCH 3/5] x86: Add -mindirect-branch-register

2018-01-14 Thread Jan Hubicka
> Add -mindirect-branch-register to force indirect branch via register.
> This is implemented by disabling patterns of indirect branch via memory,
> similar to TARGET_X32.
> 
> -mindirect-branch= and -mfunction-return= tests are updated with
> -mno-indirect-branch-register to avoid false test failures when
> -mindirect-branch-register is added to RUNTESTFLAGS for "make check".
> 
> gcc/
> 
>   * config/i386/constraints.md (Bs): Disallow memory operand for
>   -mindirect-branch-register.
>   (Bw): Likewise.
>   * config/i386/predicates.md (indirect_branch_operand): Likewise.
>   (GOT_memory_operand): Likewise.
>   (call_insn_operand): Likewise.
>   (sibcall_insn_operand): Likewise.
>   (GOT32_symbol_operand): Likewise.
>   * config/i386/i386.md (indirect_jump): Call convert_memory_address
>   for -mindirect-branch-register.
>   (tablejump): Likewise.
>   (*sibcall_memory): Likewise.
>   (*sibcall_value_memory): Likewise.
>   Disallow peepholes of indirect call and jump via memory for
>   -mindirect-branch-register.
>   (*call_pop): Replace m with Bw.
>   (*call_value_pop): Likewise.
>   (*sibcall_pop_memory): Replace m with Bs.
>   * config/i386/i386.opt (mindirect-branch-register): New option.
>   * doc/invoke.texi: Document -mindirect-branch-register option.

OK.
Honza


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-14 Thread Jan Hubicka
> gcc/
> 
>   * config/i386/i386-opts.h (indirect_branch): New.
>   * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
>   * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
>   with local indirect jump when converting indirect call and jump.
>   (ix86_set_indirect_branch_type): New.
>   (ix86_set_current_function): Call ix86_set_indirect_branch_type.
>   (indirectlabelno): New.
>   (indirect_thunk_needed): Likewise.
>   (indirect_thunk_bnd_needed): Likewise.
>   (indirect_thunks_used): Likewise.
>   (indirect_thunks_bnd_used): Likewise.
>   (INDIRECT_LABEL): Likewise.
>   (indirect_thunk_name): Likewise.
>   (output_indirect_thunk): Likewise.
>   (output_indirect_thunk_function): Likewise.
>   (ix86_output_indirect_branch): Likewise.
>   (ix86_output_indirect_jmp): Likewise.
>   (ix86_code_end): Call output_indirect_thunk_function if needed.
>   (ix86_output_call_insn): Call ix86_output_indirect_branch if
>   needed.
>   (ix86_handle_fndecl_attribute): Handle indirect_branch.
>   (ix86_attribute_table): Add indirect_branch.
>   * config/i386/i386.h (machine_function): Add indirect_branch_type
>   and has_local_indirect_jump.
>   * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
>   to true.
>   (tablejump): Likewise.
>   (*indirect_jump): Use ix86_output_indirect_jmp.
>   (*tablejump_1): Likewise.
>   (simple_return_indirect_internal): Likewise.
>   * config/i386/i386.opt (mindirect-branch=): New option.
>   (indirect_branch): New.
>   (keep): Likewise.
>   (thunk): Likewise.
>   (thunk-inline): Likewise.
>   (thunk-extern): Likewise.
>   * doc/extend.texi: Document indirect_branch function attribute.
>   * doc/invoke.texi: Document -mindirect-branch= option.
> 
> gcc/testsuite/
> 
>   * gcc.target/i386/indirect-thunk-1.c: New test.
>   * gcc.target/i386/indirect-thunk-2.c: Likewise.
>   * gcc.target/i386/indirect-thunk-3.c: Likewise.
>   * gcc.target/i386/indirect-thunk-4.c: Likewise.
>   * gcc.target/i386/indirect-thunk-5.c: Likewise.
>   * gcc.target/i386/indirect-thunk-6.c: Likewise.
>   * gcc.target/i386/indirect-thunk-7.c: Likewise.
>   * gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
>   * gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
>   * gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
>   * gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
>   * gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
>   * gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
>   * gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
>   * gcc.target/i386/indirect-thunk-attr-8.c: Likewise.
>   * gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
>   * gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
>   * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
>   * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
>   * gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
>   * gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
>   * gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
>   * gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
>   * gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
>   * gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
>   * gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
>   * gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
>   * gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
>   * gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
>   * gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
>   * gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
>   * gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
>   * gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
> +
> +  /* Pause .  */
> +  fprintf (asm_out_file, "\tpause\n");

OK, but please prepare incremental patches to choose between pause and lefence
as needed for AMD CPUs and check for large code model.


Re: [PATCH 5/5] x86: Disallow -mindirect-branch=/-mfunction-return= with -mcmodel=large

2018-01-14 Thread Jan Hubicka
> Since the thunk function may not be reachable in large code model,
> -mcmodel=large is incompatible with -mindirect-branch=thunk,
> -mindirect-branch=thunk-extern, -mfunction-return=thunk and
> -mfunction-return=thunk-extern.  Issue an error when they are used with
> -mcmodel=large.
> 
> gcc/
> 
>   * config/i386/i386.c (ix86_set_indirect_branch_type): Disallow
>   -mcmodel=large with -mindirect-branch=thunk,
>   -mindirect-branch=thunk-extern, -mfunction-return=thunk and
>   -mfunction-return=thunk-extern.
>   * doc/invoke.texi: Document -mcmodel=large is incompatible with
>   -mindirect-branch=thunk, -mindirect-branch=thunk-extern,
>   -mfunction-return=thunk and -mfunction-return=thunk-extern.
> 
> gcc/testsuite/
> 
>   * gcc.target/i386/indirect-thunk-10.c: New test.
>   * gcc.target/i386/indirect-thunk-8.c: Likewise.
>   * gcc.target/i386/indirect-thunk-9.c: Likewise.
>   * gcc.target/i386/indirect-thunk-attr-10.c: Likewise.
>   * gcc.target/i386/indirect-thunk-attr-11.c: Likewise.
>   * gcc.target/i386/indirect-thunk-attr-9.c: Likewise.
>   * gcc.target/i386/ret-thunk-17.c: Likewise.
>   * gcc.target/i386/ret-thunk-18.c: Likewise.
>   * gcc.target/i386/ret-thunk-19.c: Likewise.
>   * gcc.target/i386/ret-thunk-20.c: Likewise.
>   * gcc.target/i386/ret-thunk-21.c: Likewise.

OK.
Honza


Re: [PATCH 2/5] x86: Add -mfunction-return=

2018-01-14 Thread Jan Hubicka
> Add -mfunction-return= option to convert function return to call and
> return thunks.  The default is 'keep', which keeps function return
> unmodified.  'thunk' converts function return to call and return thunk.
> 'thunk-inline' converts function return to inlined call and return thunk.
> 'thunk-extern' converts function return to external call and return
> thunk provided in a separate object file.  You can control this behavior
> for a specific function by using the function attribute function_return.
> 
> Function return thunk is the same as memory thunk for -mindirect-branch=
> where the return address is at the top of the stack:
> 
> __x86_return_thunk:
>   call L2
> L1:
>   pause
>   jmp L1
> L2:
>   lea 8(%rsp), %rsp|lea 4(%esp), %esp
>   ret
> 
> and function return becomes
> 
>   jmp __x86_return_thunk
> 
> -mindirect-branch= tests are updated with -mfunction-return=keep to
> avoid false test failures when -mfunction-return=thunk is added to
> RUNTESTFLAGS for "make check".
> 
> gcc/
> 
>   * config/i386/i386-protos.h (ix86_output_function_return): New.
>   * config/i386/i386.c (ix86_set_indirect_branch_type): Also
>   set function_return_type.
>   (indirect_thunk_name): Add ret_p to indicate thunk for function
>   return.
>   (output_indirect_thunk_function): Pass false to
>   indirect_thunk_name.
>   (ix86_output_indirect_branch): Likewise.
>   (output_indirect_thunk_function): Create alias for function
>   return thunk if regno < 0.
>   (ix86_output_function_return): New function.
>   (ix86_handle_fndecl_attribute): Handle function_return.
>   (ix86_attribute_table): Add function_return.
>   * config/i386/i386.h (machine_function): Add
>   function_return_type.
>   * config/i386/i386.md (simple_return_internal): Use
>   ix86_output_function_return.
>   (simple_return_internal_long): Likewise.
>   * config/i386/i386.opt (mfunction-return=): New option.
>   (indirect_branch): Mention -mfunction-return=.
>   * doc/extend.texi: Document function_return function attribute.
>   * doc/invoke.texi: Document -mfunction-return= option.

The implementation is reasonable, but I still do not quite understand
if we really need this change. Celarly killing every return is going
to have significant impact and it is apparently not used by Linux kernel.
What is the main motivation for it?

Honza


Re: inline_small_functions speedup

2018-01-14 Thread Jan Hubicka
> After inlining A into B, inline_small_functions updates the information
> for (most) callees and callers of the new B:
> 
> update_callee_keys (&edge_heap, where, updated_nodes);
>   [...]
>   /* Our profitability metric can depend on local properties
>such as number of inlinable calls and size of the function body.
>After inlining these properties might change for the function we
>inlined into (since it's body size changed) and for the functions
>called by function we inlined (since number of it inlinable callers
>might change).  */
>   update_caller_keys (&edge_heap, where, updated_nodes, NULL);
> 
> These functions in turn call can_inline_edge_p for most of the associated
> edges:
> 
>   if (can_inline_edge_p (edge, false)
>   && want_inline_small_function_p (edge, false))
> update_edge_key (heap, edge);
> 
> can_inline_edge_p indirectly calls estimate_calls_size_and_time
> on the caller node, which seems to recursively process all callee
> edges rooted at the node.  It looks from this like algorithm can be
> at least quadratic in the worst case.

Yep, I have patch to add overall size into summary that makes it constant
time.  Will need to clean it up for mainline.
> 
> Maybe there's something we can do to make can_inline_edge_p cheaper, but
> since neither of these two calls is responsible for reporting an inline
> failure reason, it seems cheaper to test want_inline_small_function_p
> first, so that we don't calculate an estimate for something that we
> already know isn't a "small function".  I think the only change
> needed to make that work is to check for CIF_FINAL_ERROR in
> want_inline_small_function_p; at the moment we rely on can_inline_edge_p
> to make that check.
> 
> This cuts the time to build optabs.ii by over 4% with an
> --enable-checking=release compiler on x86_64-linux-gnu.  I've seen more
> dramatic wins on aarch64-linux-gnu due to the NUM_POLY_INT_COEFFS==2
> thing.  The patch doesn't affect the output code.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Anyway this seems to make sense to do.
> 
> Richard
> 
> 
> 2018-01-13  Richard Sandiford  
> 
> gcc/
>   * ipa-inline.c (want_inline_small_function_p): Return false if
>   inlining has already failed with CIF_FINAL_ERROR.
>   (update_caller_keys): Call want_inline_small_function_p before
>   can_inline_edge_p.
>   (update_callee_keys): Likewise.

OK,
thanks!
Honza


Re: [PATCH] PR 78534, 83704 Large character lengths

2018-01-14 Thread Thomas Koenig

Am 14.01.2018 um 11:45 schrieb Janne Blomqvist:

I have tried to use %wp, but it didn’t work:

../../work/gcc/fortran/decl.c: In function 'void 
gfc_set_constant_character_len(gfc_charlen_t, gfc_expr*, gfc_charlen_t)':
../../work/gcc/fortran/decl.c:1567:5: error: unknown conversion type character 
'w' in format [-Werror=format=]
  "CHARACTER expression at %L is being truncated "
  ^~~~



gfortran diagnostics stuff, as it seems %wd is used in the middle end.
Anyway, that's stage1 stuff IMHO.


I'd suggest using %ld and casting the value to long int, or %lld and
casting to long long int, at least for the time being.

Regards

Thomas


[patch, fortran] Change ABI for F2008 - minloc/maxloc BACK argument

2018-01-14 Thread Thomas Koenig

Hello world,

here is the latest take on the min/maxloc ABI change for BACK.
This version now passes BACK as a GFC_LOGCIAL_4 by value in all cases.
I did this by using the existing %VAL mechanism. I also added
another test case which crashed during one stage of development.

So, OK for trunk?

Regards

Thomas

2018-01-14  Thomas Koenig  

PR fortran/54613
* gfortran.h (gfc_check_f): Rename f4ml to f5ml.
(gfc_logical_4_kind): New macro
* intrinsic.h (gfc_simplify_minloc): Add a gfc_expr *argument.
(gfc_simplify_maxloc): Likewise.
(gfc_resolve_maxloc): Likewise.
(gfc_resolve_minloc): Likewise.
* check.c (gfc_check_minloc_maxloc): Add checking for "back"
argument; also raise error if it is used (for now). Add it
if it isn't present.
* intrinsic.c (add_sym_4ml): Rename to
(add_sym_5ml), adjust for extra argument.
(add_functions): Add "back" constant. Adjust maxloc and minloc
for back argument.
* iresolve.c (gfc_resolve_maxloc): Add back argument. If back is
not of gfc_logical_4_kind, convert.
(gfc_resolve_minloc): Likewise.
* simplify.c (gfc_simplify_minloc): Add back argument.
(gfc_simplify_maxloc): Likewise.
* trans-intinsic.c (gfc_conv_intrinsic_minmaxloc): Rename last
argument to %VAL to ensure passing by value.
(gfc_conv_intrinsic_function): Call gfc_conv_intrinsic_minmaxloc
also for library calls.

2018-01-14  Thomas Koenig  

PR fortran/54613
* m4/iparm.m4: Add back_arg macro if in minloc or maxloc.
* m4/iforeach-s.m4: Add optional argument back with back_arg
macro. Improve m4 quoting. If HAVE_BACK_ARG is defined, assert
that back is non-true.
* m4/iforeach.m4: Likewise.
* m4/ifunction-s.m4: Likewise.
* m4/ifunction.m4: Likewise.
* m4/maxloc0.m4: Include assert.h
* m4/minloc0.m4: Likewise.
* m4/maxloc0s.m4: #define HAVE_BACK_ARG.
* m4/minloc0s.m4: Likewise.
* m4/maxloc1s.m4: Likewise.
* m4/minloc1s.m4: Likewise.
* m4/maxloc1.m4: Include assert.h, #define HAVE_BACK_ARG.
* m4/minloc1.m4: Likewise.
* m4/maxloc2s.m4: Add assert.h, add back_arg, assert that
back is non-true.
* m4/minloc2s.m4: Likewise.
* generated/iall_i1.c: Regenerated.
* generated/iall_i16.c: Regenerated.
* generated/iall_i2.c: Regenerated.
* generated/iall_i4.c: Regenerated.
* generated/iall_i8.c: Regenerated.
* generated/iany_i1.c: Regenerated.
* generated/iany_i16.c: Regenerated.
* generated/iany_i2.c: Regenerated.
* generated/iany_i4.c: Regenerated.
* generated/iany_i8.c: Regenerated.
* generated/iparity_i1.c: Regenerated.
* generated/iparity_i16.c: Regenerated.
* generated/iparity_i2.c: Regenerated.
* generated/iparity_i4.c: Regenerated.
* generated/iparity_i8.c: Regenerated.
* generated/maxloc0_16_i1.c: Regenerated.
* generated/maxloc0_16_i16.c: Regenerated.
* generated/maxloc0_16_i2.c: Regenerated.
* generated/maxloc0_16_i4.c: Regenerated.
* generated/maxloc0_16_i8.c: Regenerated.
* generated/maxloc0_16_r10.c: Regenerated.
* generated/maxloc0_16_r16.c: Regenerated.
* generated/maxloc0_16_r4.c: Regenerated.
* generated/maxloc0_16_r8.c: Regenerated.
* generated/maxloc0_16_s1.c: Regenerated.
* generated/maxloc0_16_s4.c: Regenerated.
* generated/maxloc0_4_i1.c: Regenerated.
* generated/maxloc0_4_i16.c: Regenerated.
* generated/maxloc0_4_i2.c: Regenerated.
* generated/maxloc0_4_i4.c: Regenerated.
* generated/maxloc0_4_i8.c: Regenerated.
* generated/maxloc0_4_r10.c: Regenerated.
* generated/maxloc0_4_r16.c: Regenerated.
* generated/maxloc0_4_r4.c: Regenerated.
* generated/maxloc0_4_r8.c: Regenerated.
* generated/maxloc0_4_s1.c: Regenerated.
* generated/maxloc0_4_s4.c: Regenerated.
* generated/maxloc0_8_i1.c: Regenerated.
* generated/maxloc0_8_i16.c: Regenerated.
* generated/maxloc0_8_i2.c: Regenerated.
* generated/maxloc0_8_i4.c: Regenerated.
* generated/maxloc0_8_i8.c: Regenerated.
* generated/maxloc0_8_r10.c: Regenerated.
* generated/maxloc0_8_r16.c: Regenerated.
* generated/maxloc0_8_r4.c: Regenerated.
* generated/maxloc0_8_r8.c: Regenerated.
* generated/maxloc0_8_s1.c: Regenerated.
* generated/maxloc0_8_s4.c: Regenerated.
* generated/maxloc1_16_i1.c: Regenerated.
* generated/maxloc1_16_i16.c: Regenerated.
* generated/maxloc1_16_i2.c: Regenerated.
* generated/maxloc1_16_i4.c: Regenerated.
* generated/maxloc1_16_i8.c: Regenerated.
* generated/maxloc1_16_r10.c: Regenerated.
 

Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-14 Thread Markus Trippelsdorf
On 2018.01.14 at 11:46 +0100, Jan Hubicka wrote:
> > gcc/
> >
> > * config/i386/i386-opts.h (indirect_branch): New.
> > * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
> > * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
> > with local indirect jump when converting indirect call and jump.
> > (ix86_set_indirect_branch_type): New.
> > (ix86_set_current_function): Call ix86_set_indirect_branch_type.
> > (indirectlabelno): New.
> > (indirect_thunk_needed): Likewise.
> > (indirect_thunk_bnd_needed): Likewise.
> > (indirect_thunks_used): Likewise.
> > (indirect_thunks_bnd_used): Likewise.
> > (INDIRECT_LABEL): Likewise.
> > (indirect_thunk_name): Likewise.
> > (output_indirect_thunk): Likewise.
> > (output_indirect_thunk_function): Likewise.
> > (ix86_output_indirect_branch): Likewise.
> > (ix86_output_indirect_jmp): Likewise.
> > (ix86_code_end): Call output_indirect_thunk_function if needed.
> > (ix86_output_call_insn): Call ix86_output_indirect_branch if
> > needed.
> > (ix86_handle_fndecl_attribute): Handle indirect_branch.
> > (ix86_attribute_table): Add indirect_branch.
> > * config/i386/i386.h (machine_function): Add indirect_branch_type
> > and has_local_indirect_jump.
> > * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
> > to true.
> > (tablejump): Likewise.
> > (*indirect_jump): Use ix86_output_indirect_jmp.
> > (*tablejump_1): Likewise.
> > (simple_return_indirect_internal): Likewise.
> > * config/i386/i386.opt (mindirect-branch=): New option.
> > (indirect_branch): New.
> > (keep): Likewise.
> > (thunk): Likewise.
> > (thunk-inline): Likewise.
> > (thunk-extern): Likewise.
> > * doc/extend.texi: Document indirect_branch function attribute.
> > * doc/invoke.texi: Document -mindirect-branch= option.
> > +
> > +  /* Pause .  */
> > +  fprintf (asm_out_file, "\tpause\n");
>
> OK, but please prepare incremental patches to choose between pause and lefence
> as needed for AMD CPUs and check for large code model.

Why not use both? That would make everybody happy, no?

 pause
 lfence
 jmp

-- 
Markus


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-14 Thread Jan Hubicka
> On 2018.01.14 at 11:46 +0100, Jan Hubicka wrote:
> > > gcc/
> > >
> > >   * config/i386/i386-opts.h (indirect_branch): New.
> > >   * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
> > >   * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
> > >   with local indirect jump when converting indirect call and jump.
> > >   (ix86_set_indirect_branch_type): New.
> > >   (ix86_set_current_function): Call ix86_set_indirect_branch_type.
> > >   (indirectlabelno): New.
> > >   (indirect_thunk_needed): Likewise.
> > >   (indirect_thunk_bnd_needed): Likewise.
> > >   (indirect_thunks_used): Likewise.
> > >   (indirect_thunks_bnd_used): Likewise.
> > >   (INDIRECT_LABEL): Likewise.
> > >   (indirect_thunk_name): Likewise.
> > >   (output_indirect_thunk): Likewise.
> > >   (output_indirect_thunk_function): Likewise.
> > >   (ix86_output_indirect_branch): Likewise.
> > >   (ix86_output_indirect_jmp): Likewise.
> > >   (ix86_code_end): Call output_indirect_thunk_function if needed.
> > >   (ix86_output_call_insn): Call ix86_output_indirect_branch if
> > >   needed.
> > >   (ix86_handle_fndecl_attribute): Handle indirect_branch.
> > >   (ix86_attribute_table): Add indirect_branch.
> > >   * config/i386/i386.h (machine_function): Add indirect_branch_type
> > >   and has_local_indirect_jump.
> > >   * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
> > >   to true.
> > >   (tablejump): Likewise.
> > >   (*indirect_jump): Use ix86_output_indirect_jmp.
> > >   (*tablejump_1): Likewise.
> > >   (simple_return_indirect_internal): Likewise.
> > >   * config/i386/i386.opt (mindirect-branch=): New option.
> > >   (indirect_branch): New.
> > >   (keep): Likewise.
> > >   (thunk): Likewise.
> > >   (thunk-inline): Likewise.
> > >   (thunk-extern): Likewise.
> > >   * doc/extend.texi: Document indirect_branch function attribute.
> > >   * doc/invoke.texi: Document -mindirect-branch= option.
> > > +
> > > +  /* Pause .  */
> > > +  fprintf (asm_out_file, "\tpause\n");
> >
> > OK, but please prepare incremental patches to choose between pause and 
> > lefence
> > as needed for AMD CPUs and check for large code model.
> 
> Why not use both? That would make everybody happy, no?
> 
>  pause
>  lfence
>  jmp

Yes, according to my understanding we want both pause and lefence by default
(it is expensive anyway) + command line option to control which one to use?

Probably thus the first patch should default to both.

Honza
> 
> -- 
> Markus


Do not ICE on roundoff errors in ipa-inline

2018-01-14 Thread Jan Hubicka
Hi,
this patch fixes ICE on the testcase where we estimate function body
to be rather slow and get roundoff error out of sreal.

Honza

PR ipa/83051
* gcc.c-torture/compile/pr83051.c: New testcase.
* ipa-inline.c (edge_badness): Tolerate roundoff errors.

Index: testsuite/gcc.c-torture/compile/pr83051.c
===
--- testsuite/gcc.c-torture/compile/pr83051.c   (revision 0)
+++ testsuite/gcc.c-torture/compile/pr83051.c   (working copy)
@@ -0,0 +1,31 @@
+
+int a[1], b, c, d, e, f, g, h;
+
+void fn1 (int p)
+{ 
+  b = b >> 8 ^ a[b ^ (c & 5)] >> 8 ^ a[(b ^ c) & 5];
+  b = b >> 8 ^ a[(b ^ c) & 5];
+}
+
+static void fn2 ()
+{ 
+  int k;
+  while (1)
+while (e)
+  { 
+while (g)
+  while (h)
+for (k = 0; k < 6; k++)
+  while (f)
+fn1 (0);
+fn1 (0);
+fn1 (0);
+fn1 (0);
+  }
+}
+
+int main ()
+{ 
+  fn2 ();
+  return 0;
+}
Index: ipa-inline.c
===
--- ipa-inline.c(revision 256647)
+++ ipa-inline.c(working copy)
@@ -993,7 +993,8 @@ edge_badness (struct cgraph_edge *edge,
   /* Check that inlined time is better, but tolerate some roundoff issues.
  FIXME: When callee profile drops to 0 we account calls more.  This
  should be fixed by never doing that.  */
-  gcc_checking_assert ((edge_time - callee_info->time).to_int () <= 0
+  gcc_checking_assert ((edge_time * 100
+   - callee_info->time * 101).to_int () <= 0
|| callee->count.ipa ().initialized_p ());
   gcc_checking_assert (growth <= callee_info->size);
 


Re: [PATCH 2/5] x86: Add -mfunction-return=

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 2:48 AM, Jan Hubicka  wrote:
>> Add -mfunction-return= option to convert function return to call and
>> return thunks.  The default is 'keep', which keeps function return
>> unmodified.  'thunk' converts function return to call and return thunk.
>> 'thunk-inline' converts function return to inlined call and return thunk.
>> 'thunk-extern' converts function return to external call and return
>> thunk provided in a separate object file.  You can control this behavior
>> for a specific function by using the function attribute function_return.
>>
>> Function return thunk is the same as memory thunk for -mindirect-branch=
>> where the return address is at the top of the stack:
>>
>> __x86_return_thunk:
>>   call L2
>> L1:
>>   pause
>>   jmp L1
>> L2:
>>   lea 8(%rsp), %rsp|lea 4(%esp), %esp
>>   ret
>>
>> and function return becomes
>>
>>   jmp __x86_return_thunk
>>
>> -mindirect-branch= tests are updated with -mfunction-return=keep to
>> avoid false test failures when -mfunction-return=thunk is added to
>> RUNTESTFLAGS for "make check".
>>
>> gcc/
>>
>>   * config/i386/i386-protos.h (ix86_output_function_return): New.
>>   * config/i386/i386.c (ix86_set_indirect_branch_type): Also
>>   set function_return_type.
>>   (indirect_thunk_name): Add ret_p to indicate thunk for function
>>   return.
>>   (output_indirect_thunk_function): Pass false to
>>   indirect_thunk_name.
>>   (ix86_output_indirect_branch): Likewise.
>>   (output_indirect_thunk_function): Create alias for function
>>   return thunk if regno < 0.
>>   (ix86_output_function_return): New function.
>>   (ix86_handle_fndecl_attribute): Handle function_return.
>>   (ix86_attribute_table): Add function_return.
>>   * config/i386/i386.h (machine_function): Add
>>   function_return_type.
>>   * config/i386/i386.md (simple_return_internal): Use
>>   ix86_output_function_return.
>>   (simple_return_internal_long): Likewise.
>>   * config/i386/i386.opt (mfunction-return=): New option.
>>   (indirect_branch): Mention -mfunction-return=.
>>   * doc/extend.texi: Document function_return function attribute.
>>   * doc/invoke.texi: Document -mfunction-return= option.
>
> The implementation is reasonable, but I still do not quite understand
> if we really need this change. Celarly killing every return is going
> to have significant impact and it is apparently not used by Linux kernel.
> What is the main motivation for it?
>
> Honza

See:

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00540.html

I will check it in.

-- 
H.J.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Uros Bizjak
On Sun, Jan 14, 2018 at 11:40 AM, Jan Hubicka  wrote:
>> Hi HJ,
>>
>> > -Original Message-
>> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> > ow...@gcc.gnu.org] On Behalf Of H.J. Lu
>> > Sent: Sunday, January 14, 2018 9:07 AM
>> > To: gcc-patches@gcc.gnu.org
>> > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
>> >
>> > This set of patches for GCC 8 mitigates variant #2 of the speculative
>> > execution vulnerabilities on x86 processors identified by CVE-2017-5715, 
>> > aka
>> > Spectre.  They convert indirect branches and function returns to call and
>> > return thunks to avoid speculative execution via indirect call, jmp and 
>> > ret.
>> >
>> > H.J. Lu (5):
>> >   x86: Add -mindirect-branch=
>> >   x86: Add -mfunction-return=
>> >   x86: Add -mindirect-branch-register
>> >   x86: Add 'V' register operand modifier
>> >   x86: Disallow -mindirect-branch=/-mfunction-return= with
>> > -mcmodel=large
>>
>> Current set of patches don't seem to have any option to generate "lfence" as 
>> the loop filler in "retpoline",  which is required by AMD.
>> Can you please clarify the plan. We would like to get this checked-in GCC 8.
>
> Since thunks are output as strings, it is easy to add the option
> on the top of patch #1 of the series.  I do not fully understand
> the reason for choosing pause over lfence for Intel, but if we need
> to do both, we need to have command line option (and possibly attribute).
> What would be reasonable name for it?

Looking at the kernel patch [1], the loop filler should be
"pause;lfence" sequence, and should be universally accepted for Intel
and AMD targets.

[1] https://www.spinics.net/lists/kernel/msg2697507.html

Uros.


Re: [PATCH] PR 78534, 83704 Large character lengths

2018-01-14 Thread Janne Blomqvist
On Sun, Jan 14, 2018 at 12:45 PM, Janne Blomqvist
 wrote:
> On Sat, Jan 13, 2018 at 7:35 PM, Dominique d'Humières
>  wrote:
>> I have finally bootstrapped gfortran with the two patches applied and the 
>> spurious warnings with -Wall are now gone (limited testing), but I see a 
>> regression for gfortran.dg/string_1.f90 due to an additional error
>>
>> /opt/gcc/_clean/gcc/testsuite/gfortran.dg/string_1.f90:13:15:
>>
>>print *, len(s)
>>1
>> Error: Result of LEN overflows its kind at (1)
>>
>> No idea why this was not emitted without the patch.
>
> Thanks for testing, I'll look into this. Was it on a 32 or 64-bit target?

I can't recreate this on i686-pc-linux-gnu. What host/target are you
using exactly, is it part of the normal testsuite run or do you
compile it with some special options?

In any case, that line isn't particularly relevant to what the
testcase is testing, so it could be removed completely, or by
specifying kind=8 to len().




-- 
Janne Blomqvist


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Jan Hubicka
> On Sun, Jan 14, 2018 at 11:40 AM, Jan Hubicka  wrote:
> >> Hi HJ,
> >>
> >> > -Original Message-
> >> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> > ow...@gcc.gnu.org] On Behalf Of H.J. Lu
> >> > Sent: Sunday, January 14, 2018 9:07 AM
> >> > To: gcc-patches@gcc.gnu.org
> >> > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
> >> >
> >> > This set of patches for GCC 8 mitigates variant #2 of the speculative
> >> > execution vulnerabilities on x86 processors identified by CVE-2017-5715, 
> >> > aka
> >> > Spectre.  They convert indirect branches and function returns to call and
> >> > return thunks to avoid speculative execution via indirect call, jmp and 
> >> > ret.
> >> >
> >> > H.J. Lu (5):
> >> >   x86: Add -mindirect-branch=
> >> >   x86: Add -mfunction-return=
> >> >   x86: Add -mindirect-branch-register
> >> >   x86: Add 'V' register operand modifier
> >> >   x86: Disallow -mindirect-branch=/-mfunction-return= with
> >> > -mcmodel=large
> >>
> >> Current set of patches don't seem to have any option to generate "lfence" 
> >> as the loop filler in "retpoline",  which is required by AMD.
> >> Can you please clarify the plan. We would like to get this checked-in GCC 
> >> 8.
> >
> > Since thunks are output as strings, it is easy to add the option
> > on the top of patch #1 of the series.  I do not fully understand
> > the reason for choosing pause over lfence for Intel, but if we need
> > to do both, we need to have command line option (and possibly attribute).
> > What would be reasonable name for it?
> 
> Looking at the kernel patch [1], the loop filler should be
> "pause;lfence" sequence, and should be universally accepted for Intel
> and AMD targets.
> 
> [1] https://www.spinics.net/lists/kernel/msg2697507.html

Yep, I would say we should go with pause;lfence now and see if we want to add 
argument
eventually.
HJ, does it sound OK?

Honza
> 
> Uros.


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 3:06 AM, Jan Hubicka  wrote:
>> On 2018.01.14 at 11:46 +0100, Jan Hubicka wrote:
>> > > gcc/
>> > >
>> > >   * config/i386/i386-opts.h (indirect_branch): New.
>> > >   * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
>> > >   * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
>> > >   with local indirect jump when converting indirect call and jump.
>> > >   (ix86_set_indirect_branch_type): New.
>> > >   (ix86_set_current_function): Call ix86_set_indirect_branch_type.
>> > >   (indirectlabelno): New.
>> > >   (indirect_thunk_needed): Likewise.
>> > >   (indirect_thunk_bnd_needed): Likewise.
>> > >   (indirect_thunks_used): Likewise.
>> > >   (indirect_thunks_bnd_used): Likewise.
>> > >   (INDIRECT_LABEL): Likewise.
>> > >   (indirect_thunk_name): Likewise.
>> > >   (output_indirect_thunk): Likewise.
>> > >   (output_indirect_thunk_function): Likewise.
>> > >   (ix86_output_indirect_branch): Likewise.
>> > >   (ix86_output_indirect_jmp): Likewise.
>> > >   (ix86_code_end): Call output_indirect_thunk_function if needed.
>> > >   (ix86_output_call_insn): Call ix86_output_indirect_branch if
>> > >   needed.
>> > >   (ix86_handle_fndecl_attribute): Handle indirect_branch.
>> > >   (ix86_attribute_table): Add indirect_branch.
>> > >   * config/i386/i386.h (machine_function): Add indirect_branch_type
>> > >   and has_local_indirect_jump.
>> > >   * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
>> > >   to true.
>> > >   (tablejump): Likewise.
>> > >   (*indirect_jump): Use ix86_output_indirect_jmp.
>> > >   (*tablejump_1): Likewise.
>> > >   (simple_return_indirect_internal): Likewise.
>> > >   * config/i386/i386.opt (mindirect-branch=): New option.
>> > >   (indirect_branch): New.
>> > >   (keep): Likewise.
>> > >   (thunk): Likewise.
>> > >   (thunk-inline): Likewise.
>> > >   (thunk-extern): Likewise.
>> > >   * doc/extend.texi: Document indirect_branch function attribute.
>> > >   * doc/invoke.texi: Document -mindirect-branch= option.
>> > > +
>> > > +  /* Pause .  */
>> > > +  fprintf (asm_out_file, "\tpause\n");
>> >
>> > OK, but please prepare incremental patches to choose between pause and 
>> > lefence
>> > as needed for AMD CPUs and check for large code model.
>>
>> Why not use both? That would make everybody happy, no?
>>
>>  pause
>>  lfence
>>  jmp
>
> Yes, according to my understanding we want both pause and lefence by default
> (it is expensive anyway) + command line option to control which one to use?
>
> Probably thus the first patch should default to both.

Done.  This is the patch I am checking in.

-- 
H.J.
From de23211f9a2fae02fe003f7b32016b94be9c8407 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 6 Jan 2018 22:29:55 -0800
Subject: [PATCH 1/6] x86: Add -mindirect-branch=

Add -mindirect-branch= option to convert indirect call and jump to call
and return thunks.  The default is 'keep', which keeps indirect call and
jump unmodified.  'thunk' converts indirect call and jump to call and
return thunk.  'thunk-inline' converts indirect call and jump to inlined
call and return thunk.  'thunk-extern' converts indirect call and jump to
external call and return thunk provided in a separate object file.  You
can control this behavior for a specific function by using the function
attribute indirect_branch.

2 kinds of thunks are geneated.  Memory thunk where the function address
is at the top of the stack:

__x86_indirect_thunk:
	call L2
L1:
	pause
	lfence
	jmp L1
L2:
	lea 8(%rsp), %rsp|lea 4(%esp), %esp
	ret

Indirect jmp via memory, "jmp mem", is converted to

	push memory
	jmp __x86_indirect_thunk

Indirect call via memory, "call mem", is converted to

	jmp L2
L1:
	push [mem]
	jmp __x86_indirect_thunk
L2:
	call L1

Register thunk where the function address is in a register, reg:

__x86_indirect_thunk_reg:
	call	L2
L1:
	pause
	lfence
	jmp	L1
L2:
	movq	%reg, (%rsp)|movl%reg, (%esp)
	ret

where reg is one of (r|e)ax, (r|e)dx, (r|e)cx, (r|e)bx, (r|e)si, (r|e)di,
(r|e)bp, r8, r9, r10, r11, r12, r13, r14 and r15.

Indirect jmp via register, "jmp reg", is converted to

	jmp __x86_indirect_thunk_reg

Indirect call via register, "call reg", is converted to

	call __x86_indirect_thunk_reg

gcc/

	* config/i386/i386-opts.h (indirect_branch): New.
	* config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
	* config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
	with local indirect jump when converting indirect call and jump.
	(ix86_set_indirect_branch_type): New.
	(ix86_set_current_function): Call ix86_set_indirect_branch_type.
	(indirectlabelno): New.
	(indirect_thunk_needed): Likewise.
	(indirect_thunk_bnd_needed): Likewise.
	(indirect_thunks_used): Likewise.
	(indirect_thunks_bnd_used): Likewise.
	(INDIRECT_LABEL): Likewise.
	(indirect_thunk_name): Likewise.
	(output_indirect_thunk): Likewise.
	(output_indirect_thunk_function): Likewise.
	(ix86_output_indirect_branch): Likewise.
	(ix86_output_indirect_jmp): Likewise.

Re: [PATCH] x86: Disallow -mindirect-branch=/-mfunction-return= with -mshstk

2018-01-14 Thread H.J. Lu
On Sat, Jan 13, 2018 at 8:09 AM, Jeff Law  wrote:
> On 01/12/2018 09:16 AM, H.J. Lu wrote:
>> On Thu, Jan 11, 2018 at 3:00 PM, H.J. Lu  wrote:
>>> On Thu, Jan 11, 2018 at 2:46 PM, Jeff Law  wrote:
>>
 Do you want to mention that CET and retpolines are inherently
>>>
>>> I will document it.
>>>
 incompatible?  Should an attempt to use them together generate a
 compile-time error?

>>>
>>> Compile-time error sounds a good idea.
>>>
>>
>> Here is the patch on my current patch set.  Any comments?
> Seems reasonable to me.

Jan, Uros,

Does it look OK:

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01072.html

Thanks.

-- 
H.J.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 4:42 AM, Jan Hubicka  wrote:
>> On Sun, Jan 14, 2018 at 11:40 AM, Jan Hubicka  wrote:
>> >> Hi HJ,
>> >>
>> >> > -Original Message-
>> >> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> >> > ow...@gcc.gnu.org] On Behalf Of H.J. Lu
>> >> > Sent: Sunday, January 14, 2018 9:07 AM
>> >> > To: gcc-patches@gcc.gnu.org
>> >> > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
>> >> >
>> >> > This set of patches for GCC 8 mitigates variant #2 of the speculative
>> >> > execution vulnerabilities on x86 processors identified by 
>> >> > CVE-2017-5715, aka
>> >> > Spectre.  They convert indirect branches and function returns to call 
>> >> > and
>> >> > return thunks to avoid speculative execution via indirect call, jmp and 
>> >> > ret.
>> >> >
>> >> > H.J. Lu (5):
>> >> >   x86: Add -mindirect-branch=
>> >> >   x86: Add -mfunction-return=
>> >> >   x86: Add -mindirect-branch-register
>> >> >   x86: Add 'V' register operand modifier
>> >> >   x86: Disallow -mindirect-branch=/-mfunction-return= with
>> >> > -mcmodel=large
>> >>
>> >> Current set of patches don't seem to have any option to generate "lfence" 
>> >> as the loop filler in "retpoline",  which is required by AMD.
>> >> Can you please clarify the plan. We would like to get this checked-in GCC 
>> >> 8.
>> >
>> > Since thunks are output as strings, it is easy to add the option
>> > on the top of patch #1 of the series.  I do not fully understand
>> > the reason for choosing pause over lfence for Intel, but if we need
>> > to do both, we need to have command line option (and possibly attribute).
>> > What would be reasonable name for it?
>>
>> Looking at the kernel patch [1], the loop filler should be
>> "pause;lfence" sequence, and should be universally accepted for Intel
>> and AMD targets.
>>
>> [1] https://www.spinics.net/lists/kernel/msg2697507.html
>
> Yep, I would say we should go with pause;lfence now and see if we want to add 
> argument
> eventually.
> HJ, does it sound OK?

Yes, I am checking a patch to default to "pause; lfence".


-- 
H.J.


Re: [PATCH 2/5] x86: Add -mfunction-return=

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 3:31 AM, H.J. Lu  wrote:
> On Sun, Jan 14, 2018 at 2:48 AM, Jan Hubicka  wrote:
>>> Add -mfunction-return= option to convert function return to call and
>>> return thunks.  The default is 'keep', which keeps function return
>>> unmodified.  'thunk' converts function return to call and return thunk.
>>> 'thunk-inline' converts function return to inlined call and return thunk.
>>> 'thunk-extern' converts function return to external call and return
>>> thunk provided in a separate object file.  You can control this behavior
>>> for a specific function by using the function attribute function_return.
>>>
>>> Function return thunk is the same as memory thunk for -mindirect-branch=
>>> where the return address is at the top of the stack:
>>>
>>> __x86_return_thunk:
>>>   call L2
>>> L1:
>>>   pause
>>>   jmp L1
>>> L2:
>>>   lea 8(%rsp), %rsp|lea 4(%esp), %esp
>>>   ret
>>>
>>> and function return becomes
>>>
>>>   jmp __x86_return_thunk
>>>
>>> -mindirect-branch= tests are updated with -mfunction-return=keep to
>>> avoid false test failures when -mfunction-return=thunk is added to
>>> RUNTESTFLAGS for "make check".
>>>
>>> gcc/
>>>
>>>   * config/i386/i386-protos.h (ix86_output_function_return): New.
>>>   * config/i386/i386.c (ix86_set_indirect_branch_type): Also
>>>   set function_return_type.
>>>   (indirect_thunk_name): Add ret_p to indicate thunk for function
>>>   return.
>>>   (output_indirect_thunk_function): Pass false to
>>>   indirect_thunk_name.
>>>   (ix86_output_indirect_branch): Likewise.
>>>   (output_indirect_thunk_function): Create alias for function
>>>   return thunk if regno < 0.
>>>   (ix86_output_function_return): New function.
>>>   (ix86_handle_fndecl_attribute): Handle function_return.
>>>   (ix86_attribute_table): Add function_return.
>>>   * config/i386/i386.h (machine_function): Add
>>>   function_return_type.
>>>   * config/i386/i386.md (simple_return_internal): Use
>>>   ix86_output_function_return.
>>>   (simple_return_internal_long): Likewise.
>>>   * config/i386/i386.opt (mfunction-return=): New option.
>>>   (indirect_branch): Mention -mfunction-return=.
>>>   * doc/extend.texi: Document function_return function attribute.
>>>   * doc/invoke.texi: Document -mfunction-return= option.
>>
>> The implementation is reasonable, but I still do not quite understand
>> if we really need this change. Celarly killing every return is going
>> to have significant impact and it is apparently not used by Linux kernel.
>> What is the main motivation for it?
>>
>> Honza
>
> See:
>
> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00540.html
>
> I will check it in.
>

Here is the patch I am checking in.


-- 
H.J.
From 2f05046a78afb173f301298df0a9a44b6fb38969 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 6 Jan 2018 22:29:56 -0800
Subject: [PATCH 2/6] x86: Add -mfunction-return=

Add -mfunction-return= option to convert function return to call and
return thunks.  The default is 'keep', which keeps function return
unmodified.  'thunk' converts function return to call and return thunk.
'thunk-inline' converts function return to inlined call and return thunk.
'thunk-extern' converts function return to external call and return
thunk provided in a separate object file.  You can control this behavior
for a specific function by using the function attribute function_return.

Function return thunk is the same as memory thunk for -mindirect-branch=
where the return address is at the top of the stack:

__x86_return_thunk:
	call L2
L1:
	pause
	lfence
	jmp L1
L2:
	lea 8(%rsp), %rsp|lea 4(%esp), %esp
	ret

and function return becomes

	jmp __x86_return_thunk

-mindirect-branch= tests are updated with -mfunction-return=keep to
avoid false test failures when -mfunction-return=thunk is added to
RUNTESTFLAGS for "make check".

gcc/

	* config/i386/i386-protos.h (ix86_output_function_return): New.
	* config/i386/i386.c (ix86_set_indirect_branch_type): Also
	set function_return_type.
	(indirect_thunk_name): Add ret_p to indicate thunk for function
	return.
	(output_indirect_thunk_function): Pass false to
	indirect_thunk_name.
	(ix86_output_indirect_branch): Likewise.
	(output_indirect_thunk_function): Create alias for function
	return thunk if regno < 0.
	(ix86_output_function_return): New function.
	(ix86_handle_fndecl_attribute): Handle function_return.
	(ix86_attribute_table): Add function_return.
	* config/i386/i386.h (machine_function): Add
	function_return_type.
	* config/i386/i386.md (simple_return_internal): Use
	ix86_output_function_return.
	(simple_return_internal_long): Likewise.
	* config/i386/i386.opt (mfunction-return=): New option.
	(indirect_branch): Mention -mfunction-return=.
	* doc/extend.texi: Document function_return function attribute.
	* doc/invoke.texi: Document -mfunction-return= option.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-1.c (dg-option

Re: [PATCH 5/5] x86: Disallow -mindirect-branch=/-mfunction-return= with -mcmodel=large

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 2:47 AM, Jan Hubicka  wrote:
>> Since the thunk function may not be reachable in large code model,
>> -mcmodel=large is incompatible with -mindirect-branch=thunk,
>> -mindirect-branch=thunk-extern, -mfunction-return=thunk and
>> -mfunction-return=thunk-extern.  Issue an error when they are used with
>> -mcmodel=large.
>>
>> gcc/
>>
>>   * config/i386/i386.c (ix86_set_indirect_branch_type): Disallow
>>   -mcmodel=large with -mindirect-branch=thunk,
>>   -mindirect-branch=thunk-extern, -mfunction-return=thunk and
>>   -mfunction-return=thunk-extern.
>>   * doc/invoke.texi: Document -mcmodel=large is incompatible with
>>   -mindirect-branch=thunk, -mindirect-branch=thunk-extern,
>>   -mfunction-return=thunk and -mfunction-return=thunk-extern.
>>
>> gcc/testsuite/
>>
>>   * gcc.target/i386/indirect-thunk-10.c: New test.
>>   * gcc.target/i386/indirect-thunk-8.c: Likewise.
>>   * gcc.target/i386/indirect-thunk-9.c: Likewise.
>>   * gcc.target/i386/indirect-thunk-attr-10.c: Likewise.
>>   * gcc.target/i386/indirect-thunk-attr-11.c: Likewise.
>>   * gcc.target/i386/indirect-thunk-attr-9.c: Likewise.
>>   * gcc.target/i386/ret-thunk-17.c: Likewise.
>>   * gcc.target/i386/ret-thunk-18.c: Likewise.
>>   * gcc.target/i386/ret-thunk-19.c: Likewise.
>>   * gcc.target/i386/ret-thunk-20.c: Likewise.
>>   * gcc.target/i386/ret-thunk-21.c: Likewise.
>
> OK.
> Honza

This is the patch I am checking in to only run tests on LP64 since
-mcmodel=large doesn't work with -mx32.

Thanks.

-- 
H.J.
From b1ccfb7ff73f3ceacc4bcc0a5737d92494ac0c37 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 13 Jan 2018 18:01:54 -0800
Subject: [PATCH 5/6] x86: Disallow -mindirect-branch=/-mfunction-return= with
 -mcmodel=large

Since the thunk function may not be reachable in large code model,
-mcmodel=large is incompatible with -mindirect-branch=thunk,
-mindirect-branch=thunk-extern, -mfunction-return=thunk and
-mfunction-return=thunk-extern.  Issue an error when they are used with
-mcmodel=large.

gcc/

	* config/i386/i386.c (ix86_set_indirect_branch_type): Disallow
	-mcmodel=large with -mindirect-branch=thunk,
	-mindirect-branch=thunk-extern, -mfunction-return=thunk and
	-mfunction-return=thunk-extern.
	* doc/invoke.texi: Document -mcmodel=large is incompatible with
	-mindirect-branch=thunk, -mindirect-branch=thunk-extern,
	-mfunction-return=thunk and -mfunction-return=thunk-extern.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-10.c: New test.
	* gcc.target/i386/indirect-thunk-8.c: Likewise.
	* gcc.target/i386/indirect-thunk-9.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-10.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-11.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-9.c: Likewise.
	* gcc.target/i386/ret-thunk-17.c: Likewise.
	* gcc.target/i386/ret-thunk-18.c: Likewise.
	* gcc.target/i386/ret-thunk-19.c: Likewise.
	* gcc.target/i386/ret-thunk-20.c: Likewise.
	* gcc.target/i386/ret-thunk-21.c: Likewise.
---
 gcc/config/i386/i386.c | 26 ++
 gcc/doc/invoke.texi| 11 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-10.c  |  7 ++
 gcc/testsuite/gcc.target/i386/indirect-thunk-8.c   |  7 ++
 gcc/testsuite/gcc.target/i386/indirect-thunk-9.c   |  7 ++
 .../gcc.target/i386/indirect-thunk-attr-10.c   |  9 
 .../gcc.target/i386/indirect-thunk-attr-11.c   |  9 
 .../gcc.target/i386/indirect-thunk-attr-9.c|  9 
 gcc/testsuite/gcc.target/i386/ret-thunk-17.c   |  7 ++
 gcc/testsuite/gcc.target/i386/ret-thunk-18.c   |  8 +++
 gcc/testsuite/gcc.target/i386/ret-thunk-19.c   |  8 +++
 gcc/testsuite/gcc.target/i386/ret-thunk-20.c   |  9 
 gcc/testsuite/gcc.target/i386/ret-thunk-21.c   |  9 
 13 files changed, 126 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-21.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 46cc2563859..3bdbe088182 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5836,6 +5836,19 @@ ix86_set_indirect_branch_type (tree fndecl)
 	}
   else
 	cfun->machine->indirect_branch_type = ix86_indirect_bra

Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Jan Hubicka
> > Hi HJ, 
> > 
> > > -Original Message-
> > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > > ow...@gcc.gnu.org] On Behalf Of H.J. Lu
> > > Sent: Sunday, January 14, 2018 9:07 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
> > > 
> > > This set of patches for GCC 8 mitigates variant #2 of the speculative
> > > execution vulnerabilities on x86 processors identified by CVE-2017-5715, 
> > > aka
> > > Spectre.  They convert indirect branches and function returns to call and
> > > return thunks to avoid speculative execution via indirect call, jmp and 
> > > ret.
> > > 
> > > H.J. Lu (5):
> > >   x86: Add -mindirect-branch=
> > >   x86: Add -mfunction-return=
> > >   x86: Add -mindirect-branch-register
> > >   x86: Add 'V' register operand modifier
> > >   x86: Disallow -mindirect-branch=/-mfunction-return= with
> > > -mcmodel=large
> > 
> > Current set of patches don't seem to have any option to generate "lfence" 
> > as the loop filler in "retpoline",  which is required by AMD.
> > Can you please clarify the plan. We would like to get this checked-in GCC 
> > 8.   
> 
> Since thunks are output as strings, it is easy to add the option
> on the top of patch #1 of the series.  I do not fully understand
> the reason for choosing pause over lfence for Intel, but if we need
> to do both, we need to have command line option (and possibly attribute).
> What would be reasonable name for it?

I forgot there is -mindirect-branch-loop for that in the original patchset.
So for now we should be happy with having both lfence and pause in there
or do we still need it?

Honza
> 
> Honza


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 6:20 AM, Jan Hubicka  wrote:
>> > Hi HJ,
>> >
>> > > -Original Message-
>> > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> > > ow...@gcc.gnu.org] On Behalf Of H.J. Lu
>> > > Sent: Sunday, January 14, 2018 9:07 AM
>> > > To: gcc-patches@gcc.gnu.org
>> > > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
>> > >
>> > > This set of patches for GCC 8 mitigates variant #2 of the speculative
>> > > execution vulnerabilities on x86 processors identified by CVE-2017-5715, 
>> > > aka
>> > > Spectre.  They convert indirect branches and function returns to call and
>> > > return thunks to avoid speculative execution via indirect call, jmp and 
>> > > ret.
>> > >
>> > > H.J. Lu (5):
>> > >   x86: Add -mindirect-branch=
>> > >   x86: Add -mfunction-return=
>> > >   x86: Add -mindirect-branch-register
>> > >   x86: Add 'V' register operand modifier
>> > >   x86: Disallow -mindirect-branch=/-mfunction-return= with
>> > > -mcmodel=large
>> >
>> > Current set of patches don't seem to have any option to generate "lfence" 
>> > as the loop filler in "retpoline",  which is required by AMD.
>> > Can you please clarify the plan. We would like to get this checked-in GCC 
>> > 8.
>>
>> Since thunks are output as strings, it is easy to add the option
>> on the top of patch #1 of the series.  I do not fully understand
>> the reason for choosing pause over lfence for Intel, but if we need
>> to do both, we need to have command line option (and possibly attribute).
>> What would be reasonable name for it?
>
> I forgot there is -mindirect-branch-loop for that in the original patchset.
> So for now we should be happy with having both lfence and pause in there
> or do we still need it?
>

I suggest we leave it out for the time being.


-- 
H.J.


[PATCH] C/C++: Add -Waddress-of-packed-member

2018-01-14 Thread H.J. Lu
When address of packed member of struct or union is taken, it may result
in an unaligned pointer value.  This patch adds -Waddress-of-packed-member
to warn it:

$ cat x.i
struct pair_t
{
  char c;
  int i;
} __attribute__ ((packed));

extern struct pair_t p;
int *addr = &p.i;
$ gcc -O2 -S x.i
x.i:8:13: warning: initialization of 'int *' from address of packed member of 
'struct pair_t' may result in an unaligned pointer value 
[-Waddress-of-packed-member]
 int *addr = &p.i;
 ^
$

This warning is enabled by default.

Tested on i686 and x86-64.  OK for trunk?

H.J.
---
gcc/c/

PR c/51628
* doc/invoke.texi: Document -Wno-address-of-packed-member.

gcc/c-family/

PR c/51628
* c-common.h (warn_for_address_of_packed_member): New.
* c-warn.c (warn_for_address_of_packed_member): New function.
* c.opt: Add -Wno-address-of-packed-member.

gcc/c/

PR c/51628
* c-typeck.c (convert_for_assignment): Call
warn_for_address_of_packed_member.  Issue an warning if address
of packed member is taken.

gcc/cp/

PR c/51628
* call.c (convert_for_arg_passing): Call
warn_for_address_of_packed_member.  Issue an warning if address
of packed member is taken.
* typeck.c (convert_for_assignment): Likewise.

gcc/testsuite/

PR c/51628
* c-c++-common/pr51628-1.c: New tests.
* c-c++-common/pr51628-2.c: Likewise.
* c-c++-common/pr51628-3.c: Likewise.
* c-c++-common/pr51628-4.c: Likewise.
* c-c++-common/pr51628-5.c: Likewise.
* c-c++-common/pr51628-6.c: Likewise.
* c-c++-common/ubsan/align-2.c: Add -Wno-address-of-packed-member.
* c-c++-common/ubsan/align-4.c: Likewise.
* c-c++-common/ubsan/align-6.c: Likewise.
* c-c++-common/ubsan/align-7.c: Likewise.
* c-c++-common/ubsan/align-8.c: Likewise.
* g++.dg/ubsan/align-2.C: Likewise.
---
 gcc/c-family/c-common.h|  1 +
 gcc/c-family/c-warn.c  | 38 +++
 gcc/c-family/c.opt |  4 +++
 gcc/c/c-typeck.c   | 40 -
 gcc/cp/call.c  |  8 ++
 gcc/cp/typeck.c| 41 ++
 gcc/doc/invoke.texi| 11 ++--
 gcc/testsuite/c-c++-common/pr51628-1.c | 29 +
 gcc/testsuite/c-c++-common/pr51628-2.c | 29 +
 gcc/testsuite/c-c++-common/pr51628-3.c | 35 +
 gcc/testsuite/c-c++-common/pr51628-4.c | 35 +
 gcc/testsuite/c-c++-common/pr51628-5.c | 35 +
 gcc/testsuite/c-c++-common/pr51628-6.c | 35 +
 gcc/testsuite/c-c++-common/ubsan/align-2.c |  2 +-
 gcc/testsuite/c-c++-common/ubsan/align-4.c |  2 +-
 gcc/testsuite/c-c++-common/ubsan/align-6.c |  2 +-
 gcc/testsuite/c-c++-common/ubsan/align-7.c |  2 +-
 gcc/testsuite/c-c++-common/ubsan/align-8.c |  2 +-
 gcc/testsuite/g++.dg/ubsan/align-2.C   |  2 +-
 19 files changed, 344 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr51628-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pr51628-2.c
 create mode 100644 gcc/testsuite/c-c++-common/pr51628-3.c
 create mode 100644 gcc/testsuite/c-c++-common/pr51628-4.c
 create mode 100644 gcc/testsuite/c-c++-common/pr51628-5.c
 create mode 100644 gcc/testsuite/c-c++-common/pr51628-6.c

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index d090881e95d..ef31e4d0aa2 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1384,6 +1384,7 @@ extern void c_do_switch_warnings (splay_tree, location_t, 
tree, tree, bool,
  bool);
 extern void warn_for_omitted_condop (location_t, tree);
 extern void warn_for_restrict (unsigned, tree *, unsigned);
+extern tree warn_for_address_of_packed_member (tree type, tree rhs);
 
 /* Places where an lvalue, or modifiable lvalue, may be required.
Used to select diagnostic messages in lvalue_error and
diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
index 7d87c455ec0..ac0a2ffcb42 100644
--- a/gcc/c-family/c-warn.c
+++ b/gcc/c-family/c-warn.c
@@ -2576,3 +2576,41 @@ warn_for_multistatement_macros (location_t body_loc, 
location_t next_loc,
 inform (guard_loc, "some parts of macro expansion are not guarded by "
"this %qs clause", guard_tinfo_to_string (keyword));
 }
+
+/* Return struct or union type if the right hand type, RHS, is the
+   address of packed member of struct or union when assigning to TYPE.
+   Otherwise, return NULL_TREE.  */
+
+tree
+warn_for_address_of_packed_member (tree type, tree rhs)
+{
+  if (!warn_address_of_packed_member)
+return NULL_TREE;
+
+  if (TREE_CODE (rhs) == ADDR_EXPR && POINTER_TYPE_P (type))
+{
+  tree base = TREE_OPER

[PATCH 0/3] GCC 7: i386: Move struct ix86_frame to machine_function

2018-01-14 Thread H.J. Lu
This patch set makes ix86_frame available to i386 code generation.  They
are needed to backport the patch set of -mindirect-branch= to mitigate
variant #2 of the speculative execution vulnerabilities on x86 processors
identified by CVE-2017-5715, aka Spectre.

H.J. Lu (2):
  i386: Move struct ix86_frame to machine_function
  i386: More use reference of struct ix86_frame to avoid copy

hjl (1):
  i386: Use reference of struct ix86_frame to avoid copy

 gcc/config/i386/i386.c | 73 +-
 gcc/config/i386/i386.h | 53 +++-
 2 files changed, 65 insertions(+), 61 deletions(-)

-- 
2.14.3



[PATCH 1/3] GCC 7: i386: Move struct ix86_frame to machine_function

2018-01-14 Thread H.J. Lu
Make ix86_frame available to i386 code generation.  This is needed to
backport the patch set of -mindirect-branch= to mitigate variant #2 of
the speculative execution vulnerabilities on x86 processors identified
by CVE-2017-5715, aka Spectre.

Backport from mainline
* config/i386/i386.c (ix86_frame): Moved to ...
* config/i386/i386.h (ix86_frame): Here.
(machine_function): Add frame.
* config/i386/i386.c (ix86_compute_frame_layout): Repace the
frame argument with &cfun->machine->frame.
(ix86_can_use_return_insn_p): Don't pass &frame to
ix86_compute_frame_layout.  Copy frame from cfun->machine->frame.
(ix86_can_eliminate): Likewise.
(ix86_expand_prologue): Likewise.
(ix86_expand_epilogue): Likewise.
(ix86_expand_split_stack_prologue): Likewise.
---
 gcc/config/i386/i386.c | 68 ++
 gcc/config/i386/i386.h | 53 ++-
 2 files changed, 65 insertions(+), 56 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8a3782c0298..813337242d8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2444,53 +2444,6 @@ struct GTY(()) stack_local_entry {
   struct stack_local_entry *next;
 };
 
-/* Structure describing stack frame layout.
-   Stack grows downward:
-
-   [arguments]
-   <- ARG_POINTER
-   saved pc
-
-   saved static chain  if ix86_static_chain_on_stack
-
-   saved frame pointer if frame_pointer_needed
-   <- HARD_FRAME_POINTER
-   [saved regs]
-   <- regs_save_offset
-   [padding0]
-
-   [saved SSE regs]
-   <- sse_regs_save_offset
-   [padding1]  |
-  |<- FRAME_POINTER
-   [va_arg registers]  |
-  |
-   [frame]|
-  |
-   [padding2] | = to_allocate
-   <- STACK_POINTER
-  */
-struct ix86_frame
-{
-  int nsseregs;
-  int nregs;
-  int va_arg_size;
-  int red_zone_size;
-  int outgoing_arguments_size;
-
-  /* The offsets relative to ARG_POINTER.  */
-  HOST_WIDE_INT frame_pointer_offset;
-  HOST_WIDE_INT hard_frame_pointer_offset;
-  HOST_WIDE_INT stack_pointer_offset;
-  HOST_WIDE_INT hfp_save_offset;
-  HOST_WIDE_INT reg_save_offset;
-  HOST_WIDE_INT sse_reg_save_offset;
-
-  /* When save_regs_using_mov is set, emit prologue using
- move instead of push instructions.  */
-  bool save_regs_using_mov;
-};
-
 /* Which cpu are we scheduling for.  */
 enum attr_cpu ix86_schedule;
 
@@ -2582,7 +2535,7 @@ static unsigned int ix86_function_arg_boundary 
(machine_mode,
const_tree);
 static rtx ix86_static_chain (const_tree, bool);
 static int ix86_function_regparm (const_tree, const_tree);
-static void ix86_compute_frame_layout (struct ix86_frame *);
+static void ix86_compute_frame_layout (void);
 static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode,
 rtx, rtx, int);
 static void ix86_add_new_builtins (HOST_WIDE_INT, HOST_WIDE_INT);
@@ -11903,7 +11856,8 @@ ix86_can_use_return_insn_p (void)
   if (crtl->args.pops_args && crtl->args.size >= 32768)
 return 0;
 
-  ix86_compute_frame_layout (&frame);
+  ix86_compute_frame_layout ();
+  frame = cfun->machine->frame;
   return (frame.stack_pointer_offset == UNITS_PER_WORD
  && (frame.nregs + frame.nsseregs) == 0);
 }
@@ -12389,8 +12343,8 @@ ix86_can_eliminate (const int from, const int to)
 HOST_WIDE_INT
 ix86_initial_elimination_offset (int from, int to)
 {
-  struct ix86_frame frame;
-  ix86_compute_frame_layout (&frame);
+  ix86_compute_frame_layout ();
+  struct ix86_frame frame = cfun->machine->frame;
 
   if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM)
 return frame.hard_frame_pointer_offset;
@@ -12429,8 +12383,9 @@ ix86_builtin_setjmp_frame_value (void)
 /* Fill structure ix86_frame about frame of currently computed function.  */
 
 static void
-ix86_compute_frame_layout (struct ix86_frame *frame)
+ix86_compute_frame_layout (void)
 {
+  struct ix86_frame *frame = &cfun->machine->frame;
   unsigned HOST_WIDE_INT stack_alignment_needed;
   HOST_WIDE_INT offset;
   unsigned HOST_WIDE_INT preferred_alignment;
@@ -13737,7 +13692,8 @@ ix86_expand_prologue (void)
   m->fs.sp_offset = INCOMING_FRAME_SP_OFFSET;
   m->fs.sp_valid = true;
 
-  ix86_compute_frame_layout (&frame);
+  ix86_compute_frame_layout ();
+  frame = m->frame;
 
   if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl))
 {
@@ -14405,7 +14361,8 @@ ix86_expand_epilogue (int style)
   bool using_drap;
 
   ix86_finalize_stack_realign_flags ();
-  ix86_compute_frame_layout (&frame);
+  ix86_compute_frame_layout ()

[PATCH 3/3] GCC 7: i386: More use reference of struct ix86_frame to avoid copy

2018-01-14 Thread H.J. Lu
When there is no need to make a copy of ix86_frame, we can use reference
of struct ix86_frame to avoid copy.

Backport from mainline
* config/i386/i386.c (ix86_expand_prologue): Use reference of
struct ix86_frame.
(ix86_expand_epilogue): Likewise.
---
 gcc/config/i386/i386.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 397ef7cac26..986e6d79584 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13667,7 +13667,6 @@ ix86_expand_prologue (void)
 {
   struct machine_function *m = cfun->machine;
   rtx insn, t;
-  struct ix86_frame frame;
   HOST_WIDE_INT allocate;
   bool int_registers_saved;
   bool sse_registers_saved;
@@ -13691,7 +13690,7 @@ ix86_expand_prologue (void)
   m->fs.sp_valid = true;
 
   ix86_compute_frame_layout ();
-  frame = m->frame;
+  struct ix86_frame &frame = cfun->machine->frame;
 
   if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl))
 {
@@ -14354,13 +14353,12 @@ ix86_expand_epilogue (int style)
 {
   struct machine_function *m = cfun->machine;
   struct machine_frame_state frame_state_save = m->fs;
-  struct ix86_frame frame;
   bool restore_regs_via_mov;
   bool using_drap;
 
   ix86_finalize_stack_realign_flags ();
   ix86_compute_frame_layout ();
-  frame = m->frame;
+  struct ix86_frame &frame = cfun->machine->frame;
 
   m->fs.sp_valid = (!frame_pointer_needed
|| (crtl->sp_is_unchanging
-- 
2.14.3



[PATCH 2/3] GCC 7: i386: Use reference of struct ix86_frame to avoid copy

2018-01-14 Thread H.J. Lu
From: hjl 

When there is no need to make a copy of ix86_frame, we can use reference
of struct ix86_frame to avoid copy.

Tested on x86-64.

Backport from mainline
* config/i386/i386.c (ix86_can_use_return_insn_p): Use reference
of struct ix86_frame.
(ix86_initial_elimination_offset): Likewise.
(ix86_expand_split_stack_prologue): Likewise.
---
 gcc/config/i386/i386.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 813337242d8..397ef7cac26 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11843,8 +11843,6 @@ symbolic_reference_mentioned_p (rtx op)
 bool
 ix86_can_use_return_insn_p (void)
 {
-  struct ix86_frame frame;
-
   /* Don't use `ret' instruction in interrupt handler.  */
   if (! reload_completed
   || frame_pointer_needed
@@ -11857,7 +11855,7 @@ ix86_can_use_return_insn_p (void)
 return 0;
 
   ix86_compute_frame_layout ();
-  frame = cfun->machine->frame;
+  struct ix86_frame &frame = cfun->machine->frame;
   return (frame.stack_pointer_offset == UNITS_PER_WORD
  && (frame.nregs + frame.nsseregs) == 0);
 }
@@ -12344,7 +12342,7 @@ HOST_WIDE_INT
 ix86_initial_elimination_offset (int from, int to)
 {
   ix86_compute_frame_layout ();
-  struct ix86_frame frame = cfun->machine->frame;
+  struct ix86_frame &frame = cfun->machine->frame;
 
   if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM)
 return frame.hard_frame_pointer_offset;
@@ -14860,7 +14858,6 @@ static GTY(()) rtx split_stack_fn_large;
 void
 ix86_expand_split_stack_prologue (void)
 {
-  struct ix86_frame frame;
   HOST_WIDE_INT allocate;
   unsigned HOST_WIDE_INT args_size;
   rtx_code_label *label;
@@ -14873,7 +14870,7 @@ ix86_expand_split_stack_prologue (void)
 
   ix86_finalize_stack_realign_flags ();
   ix86_compute_frame_layout ();
-  frame = cfun->machine->frame;
+  struct ix86_frame &frame = cfun->machine->frame;
   allocate = frame.stack_pointer_offset - INCOMING_FRAME_SP_OFFSET;
 
   /* This is the label we will branch to if we have enough stack
-- 
2.14.3



Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread H.J. Lu
Now my patch set has been checked into trunk.  Here is a patch set
to move struct ix86_frame to machine_function on GCC 7, which is
needed to backport the patch set to GCC 7:

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01239.html
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01240.html
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01241.html

OK for gcc-7-branch?

Thanks.


-- 
H.J.


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 4:52 AM, H.J. Lu  wrote:
> On Sun, Jan 14, 2018 at 3:06 AM, Jan Hubicka  wrote:
>>> On 2018.01.14 at 11:46 +0100, Jan Hubicka wrote:
>>> > > gcc/
>>> > >
>>> > >   * config/i386/i386-opts.h (indirect_branch): New.
>>> > >   * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
>>> > >   * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
>>> > >   with local indirect jump when converting indirect call and jump.
>>> > >   (ix86_set_indirect_branch_type): New.
>>> > >   (ix86_set_current_function): Call ix86_set_indirect_branch_type.
>>> > >   (indirectlabelno): New.
>>> > >   (indirect_thunk_needed): Likewise.
>>> > >   (indirect_thunk_bnd_needed): Likewise.
>>> > >   (indirect_thunks_used): Likewise.
>>> > >   (indirect_thunks_bnd_used): Likewise.
>>> > >   (INDIRECT_LABEL): Likewise.
>>> > >   (indirect_thunk_name): Likewise.
>>> > >   (output_indirect_thunk): Likewise.
>>> > >   (output_indirect_thunk_function): Likewise.
>>> > >   (ix86_output_indirect_branch): Likewise.
>>> > >   (ix86_output_indirect_jmp): Likewise.
>>> > >   (ix86_code_end): Call output_indirect_thunk_function if needed.
>>> > >   (ix86_output_call_insn): Call ix86_output_indirect_branch if
>>> > >   needed.
>>> > >   (ix86_handle_fndecl_attribute): Handle indirect_branch.
>>> > >   (ix86_attribute_table): Add indirect_branch.
>>> > >   * config/i386/i386.h (machine_function): Add indirect_branch_type
>>> > >   and has_local_indirect_jump.
>>> > >   * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
>>> > >   to true.
>>> > >   (tablejump): Likewise.
>>> > >   (*indirect_jump): Use ix86_output_indirect_jmp.
>>> > >   (*tablejump_1): Likewise.
>>> > >   (simple_return_indirect_internal): Likewise.
>>> > >   * config/i386/i386.opt (mindirect-branch=): New option.
>>> > >   (indirect_branch): New.
>>> > >   (keep): Likewise.
>>> > >   (thunk): Likewise.
>>> > >   (thunk-inline): Likewise.
>>> > >   (thunk-extern): Likewise.
>>> > >   * doc/extend.texi: Document indirect_branch function attribute.
>>> > >   * doc/invoke.texi: Document -mindirect-branch= option.
>>> > > +
>>> > > +  /* Pause .  */
>>> > > +  fprintf (asm_out_file, "\tpause\n");
>>> >
>>> > OK, but please prepare incremental patches to choose between pause and 
>>> > lefence
>>> > as needed for AMD CPUs and check for large code model.
>>>
>>> Why not use both? That would make everybody happy, no?
>>>
>>>  pause
>>>  lfence
>>>  jmp
>>
>> Yes, according to my understanding we want both pause and lefence by default
>> (it is expensive anyway) + command line option to control which one to use?
>>
>> Probably thus the first patch should default to both.
>
> Done.  This is the patch I am checking in.
>

Here is the backport for GCC 7.  OK for gcc-7-branch?

-- 
H.J.
From 09f7c546376f7ed6770fc64f24aed77229f95f67 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 6 Jan 2018 22:29:55 -0800
Subject: [PATCH 1/5] x86: Add -mindirect-branch=

Add -mindirect-branch= option to convert indirect call and jump to call
and return thunks.  The default is 'keep', which keeps indirect call and
jump unmodified.  'thunk' converts indirect call and jump to call and
return thunk.  'thunk-inline' converts indirect call and jump to inlined
call and return thunk.  'thunk-extern' converts indirect call and jump to
external call and return thunk provided in a separate object file.  You
can control this behavior for a specific function by using the function
attribute indirect_branch.

2 kinds of thunks are geneated.  Memory thunk where the function address
is at the top of the stack:

__x86_indirect_thunk:
	call L2
L1:
	pause
	lfence
	jmp L1
L2:
	lea 8(%rsp), %rsp|lea 4(%esp), %esp
	ret

Indirect jmp via memory, "jmp mem", is converted to

	push memory
	jmp __x86_indirect_thunk

Indirect call via memory, "call mem", is converted to

	jmp L2
L1:
	push [mem]
	jmp __x86_indirect_thunk
L2:
	call L1

Register thunk where the function address is in a register, reg:

__x86_indirect_thunk_reg:
	call	L2
L1:
	pause
	lfence
	jmp	L1
L2:
	movq	%reg, (%rsp)|movl%reg, (%esp)
	ret

where reg is one of (r|e)ax, (r|e)dx, (r|e)cx, (r|e)bx, (r|e)si, (r|e)di,
(r|e)bp, r8, r9, r10, r11, r12, r13, r14 and r15.

Indirect jmp via register, "jmp reg", is converted to

	jmp __x86_indirect_thunk_reg

Indirect call via register, "call reg", is converted to

	call __x86_indirect_thunk_reg

gcc/

	Backport from mainline
	* config/i386/i386-opts.h (indirect_branch): New.
	* config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
	* config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
	with local indirect jump when converting indirect call and jump.
	(ix86_set_indirect_branch_type): New.
	(ix86_set_current_function): Call ix86_set_indirect_branch_type.
	(indirectlabelno): New.
	(indirect_thunk_needed): Likewise.
	(indirect_thunk_bnd_needed): Likewise.
	(indirect_thunks_used): Likewise.
	(indirect_thunks_bnd_used): Likewise.
	(INDIRECT_LABEL): Likewise.
	

Re: [PATCH 2/5] x86: Add -mfunction-return=

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 4:54 AM, H.J. Lu  wrote:
> On Sun, Jan 14, 2018 at 3:31 AM, H.J. Lu  wrote:
>> On Sun, Jan 14, 2018 at 2:48 AM, Jan Hubicka  wrote:
 Add -mfunction-return= option to convert function return to call and
 return thunks.  The default is 'keep', which keeps function return
 unmodified.  'thunk' converts function return to call and return thunk.
 'thunk-inline' converts function return to inlined call and return thunk.
 'thunk-extern' converts function return to external call and return
 thunk provided in a separate object file.  You can control this behavior
 for a specific function by using the function attribute function_return.

 Function return thunk is the same as memory thunk for -mindirect-branch=
 where the return address is at the top of the stack:

 __x86_return_thunk:
   call L2
 L1:
   pause
   jmp L1
 L2:
   lea 8(%rsp), %rsp|lea 4(%esp), %esp
   ret

 and function return becomes

   jmp __x86_return_thunk

 -mindirect-branch= tests are updated with -mfunction-return=keep to
 avoid false test failures when -mfunction-return=thunk is added to
 RUNTESTFLAGS for "make check".

 gcc/

   * config/i386/i386-protos.h (ix86_output_function_return): New.
   * config/i386/i386.c (ix86_set_indirect_branch_type): Also
   set function_return_type.
   (indirect_thunk_name): Add ret_p to indicate thunk for function
   return.
   (output_indirect_thunk_function): Pass false to
   indirect_thunk_name.
   (ix86_output_indirect_branch): Likewise.
   (output_indirect_thunk_function): Create alias for function
   return thunk if regno < 0.
   (ix86_output_function_return): New function.
   (ix86_handle_fndecl_attribute): Handle function_return.
   (ix86_attribute_table): Add function_return.
   * config/i386/i386.h (machine_function): Add
   function_return_type.
   * config/i386/i386.md (simple_return_internal): Use
   ix86_output_function_return.
   (simple_return_internal_long): Likewise.
   * config/i386/i386.opt (mfunction-return=): New option.
   (indirect_branch): Mention -mfunction-return=.
   * doc/extend.texi: Document function_return function attribute.
   * doc/invoke.texi: Document -mfunction-return= option.
>>>
>>> The implementation is reasonable, but I still do not quite understand
>>> if we really need this change. Celarly killing every return is going
>>> to have significant impact and it is apparently not used by Linux kernel.
>>> What is the main motivation for it?
>>>
>>> Honza
>>
>> See:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00540.html
>>
>> I will check it in.
>>
>
> Here is the patch I am checking in.
>
>

Here is the backport for GCC 7.  OK for gcc-7-branch?

-- 
H.J.


Re: [PATCH 3/5] x86: Add -mindirect-branch-register

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 2:46 AM, Jan Hubicka  wrote:
>> Add -mindirect-branch-register to force indirect branch via register.
>> This is implemented by disabling patterns of indirect branch via memory,
>> similar to TARGET_X32.
>>
>> -mindirect-branch= and -mfunction-return= tests are updated with
>> -mno-indirect-branch-register to avoid false test failures when
>> -mindirect-branch-register is added to RUNTESTFLAGS for "make check".
>>
>> gcc/
>>
>>   * config/i386/constraints.md (Bs): Disallow memory operand for
>>   -mindirect-branch-register.
>>   (Bw): Likewise.
>>   * config/i386/predicates.md (indirect_branch_operand): Likewise.
>>   (GOT_memory_operand): Likewise.
>>   (call_insn_operand): Likewise.
>>   (sibcall_insn_operand): Likewise.
>>   (GOT32_symbol_operand): Likewise.
>>   * config/i386/i386.md (indirect_jump): Call convert_memory_address
>>   for -mindirect-branch-register.
>>   (tablejump): Likewise.
>>   (*sibcall_memory): Likewise.
>>   (*sibcall_value_memory): Likewise.
>>   Disallow peepholes of indirect call and jump via memory for
>>   -mindirect-branch-register.
>>   (*call_pop): Replace m with Bw.
>>   (*call_value_pop): Likewise.
>>   (*sibcall_pop_memory): Replace m with Bs.
>>   * config/i386/i386.opt (mindirect-branch-register): New option.
>>   * doc/invoke.texi: Document -mindirect-branch-register option.
>
> OK.
> Honza

Here is the backport for GCC 7.  OK for gcc-7-branch?

-- 
H.J.
From 139dd2c61a11430263f91030910e2b63a73a11e7 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 6 Jan 2018 22:29:56 -0800
Subject: [PATCH 3/5] x86: Add -mindirect-branch-register

Add -mindirect-branch-register to force indirect branch via register.
This is implemented by disabling patterns of indirect branch via memory,
similar to TARGET_X32.

-mindirect-branch= and -mfunction-return= tests are updated with
-mno-indirect-branch-register to avoid false test failures when
-mindirect-branch-register is added to RUNTESTFLAGS for "make check".

gcc/

	Backport from mainline
	* config/i386/constraints.md (Bs): Disallow memory operand for
	-mindirect-branch-register.
	(Bw): Likewise.
	* config/i386/predicates.md (indirect_branch_operand): Likewise.
	(GOT_memory_operand): Likewise.
	(call_insn_operand): Likewise.
	(sibcall_insn_operand): Likewise.
	(GOT32_symbol_operand): Likewise.
	* config/i386/i386.md (indirect_jump): Call convert_memory_address
	for -mindirect-branch-register.
	(tablejump): Likewise.
	(*sibcall_memory): Likewise.
	(*sibcall_value_memory): Likewise.
	Disallow peepholes of indirect call and jump via memory for
	-mindirect-branch-register.
	(*call_pop): Replace m with Bw.
	(*call_value_pop): Likewise.
	(*sibcall_pop_memory): Replace m with Bs.
	* config/i386/i386.opt (mindirect-branch-register): New option.
	* doc/invoke.texi: Document -mindirect-branch-register option.

gcc/testsuite/

	Backport from mainline
	* gcc.target/i386/indirect-thunk-1.c (dg-options): Add
	-mno-indirect-branch-register.
	* gcc.target/i386/indirect-thunk-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
	* gcc.target/i386/ret-thunk-10.c: Likewise.
	* gcc.target/i386/ret-thunk-11.c: Likewise.
	* gcc.target/i386/ret-thunk-12.c: Likewise.
	* gcc.target/i386/ret-thunk-13.c: Likewise.
	* gcc.target/i386/ret-thunk-14.c: Likewise.
	* gcc.target/i386/ret-thunk-15.c: Likewise.
	* g

Re: [PATCH 4/5] x86: Add 'V' register operand modifier

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 2:47 AM, Jan Hubicka  wrote:
>> Add 'V', a special modifier which prints the name of the full integer
>> register without '%'.  For
>>
>> extern void (*func_p) (void);
>>
>> void
>> foo (void)
>> {
>>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
>> }
>>
>> it generates:
>>
>> foo:
>>   movqfunc_p(%rip), %rax
>>   call__x86_indirect_thunk_rax
>>   ret
>>
>> gcc/
>>
>>   * config/i386/i386.c (print_reg): Print the name of the full
>>   integer register without '%'.
>>   (ix86_print_operand): Handle 'V'.
>>* doc/extend.texi: Document 'V' modifier.
>>
>> gcc/testsuite/
>>
>>   * gcc.target/i386/indirect-thunk-register-4.c: New test.
>
> OK.
> Honza

Here is the backport for GCC 7.  OK for gcc-7-branch?

-- 
H.J.
From 5e977dfedb93e764dc480c0e0674500590ef5604 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 6 Jan 2018 22:29:56 -0800
Subject: [PATCH 4/5] x86: Add 'V' register operand modifier

Add 'V', a special modifier which prints the name of the full integer
register without '%'.  For

extern void (*func_p) (void);

void
foo (void)
{
  asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
}

it generates:

foo:
	movq	func_p(%rip), %rax
	call	__x86_indirect_thunk_rax
	ret

gcc/

	Backport from mainline
	* config/i386/i386.c (print_reg): Print the name of the full
	integer register without '%'.
	(ix86_print_operand): Handle 'V'.
	 * doc/extend.texi: Document 'V' modifier.

gcc/testsuite/

	Backport from mainline
	* gcc.target/i386/indirect-thunk-register-4.c: New test.
---
 gcc/config/i386/i386.c| 13 -
 gcc/doc/extend.texi   |  3 +++
 gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c | 13 +
 3 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4bfe2fa8c1d..e32de13688a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -17925,6 +17925,7 @@ put_condition_code (enum rtx_code code, machine_mode mode, bool reverse,
If CODE is 'h', pretend the reg is the 'high' byte register.
If CODE is 'y', print "st(0)" instead of "st", if the reg is stack op.
If CODE is 'd', duplicate the operand for AVX instruction.
+   If CODE is 'V', print naked full integer register name without %.
  */
 
 void
@@ -17935,7 +17936,7 @@ print_reg (rtx x, int code, FILE *file)
   unsigned int regno;
   bool duplicated;
 
-  if (ASSEMBLER_DIALECT == ASM_ATT)
+  if (ASSEMBLER_DIALECT == ASM_ATT && code != 'V')
 putc ('%', file);
 
   if (x == pc_rtx)
@@ -17983,6 +17984,14 @@ print_reg (rtx x, int code, FILE *file)
   return;
 }
 
+  if (code == 'V')
+{
+  if (GENERAL_REGNO_P (regno))
+	msize = GET_MODE_SIZE (word_mode);
+  else
+	error ("'V' modifier on non-integer register");
+}
+
   duplicated = code == 'd' && TARGET_AVX;
 
   switch (msize)
@@ -18102,6 +18111,7 @@ print_reg (rtx x, int code, FILE *file)
& -- print some in-use local-dynamic symbol name.
H -- print a memory address offset by 8; used for sse high-parts
Y -- print condition for XOP pcom* instruction.
+   V -- print naked full integer register name without %.
+ -- print a branch hint as 'cs' or 'ds' prefix
; -- print a semicolon (after prefixes due to bug in older gas).
~ -- print "i" if TARGET_AVX2, "f" otherwise.
@@ -18326,6 +18336,7 @@ ix86_print_operand (FILE *file, rtx x, int code)
 	case 'X':
 	case 'P':
 	case 'p':
+	case 'V':
 	  break;
 
 	case 's':
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 46e0a3623a6..9db9e0e27e9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8778,6 +8778,9 @@ The table below shows the list of supported modifiers and their effects.
 @tab @code{2}
 @end multitable
 
+@code{V} is a special modifier which prints the name of the full integer
+register without @code{%}.
+
 @anchor{x86floatingpointasmoperands}
 @subsubsection x86 Floating-Point @code{asm} Operands
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c
new file mode 100644
index 000..f0cd9b75be8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=keep -fno-pic" } */
+
+extern void (*func_p) (void);
+
+void
+foo (void)
+{
+  asm("call __x86_indirect_thunk_%V0" : : "a" (func_p));
+}
+
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_eax" { target ia32 } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_rax" { target { ! ia32 } } } } */
-- 
2.14.3



Re: [PATCH 5/5] x86: Disallow -mindirect-branch=/-mfunction-return= with -mcmodel=large

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 4:57 AM, H.J. Lu  wrote:
> On Sun, Jan 14, 2018 at 2:47 AM, Jan Hubicka  wrote:
>>> Since the thunk function may not be reachable in large code model,
>>> -mcmodel=large is incompatible with -mindirect-branch=thunk,
>>> -mindirect-branch=thunk-extern, -mfunction-return=thunk and
>>> -mfunction-return=thunk-extern.  Issue an error when they are used with
>>> -mcmodel=large.
>>>
>>> gcc/
>>>
>>>   * config/i386/i386.c (ix86_set_indirect_branch_type): Disallow
>>>   -mcmodel=large with -mindirect-branch=thunk,
>>>   -mindirect-branch=thunk-extern, -mfunction-return=thunk and
>>>   -mfunction-return=thunk-extern.
>>>   * doc/invoke.texi: Document -mcmodel=large is incompatible with
>>>   -mindirect-branch=thunk, -mindirect-branch=thunk-extern,
>>>   -mfunction-return=thunk and -mfunction-return=thunk-extern.
>>>
>>> gcc/testsuite/
>>>
>>>   * gcc.target/i386/indirect-thunk-10.c: New test.
>>>   * gcc.target/i386/indirect-thunk-8.c: Likewise.
>>>   * gcc.target/i386/indirect-thunk-9.c: Likewise.
>>>   * gcc.target/i386/indirect-thunk-attr-10.c: Likewise.
>>>   * gcc.target/i386/indirect-thunk-attr-11.c: Likewise.
>>>   * gcc.target/i386/indirect-thunk-attr-9.c: Likewise.
>>>   * gcc.target/i386/ret-thunk-17.c: Likewise.
>>>   * gcc.target/i386/ret-thunk-18.c: Likewise.
>>>   * gcc.target/i386/ret-thunk-19.c: Likewise.
>>>   * gcc.target/i386/ret-thunk-20.c: Likewise.
>>>   * gcc.target/i386/ret-thunk-21.c: Likewise.
>>
>> OK.
>> Honza
>
> This is the patch I am checking in to only run tests on LP64 since
> -mcmodel=large doesn't work with -mx32.
>

Here is the backport for GCC 7.  OK for gcc-7-branch?

-- 
H.J.
From 13dce7cceef28026c4fc2e505d724526141fe4c1 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 13 Jan 2018 18:01:54 -0800
Subject: [PATCH 5/5] x86: Disallow -mindirect-branch=/-mfunction-return= with
 -mcmodel=large

Since the thunk function may not be reachable in large code model,
-mcmodel=large is incompatible with -mindirect-branch=thunk,
-mindirect-branch=thunk-extern, -mfunction-return=thunk and
-mfunction-return=thunk-extern.  Issue an error when they are used with
-mcmodel=large.

gcc/

	Backport from mainline
	* config/i386/i386.c (ix86_set_indirect_branch_type): Disallow
	-mcmodel=large with -mindirect-branch=thunk,
	-mindirect-branch=thunk-extern, -mfunction-return=thunk and
	-mfunction-return=thunk-extern.
	* doc/invoke.texi: Document -mcmodel=large is incompatible with
	-mindirect-branch=thunk, -mindirect-branch=thunk-extern,
	-mfunction-return=thunk and -mfunction-return=thunk-extern.

gcc/testsuite/

	Backport from mainline
	* gcc.target/i386/indirect-thunk-10.c: New test.
	* gcc.target/i386/indirect-thunk-8.c: Likewise.
	* gcc.target/i386/indirect-thunk-9.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-10.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-11.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-9.c: Likewise.
	* gcc.target/i386/ret-thunk-17.c: Likewise.
	* gcc.target/i386/ret-thunk-18.c: Likewise.
	* gcc.target/i386/ret-thunk-19.c: Likewise.
	* gcc.target/i386/ret-thunk-20.c: Likewise.
	* gcc.target/i386/ret-thunk-21.c: Likewise.
---
 gcc/config/i386/i386.c | 26 ++
 gcc/doc/invoke.texi| 11 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-10.c  |  7 ++
 gcc/testsuite/gcc.target/i386/indirect-thunk-8.c   |  7 ++
 gcc/testsuite/gcc.target/i386/indirect-thunk-9.c   |  7 ++
 .../gcc.target/i386/indirect-thunk-attr-10.c   |  9 
 .../gcc.target/i386/indirect-thunk-attr-11.c   |  9 
 .../gcc.target/i386/indirect-thunk-attr-9.c|  9 
 gcc/testsuite/gcc.target/i386/ret-thunk-17.c   |  7 ++
 gcc/testsuite/gcc.target/i386/ret-thunk-18.c   |  8 +++
 gcc/testsuite/gcc.target/i386/ret-thunk-19.c   |  8 +++
 gcc/testsuite/gcc.target/i386/ret-thunk-20.c   |  9 
 gcc/testsuite/gcc.target/i386/ret-thunk-21.c   |  9 
 13 files changed, 126 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-21.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e32de13688a..318a71840c9 100644
--- a/gcc/conf

Re: [PATCH 0/5] Make std::future::wait_* use std::chrono::steady_clock when required

2018-01-14 Thread Mike Crowe
Hi Torvald,

Thanks for reviewing this change.

On Saturday 13 January 2018 at 16:29:57 +0100, Torvald Riegel wrote:
> On Sun, 2018-01-07 at 20:55 +, Mike Crowe wrote:
> > This is a first attempt to make std::future::wait_until and
> > std::future::wait_for make correct use of
> > std::chrono::steady_clock/CLOCK_MONOTONIC. It also makes
> > std::future::wait_until react to changes to CLOCK_REALTIME during the
> > wait, but only when passed a std::chrono::system_clock time point.
> 
> I have comments on the design.
> 
> First, I don't think we should not change
> __atomic_futex_unsigned_base::_M_futex_wait_until, as there's a risk
> that we'll change behavior of existing applications that work as
> expected.

I assume you mean "I don't think we should change" or "I think we should
not change"... :-)

The only way I can see that behaviour will change for existing programs is
when the system clock changes (i.e. when someone calls settimeofday.) In
the existing code, the maximum wait time is fixed once gettimeofday is
called to calculate the relative timeout. When using FUTEX_CLOCK_REALTIME,
the maximum wait can change based on changes to the system clock after that
point. It appears that glibc made this transition successfully and
currently uses FUTEX_CLOCK_REALTIME. I think that the new behaviour is
better than the old behaviour.

Or perhaps I've missed another possibility. Did you have another risk in
mind?

> Instead, ISTM we should additionally expose the two options we have at
> the level of futexes:
> * Relative timeout using CLOCK_MONOTONIC
> * Absolute timeout using CLOCK_REALTIME (which will fall back to the
> former on old kernels, which is fine I think).
> 
> Then we do the following translations from functions that programs would
> call to the new futex functions:
> 
> 1) wait_for is a loop in which we load the current time from the steady
> clock, then call the relative futex wait, and if that returns for a
> spurious reason (ie, neither timeout nor is the expected value present),
> we reduce the prior relative amount by the difference between the time
> before the futex wait and the current time.

If we're going to loop on a relative timeout it sounds safer to convert it
to an absolute (steady clock) timeout. That way we won't risk increasing
the timeout if the scheduler decides not to run us at an inopportune moment
between waits. _M_load_when_equal_for already does this.

_M_load_and_test_until already has a loop for spurious wakeup. I think that
it makes sense to only loop at one level. That loop relies on the timeout
being absolute, which is why my _M_load_and_test_until_steady also uses an
absolute timeout.

> 2) wait_until using the steady clock is a loop similar to wait_for, just
> that we additionally compute the initial relative timeout.

Clearly an absolute wait can be implemented in terms of a relative one and
vice-versa, but at least in my attempts to write them I find the code
easier to understand (and therefore get right) if the fundamental wait is
the absolute one and the relative one is implemented on top of it.

> 3) wait_until using the system clock is a loop that uses
> absolute-timeout futex wait.
> 
> 4) For wait_until using an unknown clock, I'd say that synching to the
> system clock is the right approach.  Using wait_until indicates that the
> programmer wanted to have a point in time, not a duration.

For my embedded and desktop point of view, the system clock should not be
trusted, can suddenly change in any direction and doesn't necessarily help
identify a point in real time. If we assume that the non-standard clock is
advancing steadily too, then steady_clock is a better match than
system_clock.

If you have a machine that has its system clock locked with PTP to an
atomic clock then you might think the opposite. However, even in that
situation you're reliant on steady_clock being reliable enough for short
periods of time anyway, because that shares the same local clock as
system_time.

> Does this work for you?

Not yet, but maybe there's parts that I don't fully understand the
reasoning behind.

> If so, could you provide a revised patch that uses this approach and
> includes this approach in the documentation?
> (Sorry for the lack of comments in the current code).

I'm definitely willing to improve the current code rather than just add to
it.

Mike.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Uros Bizjak
On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak  wrote:
> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu  wrote:
>
>> Hi Uros,
>>
>> Can you take a look at my x86 backend changes so that they are ready
>> to check in once we have consensus.
>
> Please finish the talks about the correct approach first. Once the
> consensus is reached, please post the final version of the patches for
> review.
>
> BTW: I have no detailed insight in these issues, so I'll look mostly
> at the implementation details, probably early next week.

One general remark is on the usage of -1 as an invalid register
number. We have INVALID_REGNUM definition for this, and many tests,
like:

  if (regno >= 0)

could become much more informative:

  if (regno != INVALID_REGNUM)

Uros.


[committed] Small i?86/x86_64-linux config.gcc extra_headers fixes

2018-01-14 Thread Jakub Jelinek
Hi!

I've noticed that i?86-*-* has 2 gfniintrin.h entries in extra_headers
and x86_64-*-* even 3, one should be enough.  Fixed thusly, committed
to trunk as obvious.

2018-01-14  Jakub Jelinek  

* config.gcc (i[34567]86-*-*): Remove one duplicate gfniintrin.h
entry from extra_headers.
(x86_64-*-*): Remove two duplicate gfniintrin.h entries from
extra_headers, make the list bitwise identical to the i?86-*-* one.

--- gcc/config.gcc  (revision 25)
+++ gcc/config.gcc  (revision 256667)
@@ -381,9 +381,8 @@ i[34567]86-*-*)
   clzerointrin.h pkuintrin.h sgxintrin.h cetintrin.h
   gfniintrin.h cet.h avx512vbmi2intrin.h
   avx512vbmi2vlintrin.h avx512vnniintrin.h
-  avx512vnnivlintrin.h gfniintrin.h vaesintrin.h
-  vpclmulqdqintrin.h avx512vpopcntdqvlintrin.h
-  avx512bitalgintrin.h"
+  avx512vnnivlintrin.h vaesintrin.h vpclmulqdqintrin.h
+  avx512vpopcntdqvlintrin.h avx512bitalgintrin.h"
;;
 x86_64-*-*)
cpu_type=i386
@@ -396,7 +395,7 @@ x86_64-*-*)
   nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
   immintrin.h x86intrin.h avxintrin.h xopintrin.h
   ia32intrin.h cross-stdarg.h lwpintrin.h popcntintrin.h
-  lzcntintrin.h bmiintrin.h tbmintrin.h bmi2intrin.h
+  lzcntintrin.h bmiintrin.h bmi2intrin.h tbmintrin.h
   avx2intrin.h avx512fintrin.h fmaintrin.h f16cintrin.h
   rtmintrin.h xtestintrin.h rdseedintrin.h prfchwintrin.h
   adxintrin.h fxsrintrin.h xsaveintrin.h xsaveoptintrin.h
@@ -410,8 +409,7 @@ x86_64-*-*)
   clzerointrin.h pkuintrin.h sgxintrin.h cetintrin.h
   gfniintrin.h cet.h avx512vbmi2intrin.h
   avx512vbmi2vlintrin.h avx512vnniintrin.h
-  avx512vnnivlintrin.h gfniintrin.h vaesintrin.h
-  vpclmulqdqintrin.h gfniintrin.h
+  avx512vnnivlintrin.h vaesintrin.h vpclmulqdqintrin.h
   avx512vpopcntdqvlintrin.h avx512bitalgintrin.h"
;;
 ia64-*-*)

Jakub


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak  wrote:
> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak  wrote:
>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu  wrote:
>>
>>> Hi Uros,
>>>
>>> Can you take a look at my x86 backend changes so that they are ready
>>> to check in once we have consensus.
>>
>> Please finish the talks about the correct approach first. Once the
>> consensus is reached, please post the final version of the patches for
>> review.
>>
>> BTW: I have no detailed insight in these issues, so I'll look mostly
>> at the implementation details, probably early next week.
>
> One general remark is on the usage of -1 as an invalid register

This has been rewritten.  The checked in patch no longer does that.

> number. We have INVALID_REGNUM definition for this, and many tests,
> like:
>
>   if (regno >= 0)
>
> could become much more informative:
>
>   if (regno != INVALID_REGNUM)



-- 
H.J.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Uros Bizjak
On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu  wrote:
> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak  wrote:
>> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak  wrote:
>>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu  wrote:
>>>
 Hi Uros,

 Can you take a look at my x86 backend changes so that they are ready
 to check in once we have consensus.
>>>
>>> Please finish the talks about the correct approach first. Once the
>>> consensus is reached, please post the final version of the patches for
>>> review.
>>>
>>> BTW: I have no detailed insight in these issues, so I'll look mostly
>>> at the implementation details, probably early next week.
>>
>> One general remark is on the usage of -1 as an invalid register
>
> This has been rewritten.  The checked in patch no longer does that.

Another issue:

+static void
+indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
+{
+  if (USE_HIDDEN_LINKONCE)
+{
+  const char *bnd = need_bnd_p ? "_bnd" : "";
+  if (regno >= 0)
+{
+  const char *reg_prefix;
+  if (LEGACY_INT_REGNO_P (regno))
+reg_prefix = TARGET_64BIT ? "r" : "e";
+  else
+reg_prefix = "";
+  sprintf (name, "__x86_indirect_thunk%s_%s%s",
+   bnd, reg_prefix, reg_names[regno]);
+}
+  else
+sprintf (name, "__x86_indirect_thunk%s", bnd);
+}

What is the benefit of reg_prefix? Can't we just live with e.g.:

__x86_indirect_thunk_ax

which is the true register name and is valid for 32bit and 64bit targets.

Uros.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 8:39 AM, Uros Bizjak  wrote:
> On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu  wrote:
>> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak  wrote:
>>> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak  wrote:
 On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu  wrote:

> Hi Uros,
>
> Can you take a look at my x86 backend changes so that they are ready
> to check in once we have consensus.

 Please finish the talks about the correct approach first. Once the
 consensus is reached, please post the final version of the patches for
 review.

 BTW: I have no detailed insight in these issues, so I'll look mostly
 at the implementation details, probably early next week.
>>>
>>> One general remark is on the usage of -1 as an invalid register
>>
>> This has been rewritten.  The checked in patch no longer does that.
>
> Another issue:
>
> +static void
> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
> +{
> +  if (USE_HIDDEN_LINKONCE)
> +{
> +  const char *bnd = need_bnd_p ? "_bnd" : "";
> +  if (regno >= 0)
> +{
> +  const char *reg_prefix;
> +  if (LEGACY_INT_REGNO_P (regno))
> +reg_prefix = TARGET_64BIT ? "r" : "e";
> +  else
> +reg_prefix = "";
> +  sprintf (name, "__x86_indirect_thunk%s_%s%s",
> +   bnd, reg_prefix, reg_names[regno]);
> +}
> +  else
> +sprintf (name, "__x86_indirect_thunk%s", bnd);
> +}
>
> What is the benefit of reg_prefix? Can't we just live with e.g.:
>
> __x86_indirect_thunk_ax
>
> which is the true register name and is valid for 32bit and 64bit targets.

They are used in asm statements in kernel:

extern void (*func_p) (void);

void
foo (void)
{
  asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
}

it generates:

foo:
movq func_p(%rip), %rax
call __x86_indirect_thunk_rax
ret


-- 
H.J.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Uros Bizjak
On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu  wrote:
> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak  wrote:
>> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak  wrote:
>>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu  wrote:
>>>
 Hi Uros,

 Can you take a look at my x86 backend changes so that they are ready
 to check in once we have consensus.
>>>
>>> Please finish the talks about the correct approach first. Once the
>>> consensus is reached, please post the final version of the patches for
>>> review.
>>>
>>> BTW: I have no detailed insight in these issues, so I'll look mostly
>>> at the implementation details, probably early next week.
>>
>> One general remark is on the usage of -1 as an invalid register
>
> This has been rewritten.  The checked in patch no longer does that.

I'm looking directly into current indirect_thunk_name,
output_indirect_thunk and output_indirect_thunk_function functions in
i386.c which have plenty of the mentioned checks.

Uros.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Jakub Jelinek
On Sun, Jan 14, 2018 at 08:41:54AM -0800, H.J. Lu wrote:
> They are used in asm statements in kernel:
> 
> extern void (*func_p) (void);
> 
> void
> foo (void)
> {
>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));

Well, using it just with a single register classes wouldn't make much sense,
then you can just use "call __x86_indirect_thunk_rax"
or "call __x86_indirect_thunk_eax" depending on __x86_64__, you wouldn't
need to extend anything.
But supposedly if you use it with "r" or "q" or similar class this will be
different.

Jakub


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Uros Bizjak
On Sun, Jan 14, 2018 at 5:41 PM, H.J. Lu  wrote:
> On Sun, Jan 14, 2018 at 8:39 AM, Uros Bizjak  wrote:
>> On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu  wrote:
>>> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak  wrote:
 On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak  wrote:
> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu  wrote:
>
>> Hi Uros,
>>
>> Can you take a look at my x86 backend changes so that they are ready
>> to check in once we have consensus.
>
> Please finish the talks about the correct approach first. Once the
> consensus is reached, please post the final version of the patches for
> review.
>
> BTW: I have no detailed insight in these issues, so I'll look mostly
> at the implementation details, probably early next week.

 One general remark is on the usage of -1 as an invalid register
>>>
>>> This has been rewritten.  The checked in patch no longer does that.
>>
>> Another issue:
>>
>> +static void
>> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
>> +{
>> +  if (USE_HIDDEN_LINKONCE)
>> +{
>> +  const char *bnd = need_bnd_p ? "_bnd" : "";
>> +  if (regno >= 0)
>> +{
>> +  const char *reg_prefix;
>> +  if (LEGACY_INT_REGNO_P (regno))
>> +reg_prefix = TARGET_64BIT ? "r" : "e";
>> +  else
>> +reg_prefix = "";
>> +  sprintf (name, "__x86_indirect_thunk%s_%s%s",
>> +   bnd, reg_prefix, reg_names[regno]);
>> +}
>> +  else
>> +sprintf (name, "__x86_indirect_thunk%s", bnd);
>> +}
>>
>> What is the benefit of reg_prefix? Can't we just live with e.g.:
>>
>> __x86_indirect_thunk_ax
>>
>> which is the true register name and is valid for 32bit and 64bit targets.
>
> They are used in asm statements in kernel:
>
> extern void (*func_p) (void);
>
> void
> foo (void)
> {
>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
> }
>
> it generates:
>
> foo:
> movq func_p(%rip), %rax
> call __x86_indirect_thunk_rax
> ret

Please fix %V to output reg_name instead.

Uros.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 8:48 AM, Uros Bizjak  wrote:
> On Sun, Jan 14, 2018 at 5:41 PM, H.J. Lu  wrote:
>> On Sun, Jan 14, 2018 at 8:39 AM, Uros Bizjak  wrote:
>>> On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu  wrote:
 On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak  wrote:
> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak  wrote:
>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu  wrote:
>>
>>> Hi Uros,
>>>
>>> Can you take a look at my x86 backend changes so that they are ready
>>> to check in once we have consensus.
>>
>> Please finish the talks about the correct approach first. Once the
>> consensus is reached, please post the final version of the patches for
>> review.
>>
>> BTW: I have no detailed insight in these issues, so I'll look mostly
>> at the implementation details, probably early next week.
>
> One general remark is on the usage of -1 as an invalid register

 This has been rewritten.  The checked in patch no longer does that.
>>>
>>> Another issue:
>>>
>>> +static void
>>> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
>>> +{
>>> +  if (USE_HIDDEN_LINKONCE)
>>> +{
>>> +  const char *bnd = need_bnd_p ? "_bnd" : "";
>>> +  if (regno >= 0)
>>> +{
>>> +  const char *reg_prefix;
>>> +  if (LEGACY_INT_REGNO_P (regno))
>>> +reg_prefix = TARGET_64BIT ? "r" : "e";
>>> +  else
>>> +reg_prefix = "";
>>> +  sprintf (name, "__x86_indirect_thunk%s_%s%s",
>>> +   bnd, reg_prefix, reg_names[regno]);
>>> +}
>>> +  else
>>> +sprintf (name, "__x86_indirect_thunk%s", bnd);
>>> +}
>>>
>>> What is the benefit of reg_prefix? Can't we just live with e.g.:
>>>
>>> __x86_indirect_thunk_ax
>>>
>>> which is the true register name and is valid for 32bit and 64bit targets.
>>
>> They are used in asm statements in kernel:
>>
>> extern void (*func_p) (void);
>>
>> void
>> foo (void)
>> {
>>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
>> }
>>
>> it generates:
>>
>> foo:
>> movq func_p(%rip), %rax
>> call __x86_indirect_thunk_rax
>> ret
>
> Please fix %V to output reg_name instead.
>

David, please comment.

-- 
H.J.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 8:45 AM, Jakub Jelinek  wrote:
> On Sun, Jan 14, 2018 at 08:41:54AM -0800, H.J. Lu wrote:
>> They are used in asm statements in kernel:
>>
>> extern void (*func_p) (void);
>>
>> void
>> foo (void)
>> {
>>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
>
> Well, using it just with a single register classes wouldn't make much sense,
> then you can just use "call __x86_indirect_thunk_rax"
> or "call __x86_indirect_thunk_eax" depending on __x86_64__, you wouldn't
> need to extend anything.
> But supposedly if you use it with "r" or "q" or similar class this will be
> different.
>

I believe "r" is allowed.


-- 
H.J.


Re: [RFC][PATCH] PR preprocessor/83173: Additional check before decrementing highest_location

2018-01-14 Thread Mike Gulick


On 01/12/2018 06:16 PM, David Malcolm wrote:
> On Fri, 2017-12-01 at 17:57 -0500, Mike Gulick wrote:
>> I've come up with some patches that fix PR preprocessor/83173, which
>> I reported
>> a couple of weeks ago.
>>
>> The first patch is a test case.  The second and third patches are two
>> versions
>> of the fix.  The first version is simpler, but it may still leave in
>> place some
>> subtle incorrect behavior that happens when the current source
>> location is less
>> than LINE_MAP_MAX_COLUMN_NUMBER.  The second version tries to handle
>> that case
>> as well, however I'm less comfortable with it as I don't know whether
>> I'm
>> computing the source_location of the *end* of the current line
>> correctly in all
>> cases.  Both of these pass the gcc/g++ test suites with no
>> regressions.
>>
>> Thanks in advance for the review/feedback!
>>
>> -Mike
> 
> Hi Mike; sorry about the delay in reviewing this.
> 
> Do you have the gcc contributor paperwork in place?

Hi Dave.  I don't have any gcc contributor paperwork in place, however I just
finished that process for a gdb patch so I think it could be done pretty
quickly.

>> From 6ff0068284c346c8db08c4b6b4d9a66d8464aeac Mon Sep 17 00:00:00 2001
>> From: Mike Gulick 
>> Date: Thu, 30 Nov 2017 18:35:48 -0500
>> Subject: [PATCH 1/2] PR preprocessor/83173: New test
>>
>> 2017-12-01  Mike Gulick  
>>
>>  PR preprocessor/83173
>>  * gcc.dg/plugin/pr83173.c: New test.
>>  * gcc.dg/plugin/pr83173.h: Header for pr83173.c
>>  * gcc.dg/plugin/pr83173-1.h: Header for pr83173.c
>>  * gcc.dg/plugin/pr83173-2.h: Header for pr83173.c
>>  * gcc.dg/plugin/location_overflow_pp_plugin.c: New plugin to
>>  override line_table->highest_location for preprocessor.
>> ---
>>  .../gcc.dg/plugin/location_overflow_pp_plugin.c| 44 
>> ++
>>  gcc/testsuite/gcc.dg/plugin/plugin.exp |  1 +
>>  gcc/testsuite/gcc.dg/plugin/pr83173-1.h|  2 +
>>  gcc/testsuite/gcc.dg/plugin/pr83173-2.h|  2 +
>>  gcc/testsuite/gcc.dg/plugin/pr83173.c  | 21 +++
>>  gcc/testsuite/gcc.dg/plugin/pr83173.h  |  2 +
>>  6 files changed, 72 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/plugin/location_overflow_pp_plugin.c
>>  create mode 100644 gcc/testsuite/gcc.dg/plugin/pr83173-1.h
>>  create mode 100644 gcc/testsuite/gcc.dg/plugin/pr83173-2.h
>>  create mode 100644 gcc/testsuite/gcc.dg/plugin/pr83173.c
>>  create mode 100644 gcc/testsuite/gcc.dg/plugin/pr83173.h
>>
>> diff --git a/gcc/testsuite/gcc.dg/plugin/location_overflow_pp_plugin.c 
>> b/gcc/testsuite/gcc.dg/plugin/location_overflow_pp_plugin.c
>> new file mode 100644
>> index 000..ba5a795b937
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/plugin/location_overflow_pp_plugin.c
>> @@ -0,0 +1,44 @@
>> +/* Plugin for testing how gracefully we degrade in the face of very
>> +   large source files.  */
>> +
>> +#include "config.h"
>> +#include "gcc-plugin.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "diagnostic.h"
>> +
>> +int plugin_is_GPL_compatible;
>> +
>> +static location_t base_location;
>> +
>> +/* Callback handler for the PLUGIN_PRAGMAS event.  This is used to set the
>> +   initial line table offset for the preprocessor, to make it appear as if 
>> we
>> +   had parsed a very large file.  PRAGMA_START_UNIT is not suitable here as 
>> is
> 
> PLUGIN_START_UNIT, presumably?
>

Sorry, yes that is right.

>> +   not invoked during the preprocessor stage.  */
> 
> This new test plugin seems almost identical to the existing
> location_overflow_plugin.c.

Yes, I was originally hoping to use location_overflow_plugin.c until I realized
that PLUGIN_START_UNIT event was not triggered during the preprocessor.
Additionally, location_overflow_plugin.c has some restrictions that allow only a
couple of specific initial offsets to be accepted, although I don't see any
reason why that couldn't be changed.

> 
> I tested changing the existing plugin to use PLUGIN_PRAGMAS rather than
> PLUGIN_START_UNIT, and it works fine for that event, so if that's the
> only difference, then maybe we don't need this new plugin?
>

I imagine it would work, although it does seem like PLUGIN_PRAGMAS is being used
for something other than its intended purpose here, which I was trying to keep
to a minimum.  Its been a while since I wrote this patch, and I can't recall
whether I noticed any other side-effects of using PLUGIN_PRAGMAS that could
impact the existing location overflow tests.  I'll take a look at it again and
update the patch or report back any concerns.

> [...snip...]
> 
>> diff --git a/gcc/testsuite/gcc.dg/plugin/pr83173.c 
>> b/gcc/testsuite/gcc.dg/plugin/pr83173.c
>> new file mode 100644
>> index 000..ff1858a2b33
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/plugin/pr83173.c
>> @@ -0,0 +1,21 @@
>> +/*
>> +  { dg-options "-fplugin-arg-location_overflow_pp_plugin-value=0x6001" }
> 
> This hardcodes a location

ia64: Update libstdc++ baseline_symbols.txt

2018-01-14 Thread Andreas Schwab
Installed as obvious.

Andreas.

PR libstdc++/81092
* config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Update.

diff --git a/libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt 
b/libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt
index 0b166d5e2a..0411f410f6 100644
--- a/libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt
+++ b/libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt
@@ -444,7 +444,7 @@ 
FUNC:_ZNKSt13basic_fstreamIwSt11char_traitsIwEE7is_openEv@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6gcountEv@@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_ostreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4
-FUNC:_ZNKSt13random_device13_M_getentropyEv@@GLIBCXX_3.4.24
+FUNC:_ZNKSt13random_device13_M_getentropyEv@@GLIBCXX_3.4.25
 FUNC:_ZNKSt13runtime_error4whatEv@@GLIBCXX_3.4
 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE5rdbufEv@@GLIBCXX_3.4
 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE7is_openEv@@GLIBCXX_3.4.5
@@ -4006,6 +4006,7 @@ OBJECT:0:GLIBCXX_3.4.21
 OBJECT:0:GLIBCXX_3.4.22
 OBJECT:0:GLIBCXX_3.4.23
 OBJECT:0:GLIBCXX_3.4.24
+OBJECT:0:GLIBCXX_3.4.25
 OBJECT:0:GLIBCXX_3.4.3
 OBJECT:0:GLIBCXX_3.4.4
 OBJECT:0:GLIBCXX_3.4.5
-- 
2.15.1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


[PATCH, rs6000] Executable tests for -msafe-indirect-jumps

2018-01-14 Thread Bill Schmidt
Hi,

It was pointed out off-list that I should add some executable tests for
the new -msafe-indirect-jumps implementation.  This patch adds three
such tests to demonstrate correct behavior.

Tested on powerpc64-linux-gnu and powerpc64le-linux-gnu.  Are these tests
okay for trunk after the other patch is approved?

Thanks,
Bill


2018-01-14  Bill Schmidt  

* gcc.target/powerpc/safe-indirect-jump-4.c: New file.
* gcc.target/powerpc/safe-indirect-jump-5.c: New file.
* gcc.target/powerpc/safe-indirect-jump-6.c: New file.


Index: gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-4.c
===
--- gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-4.c (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-4.c (working copy)
@@ -0,0 +1,25 @@
+/* { dg-do run { target { powerpc64le-*-* } } } */
+/* { dg-additional-options "-msafe-indirect-jumps" } */
+
+/* Test for deliberate misprediction of indirect calls for ELFv2.  */
+
+int (*f)();
+
+int __attribute__((noinline)) bar ()
+{
+  return (*f) ();
+}
+
+int g ()
+{
+  return 26;
+}
+
+int main ()
+{
+  f = &g;
+  if (bar () != 26)
+__builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-5.c
===
--- gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-5.c (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-5.c (working copy)
@@ -0,0 +1,55 @@
+/* { dg-do run { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msafe-indirect-jumps -Wno-pedantic" } */
+
+/* Test for deliberate misprediction of computed goto.  */
+
+int __attribute__((noinline)) bar (int i)
+{
+  return 1960 + i;
+}
+
+int __attribute__((noinline)) baz (int i)
+{
+  return i * i;
+}
+
+int __attribute__((noinline)) spaz (int i)
+{
+  return i + 1;
+}
+
+int foo (int x)
+{
+  static void *labptr[] = { &&lab0, &&lab1, &&lab2 };
+
+  if (x < 0 || x > 2)
+return -1;
+
+  goto *labptr[x];
+
+ lab0:
+  return bar (x);
+
+ lab1:
+  return baz (x) + 1;
+
+ lab2:
+  return spaz (x) / 2;
+}
+
+int main ()
+{
+  if (foo (0) != 1960)
+__builtin_abort ();
+
+  if (foo (1) != 2)
+__builtin_abort ();
+
+  if (foo (2) != 1)
+__builtin_abort ();
+
+  if (foo (3) != -1)
+__builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-6.c
===
--- gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-6.c (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-6.c (working copy)
@@ -0,0 +1,80 @@
+/* { dg-do run { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msafe-indirect-jumps" } */
+
+/* Test for deliberate misprediction of jump tables.  */
+
+void __attribute__((noinline)) bar ()
+{
+}
+
+int foo (int x)
+{
+  int a;
+  
+  switch (x)
+{
+default:
+  a = -1;
+  break;
+case 0:
+  a = x * x + 3;
+  break;
+case 1:
+  a = x + 1;
+  break;
+case 2:
+  a = x + x;
+  break;
+case 3:
+  a = x << 3;
+  break;
+case 4:
+  a = x >> 1;
+  break;
+case 5:
+  a = x;
+  break;
+case 6:
+  a = 0;
+  break;
+case 7:
+  a = x * x + x;
+  break;
+}
+
+  bar();
+
+  return a;
+}
+
+int main ()
+{
+  if (foo (0) != 3)
+__builtin_abort ();
+  
+  if (foo (1) != 2)
+__builtin_abort ();
+  
+  if (foo (2) != 4)
+__builtin_abort ();
+  
+  if (foo (3) != 24)
+__builtin_abort ();
+  
+  if (foo (4) != 2)
+__builtin_abort ();
+  
+  if (foo (5) != 5)
+__builtin_abort ();
+  
+  if (foo (6) != 0)
+__builtin_abort ();
+  
+  if (foo (7) != 56)
+__builtin_abort ();
+  
+  if (foo (8) != -1)
+__builtin_abort ();
+  
+  return 0;
+}



[patch, committed] PR83811 fortran 'e' format broken for single digit exponents

2018-01-14 Thread Jerry DeLisle
Hello all,

I committed the following as trivial.

Regression tested on x86_64-pc-linux-gnu.

This is a regression on 7 so I will backport.

Regards,

Jerry

2018-01-18  Jerry DeLisle  

PR libgfortran/83811
* write.c (select_buffer): Adjust buffer size up by 1.

diff --git a/libgfortran/io/write.c b/libgfortran/io/write.c
index 8021a1e9c4b..28ea852867b 100644
--- a/libgfortran/io/write.c
+++ b/libgfortran/io/write.c
@@ -1519,8 +1519,9 @@ select_buffer (st_parameter_dt *dtp, const fnode *f, int
precision,
 {
   char *result;

-  /* The buffer needs at least one more byte to allow room for normalizing.  */
-  *size = size_from_kind (dtp, f, kind) + precision + 1;
+  /* The buffer needs at least one more byte to allow room for
+ normalizing and 1 to hold null terminator.  */
+  *size = size_from_kind (dtp, f, kind) + precision + 1 + 1;

   if (*size > BUF_STACK_SZ)
  result = xmalloc (*size);

New test case:

! { dg-do run }
! PR83811 fortran 'e' format broken for single digit exponents
program test
  character(25) :: s
  write(s, '(1pe5.0e1)') 1.e-4
  if (s.ne."1.E-4") call abort
  write(s, '(e5.1e1)') 1.e12
  if (s.ne."*") call abort
end



Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Woodhouse, David
This won't make the list; I'll send a more coherent and less HTML-afflicted 
version later.

The bare 'ax' naming made it painful to instantiate the external thunks for 
32-bit and 64-bot code because we had to put the e/r back again inside the .irp 
reg ax bx... code.

We could probably have lived with that but it would be painful to change now 
that Linux and Xen patches with the current ABI are all lined up. I appreciate 
they weren't in GCC yet so we get little sympathy but these are strange times 
and we had to move fast.

I'd really like *not* to change it now. Having the thunk name actually include 
the name of the register it's using does seem nicer anyway...

On 14 Jan 2018 17:58, "H.J. Lu"  wrote:

On Sun, Jan 14, 2018 at 8:45 AM, Jakub Jelinek  wrote:
> On Sun, Jan 14, 2018 at 08:41:54AM -0800, H.J. Lu wrote:
>> They are used in asm statements in kernel:
>>
>> extern void (*func_p) (void);
>>
>> void
>> foo (void)
>> {
>>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
>
> Well, using it just with a single register classes wouldn't make much sense,
> then you can just use "call __x86_indirect_thunk_rax"
> or "call __x86_indirect_thunk_eax" depending on __x86_64__, you wouldn't
> need to extend anything.
> But supposedly if you use it with "r" or "q" or similar class this will be
> different.
>

I believe "r" is allowed.


--
H.J.




Amazon Web Services UK Limited. Registered in England and Wales with 
registration number 08650665 and which has its registered office at 60 Holborn 
Viaduct, London EC1A 2FD, United Kingdom.


Re: [PATCH 3/5] x86: Add -mindirect-branch-register

2018-01-14 Thread Uros Bizjak
-  (ior (and (not (match_test "TARGET_X32"))
+  (ior (and (not (match_test "TARGET_X32
+  || ix86_indirect_branch_thunk_register"))
 (match_operand 0 "sibcall_memory_operand"))
-   (and (match_test "TARGET_X32 && Pmode == DImode")
+   (and (match_test "TARGET_X32 && Pmode == DImode
+ && !ix86_indirect_branch_thunk_register")
 (match_operand 0 "GOT_memory_operand"

Is this patch just trying to disable the predicate when
ix86_indirect_branch_thunk_register is set? Because this is what this
convoluted logic does.

While there is some sense for "TARGET_X32 && Pmode == DImode", we try
to use RTX codes as much as possible in constraints.md and
predicates.md.

Uros.


Re: [PATCH 3/5] x86: Add -mindirect-branch-register

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 9:51 AM, Uros Bizjak  wrote:
> -  (ior (and (not (match_test "TARGET_X32"))
> +  (ior (and (not (match_test "TARGET_X32
> +  || ix86_indirect_branch_thunk_register"))
>  (match_operand 0 "sibcall_memory_operand"))
> -   (and (match_test "TARGET_X32 && Pmode == DImode")
> +   (and (match_test "TARGET_X32 && Pmode == DImode
> + && !ix86_indirect_branch_thunk_register")
>  (match_operand 0 "GOT_memory_operand"
>
> Is this patch just trying to disable the predicate when
> ix86_indirect_branch_thunk_register is set? Because this is what this
> convoluted logic does.

Yes, we want to disable all indirect branch via memory with
-mindirect-branch-register, just like -mx32.   We could do

#idefine TARGET_INDIRECT_BRANCH_REGISTER \
 (TARGER_X32 ||  ix86_indirect_branch_thunk_register)

> While there is some sense for "TARGET_X32 && Pmode == DImode", we try
> to use RTX codes as much as possible in constraints.md and
> predicates.md.
>
> Uros.



-- 
H.J.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Uros Bizjak
On Sun, Jan 14, 2018 at 6:44 PM, Woodhouse, David  wrote:
> This won't make the list; I'll send a more coherent and less HTML-afflicted
> version later.
>
> The bare 'ax' naming made it painful to instantiate the external thunks for
> 32-bit and 64-bot code because we had to put the e/r back again inside the
> .irp reg ax bx... code.
>
> We could probably have lived with that but it would be painful to change now
> that Linux and Xen patches with the current ABI are all lined up. I
> appreciate they weren't in GCC yet so we get little sympathy but these are
> strange times and we had to move fast.
>
> I'd really like *not* to change it now. Having the thunk name actually
> include the name of the register it's using does seem nicer anyway...

That's unfortunate... I suspect that in the future, one will need
#ifdef __x86_64__ around eventual calls to thunks from c code because
of this decision, since thunks for x86_64 target will have different
names than thunks for x86_32 target. I don't know if the (single?)
case of mixing 32 and 64 bit assembly in the highly specialized part
of the kernel really warrants this decision. Future programmers will
be grateful if kernel people can re-consider their choice in
not-yet-release ABI.

Uros.


Re: [PATCH 3/5] x86: Add -mindirect-branch-register

2018-01-14 Thread Uros Bizjak
On Sun, Jan 14, 2018 at 7:08 PM, H.J. Lu  wrote:
> On Sun, Jan 14, 2018 at 9:51 AM, Uros Bizjak  wrote:
>> -  (ior (and (not (match_test "TARGET_X32"))
>> +  (ior (and (not (match_test "TARGET_X32
>> +  || ix86_indirect_branch_thunk_register"))
>>  (match_operand 0 "sibcall_memory_operand"))
>> -   (and (match_test "TARGET_X32 && Pmode == DImode")
>> +   (and (match_test "TARGET_X32 && Pmode == DImode
>> + && !ix86_indirect_branch_thunk_register")
>>  (match_operand 0 "GOT_memory_operand"
>>
>> Is this patch just trying to disable the predicate when
>> ix86_indirect_branch_thunk_register is set? Because this is what this
>> convoluted logic does.
>
> Yes, we want to disable all indirect branch via memory with
> -mindirect-branch-register, just like -mx32.   We could do
>
> #idefine TARGET_INDIRECT_BRANCH_REGISTER \
>  (TARGER_X32 ||  ix86_indirect_branch_thunk_register)

Index: predicates.md
===
--- predicates.md   (revision 25)
+++ predicates.md   (working copy)
@@ -710,11 +710,10 @@
   (ior (match_test "constant_call_address_operand
 (op, mode == VOIDmode ? mode : Pmode)")
(match_operand 0 "call_register_no_elim_operand")
-   (ior (and (not (match_test "TARGET_X32
-  || ix86_indirect_branch_thunk_register"))
+   (and (not (match_test "ix86_indirect_branch_thunk_register"))
+   (ior (and (not (match_test "TARGET_X32")))
 (match_operand 0 "memory_operand"))
-   (and (match_test "TARGET_X32 && Pmode == DImode
- && !ix86_indirect_branch_thunk_register")
+(and (match_test "TARGET_X32 && Pmode == DImode")
 (match_operand 0 "GOT_memory_operand")

or something like that.

Uros.


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-14 Thread David Woodhouse
On Sun, 2018-01-14 at 12:06 +0100, Jan Hubicka wrote:
> 
> Yes, according to my understanding we want both pause and lefence by default
> (it is expensive anyway) + command line option to control which one to use?
> 
> Probably thus the first patch should default to both.

In the kernel we're going to just do both. I'm not sure I see the merit
in a command line option.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Uros Bizjak
On Sun, Jan 14, 2018 at 7:22 PM, Uros Bizjak  wrote:
> On Sun, Jan 14, 2018 at 6:44 PM, Woodhouse, David  wrote:
>> This won't make the list; I'll send a more coherent and less HTML-afflicted
>> version later.
>>
>> The bare 'ax' naming made it painful to instantiate the external thunks for
>> 32-bit and 64-bot code because we had to put the e/r back again inside the
>> .irp reg ax bx... code.
>>
>> We could probably have lived with that but it would be painful to change now
>> that Linux and Xen patches with the current ABI are all lined up. I
>> appreciate they weren't in GCC yet so we get little sympathy but these are
>> strange times and we had to move fast.
>>
>> I'd really like *not* to change it now. Having the thunk name actually
>> include the name of the register it's using does seem nicer anyway...
>
> That's unfortunate... I suspect that in the future, one will need
> #ifdef __x86_64__ around eventual calls to thunks from c code because
> of this decision, since thunks for x86_64 target will have different
> names than thunks for x86_32 target. I don't know if the (single?)
> case of mixing 32 and 64 bit assembly in the highly specialized part
> of the kernel really warrants this decision. Future programmers will
> be grateful if kernel people can re-consider their choice in
> not-yet-release ABI.

A quick look through latest x86/pti update [1] shows:

+#ifdef CONFIG_X86_32
+#define INDIRECT_THUNK(reg) extern asmlinkage void
__x86_indirect_thunk_e ## reg(void);
+#else
+#define INDIRECT_THUNK(reg) extern asmlinkage void
__x86_indirect_thunk_r ## reg(void);
+INDIRECT_THUNK(8)
+INDIRECT_THUNK(9)
+INDIRECT_THUNK(10)
+INDIRECT_THUNK(11)
+INDIRECT_THUNK(12)
+INDIRECT_THUNK(13)
+INDIRECT_THUNK(14)
+INDIRECT_THUNK(15)
+#endif
+INDIRECT_THUNK(ax)
+INDIRECT_THUNK(bx)
+INDIRECT_THUNK(cx)
+INDIRECT_THUNK(dx)
+INDIRECT_THUNK(si)
+INDIRECT_THUNK(di)
+INDIRECT_THUNK(bp)
+INDIRECT_THUNK(sp)

and:

+/*
+ * Despite being an assembler file we can't just use .irp here
+ * because __KSYM_DEPS__ only uses the C preprocessor and would
+ * only see one instance of "__x86_indirect_thunk_\reg" rather
+ * than one per register with the correct names. So we do it
+ * the simple and nasty way...
+ */
+#define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
+#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
+
+GENERATE_THUNK(_ASM_AX)
+GENERATE_THUNK(_ASM_BX)
+GENERATE_THUNK(_ASM_CX)
+GENERATE_THUNK(_ASM_DX)
+GENERATE_THUNK(_ASM_SI)
+GENERATE_THUNK(_ASM_DI)
+GENERATE_THUNK(_ASM_BP)
+GENERATE_THUNK(_ASM_SP)
+#ifdef CONFIG_64BIT
+GENERATE_THUNK(r8)
+GENERATE_THUNK(r9)
+GENERATE_THUNK(r10)
+GENERATE_THUNK(r11)
+GENERATE_THUNK(r12)
+GENERATE_THUNK(r13)
+GENERATE_THUNK(r14)
+GENERATE_THUNK(r15)

I have a feeling that using e.g. __x86_indirect_thunk_ax would be more
convenient in both cases.

[1] https://www.spinics.net/lists/kernel/msg2697606.html

Uros.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread David Woodhouse
On Sun, 2018-01-14 at 21:21 +0100, Uros Bizjak wrote:
> A quick look through latest x86/pti update [1] shows:
> 
> +#ifdef CONFIG_X86_32
> +#define INDIRECT_THUNK(reg) extern asmlinkage void
> __x86_indirect_thunk_e ## reg(void);
> +#else
> +#define INDIRECT_THUNK(reg) extern asmlinkage void
> __x86_indirect_thunk_r ## reg(void);
> +INDIRECT_THUNK(8)
> +INDIRECT_THUNK(9)
> +INDIRECT_THUNK(10)
> +INDIRECT_THUNK(11)
> +INDIRECT_THUNK(12)
> +INDIRECT_THUNK(13)
> +INDIRECT_THUNK(14)
> +INDIRECT_THUNK(15)
> +#endif
> +INDIRECT_THUNK(ax)
> +INDIRECT_THUNK(bx)
> +INDIRECT_THUNK(cx)
> +INDIRECT_THUNK(dx)
> +INDIRECT_THUNK(si)
> +INDIRECT_THUNK(di)
> +INDIRECT_THUNK(bp)
> +INDIRECT_THUNK(sp)

Yeah, that one is purely for the CONFIG_MODVERSIONS system, which I'm
hoping to fix properly by not having to have fake (and clearly
incorrect) C prototypes for the thunks which aren't actually C
functions. It's intended to go away.


> and:
> 
> +/*
> + * Despite being an assembler file we can't just use .irp here
> + * because __KSYM_DEPS__ only uses the C preprocessor and would
> + * only see one instance of "__x86_indirect_thunk_\reg" rather
> + * than one per register with the correct names. So we do it
> + * the simple and nasty way...
> + */
> +#define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
> +#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
> +
> +GENERATE_THUNK(_ASM_AX)
> +GENERATE_THUNK(_ASM_BX)
> +GENERATE_THUNK(_ASM_CX)
> +GENERATE_THUNK(_ASM_DX)
> +GENERATE_THUNK(_ASM_SI)
> +GENERATE_THUNK(_ASM_DI)
> +GENERATE_THUNK(_ASM_BP)
> +GENERATE_THUNK(_ASM_SP)
> +#ifdef CONFIG_64BIT
> +GENERATE_THUNK(r8)
> +GENERATE_THUNK(r9)
> +GENERATE_THUNK(r10)
> +GENERATE_THUNK(r11)
> +GENERATE_THUNK(r12)
> +GENERATE_THUNK(r13)
> +GENERATE_THUNK(r14)
> +GENERATE_THUNK(r15)
> 
> I have a feeling that using e.g. __x86_indirect_thunk_ax would be more
> convenient in both cases.

Likewise, the CONFIG_TRIM_UNUSED_SYMBOLS mechanism in the kernel passes
.S files through the preprocessor and looks for EXPORT_SYMBOL, so it
wasn't working well with my .irp-based implementation like the one in
Xen. So I've swapped it out for this one for now.

Again, I was hoping to clean that up and make it do something saner,
and then this could switch back too.

But sure, right now it isn't that might of a difference for me; my
implementation has changed since I made that reqeust. I have no
fundamental technical objection to the bare 'ax' naming. We can live
with either.

It's just that we've been asking for an agreement on the basics (the
command line we use, and the thunk names) for some days now, and this
is the first time we've had this discussion, and Linus has just taken
the patches. 

That's still fine. I know we get no sympathy, and we *can* change the
Linux kernel between -rc8 and -final if we must, and change the Xen
patches too. I'd just rather not.


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 0/5] Make std::future::wait_* use std::chrono::steady_clock when required

2018-01-14 Thread Mike Crowe
On Sunday 14 January 2018 at 16:08:09 +, Mike Crowe wrote:
> Hi Torvald,
> 
> Thanks for reviewing this change.
> 
> On Saturday 13 January 2018 at 16:29:57 +0100, Torvald Riegel wrote:
> > On Sun, 2018-01-07 at 20:55 +, Mike Crowe wrote:
> > > This is a first attempt to make std::future::wait_until and
> > > std::future::wait_for make correct use of
> > > std::chrono::steady_clock/CLOCK_MONOTONIC. It also makes
> > > std::future::wait_until react to changes to CLOCK_REALTIME during the
> > > wait, but only when passed a std::chrono::system_clock time point.
> > 
> > I have comments on the design.
> > 
> > First, I don't think we should not change
> > __atomic_futex_unsigned_base::_M_futex_wait_until, as there's a risk
> > that we'll change behavior of existing applications that work as
> > expected.
> 
> I assume you mean "I don't think we should change" or "I think we should
> not change"... :-)
> 
> The only way I can see that behaviour will change for existing programs is
> when the system clock changes (i.e. when someone calls settimeofday.) In
> the existing code, the maximum wait time is fixed once gettimeofday is
> called to calculate the relative timeout. When using FUTEX_CLOCK_REALTIME,
> the maximum wait can change based on changes to the system clock after that
> point. It appears that glibc made this transition successfully and
> currently uses FUTEX_CLOCK_REALTIME. I think that the new behaviour is
> better than the old behaviour.
> 
> Or perhaps I've missed another possibility. Did you have another risk in
> mind?
> 
> > Instead, ISTM we should additionally expose the two options we have at
> > the level of futexes:
> > * Relative timeout using CLOCK_MONOTONIC
> > * Absolute timeout using CLOCK_REALTIME (which will fall back to the
> > former on old kernels, which is fine I think).
> > 
> > Then we do the following translations from functions that programs would
> > call to the new futex functions:
> > 
> > 1) wait_for is a loop in which we load the current time from the steady
> > clock, then call the relative futex wait, and if that returns for a
> > spurious reason (ie, neither timeout nor is the expected value present),
> > we reduce the prior relative amount by the difference between the time
> > before the futex wait and the current time.
> 
> If we're going to loop on a relative timeout it sounds safer to convert it
> to an absolute (steady clock) timeout. That way we won't risk increasing
> the timeout if the scheduler decides not to run us at an inopportune moment
> between waits. _M_load_when_equal_for already does this.
> 
> _M_load_and_test_until already has a loop for spurious wakeup. I think that
> it makes sense to only loop at one level. That loop relies on the timeout
> being absolute, which is why my _M_load_and_test_until_steady also uses an
> absolute timeout.
> 
> > 2) wait_until using the steady clock is a loop similar to wait_for, just
> > that we additionally compute the initial relative timeout.
> 
> Clearly an absolute wait can be implemented in terms of a relative one and
> vice-versa, but at least in my attempts to write them I find the code
> easier to understand (and therefore get right) if the fundamental wait is
> the absolute one and the relative one is implemented on top of it.

I had a quick go at implementing at least the first part of your design, as
I understood it. (I've kept the loops inside atomic_futex_unsigned - and I
think that you wanted to move them out to the client code.) I've not tested
it much.

I think that this implementation of _M_load_and_test_for is rather more
error-prone than my previous _M_load_and_test_until_steady. That's probably
partly because the type-safe duration has already been separated into seconds
and nanoseconds. It would be nice to push this separation as deeply as
possible in the code, but I'm afraid that would break ABI compatibility.

Thanks.

Mike.

--8<--

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index ad9437da4e2..fa4a4382c79 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -57,6 +57,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_futex_wait_until(unsigned *__addr, unsigned __val, bool __has_timeout,
chrono::seconds __s, chrono::nanoseconds __ns);

+// Returns false iff a timeout occurred.
+bool
+_M_futex_wait_for(unsigned *__addr, unsigned __val, bool __has_timeout,
+   chrono::seconds __s, chrono::nanoseconds __ns);
+
 // This can be executed after the object has been destroyed.
 static void _M_futex_notify_all(unsigned* __addr);
   };
@@ -110,6 +115,40 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 }

+// If a timeout occurs, returns a current value after the timeout;
+// otherwise, returns the operand's value if equal is true or a different
+// value if equal is false.
+// The assumed value is the caller's assumption about the current value
+ 

Re: [PATCH 3/5] x86: Add -mindirect-branch-register

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 10:52 AM, Uros Bizjak  wrote:
> On Sun, Jan 14, 2018 at 7:08 PM, H.J. Lu  wrote:
>> On Sun, Jan 14, 2018 at 9:51 AM, Uros Bizjak  wrote:
>>> -  (ior (and (not (match_test "TARGET_X32"))
>>> +  (ior (and (not (match_test "TARGET_X32
>>> +  || ix86_indirect_branch_thunk_register"))
>>>  (match_operand 0 "sibcall_memory_operand"))
>>> -   (and (match_test "TARGET_X32 && Pmode == DImode")
>>> +   (and (match_test "TARGET_X32 && Pmode == DImode
>>> + && !ix86_indirect_branch_thunk_register")
>>>  (match_operand 0 "GOT_memory_operand"
>>>
>>> Is this patch just trying to disable the predicate when
>>> ix86_indirect_branch_thunk_register is set? Because this is what this
>>> convoluted logic does.
>>
>> Yes, we want to disable all indirect branch via memory with
>> -mindirect-branch-register, just like -mx32.   We could do
>>
>> #idefine TARGET_INDIRECT_BRANCH_REGISTER \
>>  (TARGER_X32 ||  ix86_indirect_branch_thunk_register)
>
> Index: predicates.md
> ===
> --- predicates.md   (revision 25)
> +++ predicates.md   (working copy)
> @@ -710,11 +710,10 @@
>(ior (match_test "constant_call_address_operand
>  (op, mode == VOIDmode ? mode : Pmode)")
> (match_operand 0 "call_register_no_elim_operand")
> -   (ior (and (not (match_test "TARGET_X32
> -  || ix86_indirect_branch_thunk_register"))
> +   (and (not (match_test "ix86_indirect_branch_thunk_register"))
> +   (ior (and (not (match_test "TARGET_X32")))
>  (match_operand 0 "memory_operand"))
> -   (and (match_test "TARGET_X32 && Pmode == DImode
> - && !ix86_indirect_branch_thunk_register")
> +(and (match_test "TARGET_X32 && Pmode == DImode")
>  (match_operand 0 "GOT_memory_operand")
>
> or something like that.
>

LGTM.

Thanks.

-- 
H.J.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Uros Bizjak
On Sun, Jan 14, 2018 at 9:34 PM, David Woodhouse  wrote:

> Likewise, the CONFIG_TRIM_UNUSED_SYMBOLS mechanism in the kernel passes
> .S files through the preprocessor and looks for EXPORT_SYMBOL, so it
> wasn't working well with my .irp-based implementation like the one in
> Xen. So I've swapped it out for this one for now.
>
> Again, I was hoping to clean that up and make it do something saner,
> and then this could switch back too.
>
> But sure, right now it isn't that might of a difference for me; my
> implementation has changed since I made that reqeust. I have no
> fundamental technical objection to the bare 'ax' naming. We can live
> with either.
>
> It's just that we've been asking for an agreement on the basics (the
> command line we use, and the thunk names) for some days now, and this
> is the first time we've had this discussion, and Linus has just taken
> the patches.
>
> That's still fine. I know we get no sympathy, and we *can* change the
> Linux kernel between -rc8 and -final if we must, and change the Xen
> patches too. I'd just rather not.

Well, you did say that these are strange times ;)

>From the user perspective, it would be more convenient to use the
thunk names that are the same for 32bit and 64bit targets. If we
ignore this fact, the difference is only a couple of lines in the
compiler source which we also can live with. But please discuss my
proposal also in the kernel community, and weight the benefits and
drawbacks of each approach before the final decision.

Please pass the final decision to gcc community, and we'll implement it.

Uros.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread David Woodhouse
On Sun, 2018-01-14 at 21:52 +0100, Uros Bizjak wrote:
> On Sun, Jan 14, 2018 at 9:34 PM, David Woodhouse  wrote:
> > But sure, right now it isn't that might of a difference for me; my
> > implementation has changed since I made that reqeust. I have no
> > fundamental technical objection to the bare 'ax' naming. We can live
> > with either.
> > 
> > It's just that we've been asking for an agreement on the basics (the
> > command line we use, and the thunk names) for some days now, and this
> > is the first time we've had this discussion, and Linus has just taken
> > the patches.
> > 
> > That's still fine. I know we get no sympathy, and we *can* change the
> > Linux kernel between -rc8 and -final if we must, and change the Xen
> > patches too. I'd just rather not.
> Well, you did say that these are strange times ;)
> 
> From the user perspective, it would be more convenient to use the
> thunk names that are the same for 32bit and 64bit targets. If we
> ignore this fact, the difference is only a couple of lines in the
> compiler source which we also can live with. But please discuss my
> proposal also in the kernel community, and weight the benefits and
> drawbacks of each approach before the final decision.
> 
> Please pass the final decision to gcc community, and we'll implement it.

+Linus, Thomas.

Review on the GCC patches has led to a request that the thunk symbols
be changed from e.g. __x86_indirect_thunk_rax to
__x86_indirect_thunk_ax without the 'r'.

If we're going to change the thunk names, it's best to do it *right*
now before the 4.15-rc8 release.

I genuinely don't care at this point what the thunk names are. It's
just that Linus is probably preparing the -rc8 release as we speak, and
I'd want to do a new compiler build and set of tests if we make the
change. For that reason alone, I'm inclined to answer that we should
leave them as they are.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Linus Torvalds
On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse  wrote:
> Review on the GCC patches has led to a request that the thunk symbols
> be changed from e.g. __x86_indirect_thunk_rax to
> __x86_indirect_thunk_ax without the 'r'.

Ok. I think that just makes it easier for us, since then the names are
independent of 32-vs/64, and we don't need to use the _ASM_XY names.

What about r8-r15? I'm assuming 'r' there is used?

Mind sending me a tested patch? I'll was indeed planning on generating
rc8, but I might as well go grocery shopping now instead, and do rc8
later in the evening.

Linus


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread David Woodhouse
On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote:
> On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse  wrote:
> > Review on the GCC patches has led to a request that the thunk symbols
> > be changed from e.g. __x86_indirect_thunk_rax to
> > __x86_indirect_thunk_ax without the 'r'.
> 
> Ok. I think that just makes it easier for us, since then the names are
> independent of 32-vs/64, and we don't need to use the _ASM_XY names.
> 
> What about r8-r15? I'm assuming 'r' there is used?
> 
> Mind sending me a tested patch? I'll was indeed planning on generating
> rc8, but I might as well go grocery shopping now instead, and do rc8
> later in the evening.

I'll kick off a compiler build now...

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Thomas Gleixner
On Sun, 14 Jan 2018, David Woodhouse wrote:

> On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote:
> > On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse  
> > wrote:
> > > Review on the GCC patches has led to a request that the thunk symbols
> > > be changed from e.g. __x86_indirect_thunk_rax to
> > > __x86_indirect_thunk_ax without the 'r'.
> > 
> > Ok. I think that just makes it easier for us, since then the names are
> > independent of 32-vs/64, and we don't need to use the _ASM_XY names.
> > 
> > What about r8-r15? I'm assuming 'r' there is used?
> > 
> > Mind sending me a tested patch? I'll was indeed planning on generating
> > rc8, but I might as well go grocery shopping now instead, and do rc8
> > later in the evening.
> 
> I'll kick off a compiler build now...

Send the patch to me/LKML. I'm queueing the compile time warning removal
and then can add that one on top so Linus can pull the lot.

Thanks,

tglx


Re: [PATCH 3/5] x86: Add -mindirect-branch-register

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 10:52 AM, Uros Bizjak  wrote:
> On Sun, Jan 14, 2018 at 7:08 PM, H.J. Lu  wrote:
>> On Sun, Jan 14, 2018 at 9:51 AM, Uros Bizjak  wrote:
>>> -  (ior (and (not (match_test "TARGET_X32"))
>>> +  (ior (and (not (match_test "TARGET_X32
>>> +  || ix86_indirect_branch_thunk_register"))
>>>  (match_operand 0 "sibcall_memory_operand"))
>>> -   (and (match_test "TARGET_X32 && Pmode == DImode")
>>> +   (and (match_test "TARGET_X32 && Pmode == DImode
>>> + && !ix86_indirect_branch_thunk_register")
>>>  (match_operand 0 "GOT_memory_operand"
>>>
>>> Is this patch just trying to disable the predicate when
>>> ix86_indirect_branch_thunk_register is set? Because this is what this
>>> convoluted logic does.
>>
>> Yes, we want to disable all indirect branch via memory with
>> -mindirect-branch-register, just like -mx32.   We could do
>>
>> #idefine TARGET_INDIRECT_BRANCH_REGISTER \
>>  (TARGER_X32 ||  ix86_indirect_branch_thunk_register)
>
> Index: predicates.md
> ===
> --- predicates.md   (revision 25)
> +++ predicates.md   (working copy)
> @@ -710,11 +710,10 @@
>(ior (match_test "constant_call_address_operand
>  (op, mode == VOIDmode ? mode : Pmode)")
> (match_operand 0 "call_register_no_elim_operand")
> -   (ior (and (not (match_test "TARGET_X32
> -  || ix86_indirect_branch_thunk_register"))
> +   (and (not (match_test "ix86_indirect_branch_thunk_register"))
> +   (ior (and (not (match_test "TARGET_X32")))
>  (match_operand 0 "memory_operand"))
> -   (and (match_test "TARGET_X32 && Pmode == DImode
> - && !ix86_indirect_branch_thunk_register")
> +(and (match_test "TARGET_X32 && Pmode == DImode")
>  (match_operand 0 "GOT_memory_operand")
>
> or something like that.
>

I am testing this patch.  OK for trunk if there is no regression?


-- 
H.J.
From e332a970f82439c5169e6f87c77f15123e7ec49e Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 14 Jan 2018 13:19:37 -0800
Subject: [PATCH] x86: Rewrite ix86_indirect_branch_thunk_register logic

Rewrite ix86_indirect_branch_thunk_register logic with

(and (not (match_test "ix86_indirect_branch_thunk_register"))
 (original condition before r256662))

	* config/i386/predicates.md (indirect_branch_operand): Rewrite
	ix86_indirect_branch_thunk_register logic.
	(constant_call_address_operand): Likewise.
	(sibcall_insn_operand): Likewise.
---
 gcc/config/i386/predicates.md | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 5ae443231b8..68556d8b827 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -664,10 +664,10 @@
 
 ;; Test for a valid operand for indirect branch.
 (define_predicate "indirect_branch_operand"
-  (ior (match_operand 0 "register_operand")
-   (and (not (match_test "TARGET_X32
-			  || ix86_indirect_branch_thunk_register"))
-	(match_operand 0 "memory_operand"
+  (and (not (match_test "ix86_indirect_branch_thunk_register"))
+   (ior (match_operand 0 "register_operand")
+	(and (not (match_test "TARGET_X32"))
+		 (match_operand 0 "memory_operand")
 
 ;; Return true if OP is a memory operands that can be used in sibcalls.
 ;; Since sibcall never returns, we can only use call-clobbered register
@@ -710,24 +710,22 @@
   (ior (match_test "constant_call_address_operand
 		 (op, mode == VOIDmode ? mode : Pmode)")
(match_operand 0 "call_register_no_elim_operand")
-   (ior (and (not (match_test "TARGET_X32
-   || ix86_indirect_branch_thunk_register"))
-		 (match_operand 0 "memory_operand"))
-	(and (match_test "TARGET_X32 && Pmode == DImode
-			  && !ix86_indirect_branch_thunk_register")
-		 (match_operand 0 "GOT_memory_operand")
+   (and (not (match_test "ix86_indirect_branch_thunk_register"))
+	(ior (and (not (match_test "TARGET_X32"))
+		  (match_operand 0 "memory_operand"))
+		 (and (match_test "TARGET_X32 && Pmode == DImode")
+		  (match_operand 0 "GOT_memory_operand"))
 
 ;; Similarly, but for tail calls, in which we cannot allow memory references.
 (define_special_predicate "sibcall_insn_operand"
   (ior (match_test "constant_call_address_operand
 		 (op, mode == VOIDmode ? mode : Pmode)")
(match_operand 0 "register_no_elim_operand")
-   (ior (and (not (match_test "TARGET_X32
-   || ix86_indirect_branch_thunk_register"))
-		 (match_operand 0 "sibcall_memory_operand"))
-	(and (match_test "TARGET_X32 && Pmode == DImode
-			  && !ix86_indirect_branch_thunk_register")
-		 (match_operand 0 "GOT_memory_operand")
+   (and (not (match_test "ix86_indirect_branch_thunk_register"))
+	(ior (and (not (match_test "TARGET_X32"))
+		   

Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 1:07 PM, Linus Torvalds
 wrote:
> On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse  wrote:
>> Review on the GCC patches has led to a request that the thunk symbols
>> be changed from e.g. __x86_indirect_thunk_rax to
>> __x86_indirect_thunk_ax without the 'r'.
>
> Ok. I think that just makes it easier for us, since then the names are
> independent of 32-vs/64, and we don't need to use the _ASM_XY names.
>
> What about r8-r15? I'm assuming 'r' there is used?

They will remain r8-r15.

> Mind sending me a tested patch? I'll was indeed planning on generating
> rc8, but I might as well go grocery shopping now instead, and do rc8
> later in the evening.
>
> Linus



-- 
H.J.


Re: [patch, committed] PR83811 fortran 'e' format broken for single digit exponents

2018-01-14 Thread Jerry DeLisle
On 01/14/2018 09:42 AM, Jerry DeLisle wrote:
> Hello all,
> 
> I committed the following as trivial.
> 
> Regression tested on x86_64-pc-linux-gnu.
> 
> This is a regression on 7 so I will backport.
> 
> Regards,
> 
> Jerry
> 
> 2018-01-18  Jerry DeLisle  
> 
>   PR libgfortran/83811
>   * write.c (select_buffer): Adjust buffer size up by 1.
> 

Committed to gcc7 after regression testing and I fixed the dates I had in the
ChangeLogs.

Cheers,

Jerry


Re: [PATCH] document -Wclass-memaccess suppression by casting (PR 81327)

2018-01-14 Thread Martin Sebor

On 01/14/2018 03:31 AM, Jakub Jelinek wrote:

On Sat, Jan 13, 2018 at 04:14:38PM -0700, Martin Sebor wrote:

-The @option{-Wclass-memaccess} option is enabled by @option{-Wall}.
+The @option{-Wclass-memaccess} option is enabled by @option{-Wall}.  Casting


Perhaps "Explicitly casting" instead?  The implicit cast doesn't suppress it
and occurs whenever there isn't an explicit cast.


+the pointer to the class object to @code{void *} or to a type that can be
+safely accessed by the raw memory function suppresses the warning.



Sure.  I've committed r256677.

Martin



Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread David Woodhouse
On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote:
> On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse  wrote:
> > 
> > Review on the GCC patches has led to a request that the thunk symbols
> > be changed from e.g. __x86_indirect_thunk_rax to
> > __x86_indirect_thunk_ax without the 'r'.
> Ok. I think that just makes it easier for us, since then the names are
> independent of 32-vs/64, and we don't need to use the _ASM_XY names.
> 
> What about r8-r15? I'm assuming 'r' there is used?

Ah yes, *this* is why I hated it... for 'r8' onwards that is indeed the
register names as well as the suffix of the thunk name. But for the
legacy registers I have to prepend 'e' or 'r' myself in the macro. So
it ends up looking like this:


.macro THUNK reg
.section .text.__x86.indirect_thunk.\reg

ENTRY(__x86_indirect_thunk_\reg)
CFI_STARTPROC
$done = 0
.irp xreg r8 r9 r10 r11 r12 r13 r14 r15
.ifeqs "\reg", "\xreg"
JMP_NOSPEC %\reg
$done = 1
.endif
.endr
.if $done != 1
JMP_NOSPEC %__ASM_REG(\reg)
.endif
CFI_ENDPROC
ENDPROC(__x86_indirect_thunk_\reg)
.endm

/*
 * Despite being an assembler file we can't just use .irp here
 * because __KSYM_DEPS__ only uses the C preprocessor and would
 * only see one instance of "__x86_indirect_thunk_\reg" rather
 * than one per register with the correct names. So we do it
 * the simple and nasty way...
 */
#define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)

GENERATE_THUNK(ax)
GENERATE_THUNK(bx)
GENERATE_THUNK(cx)
GENERATE_THUNK(dx)
GENERATE_THUNK(si)
GENERATE_THUNK(di)
GENERATE_THUNK(bp)
#ifdef CONFIG_64BIT
GENERATE_THUNK(r8)
GENERATE_THUNK(r9)
GENERATE_THUNK(r10)
GENERATE_THUNK(r11)
GENERATE_THUNK(r12)
GENERATE_THUNK(r13)
GENERATE_THUNK(r14)
GENERATE_THUNK(r15)
#endif


And *that* was the point at which I asked HJ to just use the proper
bloody register names :)


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 1:58 PM, David Woodhouse  wrote:
> On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote:
>> On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse  wrote:
>> >
>> > Review on the GCC patches has led to a request that the thunk symbols
>> > be changed from e.g. __x86_indirect_thunk_rax to
>> > __x86_indirect_thunk_ax without the 'r'.
>> Ok. I think that just makes it easier for us, since then the names are
>> independent of 32-vs/64, and we don't need to use the _ASM_XY names.
>>
>> What about r8-r15? I'm assuming 'r' there is used?
>
> Ah yes, *this* is why I hated it... for 'r8' onwards that is indeed the
> register names as well as the suffix of the thunk name. But for the
> legacy registers I have to prepend 'e' or 'r' myself in the macro. So
> it ends up looking like this:
>
>
> .macro THUNK reg
> .section .text.__x86.indirect_thunk.\reg
>
> ENTRY(__x86_indirect_thunk_\reg)
> CFI_STARTPROC
> $done = 0
> .irp xreg r8 r9 r10 r11 r12 r13 r14 r15
> .ifeqs "\reg", "\xreg"
> JMP_NOSPEC %\reg
> $done = 1
> .endif
> .endr
> .if $done != 1
> JMP_NOSPEC %__ASM_REG(\reg)
> .endif
> CFI_ENDPROC
> ENDPROC(__x86_indirect_thunk_\reg)
> .endm
>
> /*
>  * Despite being an assembler file we can't just use .irp here
>  * because __KSYM_DEPS__ only uses the C preprocessor and would
>  * only see one instance of "__x86_indirect_thunk_\reg" rather
>  * than one per register with the correct names. So we do it
>  * the simple and nasty way...
>  */
> #define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
> #define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
>
> GENERATE_THUNK(ax)
> GENERATE_THUNK(bx)
> GENERATE_THUNK(cx)
> GENERATE_THUNK(dx)
> GENERATE_THUNK(si)
> GENERATE_THUNK(di)
> GENERATE_THUNK(bp)
> #ifdef CONFIG_64BIT
> GENERATE_THUNK(r8)
> GENERATE_THUNK(r9)
> GENERATE_THUNK(r10)
> GENERATE_THUNK(r11)
> GENERATE_THUNK(r12)
> GENERATE_THUNK(r13)
> GENERATE_THUNK(r14)
> GENERATE_THUNK(r15)
> #endif
>
>
> And *that* was the point at which I asked HJ to just use the proper
> bloody register names :)

Please let me know if I should make the change to ax,..., r8,..r15.

-- 
H.J.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread David Woodhouse
On Sun, 2018-01-14 at 14:03 -0800, H.J. Lu wrote:
> 
> > And *that* was the point at which I asked HJ to just use the proper
> > bloody register names :)
> 
> Please let me know if I should make the change to ax,..., r8,..r15.

This is what I'm building my compiler with now, to make that change:
http://git.infradead.org/users/dwmw2/gcc-retpoline.git/shortlog/refs/heads/retpoline-regnames

At this point I'm inclined to suggest we don't make the change. I'll
finish and test it anyway. I *can* change my GENERATE_THUNK macro to
take two arguments — the suffix of the thunk name, and the register
name to use. That lets me ditch the clever but ugly loop trick.

I *did* want to get this file back to using .irp in the end, by fixing
up other kernel infrastructure to do things properly. But I can live
without that too if I must.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 2:09 PM, David Woodhouse  wrote:
> On Sun, 2018-01-14 at 14:03 -0800, H.J. Lu wrote:
>>
>> > And *that* was the point at which I asked HJ to just use the proper
>> > bloody register names :)
>>
>> Please let me know if I should make the change to ax,..., r8,..r15.
>
> This is what I'm building my compiler with now, to make that change:
> http://git.infradead.org/users/dwmw2/gcc-retpoline.git/shortlog/refs/heads/retpoline-regnames
>
> At this point I'm inclined to suggest we don't make the change. I'll
> finish and test it anyway. I *can* change my GENERATE_THUNK macro to
> take two arguments — the suffix of the thunk name, and the register
> name to use. That lets me ditch the clever but ugly loop trick.
>
> I *did* want to get this file back to using .irp in the end, by fixing
> up other kernel infrastructure to do things properly. But I can live
> without that too if I must.

Please use this GCC patch instead.

-- 
H.J.
From 223c07edf531eaa05c7fff564ce6b5dc48e6a49b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 14 Jan 2018 14:04:55 -0800
Subject: [PATCH] x86: Change register names in __x86_indirect_thunk_reg

Change names of the lower 8 integer registers in __x86_indirect_thunk_reg
to ax, dx, cx, bx, si, di and bp.

gcc/

	* config/i386/i386.c (indirect_thunk_name): Don't check
	LEGACY_INT_REGNO_P.
	(print_reg): Use reg_names[regno] for 'V'.
	* doc/extend.texi: Replace "the full integer" with "the integer"
	for 'V'.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-1.c: Updated.
	* gcc.target/i386/indirect-thunk-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-register-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-register-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-register-4.c: Likewise.
	* gcc.target/i386/ret-thunk-10.c: Likewise.
	* gcc.target/i386/ret-thunk-11.c: Likewise.
	* gcc.target/i386/ret-thunk-12.c: Likewise.
	* gcc.target/i386/ret-thunk-13.c: Likewise.
	* gcc.target/i386/ret-thunk-14.c: Likewise.
	* gcc.target/i386/ret-thunk-15.c: Likewise.
	* gcc.target/i386/ret-thunk-9.c: Likewise.
---
 gcc/config/i386/i386.c   | 20 
 gcc/doc/extend.texi  |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-2.c |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-3.c |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-4.c |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-7.c |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-1.c  |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-2.c  |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-5.c  |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-6.c  |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-7.c  |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-1.c|  2 +-
 .../gcc.target/i386/indirect-thunk-extern-2.c|  2 +-
 .../gcc.target/i386/indirect-thunk-extern-3.c|  2 +-
 .../gcc.target/i386/indirect-thunk-extern-4.c|  2 +-
 .../gcc.target/i386/indirect-thunk-extern-7.c|  2 +-
 .../gcc.target/i386/indirect-thunk-register-1.c  |  2 +-
 .../gcc.target/i386/indirect-thunk-register-3.c  |  2 +-
 .../gcc.target/i386/indirect-thunk-register-4.c  |  3 +--
 gcc/testsuite/gcc.target/i386/ret-thunk-10.c |  4 ++--
 gcc/testsuite/gcc.target/i386/ret-thunk-11.c |  4 ++--
 gcc/testsuite/gcc.target/i386/ret-thunk-12.c |  4 ++--
 gcc/testsuite/gcc.target/i386/ret-thunk-13.c |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-14.c |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-15.c |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-9.c  |  2 +-
 27 files changed, 37 insertions(+), 42 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5e4f845a1bd..890bd701cd1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10787,15 +10787,8 @@ indirect_thunk_name (char name[32], int regno, bool need_bnd_p,
 {
   const char *bnd = need_bnd_p ? "_bnd" : "";
   if (regno >= 0)
-	{
-	  const char *reg_prefix;
-	  if (LEGACY_INT_REGNO_P (regno))
-	reg_prefix = TARGET_64BIT ? "r" : "e";
-	  else
-	reg_prefix = "";

Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread David Woodhouse
On Sun, 2018-01-14 at 14:12 -0800, H.J. Lu wrote:
> Please use this GCC patch instead.

Building now; thanks.

This is the kernel patch I'll test as soon as the compiler is done.
It's slightly less horrid than the "clever" one I sent out earlier, but
does still end up needing those _ASM_AX etc. macros in *addition* to
the bare "ax" that goes in the symbol names.

I'm not convinced we want to do this, but I'll defer to Linus.



From 755f50731a99b0ce0890e478e6a2d6ebd647da15 Mon Sep 17 00:00:00 2001
From: David Woodhouse 
Date: Sun, 14 Jan 2018 22:21:02 +
Subject: [PATCH] x86/retpoline: Switch thunk names to match final GCC patches

At the last minute, they were switched from __x86_indirect_thunk_rax to
__x86_indirect_thunk_ax without the 'r' or 'e' on the register name.

This is not entirely an improvement, IMO.

Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/asm-prototypes.h | 24 ++--
 arch/x86/lib/retpoline.S  | 41 +--
 2 files changed, 31 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/asm-prototypes.h 
b/arch/x86/include/asm/asm-prototypes.h
index 0927cdc4f946..df80478fb682 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -18,19 +18,7 @@ extern void cmpxchg8b_emu(void);
 #endif
 
 #ifdef CONFIG_RETPOLINE
-#ifdef CONFIG_X86_32
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## 
reg(void);
-#else
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## 
reg(void);
-INDIRECT_THUNK(8)
-INDIRECT_THUNK(9)
-INDIRECT_THUNK(10)
-INDIRECT_THUNK(11)
-INDIRECT_THUNK(12)
-INDIRECT_THUNK(13)
-INDIRECT_THUNK(14)
-INDIRECT_THUNK(15)
-#endif
+#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_ ## 
reg(void);
 INDIRECT_THUNK(ax)
 INDIRECT_THUNK(bx)
 INDIRECT_THUNK(cx)
@@ -39,4 +27,14 @@ INDIRECT_THUNK(si)
 INDIRECT_THUNK(di)
 INDIRECT_THUNK(bp)
 INDIRECT_THUNK(sp)
+#ifdef CONFIG_64BIT
+INDIRECT_THUNK(r8)
+INDIRECT_THUNK(r9)
+INDIRECT_THUNK(r10)
+INDIRECT_THUNK(r11)
+INDIRECT_THUNK(r12)
+INDIRECT_THUNK(r13)
+INDIRECT_THUNK(r14)
+INDIRECT_THUNK(r15)
+#endif /* CONFIG_64BIT */
 #endif /* CONFIG_RETPOLINE */
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index cb45c6cb465f..7da2c9035836 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -8,14 +8,14 @@
 #include 
 #include 
 
-.macro THUNK reg
-   .section .text.__x86.indirect_thunk.\reg
+.macro THUNK reg suffix
+   .section .text.__x86.indirect_thunk.\suffix
 
-ENTRY(__x86_indirect_thunk_\reg)
+ENTRY(__x86_indirect_thunk_\suffix)
    CFI_STARTPROC
    JMP_NOSPEC %\reg
    CFI_ENDPROC
-ENDPROC(__x86_indirect_thunk_\reg)
+ENDPROC(__x86_indirect_thunk_\suffix)
 .endm
 
 /*
@@ -26,23 +26,22 @@ ENDPROC(__x86_indirect_thunk_\reg)
  * the simple and nasty way...
  */
 #define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
-#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
+#define GENERATE_THUNK(reg, suffix) THUNK reg suffix; EXPORT_THUNK(suffix)
 
-GENERATE_THUNK(_ASM_AX)
-GENERATE_THUNK(_ASM_BX)
-GENERATE_THUNK(_ASM_CX)
-GENERATE_THUNK(_ASM_DX)
-GENERATE_THUNK(_ASM_SI)
-GENERATE_THUNK(_ASM_DI)
-GENERATE_THUNK(_ASM_BP)
-GENERATE_THUNK(_ASM_SP)
+GENERATE_THUNK(_ASM_AX, ax)
+GENERATE_THUNK(_ASM_BX, bx)
+GENERATE_THUNK(_ASM_CX, cx)
+GENERATE_THUNK(_ASM_DX, dx)
+GENERATE_THUNK(_ASM_SI, si)
+GENERATE_THUNK(_ASM_DI, di)
+GENERATE_THUNK(_ASM_BP, bp)
 #ifdef CONFIG_64BIT
-GENERATE_THUNK(r8)
-GENERATE_THUNK(r9)
-GENERATE_THUNK(r10)
-GENERATE_THUNK(r11)
-GENERATE_THUNK(r12)
-GENERATE_THUNK(r13)
-GENERATE_THUNK(r14)
-GENERATE_THUNK(r15)
+GENERATE_THUNK(r8, r8)
+GENERATE_THUNK(r9, r9)
+GENERATE_THUNK(r10, r10)
+GENERATE_THUNK(r11, r11)
+GENERATE_THUNK(r12, r12)
+GENERATE_THUNK(r13, r13)
+GENERATE_THUNK(r14, r14)
+GENERATE_THUNK(r15, r15)
 #endif
-- 
2.14.3


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Linus Torvalds
On Sun, Jan 14, 2018 at 2:39 PM, David Woodhouse  wrote:
>
> I'm not convinced we want to do this, but I'll defer to Linus.

Well, I guess we have no choice, if gcc ends up using the stupid names.

And yes, apparently this just made our macros worse instead of
cleaning anything up. Oh well.

I do have one (possible) solution: just export both names. So you'd export

  __x86_indirect_thunk_ax
  __x86_indirect_thunk_rax
..
 __x86_indirect_thunk_8
 __x86_indirect_thunk_r8

as symbols (same code, obviously), and then

 (a) the macros would be simpler

 (b) it just happens to work with even the old gcc patch

But at this point I don't really care.

  Linus


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread David Woodhouse
On Sun, 2018-01-14 at 15:02 -0800, Linus Torvalds wrote:
> On Sun, Jan 14, 2018 at 2:39 PM, David Woodhouse  > wrote:
> >
> > I'm not convinced we want to do this, but I'll defer to Linus.
> 
> Well, I guess we have no choice, if gcc ends up using the stupid
> names.

At this point, they'll do what we ask. See Uros's earlier message:

> ... the difference is only a couple of lines in the
> compiler source which we also can live with. But please discuss my
> proposal also in the kernel community, and weight the benefits and
> drawbacks of each approach before the final decision.
> 
> Please pass the final decision to gcc community, and we'll implement it.

I think we should stick with what we have now, with the names of the
thunks actually being the *full* name of the register (rax, eax, etc.)
that they use.


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Linus Torvalds
On Sun, Jan 14, 2018 at 3:09 PM, David Woodhouse  wrote:
>
> I think we should stick with what we have now, with the names of the
> thunks actually being the *full* name of the register (rax, eax, etc.)
> that they use.

It that works for the gcc people, then yes, I agree. The mixed
"sometimes full, sometimes not" approach just seems broken.

  Linus


Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.

2018-01-14 Thread Bernhard Reutner-Fischer
On 10 January 2018 15:59:28 CET, "Martin Liška"  wrote:
>On 01/10/2018 02:13 PM, Richard Biener wrote:
>> On Tue, Jan 9, 2018 at 7:29 PM, Jeff Law  wrote:
>>> On 01/09/2018 07:43 AM, Martin Liška wrote:
 On 09/20/2017 05:00 PM, Jeff Law wrote:
> On 09/20/2017 01:24 AM, Martin Liška wrote:
>
>>
>> Hello.
>>
>> Thank you Jeff for very verbose explanation what's happening. I'm
>planning to do
>> follow-up of this patch that will include clustering for
>bit-tests and jump tables.
>> Maybe that will make aforementioned issues even more difficult,
>but we'll see.
> FWIW, the DOM changes to simplify the conditionals seem to help
>both
> cases, trigger reasonably consistently in a bootstrap and for some
> subset of the triggers actually result in transformations that
>allow
> other passes to do a better job in the common (-O2) case.  So my
> inclination is to polish them a bit further get them on the trunk.
>
> My recommendation is to ignore the two regressions for now and
>focus on
> the cleanups you're trying to do.
>
> jeff
>

 Hello.

 Some time ago I've decided that I'll make patch submission of
>switch clustering
 in next stage1. However, this patch can be applied as is in this
>stage3. Would
 it be possible or is it too late?
>>> I'll let Richi make the call here.  FWIW, the DOM changes to avoid
>the
>>> two missed-optimization regressions you ran into are on the trunk,
>so
>>> that's no longer a blocking issue.
>> 
>> If you are fine with waiting then please wait ;)
>
>Yep, it's not urgent.

Can you please post CSiBE numbers? Ideally throwing in gcc-3.4.6 numbers too?

thanks, 


[PATCH] x86/retpoline: Switch thunk names to match final GCC patches

2018-01-14 Thread David Woodhouse
At the last minute, they were switched from __x86_indirect_thunk_rax to
__x86_indirect_thunk_ax without the 'r' or 'e' on the register name.
Except for the _r[89..] versions, obviously.

This is not entirely an improvement, IMO.

Reluctantly-signed-off-by: David Woodhouse 
---
I think we *shouldn't* do this. Uros said we could look at it and make
a decision, and GCC would implement what we decide. Up to Linus.

I'm sending this because I've built the compiler with the proposed
changes and tested it, and it's quarter past Monday and I'm done. Not
because I really want it.

 arch/x86/entry/entry_64.S |  2 +-
 arch/x86/include/asm/asm-prototypes.h | 24 ++--
 arch/x86/lib/retpoline.S  | 41 +--
 3 files changed, 32 insertions(+), 35 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 59874bc..e393163 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -269,7 +269,7 @@ entry_SYSCALL_64_fastpath:
 */
 #ifdef CONFIG_RETPOLINE
movqsys_call_table(, %rax, 8), %rax
-   call__x86_indirect_thunk_rax
+   call__x86_indirect_thunk_ax
 #else
call*sys_call_table(, %rax, 8)
 #endif
diff --git a/arch/x86/include/asm/asm-prototypes.h 
b/arch/x86/include/asm/asm-prototypes.h
index 0927cdc..df80478 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -18,19 +18,7 @@ extern void cmpxchg8b_emu(void);
 #endif
 
 #ifdef CONFIG_RETPOLINE
-#ifdef CONFIG_X86_32
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## 
reg(void);
-#else
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## 
reg(void);
-INDIRECT_THUNK(8)
-INDIRECT_THUNK(9)
-INDIRECT_THUNK(10)
-INDIRECT_THUNK(11)
-INDIRECT_THUNK(12)
-INDIRECT_THUNK(13)
-INDIRECT_THUNK(14)
-INDIRECT_THUNK(15)
-#endif
+#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_ ## 
reg(void);
 INDIRECT_THUNK(ax)
 INDIRECT_THUNK(bx)
 INDIRECT_THUNK(cx)
@@ -39,4 +27,14 @@ INDIRECT_THUNK(si)
 INDIRECT_THUNK(di)
 INDIRECT_THUNK(bp)
 INDIRECT_THUNK(sp)
+#ifdef CONFIG_64BIT
+INDIRECT_THUNK(r8)
+INDIRECT_THUNK(r9)
+INDIRECT_THUNK(r10)
+INDIRECT_THUNK(r11)
+INDIRECT_THUNK(r12)
+INDIRECT_THUNK(r13)
+INDIRECT_THUNK(r14)
+INDIRECT_THUNK(r15)
+#endif /* CONFIG_64BIT */
 #endif /* CONFIG_RETPOLINE */
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index cb45c6c..7da2c90 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -8,14 +8,14 @@
 #include 
 #include 
 
-.macro THUNK reg
-   .section .text.__x86.indirect_thunk.\reg
+.macro THUNK reg suffix
+   .section .text.__x86.indirect_thunk.\suffix
 
-ENTRY(__x86_indirect_thunk_\reg)
+ENTRY(__x86_indirect_thunk_\suffix)
CFI_STARTPROC
JMP_NOSPEC %\reg
CFI_ENDPROC
-ENDPROC(__x86_indirect_thunk_\reg)
+ENDPROC(__x86_indirect_thunk_\suffix)
 .endm
 
 /*
@@ -26,23 +26,22 @@ ENDPROC(__x86_indirect_thunk_\reg)
  * the simple and nasty way...
  */
 #define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
-#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
+#define GENERATE_THUNK(reg, suffix) THUNK reg suffix; EXPORT_THUNK(suffix)
 
-GENERATE_THUNK(_ASM_AX)
-GENERATE_THUNK(_ASM_BX)
-GENERATE_THUNK(_ASM_CX)
-GENERATE_THUNK(_ASM_DX)
-GENERATE_THUNK(_ASM_SI)
-GENERATE_THUNK(_ASM_DI)
-GENERATE_THUNK(_ASM_BP)
-GENERATE_THUNK(_ASM_SP)
+GENERATE_THUNK(_ASM_AX, ax)
+GENERATE_THUNK(_ASM_BX, bx)
+GENERATE_THUNK(_ASM_CX, cx)
+GENERATE_THUNK(_ASM_DX, dx)
+GENERATE_THUNK(_ASM_SI, si)
+GENERATE_THUNK(_ASM_DI, di)
+GENERATE_THUNK(_ASM_BP, bp)
 #ifdef CONFIG_64BIT
-GENERATE_THUNK(r8)
-GENERATE_THUNK(r9)
-GENERATE_THUNK(r10)
-GENERATE_THUNK(r11)
-GENERATE_THUNK(r12)
-GENERATE_THUNK(r13)
-GENERATE_THUNK(r14)
-GENERATE_THUNK(r15)
+GENERATE_THUNK(r8, r8)
+GENERATE_THUNK(r9, r9)
+GENERATE_THUNK(r10, r10)
+GENERATE_THUNK(r11, r11)
+GENERATE_THUNK(r12, r12)
+GENERATE_THUNK(r13, r13)
+GENERATE_THUNK(r14, r14)
+GENERATE_THUNK(r15, r15)
 #endif
-- 
2.7.4



Re: [PATCH] x86/retpoline: Switch thunk names to match final GCC patches

2018-01-14 Thread Linus Torvalds
On Sun, Jan 14, 2018 at 3:23 PM, David Woodhouse  wrote:
> I think we *shouldn't* do this. Uros said we could look at it and make
> a decision, and GCC would implement what we decide. Up to Linus.

Regardless of whether we end up having to do this, I'm not doing rc8
with it, and let's hope we can just skip it entirely.

It seems silly to have the 'r' for the r8-r15 case, but not the legacy
registers.

  Linus


[PATCH] handle multiple flexible array members (PR 83588)

2018-01-14 Thread Martin Sebor

The attached patch fixes PR c++/83588 - struct with two flexible
arrays causes an internal compiler error.  The ICE is caused by
the same assertion in varasm.c that has led to other similar
reports in the past:

  /* Given a non-empty initialization, this field had better
 be last.  Given a flexible array member, the next field
 on the chain is a TYPE_DECL of the enclosing struct.  */
  const_tree next = DECL_CHAIN (local->field);
  gcc_assert (!fieldsize || !next || TREE_CODE (next) != FIELD_DECL);

The fix is simply to also detect when a class defines more than
one flexible array member and treat the subsequent array as any
other member, and reject such class definitions to make sure they
never reach the assertion above.

Martin
PR c++/83588 - struct with two flexible arrays causes an internal compiler error

gcc/cp/ChangeLog:

	PR c++/83588
	* class.c (find_flexarrays): Make a record of multiple flexible array
	members.

gcc/testsuite/ChangeLog:

	PR c++/83588
	* g++.dg/ext/flexary28.C: New test.


Index: gcc/cp/class.c
===
--- gcc/cp/class.c	(revision 256674)
+++ gcc/cp/class.c	(working copy)
@@ -6566,14 +6566,17 @@ find_flexarrays (tree t, flexmems_t *fmem, bool ba
 	  /* Flexible array members have no upper bound.  */
 	  if (fmem->array)
 	{
-	  /* Replace the zero-length array if it's been stored and
-		 reset the after pointer.  */
 	  if (TYPE_DOMAIN (TREE_TYPE (fmem->array)))
 		{
+		  /* Replace the zero-length array if it's been stored and
+		 reset the after pointer.  */
 		  fmem->after[bool (pun)] = NULL_TREE;
 		  fmem->array = fld;
 		  fmem->enclosing = pstr;
 		}
+	  else if (!fmem->after[bool (pun)])
+		/* Make a record of another flexible array member.  */
+		fmem->after[bool (pun)] = fld;
 	}
 	  else
 	{
Index: gcc/testsuite/g++.dg/ext/flexary28.C
===
--- gcc/testsuite/g++.dg/ext/flexary28.C	(nonexistent)
+++ gcc/testsuite/g++.dg/ext/flexary28.C	(working copy)
@@ -0,0 +1,46 @@
+// PR c++/83588 - struct with two flexible arrays causes an internal compiler
+// error
+// { dg-do compile }
+// { dg-options "-Wno-pedantic" }
+
+struct A {
+  int i;
+  int a[];  // { dg-error "flexible array member .A::a. not at end of .struct A." }
+  int b[];
+};
+
+struct B {
+  int i;
+  int a[];  // { dg-error "flexible array member .B::a. not at end of .struct B." }
+  int j;
+  int b[][2];
+};
+
+struct C {
+  int i;
+  struct {
+int a[];// { dg-error "flexible array member .Ca. not at end of .struct C." }
+  };
+  int b[];
+};
+
+struct D {
+  int i;
+  struct {
+int a[];// { dg-error "flexible array member .Da. not at end of .struct D." }
+  } b[];
+  int c[];
+};
+
+struct E {
+  int i;
+  int a[0];
+  int b[];  // { dg-error "flexible array member .E::b. not at end of .struct E." }
+  int d[];
+};
+
+struct F {
+  int i;
+  int a[];  // { dg-error "flexible array member .F::a. not at end of .struct F." }
+  int b[], c[], d[];
+};


Re: [PATCH 3/5] x86: Add -mindirect-branch-register

2018-01-14 Thread H.J. Lu
On Sun, Jan 14, 2018 at 1:23 PM, H.J. Lu  wrote:
> On Sun, Jan 14, 2018 at 10:52 AM, Uros Bizjak  wrote:
>> On Sun, Jan 14, 2018 at 7:08 PM, H.J. Lu  wrote:
>>> On Sun, Jan 14, 2018 at 9:51 AM, Uros Bizjak  wrote:
 -  (ior (and (not (match_test "TARGET_X32"))
 +  (ior (and (not (match_test "TARGET_X32
 +  || ix86_indirect_branch_thunk_register"))
  (match_operand 0 "sibcall_memory_operand"))
 -   (and (match_test "TARGET_X32 && Pmode == DImode")
 +   (and (match_test "TARGET_X32 && Pmode == DImode
 + && !ix86_indirect_branch_thunk_register")
  (match_operand 0 "GOT_memory_operand"

 Is this patch just trying to disable the predicate when
 ix86_indirect_branch_thunk_register is set? Because this is what this
 convoluted logic does.
>>>
>>> Yes, we want to disable all indirect branch via memory with
>>> -mindirect-branch-register, just like -mx32.   We could do
>>>
>>> #idefine TARGET_INDIRECT_BRANCH_REGISTER \
>>>  (TARGER_X32 ||  ix86_indirect_branch_thunk_register)
>>
>> Index: predicates.md
>> ===
>> --- predicates.md   (revision 25)
>> +++ predicates.md   (working copy)
>> @@ -710,11 +710,10 @@
>>(ior (match_test "constant_call_address_operand
>>  (op, mode == VOIDmode ? mode : Pmode)")
>> (match_operand 0 "call_register_no_elim_operand")
>> -   (ior (and (not (match_test "TARGET_X32
>> -  || ix86_indirect_branch_thunk_register"))
>> +   (and (not (match_test "ix86_indirect_branch_thunk_register"))
>> +   (ior (and (not (match_test "TARGET_X32")))
>>  (match_operand 0 "memory_operand"))
>> -   (and (match_test "TARGET_X32 && Pmode == DImode
>> - && !ix86_indirect_branch_thunk_register")
>> +(and (match_test "TARGET_X32 && Pmode == DImode")
>>  (match_operand 0 "GOT_memory_operand")
>>
>> or something like that.
>>
>
> I am testing this patch.  OK for trunk if there is no regression?
>

Here is the updated patch.  Tested on i686 and x86-64.  OK for
trunk?


-- 
H.J.
From b76722e4050241d0b4c0a46bd12015d3973b118b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 14 Jan 2018 13:19:37 -0800
Subject: [PATCH] x86: Rewrite ix86_indirect_branch_thunk_register logic

Rewrite ix86_indirect_branch_thunk_register logic with

(and (not (match_test "ix86_indirect_branch_thunk_register"))
 (original condition before r256662))

	* config/i386/predicates.md (constant_call_address_operand):
	Rewrite ix86_indirect_branch_thunk_register logic.
	(sibcall_insn_operand): Likewise.
---
 gcc/config/i386/predicates.md | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 5ae443231b8..23bf564f58c 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -710,24 +710,22 @@
   (ior (match_test "constant_call_address_operand
 		 (op, mode == VOIDmode ? mode : Pmode)")
(match_operand 0 "call_register_no_elim_operand")
-   (ior (and (not (match_test "TARGET_X32
-   || ix86_indirect_branch_thunk_register"))
-		 (match_operand 0 "memory_operand"))
-	(and (match_test "TARGET_X32 && Pmode == DImode
-			  && !ix86_indirect_branch_thunk_register")
-		 (match_operand 0 "GOT_memory_operand")
+   (and (not (match_test "ix86_indirect_branch_thunk_register"))
+	(ior (and (not (match_test "TARGET_X32"))
+		  (match_operand 0 "memory_operand"))
+		 (and (match_test "TARGET_X32 && Pmode == DImode")
+		  (match_operand 0 "GOT_memory_operand"))
 
 ;; Similarly, but for tail calls, in which we cannot allow memory references.
 (define_special_predicate "sibcall_insn_operand"
   (ior (match_test "constant_call_address_operand
 		 (op, mode == VOIDmode ? mode : Pmode)")
(match_operand 0 "register_no_elim_operand")
-   (ior (and (not (match_test "TARGET_X32
-   || ix86_indirect_branch_thunk_register"))
-		 (match_operand 0 "sibcall_memory_operand"))
-	(and (match_test "TARGET_X32 && Pmode == DImode
-			  && !ix86_indirect_branch_thunk_register")
-		 (match_operand 0 "GOT_memory_operand")
+   (and (not (match_test "ix86_indirect_branch_thunk_register"))
+	(ior (and (not (match_test "TARGET_X32"))
+		  (match_operand 0 "sibcall_memory_operand"))
+		 (and (match_test "TARGET_X32 && Pmode == DImode")
+		  (match_operand 0 "GOT_memory_operand"))
 
 ;; Return true if OP is a 32-bit GOT symbol operand.
 (define_predicate "GOT32_symbol_operand"
-- 
2.14.3



RE: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread Kumar, Venkataramanan
Hi 

> -Original Message-
> From: H.J. Lu [mailto:hjl.to...@gmail.com]
> Sent: Sunday, January 14, 2018 7:52 PM
> To: Jan Hubicka 
> Cc: Kumar, Venkataramanan ; gcc-
> patc...@gcc.gnu.org; Dharmakan, Rohit arul raj
> ; Nagarajan, Muthu kumar raj
> ; Uros Bizjak (ubiz...@gmail.com)
> 
> Subject: Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
> 
> On Sun, Jan 14, 2018 at 6:20 AM, Jan Hubicka  wrote:
> >> > Hi HJ,
> >> >
> >> > > -Original Message-
> >> > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> > > ow...@gcc.gnu.org] On Behalf Of H.J. Lu
> >> > > Sent: Sunday, January 14, 2018 9:07 AM
> >> > > To: gcc-patches@gcc.gnu.org
> >> > > Subject: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
> >> > >
> >> > > This set of patches for GCC 8 mitigates variant #2 of the
> >> > > speculative execution vulnerabilities on x86 processors
> >> > > identified by CVE-2017-5715, aka Spectre.  They convert indirect
> >> > > branches and function returns to call and return thunks to avoid
> speculative execution via indirect call, jmp and ret.
> >> > >
> >> > > H.J. Lu (5):
> >> > >   x86: Add -mindirect-branch=
> >> > >   x86: Add -mfunction-return=
> >> > >   x86: Add -mindirect-branch-register
> >> > >   x86: Add 'V' register operand modifier
> >> > >   x86: Disallow -mindirect-branch=/-mfunction-return= with
> >> > > -mcmodel=large
> >> >
> >> > Current set of patches don't seem to have any option to generate
> "lfence" as the loop filler in "retpoline",  which is required by AMD.
> >> > Can you please clarify the plan. We would like to get this checked-in GCC
> 8.
> >>
> >> Since thunks are output as strings, it is easy to add the option on
> >> the top of patch #1 of the series.  I do not fully understand the
> >> reason for choosing pause over lfence for Intel, but if we need to do
> >> both, we need to have command line option (and possibly attribute).
> >> What would be reasonable name for it?
> >
> > I forgot there is -mindirect-branch-loop for that in the original patchset.
> > So for now we should be happy with having both lfence and pause in
> > there or do we still need it?
> >
> 
> I suggest we leave it out for the time being.

Yes as of now having both "lfence" and "pause" is Ok.   Hope we can add "- 
indirect-branch-loop" option later if required.  

Regards,
Venkat,
> 
> 
> --
> H.J.


Re: [PATCH] MicroBlaze resolve section change issues for ident output

2018-01-14 Thread Jeff Law
On 01/11/2018 06:54 AM, Nathan Rossi wrote:
> 2018-01-11  Nathan Rossi  
> 
> PR target/83013
> * config/microblaze/microblaze.c (microblaze_asm_output_ident):
> Use .pushsection/.popsection
THanks.  Installed on the trunk.

jeff


Re: [PATCH 1/2] Fix unstable sort

2018-01-14 Thread Jeff Law
On 01/12/2018 01:58 PM, li...@coryfields.com wrote:
> From: Cory Fields 
> 
> 2018-01-12  Cory Fields  
>* tree-ssa-loop-im.c (sort_bbs_in_loop_postorder_cmp): stabilize sort
Thanks.  Installed onto the trunk.
jeff


Re: [PATCH 2/2] Fix unstable sort

2018-01-14 Thread Jeff Law
On 01/12/2018 01:58 PM, li...@coryfields.com wrote:
> From: Cory Fields 
> 
> 2018-01-12  Cory Fields  
>* tree-ira.c (allocno_hard_regs_compare): stabilize sort
Thanks.  I fixed the ChangeLog entry and installed hte patch on the trunk.

jeff


Re: [PATCH] suppress -Wstringop-overflow when no-warning is set (PR 83508)

2018-01-14 Thread Jeff Law
On 01/11/2018 04:49 PM, Martin Sebor wrote:
> On 01/11/2018 04:24 PM, Jeff Law wrote:
>> On 01/10/2018 01:26 PM, Martin Sebor wrote:
>>> To avoid issuing duplicate warnings for the same function call
>>> in the source code the -Wrestrict warning code makes sure
>>> the no-warning bit is propagated between trees and GIMPLE and
>>> tested before issuing a warning.  But the warning also detects
>>> some of the same problems as -Wstringop-overflow, and that
>>> warning was not updated to pay attention to the no-warning bit.
>>> This can in turn lead to two warnings for what boils down to
>>> the same bug.  The warnings can be confusing when the first
>>> one references the function as it appears in the source code
>>> and the second one the one it was transformed to by GCC after
>>> the first warning was issued.
>>>
>>> The attached patch corrects this oversight by having
>>> the buffer overflow checker test the no-warning bit and skip
>>> issuing a diagnostic.  (The function that does the overflow
>>> checking still runs so that it can continue to indicate to its
>>> callers whether an overflow has been detected.)
>>>
>>> Bootstrap on x86_64-linux in progress.
>>>
>>> Martin
>>>
>>> gcc-83508.diff
>>>
>>>
>>> PR other/83508 - c-c++-common/Wrestrict.c fails since r255836
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR other/83508
>>> * gcc.dg/Wstringop-overflow-2.c: New test.
>>>
>>> gcc/ChangeLog:
>>>
>>> PR other/83508
>>> * builtins.c (check_access): Avoid warning when the no-warning bit
>>> is set.
>> Would it be better to check for TREE_NO_WARNING at the very start of
>> check_access rather than sprinkling it at various sites later in the
>> code?
> 
> The function is called from a couple of places to check that
> there is no overflow and avoid expansion (we discussed this
> not too long ago when I enhanced compute_objsize() to handle
> ranges).  I didn't want to change that (I mention it in the
> last sentence above.)
OK.  Just wanted to raise it as a possibility.

> 
> Looking forward, I would like to see these middle-end checkers
> used to avoid making certain kinds of transformations that we
> know are dangerous (we discussed emitting __builtin_trap or
> __builtin_unreachable) for some such cases.  I realize they
> aren't suitable for it quite it yet and will need work to make
> them so (assuming we can agree on that approach), but I mention
> it to explain what I was thinking when I sprinkled the tests
> in check_access() the way I did.
Understood.

> 
> As an aside, even though I think GCC should issue only one
> warning per function, I'm not sure that -Wrestrict should
> trump -Wstringop-overflow.  It seems like it should be the
> other way around because the latter is more severe.  That's
> also how it worked before -Wrestrict was moved into its own
> pass.  To make it work like that again, -Wstringop-overflow
> would need to run either before -Wrestrict or at the same
> time.  Maybe it should be moved in GCC 9.
I think we're OK for gcc-8.  We can revisit ordering and such for gcc-9.

FWIW, I went ahead and committed your patch to the trunk.

jeff


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-14 Thread David Woodhouse
On Sun, 2018-01-14 at 21:52 +0100, Uros Bizjak wrote:
> 
> Well, you did say that these are strange times ;)
> 
> From the user perspective, it would be more convenient to use the
> thunk names that are the same for 32bit and 64bit targets. If we
> ignore this fact, the difference is only a couple of lines in the
> compiler source which we also can live with. But please discuss my
> proposal also in the kernel community, and weight the benefits and
> drawbacks of each approach before the final decision.
> 
> Please pass the final decision to gcc community, and we'll implement
> it.

I think you watched this happen, but just to be explicitly clear:

We weighed the benefits and tested this, and we concluded that we don't
want it. Let's stick with e.g. __x86_indirect_thunk_rax please.

Thank you for being flexible.

smime.p7s
Description: S/MIME cryptographic signature