[RX] Add support for atomic operations

2016-05-08 Thread Oleg Endo
Hi,

The attached patch adds some rudimentary support for atomic operations.
 On the original RX there is one truly atomic insn "xchg".  All other
operations have to implement atomicity in some other way.  One straight
forward option which is already being done on SH is to disable
interrupts for the duration of the atomic sequence.  This works for
single-core systems that run in privileged mode.  And that's what the
patch does.

OK for trunk?

Cheers,
Oleg

gcc/ChangeLog:
* config/rx/rx-protos.h (is_interrupt_func, is_fast_interrupt_func):
Forward declare.
(rx_atomic_sequence): New class.
* config/rx/rx.c (rx_print_operand): Use symbolic names for PSW bits.
(is_interrupt_func, is_fast_interrupt_func): Make non-static and
non-inline.
(rx_atomic_sequence::rx_atomic_sequence,
rx_atomic_sequence::~rx_atomic_sequence): New functions.
* config/rx/rx.md (CTRLREG_PSW, CTRLREG_USP, CTRLREG_FPSW, CTRLREG_CPEN,
CTRLREG_BPSW, CTRLREG_BPC, CTRLREG_ISP, CTRLREG_FINTV,
CTRLREG_INTB): New constants.
(FETCHOP): New code iterator.
(fethcop_name, fetchop_name2): New iterator code attributes.
(QIHI): New mode iterator.
(atomic_exchange, atomic_exchangesi, xchg_mem,
atomic_fetch_si, atomic_fetch_nandsi,
atomic__fetchsi, atomic_nand_fetchsi): New patterns.Index: gcc/config/rx/rx-protos.h
===
--- gcc/config/rx/rx-protos.h	(revision 235992)
+++ gcc/config/rx/rx-protos.h	(working copy)
@@ -26,6 +26,28 @@
 extern void		rx_expand_prologue (void);
 extern int		rx_initial_elimination_offset (int, int);
 
+bool is_interrupt_func (const_tree decl);
+bool is_fast_interrupt_func (const_tree decl);
+
+/* rx_atomic_sequence is used to emit the header and footer
+   of an atomic sequence.  It's supposed to be used in a scope.
+   When constructed, it will emit the atomic sequence header insns.
+   When destructred (goes out of scope), it will emit the
+   corresponding atomic sequence footer insns.  */
+class rx_atomic_sequence
+{
+public:
+  rx_atomic_sequence (const_tree fun_decl);
+  ~rx_atomic_sequence (void);
+
+private:
+  rx_atomic_sequence (void);
+  rx_atomic_sequence (const rx_atomic_sequence&);
+  rx_atomic_sequence& operator = (const rx_atomic_sequence&);
+
+  rtx m_prev_psw_reg;
+};
+
 #ifdef RTX_CODE
 extern int		rx_adjust_insn_length (rtx_insn *, int);
 extern int 		rx_align_for_label (rtx, int);
Index: gcc/config/rx/rx.c
===
--- gcc/config/rx/rx.c	(revision 235992)
+++ gcc/config/rx/rx.c	(working copy)
@@ -630,15 +630,15 @@
   gcc_assert (CONST_INT_P (op));
   switch (INTVAL (op))
 	{
-	case 0:   fprintf (file, "psw"); break;
-	case 2:   fprintf (file, "usp"); break;
-	case 3:   fprintf (file, "fpsw"); break;
-	case 4:   fprintf (file, "cpen"); break;
-	case 8:   fprintf (file, "bpsw"); break;
-	case 9:   fprintf (file, "bpc"); break;
-	case 0xa: fprintf (file, "isp"); break;
-	case 0xb: fprintf (file, "fintv"); break;
-	case 0xc: fprintf (file, "intb"); break;
+	case CTRLREG_PSW:   fprintf (file, "psw"); break;
+	case CTRLREG_USP:   fprintf (file, "usp"); break;
+	case CTRLREG_FPSW:  fprintf (file, "fpsw"); break;
+	case CTRLREG_CPEN:  fprintf (file, "cpen"); break;
+	case CTRLREG_BPSW:  fprintf (file, "bpsw"); break;
+	case CTRLREG_BPC:   fprintf (file, "bpc"); break;
+	case CTRLREG_ISP:   fprintf (file, "isp"); break;
+	case CTRLREG_FINTV: fprintf (file, "fintv"); break;
+	case CTRLREG_INTB:  fprintf (file, "intb"); break;
 	default:
 	  warning (0, "unrecognized control register number: %d - using 'psw'",
 		   (int) INTVAL (op));
@@ -1216,7 +1216,7 @@
 
 /* Returns true if the provided function has the "fast_interrupt" attribute.  */
 
-static inline bool
+bool
 is_fast_interrupt_func (const_tree decl)
 {
   return has_func_attr (decl, "fast_interrupt");
@@ -1224,7 +1224,7 @@
 
 /* Returns true if the provided function has the "interrupt" attribute.  */
 
-static inline bool
+bool
 is_interrupt_func (const_tree decl)
 {
   return has_func_attr (decl, "interrupt");
@@ -3409,6 +3409,29 @@
   return TARGET_ENABLE_LRA;
 }
 
+rx_atomic_sequence::rx_atomic_sequence (const_tree fun_decl)
+{
+  if (is_fast_interrupt_func (fun_decl) || is_interrupt_func (fun_decl))
+{
+  /* If we are inside an interrupt handler, assume that interrupts are
+	 off -- which is the default hardware behavior.  In this case, there
+	 is no need to disable the interrupts.  */
+  m_prev_psw_reg = NULL;
+}
+  else
+{
+  m_prev_psw_reg = gen_reg_rtx (SImode);
+  emit_insn (gen_mvfc (m_prev_psw_reg, GEN_INT (CTRLREG_PSW)));
+  emit_insn (gen_clrpsw (GEN_INT ('I')));
+}
+}
+
+rx_atomic_sequence::~rx_atomic_sequence (void)
+{
+  if (m_prev_psw_reg != NULL)
+emit_insn (gen_mvtc (GEN_INT (CTRLREG_PSW), m_prev_psw_reg));
+}
+
 
 #undef  TARGET_NARROW_VOLA

Error out on -fvtable-verify without --enable-vtable-verify

2016-05-08 Thread Rainer Orth
With the recent change not to install libvtv without
--enable-vtable-verify, I noticed that gcc/g++ would still accept
-fvtable-verify without errors, only to emit obscure link-time errors
about missing vtv_*.o (which hadn't been installed in that situation
before) and libvtv.

It seems to me a much better user experience to emit a clear error
message in this case, which is what this patch does.

Bootstrapped without regressions (without and with
--enable-vtable-verify) on i386-pc-solaris2.12 and x86_64-pc-linux-gnu.
Manually tested that I get (or don't get) the expected errors for
-fvtable-verfy=(none|std|preinit].

Ok for mainline?

Rainer


2016-05-04  Rainer Orth  

* configure.ac (enable_vtable_verify): Handle --enable-vtable-verify.
* configure: Regenerate.
* config.in: Regenerate.
* gcc.c (VTABLE_VERIFICATION_SPEC) [!ENABLE_VTABLE_VERIFY]: Error
on -fvtable-verify.
* config/sol2.h [!ENABLE_VTABLE_VERIFY] (STARTFILE_VTV_SPEC): Define.
(ENDFILE_VTV_SPEC): Define.

# HG changeset patch
# Parent  26d037ddd624a6e7b738c1db2b6dbdeb90c9b01c
Error out on -fvtable-verify without --enable-vtable-verify

diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h
--- a/gcc/config/sol2.h
+++ b/gcc/config/sol2.h
@@ -166,21 +166,26 @@ along with GCC; see the file COPYING3.  
 #define STARTFILE_CRTBEGIN_SPEC	"crtbegin.o%s"
 #endif
 
+#if ENABLE_VTABLE_VERIFY
 #if SUPPORTS_INIT_PRIORITY
 #define STARTFILE_VTV_SPEC \
   "%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
  fvtable-verify=std:vtv_start.o%s}"
-
 #define ENDFILE_VTV_SPEC \
   "%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_end_preinit.o%s; \
  fvtable-verify=std:vtv_end.o%s}"
-#else
+#else /* !SUPPORTS_INIT_PRIORITY */
 #define STARTFILE_VTV_SPEC \
-  "%{fvtable-verify:%e-fvtable-verify is not supported in this configuration}"
+  "%{fvtable-verify=*: \
+ %e-fvtable-verify=%* is not supported in this configuration}"
 #define ENDFILE_VTV_SPEC ""
-#endif
+#endif /* !SUPPORTS_INIT_PRIORITY */
+#else /* !ENABLE_VTABLE_VERIFY */
+#define STARTFILE_VTV_SPEC ""
+#define ENDFILE_VTV_SPEC ""
+#endif /* !ENABLE_VTABLE_VERIFY */
 
 /* We don't use the standard svr4 STARTFILE_SPEC because it's wrong for us.  */
 #undef STARTFILE_SPEC
diff --git a/gcc/configure.ac b/gcc/configure.ac
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -865,6 +865,19 @@ Valid choices are 'yes' and 'no'.]) ;;
   esac
 ], [enable_tls=''])
 
+AC_ARG_ENABLE(vtable-verify,
+[AS_HELP_STRING([--enable-vtable-verify],
+		[enable vtable verification feature])],
+[case "$enableval" in
+ yes) enable_vtable_verify=yes ;;
+ no)  enable_vtable_verify=no ;;
+ *)   enable_vtable_verify=no;;
+ esac],
+[enable_vtable_verify=no])
+vtable_verify=`if test $enable_vtable_verify != no; then echo 1; else echo 0; fi`
+AC_DEFINE_UNQUOTED(ENABLE_VTABLE_VERIFY, $vtable_verify,
+[Define 0/1 if vtable verification feature is enabled.])
+
 AC_ARG_ENABLE(objc-gc,
 [AS_HELP_STRING([--enable-objc-gc],
 		[enable the use of Boehm's garbage collector with
diff --git a/gcc/gcc.c b/gcc/gcc.c
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -989,9 +989,18 @@ proper position among the other output f
 the vtable verification runtime functions are in libstdc++, so we use
 the spec just below this one.  */
 #ifndef VTABLE_VERIFICATION_SPEC
+#if ENABLE_VTABLE_VERIFY
 #define VTABLE_VERIFICATION_SPEC "\
 %{!nostdlib:%{fvtable-verify=std: -lvtv -u_vtable_map_vars_start -u_vtable_map_vars_end}\
 %{fvtable-verify=preinit: -lvtv -u_vtable_map_vars_start -u_vtable_map_vars_end}}"
+#else
+#define VTABLE_VERIFICATION_SPEC "\
+%{fvtable-verify=none:} \
+%{fvtable-verify=std: \
+  %e-fvtable-verify=std is not supported in this configuration} \
+%{fvtable-verify=preinit: \
+  %e-fvtable-verify=preinit is not supported in this configuration}"
+#endif
 #endif
 
 #ifndef CHKP_SPEC

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[v3 PATCH] Avoid endless run-time recursion for copying single-element tuples where the element type is by-value constructible from any type

2016-05-08 Thread Ville Voutilainen
Tested on Linux-PPC64.

2016-05-08  Ville Voutilainen  

Avoid endless run-time recursion for copying single-element
tuples where the element type is by-value constructible
from any type.
 * include/std/tuple (_TC<>::_NotSameTuple): New.
 * include/std/tuple (tuple(_UElements&&...): Use it.
* testsuite/20_util/tuple/cons/element_accepts_anything_byval.cc: New.
diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 53f3184..7522e43 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -500,6 +500,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __not_>
   >::value;
 }
+template
+static constexpr bool _NotSameTuple()
+{
+  return  __not_,
+typename remove_const<
+  typename remove_reference<_UElements...>::type
+  >::type>>::value;
+}
   };
 
   template
@@ -534,6 +542,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   return true;
 }
+template
+static constexpr bool _NotSameTuple()
+{
+  return  true;
+}
   };
 
   /// Primary class template, tuple
@@ -611,7 +624,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Elements...>;
 
   template::template
+  enable_if<
+ _TC::template
+   _NotSameTuple<_UElements...>()
+ && _TMC<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
   && _TMC<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
@@ -621,7 +637,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : _Inherited(std::forward<_UElements>(__elements)...) { }
 
   template::template
+enable_if<
+ _TC::template
+   _NotSameTuple<_UElements...>()
+ && _TMC<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
   && !_TMC<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
diff --git 
a/libstdc++-v3/testsuite/20_util/tuple/cons/element_accepts_anything_byval.cc 
b/libstdc++-v3/testsuite/20_util/tuple/cons/element_accepts_anything_byval.cc
new file mode 100644
index 000..fe9bea6
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/20_util/tuple/cons/element_accepts_anything_byval.cc
@@ -0,0 +1,30 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+using namespace std;
+
+struct Something {
+Something() { }
+template  Something(T) { }
+};
+
+int main() {
+tuple t1;
+tuple t2 = t1;
+}
+


Re: [v3 PATCH] Avoid endless run-time recursion for copying single-element tuples where the element type is by-value constructible from any type

2016-05-08 Thread Daniel Krügler
Have you considered to test against decay instead of
remove_reference/remove_const? That would be similar to other places
in the standard. (I also believe that your fix actually should be
submitted as an LWG issue)

- Daniel

2016-05-08 13:43 GMT+02:00 Ville Voutilainen :
> Tested on Linux-PPC64.
>
> 2016-05-08  Ville Voutilainen  
>
> Avoid endless run-time recursion for copying single-element
> tuples where the element type is by-value constructible
> from any type.
>  * include/std/tuple (_TC<>::_NotSameTuple): New.
>  * include/std/tuple (tuple(_UElements&&...): Use it.
> * testsuite/20_util/tuple/cons/element_accepts_anything_byval.cc: New.



-- 


SavedURI :Show URLShow URLSavedURI :
SavedURI :Hide URLHide URLSavedURI :
https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154



Re: [v3 PATCH] Avoid endless run-time recursion for copying single-element tuples where the element type is by-value constructible from any type

2016-05-08 Thread Ville Voutilainen
On 8 May 2016 at 14:48, Daniel Krügler  wrote:
> Have you considered to test against decay instead of
> remove_reference/remove_const? That would be similar to other places
> in the standard. (I also believe that your fix actually should be
> submitted as an LWG issue)


STL is about to submit the fix as an LWG issue. He seemed to agree
that the fix is what
he intends to submit. Using decay instead of
remove_reference/remove_const is fine by
me, but I suppose it shouldn't make a difference here other than
notational brevity?


Re: [v3 PATCH] Avoid endless run-time recursion for copying single-element tuples where the element type is by-value constructible from any type

2016-05-08 Thread Ville Voutilainen
On 8 May 2016 at 14:51, Ville Voutilainen  wrote:
> On 8 May 2016 at 14:48, Daniel Krügler  wrote:
>> Have you considered to test against decay instead of
>> remove_reference/remove_const? That would be similar to other places
>> in the standard. (I also believe that your fix actually should be
>> submitted as an LWG issue)
>
>
> STL is about to submit the fix as an LWG issue. He seemed to agree
> that the fix is what
> he intends to submit. Using decay instead of
> remove_reference/remove_const is fine by
> me, but I suppose it shouldn't make a difference here other than
> notational brevity?

For what it's worth, I have the tiniest preference against using decay
here; whenever I see
decay, I wonder whether array/function decay is significant. While it
doesn't make a difference
here, I still prefer just doing remove_reference+remove_const here.
It's up to Jonathan, I'll change
it to decay if he so advises.


[SH][committed] Various cleanups

2016-05-08 Thread Oleg Endo
Hi,

The attached patch performs various cleanups in the SH code.  No
functional changes.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r236008.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh-protos.h (sh_media_register_for_return): Remove.
* config/sh/sh.c: Define and declare variables on first use throughout
the file.
(current_function_interrupt): Change to bool type.
(frame_insn): Rename to emit_frame_insn and update users.
(push_regs): Use bool for 'interrupt_handler' argument.
(save_schedule_s): Remove.
(TARGET_ASM_UNALIGNED_DI_OP, TARGET_ASM_ALIGNED_DI_OP): Remove.
(sh_option_override): Don't nullify targetm.asm_out.aligned_op.di and
targetm.asm_out.unaligned_op.di.
(gen_far_branch): Remove redundant forward declaration.
(sh_media_register_for_return, MAX_SAVED_REGS, save_entry_s, save_entry,
MAX_TEMPS, save_schedule_ssave_schedule): Remove.
(sh_set_return_address, sh_function_ok_for_sibcall,
scavenge_reg): Update comments.
(sh_builtin_saveregs): Use TRAGET_FPU_ANY condition.
(sh2a_get_function_vector_number, sh2a_function_vector_p): Use for loop.
(sh_attr_renesas_p): Remove unnecessary parentheses.
(branch_dest): Simplify.
* config/sh/sh.h (sh_args): Remove byref, byref_regs, stack_regs fields.
Change force_mem, prototype_p, outgoing, renesas_abi fields to bool.
(CUMULATIVE_ARGS): Change macro to typedef.
(current_function_interrupt): Change to bool type.
(sh_arg_class, sh_args, CUMULATIVE_ARGS, current_function_interrupt):
Surround with __cplusplus ifdef.
(sh_compare_op0, sh_compare_op1): Remove.
(EPILOGUE_USES): Use TARGET_FPU_ANY condition.diff --git a/gcc/config/sh/sh-protos.h b/gcc/config/sh/sh-protos.h
index d302394..fecbb88 100644
--- a/gcc/config/sh/sh-protos.h
+++ b/gcc/config/sh/sh-protos.h
@@ -366,7 +366,6 @@ extern void sh_cpu_cpp_builtins (cpp_reader* pfile);
 
 extern const char *output_jump_label_table (void);
 extern rtx get_t_reg_rtx (void);
-extern int sh_media_register_for_return (void);
 extern void sh_expand_prologue (void);
 extern void sh_expand_epilogue (bool);
 extern void sh_set_return_address (rtx, rtx);
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 6d1d1a3..51f983c 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -80,8 +80,9 @@ int code_for_indirect_jump_scratch = CODE_FOR_indirect_jump_scratch;
 		  ? (DECL_ATTRIBUTES (decl)) \
 		  : TYPE_ATTRIBUTES (TREE_TYPE (decl))
 
-/* Set to 1 by expand_prologue() when the function is an interrupt handler.  */
-int current_function_interrupt;
+/* Set to true by expand_prologue() when the function is an
+   interrupt handler.  */
+bool current_function_interrupt;
 
 tree sh_deferred_function_attributes;
 tree *sh_deferred_function_attributes_tail = &sh_deferred_function_attributes;
@@ -180,10 +181,10 @@ static void sh_reorg (void);
 static void sh_option_override (void);
 static void sh_override_options_after_change (void);
 static void output_stack_adjust (int, rtx, int, HARD_REG_SET *, bool);
-static rtx_insn *frame_insn (rtx);
+static rtx_insn* emit_frame_insn (rtx);
 static rtx push (int);
 static void pop (int);
-static void push_regs (HARD_REG_SET *, int);
+static void push_regs (HARD_REG_SET* mask, bool interrupt_handler);
 static int calc_live_regs (HARD_REG_SET *);
 static HOST_WIDE_INT rounded_frame_size (int);
 static bool sh_frame_pointer_required (void);
@@ -267,7 +268,6 @@ static rtx sh_delegitimize_address (rtx);
 static bool sh_cannot_substitute_mem_equiv_p (rtx);
 static bool sh_legitimize_address_displacement (rtx *, rtx *, machine_mode);
 static int scavenge_reg (HARD_REG_SET *s);
-struct save_schedule_s;
 
 static rtx sh_struct_value_rtx (tree, int);
 static rtx sh_function_value (const_tree, const_tree, bool);
@@ -355,12 +355,6 @@ static const struct attribute_spec sh_attribute_table[] =
 #undef TARGET_ASM_UNALIGNED_SI_OP
 #define TARGET_ASM_UNALIGNED_SI_OP "\t.ualong\t"
 
-/* These are NULLed out on non-SH5 in TARGET_OPTION_OVERRIDE.  */
-#undef TARGET_ASM_UNALIGNED_DI_OP
-#define TARGET_ASM_UNALIGNED_DI_OP "\t.uaquad\t"
-#undef TARGET_ASM_ALIGNED_DI_OP
-#define TARGET_ASM_ALIGNED_DI_OP "\t.quad\t"
-
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE sh_option_override
 
@@ -832,10 +826,6 @@ sh_option_override (void)
   sh_cpu = PROCESSOR_SH4A;
 }
 
-  /* Only the sh64-elf assembler fully supports .quad properly.  */
-  targetm.asm_out.aligned_op.di = NULL;
-  targetm.asm_out.unaligned_op.di = NULL;
-
   /* User/priviledged mode is supported only on SH3* and SH4*.
  Disable it for everything else.  */
   if (!TARGET_SH3 && TARGET_USERMODE)
@@ -1662,11 +1652,9 @@ prepare_move_operands (rtx operands[], machine_mode mode)
 
   if (mode == Pmode || mode =

[SH][committed] Convert GET_SH_ARG_CLASS into a function

2016-05-08 Thread Oleg Endo
Hi,

The attached patch converts the GET_SH_ARG_CLASS macro into a function.
 No functional change.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r236009.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.h (GET_SH_ARG_CLASS): Convert macro into ...
* config/sh/sh.c (get_sh_arg_class): ... this new function.  Update its
users.diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 51f983c..a36b098 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -7898,6 +7898,20 @@ sh_callee_copies (cumulative_args_t cum, machine_mode mode,
 	  % SH_MIN_ALIGN_FOR_CALLEE_COPY == 0));
 }
 
+static sh_arg_class
+get_sh_arg_class (machine_mode mode)
+{
+  if (TARGET_FPU_ANY && mode == SFmode)
+return SH_ARG_FLOAT;
+
+  if (TARGET_FPU_DOUBLE
+  && (GET_MODE_CLASS (mode) == MODE_FLOAT
+	  || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT))
+return SH_ARG_FLOAT;
+
+  return SH_ARG_INT;
+}
+
 /* Round a register number up to a proper boundary for an arg of mode
MODE.
The SH doesn't care about double alignment, so we only
@@ -7913,9 +7927,9 @@ sh_round_reg (const CUMULATIVE_ARGS& cum, machine_mode mode)
 	  && (mode == DFmode || mode == DCmode)
 	  && cum.arg_count[(int) SH_ARG_FLOAT] < NPARM_REGS (mode)))
  && GET_MODE_UNIT_SIZE (mode) > UNITS_PER_WORD)
-? (cum.arg_count[(int) GET_SH_ARG_CLASS (mode)]
-   + (cum.arg_count[(int) GET_SH_ARG_CLASS (mode)] & 1))
-: cum.arg_count[(int) GET_SH_ARG_CLASS (mode)]);
+? (cum.arg_count[(int) get_sh_arg_class (mode)]
+   + (cum.arg_count[(int) get_sh_arg_class (mode)] & 1))
+: cum.arg_count[(int) get_sh_arg_class (mode)]);
 }
 
 /* Return true if arg of the specified mode should be passed in a register
@@ -8067,7 +8081,7 @@ sh_function_arg_advance (cumulative_args_t ca_v, machine_mode mode,
 
   if (! ((TARGET_SH4 || TARGET_SH2A) || ca->renesas_abi)
   || sh_pass_in_reg_p (*ca, mode, type))
-(ca->arg_count[(int) GET_SH_ARG_CLASS (mode)]
+(ca->arg_count[(int) get_sh_arg_class (mode)]
  = (sh_round_reg (*ca, mode)
 	+ (mode == BLKmode
 	   ? CEIL (int_size_in_bytes (type), UNITS_PER_WORD)
diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
index 34dd135..f725535 100644
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
@@ -1198,13 +1198,6 @@ extern bool current_function_interrupt;
 
 #endif // __cplusplus
 
-#define GET_SH_ARG_CLASS(MODE) \
-  ((TARGET_FPU_ANY && (MODE) == SFmode) \
-   ? SH_ARG_FLOAT \
-   : TARGET_FPU_DOUBLE && (GET_MODE_CLASS (MODE) == MODE_FLOAT \
-			   || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT) \
- ? SH_ARG_FLOAT : SH_ARG_INT)
-
 /* Initialize a variable CUM of type CUMULATIVE_ARGS
for a call to a function whose data type is FNTYPE.
For a library call, FNTYPE is 0.


New Swedish PO file for 'gcc' (version 6.1.0)

2016-05-08 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-6.1.0.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[PATCH, GCC] PR middle-end/55299, fold bitnot through ASR and rotates

2016-05-08 Thread Mikhail Maltsev
Hi!

I decided to revive this patch:
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00999.html.
I addressed review comments about sign conversions. Bootstrapped and regtested
on x86_64-linux-gnu {,-m32}. OK for trunk?

-- 
Regards,
Mikhail Maltsev

gcc/testsuite/ChangeLog:

2016-05-08  Mikhail Maltsev  

PR tree-optimization/54579
PR middle-end/55299
* gcc.dg/fold-notrotate-1.c: New test.
* gcc.dg/fold-notshift-1.c: New test.
* gcc.dg/fold-notshift-2.c: New test.


gcc/ChangeLog:

2016-05-08  Mikhail Maltsev  

PR tree-optimization/54579
PR middle-end/55299
* match.pd (~(~X >> Y), ~(~X >>r Y), ~(~X > Y) -> X >> Y (for arithmetic shift).  */
+(simplify
+ (bit_not (convert? (rshift (bit_not @0) @1)))
+  (if (!TYPE_UNSIGNED (TREE_TYPE (@0)))
+   (convert (rshift @0 @1
+(simplify
+ (bit_not (convert? (rshift (convert@0 (bit_not @1)) @2)))
+  (if (!TYPE_UNSIGNED (TREE_TYPE (@0)))
+   (with
+{ tree shift_type = TREE_TYPE (@0); }
+ (convert (rshift:shift_type (convert @1) @2)
+
+/* Same as above, but for rotates.  */
+(for rotate (lrotate rrotate)
+ (simplify
+  (bit_not (convert1? (rotate (convert2? (bit_not @0)) @1)))
+   (with
+{ tree operand_type = TREE_TYPE (@0); }
+ (convert (rotate:operand_type @0 @1)
 
 /* Simplifications of conversions.  */
 
diff --git a/gcc/testsuite/gcc.dg/fold-notrotate-1.c b/gcc/testsuite/gcc.dg/fold-notrotate-1.c
new file mode 100644
index 000..a9b3804
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-notrotate-1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+#define INT_BITS  (sizeof (int) * __CHAR_BIT__)
+#define ROL(x, y) ((x) << (y) | (x) >> (INT_BITS - (y)))
+#define ROR(x, y) ((x) >> (y) | (x) << (INT_BITS - (y)))
+
+unsigned
+rol (unsigned a, unsigned b)
+{
+  return ~ROL (~a, b);
+}
+
+unsigned int
+ror (unsigned a, unsigned b)
+{
+  return ~ROR (~a, b);
+}
+
+int
+rol_conv1 (int a, unsigned b)
+{
+  return ~(int)ROL((unsigned)~a, b);
+}
+
+int
+rol_conv2 (int a, unsigned b)
+{
+  return ~ROL((unsigned)~a, b);
+}
+
+int
+rol_conv3 (unsigned a, unsigned b)
+{
+  return ~(int)ROL(~a, b);
+}
+
+#define LONG_BITS  (sizeof (long) * __CHAR_BIT__)
+#define ROLL(x, y) ((x) << (y) | (x) >> (LONG_BITS - (y)))
+#define RORL(x, y) ((x) >> (y) | (x) << (LONG_BITS - (y)))
+
+unsigned long
+roll (unsigned long a, unsigned long b)
+{
+  return ~ROLL (~a, b);
+}
+
+unsigned long
+rorl (unsigned long a, unsigned long b)
+{
+  return ~RORL (~a, b);
+}
+
+/* { dg-final { scan-tree-dump-not "~" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/fold-notshift-1.c b/gcc/testsuite/gcc.dg/fold-notshift-1.c
new file mode 100644
index 000..674f3c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-notshift-1.c
@@ -0,0 +1,62 @@
+/* PR tree-optimization/54579
+   PR middle-end/55299 */
+
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-cddce1" } */
+
+int
+asr1 (int a, int b)
+{
+  return ~((~a) >> b);
+}
+
+long
+asr1l (long a, long b)
+{
+  return ~((~a) >> b);
+}
+
+int
+asr_conv (unsigned a, unsigned b)
+{
+  return ~((int)~a >> b);
+}
+
+unsigned
+asr_conv2 (unsigned a, unsigned b)
+{
+  return ~(unsigned)((int)~a >> b);
+}
+
+unsigned
+asr_conv3 (int a, int b)
+{
+  return ~(unsigned)(~a >> b);
+}
+
+int
+asr2 (int a, int b)
+{
+  return -((-a - 1) >> b) - 1;
+}
+
+int
+asr3 (int a, int b)
+{
+  return a < 0 ? ~((~a) >> b) : a >> b;
+}
+
+long
+asr3l (long a, int b)
+{
+  return a < 0 ? ~((~a) >> b) : a >> b;
+}
+
+int
+asr4 (int a, int b)
+{
+  return a < 0 ? -((-a - 1) >> b) - 1 : a >> b;
+}
+
+/* { dg-final { scan-tree-dump-times ">>" 9 "cddce1" } } */
+/* { dg-final { scan-tree-dump-not "~" "cddce1" } } */
diff --git a/gcc/testsuite/gcc.dg/fold-notshift-2.c b/gcc/testsuite/gcc.dg/fold-notshift-2.c
new file mode 100644
index 000..5287610
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-notshift-2.c
@@ -0,0 +1,18 @@
+/* PR middle-end/55299 */
+
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-cddce1" } */
+
+unsigned int
+lsr (unsigned int a, unsigned int b)
+{
+  return ~((~a) >> b);
+}
+
+int
+sl (int a, int b)
+{
+  return ~((~a) << b);
+}
+
+/* { dg-final { scan-tree-dump-times "~" 4 "cddce1" } } */


[PATCH, i386]: Fix PR 70998, CE in pre_and_rev_post_order_compute, at cfganal.c

2016-05-08 Thread Uros Bizjak
Hello!

As exposed by r235906 [1], we should not widen DFmode memory access to
V2DFmode in the splitter.

Attached patch introduces two new patterns that use correct mode of
memory operand. These two patterns are appropriate for the
TARGET_SSE_PARTIAL_REG_DEPENDENCY splitters, as they don't need to
widen memory access.

2016-05-08  Uros Bizjak  

PR target/70998
* config/i386/sse.md (*sse2_vd_cvtsd2ss): New insn pattern.
(*sse2_vd_cvtss2sd): Ditto.
* config/i386/i386.md
(TARGET_SSE_PARTIAL_REG_DEPENDENCY float_truncate df->sf splitter):
Generate *sse2_vd_cvtsd2ss pattern.
(TARGET_SSE_PARTIAL_REG_DEPENDENCY float_extend sf->df splitter):
Generate *sse2_vd_cvtss2sd pattern.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

[1] https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=235906

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 236007)
+++ config/i386/i386.md (working copy)
@@ -5192,13 +5192,12 @@
   [(set (match_dup 0)
(vec_merge:V4SF
  (vec_duplicate:V4SF
-   (float_truncate:V2SF
+   (float_truncate:SF
  (match_dup 1)))
  (match_dup 0)
  (const_int 1)))]
 {
   operands[0] = lowpart_subreg (V4SFmode, operands[0], SFmode);
-  operands[1] = lowpart_subreg (V2DFmode, operands[1], DFmode);
   emit_move_insn (operands[0], CONST0_RTX (V4SFmode));
 })
 
@@ -5219,15 +5218,13 @@
|| TARGET_AVX512VL)"
   [(set (match_dup 0)
 (vec_merge:V2DF
-  (float_extend:V2DF
-(vec_select:V2SF
-  (match_dup 1)
-  (parallel [(const_int 0) (const_int 1)])))
-  (match_dup 0)
+ (vec_duplicate:V2DF
+   (float_extend:DF
+ (match_dup 1)))
+ (match_dup 0)
   (const_int 1)))]
 {
   operands[0] = lowpart_subreg (V2DFmode, operands[0], DFmode);
-  operands[1] = lowpart_subreg (V4SFmode, operands[1], SFmode);
   emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
 })
 
Index: config/i386/sse.md
===
--- config/i386/sse.md  (revision 236007)
+++ config/i386/sse.md  (working copy)
@@ -4949,6 +4949,27 @@
(set_attr "prefix" "orig,orig,")
(set_attr "mode" "SF")])
 
+(define_insn "*sse2_vd_cvtsd2ss"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
+   (vec_merge:V4SF
+ (vec_duplicate:V4SF
+   (float_truncate:SF (match_operand:DF 2 "nonimmediate_operand" 
"x,m,vm")))
+ (match_operand:V4SF 1 "register_operand" "0,0,v")
+ (const_int 1)))]
+  "TARGET_SSE2"
+  "@
+   cvtsd2ss\t{%2, %0|%0, %2}
+   cvtsd2ss\t{%2, %0|%0, %2}
+   vcvtsd2ss\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssecvt")
+   (set_attr "athlon_decode" "vector,double,*")
+   (set_attr "amdfam10_decode" "vector,double,*")
+   (set_attr "bdver1_decode" "direct,direct,*")
+   (set_attr "btver2_decode" "double,double,double")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "mode" "SF")])
+
 (define_insn "sse2_cvtss2sd"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
(vec_merge:V2DF
@@ -4972,6 +4993,27 @@
(set_attr "prefix" "orig,orig,")
(set_attr "mode" "DF")])
 
+(define_insn "*sse2_vd_cvtss2sd"
+  [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
+   (vec_merge:V2DF
+ (vec_duplicate:V2DF
+   (float_extend:DF (match_operand:SF 2 "nonimmediate_operand" 
"x,m,vm")))
+ (match_operand:V2DF 1 "register_operand" "0,0,v")
+ (const_int 1)))]
+  "TARGET_SSE2"
+  "@
+   cvtss2sd\t{%2, %0|%0, %2}
+   cvtss2sd\t{%2, %0|%0, %2}
+   vcvtss2sd\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssecvt")
+   (set_attr "amdfam10_decode" "vector,double,*")
+   (set_attr "athlon_decode" "direct,direct,*")
+   (set_attr "bdver1_decode" "direct,direct,*")
+   (set_attr "btver2_decode" "double,double,double")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "mode" "DF")])
+
 (define_insn "avx512f_cvtpd2ps512"
   [(set (match_operand:V8SF 0 "register_operand" "=v")
(float_truncate:V8SF


Re: [PATCH, GCC] PR middle-end/55299, fold bitnot through ASR and rotates

2016-05-08 Thread Marc Glisse

On Sun, 8 May 2016, Mikhail Maltsev wrote:


Hi!

I decided to revive this patch:
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00999.html.
I addressed review comments about sign conversions. Bootstrapped and regtested
on x86_64-linux-gnu {,-m32}. OK for trunk?


Hello,

are you sure that your transformations are safe for any kind of 
conversion?


--
Marc Glisse


Re: Simple bitop reassoc in match.pd (was: Canonicalize X u< X to UNORDERED_EXPR)

2016-05-08 Thread Marc Glisse

On Fri, 6 May 2016, Marc Glisse wrote:


2016-05-06  Marc Glisse  

gcc/
* fold-const.c (fold_binary_loc) [(X ^ Y) & Y]: Remove and merge with...
* match.pd ((X & Y) ^ Y): ... this.
((X & Y) & Y, (X | Y) | Y, (X ^ Y) ^ Y, (X & Y) & (X & Z), (X | Y)
| (X | Z), (X ^ Y) ^ (X ^ Z)): New transformations.

gcc/testsuite/
* gcc.dg/tree-ssa/bit-assoc.c: New testcase.
* gcc.dg/tree-ssa/pr69270.c: Adjust.
* gcc.dg/tree-ssa/vrp59.c: Disable forwprop.


Oups, I forgot about convert1/convert2 in match.pd, which may be relevant 
especially if @0 is a constant in the last transforms. Please ignore the 
patch for now, I'll resend after checking it.


--
Marc Glisse


Re: [wwwdocs] Buildstat update for 5.x

2016-05-08 Thread Gerald Pfeifer
On Sat, 23 Apr 2016, Tom G. Christensen wrote:
> Latest results for 5.x

That's quite a bit.  Thanks for the patch, I applied this.
(And sorry for missing this versus the other which I applied
right away.)

Gerald


[wwwdocs] PATCH for Re: Change in mirror setup

2016-05-08 Thread Gerald Pfeifer
On Fri, 6 May 2016, NOC wrote:
> We have changed our mirror server setup.
> 
> Could you please update the mirror list and replace:
> 
> France, Gravelines: http://mirror0.babylon.network/gcc/ |
> ftp://mirror0.babylon.network/gcc/ |
> rsync://mirror0.babylon.network/gcc/, thanks to Tim Semeijn
> (noc@babylon.network) at Babylon Network.
> France, Roubaix: http://mirror1.babylon.network/gcc/ |
> ftp://mirror1.babylon.network/gcc/ |
> rsync://mirror1.babylon.network/gcc/, thanks to Tim Semeijn
> (noc@babylon.network) at Babylon Network.
> 
> -with-
> 
> The Netherlands, Amsterdam: http://nl.mirror.babylon.network/gcc/ |
> ftp://nl.mirror.babylon.network/gcc/ |
> rsync://nl.mirror.babylon.network/gcc/, thanks to Tim Semeijn
> (noc@babylon.network) at Babylon Network.
> France, Roubaix: http://fr.mirror.babylon.network/gcc/ |
> ftp://fr.mirror.babylon.network/gcc/ |
> rsync://fr.mirror.babylon.network/gcc/, thanks to Tim Semeijn
> (noc@babylon.network) at Babylon Network.

Thanks for the heads up!

I believe my patch below, which I applied, implements these changes.
Please let me know if I missed (or misunderstood) anything.

Gerald

Index: mirrors.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/mirrors.html,v
retrieving revision 1.234
diff -u -r1.234 mirrors.html
--- mirrors.html28 Feb 2016 20:59:34 -  1.234
+++ mirrors.html8 May 2016 22:51:23 -
@@ -20,14 +20,9 @@
 France (no snapshots): ftp://ftp.lip6.fr/pub/gcc/";>ftp.lip6.fr, thanks to ftpmaint at 
lip6.fr
 France, Brittany: ftp://ftp.irisa.fr/pub/mirrors/gcc.gnu.org/gcc/";>ftp.irisa.fr, thanks 
to ftpmaint at irisa.fr
 France, Gravelines:
-  http://mirror0.babylon.network/gcc/";>http://mirror0.babylon.network/gcc/
 |
-  ftp://mirror0.babylon.network/gcc/";>ftp://mirror0.babylon.network/gcc/
 |
-  rsync://mirror0.babylon.network/gcc/,
-  thanks to Tim Semeijn (noc@babylon.network) at Babylon Network.
-France, Roubaix:
-  http://mirror1.babylon.network/gcc/";>http://mirror1.babylon.network/gcc/
 |
-  ftp://mirror1.babylon.network/gcc/";>ftp://mirror1.babylon.network/gcc/
 |
-  rsync://mirror1.babylon.network/gcc/,
+  http://fr.mirror.babylon.network/gcc/";>http://fr.mirror.babylon.network/gcc/
 |
+  ftp://fr.mirror.babylon.network/gcc/";>ftp://fr.mirror.babylon.network/gcc/
 |
+  rsync://fr.mirror.babylon.network/gcc/,
   thanks to Tim Semeijn (noc@babylon.network) at Babylon Network.
 France, Versailles: ftp://ftp.uvsq.fr/pub/gcc/";>ftp.uvsq.fr, 
thanks to ftpmaint at uvsq.fr
 Germany, Berlin: ftp://ftp.fu-berlin.de/unix/languages/gcc/";>ftp.fu-berlin.de, thanks 
to ftp at fu-berlin.de
@@ -38,6 +33,11 @@
 Hungary, Budapest: http://robotlab.itk.ppke.hu/gcc/";>robotlab.itk.ppke.hu, thanks to 
Adam Rak (neurhlp at gmail.com)
 Japan: ftp://ftp.dti.ad.jp/pub/lang/gcc/";>ftp.dti.ad.jp, 
thanks to IWAIZAKO Takahiro (ftp-admin at dti.ad.jp)
 Japan: http://ftp.tsukuba.wide.ad.jp/software/gcc/";>ftp.tsukuba.wide.ad.jp, 
thanks to Kohei Takahashi (tsukuba-ftp-servers at tsukuba.wide.ad.jp)
+The Netherlands, Amsterdam:
+  http://nl.mirror.babylon.network/gcc/";>http://nl.mirror.babylon.network/gcc/
 |
+  ftp://nl.mirror.babylon.network/gcc/";>ftp://nl.mirror.babylon.network/gcc/
 |
+  rsync://nl.mirror.babylon.network/gcc/,
+  thanks to Tim Semeijn (noc@babylon.network) at Babylon Network.
 The Netherlands, Nijmegen: ftp://ftp.nluug.nl/mirror/languages/gcc";>ftp.nluug.nl, thanks to Jan 
Cristiaan van Winkel (jc at ATComputing.nl)
 Slovakia, Bratislava: http://gcc.fyxm.net/";>gcc.fyxm.net, 
thanks to Jan Teluch (admin at 2600.sk)
 UK: ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/";>ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/,
 thanks to mirror at mirrorservice.org


[patch commited SH] Remove extra colon from subtarget specs

2016-05-08 Thread Kaz Kojima
Hi,

I've committed the attached obvious patch to fix some failures

  Assembler messages:
  Error: can't open :-isa=sh4-up for reading: No such file or directory

with testcases which specify -mrelax on sh4-unknown-linux-gnu.
Tested with "make -k check" on sh4-unknown-linux-gnu.

Regards,
kaz
--
2016-05-08  Kaz Kojima  

* config/sh/sh.h (SUBTARGET_ASM_RELAX_SPEC): Remove extra colon.
(SUBTARGET_ASM_ISA_SPEC): Likewise.

diff --git a/config/sh/sh.h b/config/sh/sh.h
index 16b4a8e..548a084 100644
--- a/config/sh/sh.h
+++ b/config/sh/sh.h
@@ -201,7 +201,7 @@ extern int code_for_indirect_jump_scratch;
   SUBTARGET_EXTRA_SPECS
 
 #if TARGET_CPU_DEFAULT & MASK_HARD_SH4
-#define SUBTARGET_ASM_RELAX_SPEC "%{!m1:%{!m2:%{!m3*::-isa=sh4-up}}}"
+#define SUBTARGET_ASM_RELAX_SPEC "%{!m1:%{!m2:%{!m3*:-isa=sh4-up}}}"
 #else
 #define SUBTARGET_ASM_RELAX_SPEC "%{m4*:-isa=sh4-up}"
 #endif
@@ -245,7 +245,7 @@ extern int code_for_indirect_jump_scratch;
 /* Strict nofpu means that the compiler should tell the assembler
to reject FPU instructions. E.g. from ASM inserts.  */
 #if TARGET_CPU_DEFAULT & MASK_HARD_SH4 && !(TARGET_CPU_DEFAULT & MASK_SH_E)
-#define SUBTARGET_ASM_ISA_SPEC 
"%{!m1:%{!m2:%{!m3*:%{m4-nofpu|!m4*::-isa=sh4-nofpu"
+#define SUBTARGET_ASM_ISA_SPEC 
"%{!m1:%{!m2:%{!m3*:%{m4-nofpu|!m4*:-isa=sh4-nofpu"
 #else
 
 #define SUBTARGET_ASM_ISA_SPEC \


Re: [PATCH] add -fprolog-pad=N option to c-family

2016-05-08 Thread AKASHI Takahiro
Hi,

Let me make some comments from the kernel side.

On Thu, Apr 28, 2016 at 11:58:25AM +0100, Szabolcs Nagy wrote:
> On 28/04/16 09:47, Maxim Kuvyrkov wrote:
> >> On Apr 27, 2016, at 7:26 PM, Szabolcs Nagy  wrote:
> >>
> >> with -mfentry, by default the user only has to
> >> implement the fentry call (linux wants nops there, but
> >> e.g. glibc could use -pg -mfentry for profiling on
> >> aarch64 and the target specific details are easier to
> >> document for an -m option than for something general).
> > 
> > I don't understand your point here, could you elaborate, please?
> > 
> 
> if we only provide -mfentry then
> 
> - the kernel can use it (they have tools to nop patch the binary),

Do you mean scripts/recordmcount.c,.pl?
This tool is intended to generate __mcount_loc section, which contains
a list of locations of callsites of mcount/fentry, and won't make any
changes to the kernel binary.

> - others who don't want to fiddle with nops, just have the call,
> can also use it (e.g. user-space profiling cannot really use
> something that needs binary patching in case the user prefers
> -pg -mfentry over the current -pg behaviour).

Well, -mfentry is simple and perfect on x86, but seems to be not best-fit
to arm, thinking that -mfentry means that it inserts a callsite at the very
beginning of a function. See a thread of discussions about -mfentry on arm64.

> - it's target specific, so the magic abi of the fentry call can
> be documented by the target according to the specific instruction
> sequence that is used. (with nop-padding there are psabi and
> compiler optimization interactions that may be hard to document
> in a generic way and letting the user figure it out may cause
> problems later in compiler development.. but i'm just speculating
> based on the powerpc toc handling and ipa-ra findings.)
> 
> >> the nop-padding is more general, but the size and
> >> layout of nops and the call abi will be target
> >> specific and the user will most likely need to modify
> >> the binary (to get the right sequence) which needs
> >> additional tooling.  i don't know who might use it
> >> other than linux (which already has tools to deal with
> >> -mfentry).

Please note that code-patching(/nop-padding) is totally up to the kernel
and arch-specific code. The kernel will do that either
- at the initialization of kernel ftrace, or
- at runtime dynamically by user's instructions (through sysfs)

The tool (recordmcount) will never interact with the kernel at runtime.

> > 
> > Right, but this tooling will require minimal (if any) changes
> > to be adapted to nop-pad approach.  If I remember correctly,
> > recent versions of GCC and kernel for x86_64 generate NOPs,
> > not the call sequence in the prologs when -mfentry is used.

I think that Maxim mentioned the following x86-specific gcc options:
- -mrecord-mcount
- -mnop-mcount
but as far as I checked, the current kernel does not utilizes these
options.

> i'm trying to find where this happens in the kernel, but
> i only see scripts/recordmcount.{c,pl} which are based on
> nop patching the fentry/mcount call sites.
> 
> without such call sites the tools have to be implemented
> differently and the way the kernel records the call site
> positions might not match the prolog-pad recording.

Where the callsite resides in a given nop sequence will depend on arch,
but again, this issue can be handled by arch-specific code.

Thanks,
-Takahiro AKASHI