Re: Patch ping (stage1-ish patches)

2013-11-28 Thread Jakub Jelinek
On Wed, Nov 27, 2013 at 01:06:06PM -0700, Jeff Law wrote:
> +   HOST_WIDE_INT offset, sz;
> +   sz = ASAN_RED_ZONE_SIZE;
> +   sz = data.asan_vec[0] - prev_offset;
> 
> Seems to me like the first assignment to sz is dead.  Clearly
> something isn't right here.

Thanks for catching that out, yeah, the above is from reusing
the sz variable for both red zone size (what is being computed)
and a helper temporary for the total size of asan stack frame so far
that is needed 3 times in the computation.

I've used new redzonesz var for the former to make it clearer.
Here is what I've committed in the end after retesting it on x86_64-linux.

2013-11-28  Jakub Jelinek  

* cfgexpand.c (struct stack_vars_data): Add asan_base and asan_alignb
fields.
(expand_stack_vars): For -fsanitize=address, use (and set initially)
data->asan_base as base for vars and update asan_alignb.
(expand_used_vars): Initialize data.asan_base and data.asan_alignb.
Pass them to asan_emit_stack_protection.
* asan.c (asan_detect_stack_use_after_return): New variable.
(asan_emit_stack_protection): Add pbase and alignb arguments.
Implement use after return sanitization.
* asan.h (asan_emit_stack_protection): Adjust prototype.
(ASAN_STACK_MAGIC_USE_AFTER_RET, ASAN_STACK_RETIRED_MAGIC): Define.

--- gcc/asan.c.jj   2013-11-27 18:02:47.984814523 +0100
+++ gcc/asan.c  2013-11-28 08:36:28.740704722 +0100
@@ -237,6 +237,9 @@ alias_set_type asan_shadow_set = -1;
alias set is used for all shadow memory accesses.  */
 static GTY(()) tree shadow_ptr_types[2];
 
+/* Decl for __asan_option_detect_stack_use_after_return.  */
+static GTY(()) tree asan_detect_stack_use_after_return;
+
 /* Hashtable support for memory references used by gimple
statements.  */
 
@@ -950,20 +953,26 @@ asan_function_start (void)
and DECLS is an array of representative decls for each var partition.
LENGTH is the length of the OFFSETS array, DECLS array is LENGTH / 2 - 1
elements long (OFFSETS include gap before the first variable as well
-   as gaps after each stack variable).  */
+   as gaps after each stack variable).  PBASE is, if non-NULL, some pseudo
+   register which stack vars DECL_RTLs are based on.  Either BASE should be
+   assigned to PBASE, when not doing use after return protection, or
+   corresponding address based on __asan_stack_malloc* return value.  */
 
 rtx
-asan_emit_stack_protection (rtx base, HOST_WIDE_INT *offsets, tree *decls,
-   int length)
+asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
+   HOST_WIDE_INT *offsets, tree *decls, int length)
 {
-  rtx shadow_base, shadow_mem, ret, mem;
+  rtx shadow_base, shadow_mem, ret, mem, orig_base, lab;
   char buf[30];
   unsigned char shadow_bytes[4];
-  HOST_WIDE_INT base_offset = offsets[length - 1], offset, prev_offset;
+  HOST_WIDE_INT base_offset = offsets[length - 1];
+  HOST_WIDE_INT base_align_bias = 0, offset, prev_offset;
+  HOST_WIDE_INT asan_frame_size = offsets[0] - base_offset;
   HOST_WIDE_INT last_offset, last_size;
   int l;
   unsigned char cur_shadow_byte = ASAN_STACK_MAGIC_LEFT;
   tree str_cst, decl, id;
+  int use_after_return_class = -1;
 
   if (shadow_ptr_types[0] == NULL_TREE)
 asan_init_shadow_ptr_types ();
@@ -993,10 +1002,67 @@ asan_emit_stack_protection (rtx base, HO
   str_cst = asan_pp_string (&asan_pp);
 
   /* Emit the prologue sequence.  */
+  if (asan_frame_size > 32 && asan_frame_size <= 65536 && pbase)
+{
+  use_after_return_class = floor_log2 (asan_frame_size - 1) - 5;
+  /* __asan_stack_malloc_N guarantees alignment
+ N < 6 ? (64 << N) : 4096 bytes.  */
+  if (alignb > (use_after_return_class < 6
+   ? (64U << use_after_return_class) : 4096U))
+   use_after_return_class = -1;
+  else if (alignb > ASAN_RED_ZONE_SIZE && (asan_frame_size & (alignb - 1)))
+   base_align_bias = ((asan_frame_size + alignb - 1)
+  & ~(alignb - HOST_WIDE_INT_1)) - asan_frame_size;
+}
+  if (use_after_return_class == -1 && pbase)
+emit_move_insn (pbase, base);
   base = expand_binop (Pmode, add_optab, base,
-  gen_int_mode (base_offset, Pmode),
+  gen_int_mode (base_offset - base_align_bias, Pmode),
   NULL_RTX, 1, OPTAB_DIRECT);
+  orig_base = NULL_RTX;
+  if (use_after_return_class != -1)
+{
+  if (asan_detect_stack_use_after_return == NULL_TREE)
+   {
+ id = get_identifier ("__asan_option_detect_stack_use_after_return");
+ decl = build_decl (BUILTINS_LOCATION, VAR_DECL, id,
+integer_type_node);
+ SET_DECL_ASSEMBLER_NAME (decl, id);
+ TREE_ADDRESSABLE (decl) = 1;
+ DECL_ARTIFICIAL (decl) = 1;
+ DECL_IGNORED_P (decl) = 1;
+ DECL_EXTERNAL (decl) = 1;
+ TREE_STATIC 

Re: [PING^2] [PATCH] PR59063

2013-11-28 Thread Jakub Jelinek
On Wed, Nov 27, 2013 at 05:52:30PM +0400, Yury Gribov wrote:
> > Perhaps it is time for libsanitizer.spec filled in during
> configure of libsanitizer
> > that the spec would source in?
> 
> Draft patch is attached, let's see if I understood your
> recommendation correctly. Some obvious quirks:
> 1) I didn't add link_libubsan/link_liblsan because they seem to be
> happy with default libs from %(link_sanitizer).
> 2) I left LIBASAN_EARLY_SPEC/LIBASAN_SPEC logic in gnu-user.h and gcc.c
> because they rely on LD_STATIC_OPTION and HAVE_LD_STATIC_DYNAMIC
> which are defined in gcc/configure
> and thus not available in libsanitizer/configure.

Looks basically ok, my preference perhaps would be not to put
link_sanitizer into the Makefile*/libsanitizer.spec/gcc.c at all,
instead use it solely as configure.ac/configure internal variable
and set all of link_{a,t,l,ub}san to it plus the extra libs needed
by each of those.  The fact that they have some common libs is IMHO just an
internal detail that doesn't need to be exposed outside of libsanitizer.

> +  " %{static-libasan:%:include(libsanitizer.spec)%(link_libasan) 
> %(link_sanitizer)}"

So this would have s/ %(link_sanitizer)//.

> +  " %{static-liblsan:%:include(libsanitizer.spec) %(link_sanitizer)}"

Note for asan you didn't use space between %:include and %(link_*, but now
you did.

Jakub


Re: [PATCH] Get rid of useless -fno-rtti for libubsan

2013-11-28 Thread Jakub Jelinek
On Wed, Nov 27, 2013 at 12:01:50PM +0400, Yury Gribov wrote:
> As discussed in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59106
> only a subset of libubsan should be built with RTTI support.
> Attached patch adds custom build rules for relevant files.

We assume GNU make I think, I wonder if this can't be done through
just adding -fno-rtti to AM_CXXFLAGS and say
ubsan_handlers_cxx.% ubsan_type_hash.% : AM_CXXFLAGS += -frtti

Jakub


Re: [PATCH] OpenMP #pragma omp declare simd support (take 2)

2013-11-28 Thread Andreas Schwab
Causes an ICE on ia64.

spawn /usr/local/gcc/gcc-20131128/Build/gcc/xgcc 
-B/usr/local/gcc/gcc-20131128/Build/gcc/ 
/usr/local/gcc/gcc-20131128/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c 
-fno-diagnostics-show-caret -fdiagnostics-color=never -flto -ffat-lto-objects 
-ftree-vectorize -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details 
-fopenmp-simd -lm -o ./vect-simd-clone-1.exe
/usr/local/gcc/gcc-20131128/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c:56:1: 
internal compiler error: tree code 'omp_clause' is not supported in LTO streams
0x4081c83f DFS_write_tree
../../gcc/lto-streamer-out.c:1250
0x4081c0bf DFS_write_tree_body
../../gcc/lto-streamer-out.c:588
0x4081c0bf DFS_write_tree
../../gcc/lto-streamer-out.c:1158
0x4081c0bf DFS_write_tree_body
../../gcc/lto-streamer-out.c:588
0x4081c0bf DFS_write_tree
../../gcc/lto-streamer-out.c:1158
0x4081bb5f DFS_write_tree_body
../../gcc/lto-streamer-out.c:502
0x4081bb5f DFS_write_tree
../../gcc/lto-streamer-out.c:1158
0x4081ddef lto_output_tree(output_block*, tree_node*, bool, bool)
../../gcc/lto-streamer-out.c:1340
0x40814a3f write_global_stream
../../gcc/lto-streamer-out.c:2050
0x40822dbf lto_output_decl_state_streams
../../gcc/lto-streamer-out.c:2094
0x40822dbf produce_asm_for_decls()
../../gcc/lto-streamer-out.c:2379
0x408a60bf write_lto
../../gcc/passes.c:2283
0x408ade3f ipa_write_summaries_1
../../gcc/passes.c:2342
0x408ade3f ipa_write_summaries()
../../gcc/passes.c:2399
0x403da03f ipa_passes
../../gcc/cgraphunit.c:2030
0x403da03f compile()
../../gcc/cgraphunit.c:2126
0x403dafdf finalize_compilation_unit()
../../gcc/cgraphunit.c:2280
0x4018f44f c_write_global_declarations()
../../gcc/c/c-decl.c:10389

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-11-28 Thread Richard Sandiford
Tejas Belagod  writes:
> Richard Sandiford wrote:
>> Tejas Belagod  writes:
 The problem is that one reg rtx can span several hard registers.
 E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
 but it might instead represent two 32-bit registers (nos. 32 and 33).
 Obviously the latter's not very likely for vectors this small,
 but more likely for larger ones (including on NEON IIRC).

 So if we had 2 32-bit registers being treated as a V4HI, it would be:

<--32--><--33-->
msb  lsb



msb  lsb
<--32-->

 for big endian and:

<--33--><--32-->
msb  lsb



msb  lsb
<--32-->

 for little endian.
>>> Ah, ok, that makes things clearer. Thanks for that.
>>>
>>> I can't find any helper function that figures out if we're writing
>>> partial or
>>> full result regs. Would something like
>>>
>>>  REGNO (src) == REGNO (dst) &&
>>>  HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1
>>>
>>> be a sane check for partial result regs?
>> 
>> Yeah, that should work.  I think a more general alternative would be:
>> 
>>   simplify_subreg_regno (REGNO (src), GET_MODE (src),
>>  offset, GET_MODE (dst)) == (int) REGNO (dst)
>> 
>> where:
>> 
>>   offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0))
>> 
>> That offset is the byte offset of the first selected element from the
>> start of a vector in memory, which is also the way that SUBREG_BYTEs
>> are counted.  For little-endian it gives the offset of the lsb of the
>> slice, while for big-endian it gives the offset of the msb (which is
>> also how SUBREG_BYTEs work).
>> 
>> The simplify_subreg_regno should cope with both single-register vectors
>> and multi-register vectors.
>
> Sorry for the delayed response to this.
>
> Thanks for the tip. Here's an improved patch that implements the 
> simplify_sureg_regno () method of eliminating redundant moves. Regarding the 
> test case, I failed to get the ppc back-end to generate RTL pattern that this 
> patch checks for. I can easily write a test case for aarch64(big and little 
> endian) on these lines
>
> typedef float float32x4_t __attribute__ ((__vector_size__ (16)));
>
> float foo_be (float32x4_t x)
> {
>return x[3];
> }
>
> float foo_le (float32x4_t x)
> {
>return x[0];
> }
>
> where I know that the vector indexing will generate a vec_select on
> the same src and dst regs that could be optimized away and hence test
> it. But I'm struggling to get a test case that the ppc altivec
> back-end will generate such a vec_select for. I see that altivec does
> not define vec_extract, so a simple indexing like this seems to happen
> via memory. Also, I don't know enough about the ppc PCS or
> architecture to write a test that will check for this optimization
> opportunity on same src and dst hard-registers. Any hints?

Me neither, sorry.

FWIW, the MIPS tests:

  typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
  void bar (float);
  void foo_be (float32x2_t x) { bar (x[1]); }
  void foo_le (float32x2_t x) { bar (x[0]); }

also exercise it, but I don't think they add anything over the aarch64
versions.  I can add them to the testsuite anyway if it helps though.

> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 0cd0c7e..ca25ce5 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set)
>dst = SUBREG_REG (dst);
>  }
>  
> +  /* It is a NOOP if destination overlaps with selected src vector
> + elements.  */
> +  if (GET_CODE (src) == VEC_SELECT
> +  && REG_P (XEXP (src, 0)) && REG_P (dst)
> +  && HARD_REGISTER_P (XEXP (src, 0))
> +  && HARD_REGISTER_P (dst))
> +{
> +  rtx par = XEXP (src, 1);
> +  rtx src0 = XEXP (src, 0);
> +  HOST_WIDE_INT offset =
> + GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0, 0));
> +
> +  return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
> + offset, GET_MODE (dst)) == (int)REGNO (dst);
> +}
> +

Since this also (correctly) triggers for vector results, we need to keep
the check for consecutive indices that you had originally.  (It's always
the first index that should be used for the simplify_subreg_regno though.)

Looks good to me otherwise, thanks.

Richard.


Re: [Patch, ARM] Fix ICE when high register is used as pic base register for thumb1 target

2013-11-28 Thread Richard Earnshaw
On 28/11/13 05:55, Terry Guo wrote:
> 
> 
>> -Original Message-
>> From: Richard Earnshaw
>> Sent: Tuesday, November 26, 2013 5:44 PM
>> To: Terry Guo
>> Cc: Ramana Radhakrishnan; gcc-patches@gcc.gnu.org
>> Subject: Re: [Patch, ARM] Fix ICE when high register is used as pic base
>> register for thumb1 target
>>
>> On 26/11/13 04:18, Terry Guo wrote:
>>> Hi,
>>>
>>> This patch intends to fix ICE when high register is used for pic base
>>> register for thumb1 target. Tested with gcc regression test, no new
>>> regressions. Is it OK to trunk?
>>>
>>> BR,
>>> Terry
>>>
>>> gcc/ChangeLog:
>>>
>>> 2013-11-26  Terry Guo  
>>>
>>> * config/arm/arm.c (require_pic_register): Handle high pic
>>> base register for
>>> thumb-1.
>>> (arm_load_pic_register): Also initialize high pic base register.
>>> * doc/invoke.texi: Update documentation for option
> -mpic-register.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2013-11-26  Terry Guo  
>>>
>>> * gcc.target/arm/thumb1-pic-high.c: New case.
>>> * gcc.target/arm/thumb1-pic-single-base.c: New case.
>>>
>>>
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
>>> 501d080..f0b46e9 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -12216,8 +12216,11 @@ before execution begins.
>>>
>>>  @item -mpic-register=@var{reg}
>>>  @opindex mpic-register
>>> -Specify the register to be used for PIC addressing.  The default is
>>> R10 -unless stack-checking is enabled, when R9 is used.
>>> +Specify the register to be used for PIC addressing.
>>> +For standard PIC base case, the default will be any suitable register
>>> +determined by compiler.  For single PIC base case, the default is R9
>>> +if target is EABI based or stack-checking is enabled, otherwise the
>>> +default is R10.
>>>
>>
>> Please can you put @samp{} around the uses of R9 and R10.
>> Otherwise, OK.
>> R.
>>
> 
> Thanks Richard. The updated patch is committed to trunk. Is it OK to
> backport to FSF 4.8 branch as a bug fix?
> 

Yes.


R.




[PING ^ 2] [PATCH 1/n] Add conditional compare support

2013-11-28 Thread Zhenqiang Chen
Hi,

Patch is rebased with the latest trunk with changes:
* simplify_while_replacing (recog.c) should not swap operands of compares in 
CCMP.
* make sure no other instructions can clobber CC except compares in CCMP when 
expanding CCMP.

For easy to read, I add the following description(in expr.c).

  The following functions expand conditional compare (CCMP) instructions.
   Here is a short description about the over all algorithm:
 * ccmp_candidate_p is used to identify the CCMP candidate

 * expand_ccmp_expr is the main entry, which calls expand_ccmp_expr_1
   to expand CCMP.

 * expand_ccmp_expr_1 uses a recursive algorithm to expand CCMP.
   It calls two target hooks gen_ccmp_first and gen_ccmp_next to generate
   CCMP instructions.
 - gen_ccmp_first expands the first compare in CCMP.
 - gen_ccmp_next expands the following compares.

   Another hook select_ccmp_cmp_order is called to determine which compare
   is done first since not all combination of compares are legal in some
   target like ARM.  We might get more chance when swapping the compares.

   During expanding, we must make sure that no instruction can clobber the
   CC reg except the compares.  So clobber_cc_p and check_clobber_cc are
   introduced to do the check.

 * If the final result is not used in a COND_EXPR (checked by function
   used_in_cond_stmt_p), it calls cstorecc4 pattern to store the CC to a
   general register.

Bootstrap and no make check regression on X86-64 and ARM chrome book.

ChangeLog:
2013-11-28  Zhenqiang Chen  

* config/arm/arm-protos.h (arm_select_dominance_ccmp_mode,
arm_ccmode_to_code): New prototypes.
* config/arm/arm.c (arm_select_dominance_cc_mode_1): New function
extracted from arm_select_dominance_cc_mode.
(arm_ccmode_to_code, arm_code_to_ccmode, arm_convert_to_SImode,
arm_select_dominance_ccmp_mode): New functions.
(arm_select_ccmp_cmp_order, arm_gen_ccmp_first, arm_gen_ccmp_next):
New hooks.
(arm_select_dominance_cc_mode): Call arm_select_dominance_cc_mode_1.
* config/arm/arm.md (cbranchcc4, cstorecc4, ccmp_and, ccmp_ior): New
instruction patterns.
* doc/md.texi (ccmp): New index.
* doc/tm.texi (TARGET_SELECT_CCMP_CMP_ORDER, TARGET_GEN_CCMP_FIRST,
TARGET_GEN_CCMP_NEXT): New hooks.
* doc/tm.texi (TARGET_SELECT_CCMP_CMP_ORDER, TARGET_GEN_CCMP_FIRST,
TARGET_GEN_CCMP_NEXT): New hooks.
* doc/tm.texi.in (TARGET_SELECT_CCMP_CMP_ORDER, TARGET_GEN_CCMP_FIRST,
TARGET_GEN_CCMP_NEXT): New hooks.
* expmed.c (emit_cstore): Make it global.
* expr.c: Include tree-phinodes.h and ssa-iterators.h.
(ccmp_candidate_p, used_in_cond_stmt_p, check_clobber_cc, clobber_cc_p,
gen_ccmp_next, expand_ccmp_expr_1, expand_ccmp_expr): New functions.
(expand_expr_real_1): Handle conditional compare.
* optabs.c (get_rtx_code): Make it global and handle BIT_AND_EXPR and
BIT_IOR_EXPR.
* optabs.h (get_rtx_code, emit_cstore): New prototypes.
* recog.c (ccmp_insn_p): New function.
(simplify_while_replacing): Do not swap conditional compare insn.
* target.def (select_ccmp_cmp_order, gen_ccmp_first, gen_ccmp_next):
Define hooks.
* targhooks.c (default_select_ccmp_cmp_order): New function.
* targhooks.h (default_select_ccmp_cmp_order): New prototypes.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index c5b16da..e3162c1 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -117,6 +117,9 @@ extern bool gen_movmem_ldrd_strd (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
   HOST_WIDE_INT);
+extern enum machine_mode arm_select_dominance_ccmp_mode (rtx, enum 
machine_mode,
+HOST_WIDE_INT);
+enum rtx_code arm_ccmode_to_code (enum machine_mode mode);
 extern rtx arm_gen_compare_reg (RTX_CODE, rtx, rtx, rtx);
 extern rtx arm_gen_return_addr_mask (void);
 extern void arm_reload_in_hi (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 129e428..b0fc4f4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -287,6 +287,12 @@ static unsigned arm_add_stmt_cost (void *data, int count,
 static void arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
 bool op0_preserve_value);
 static unsigned HOST_WIDE_INT arm_asan_shadow_offset (void);
+static int arm_select_ccmp_cmp_order (int, int);
+static rtx arm_gen_ccmp_first (int, rtx, rtx);
+static rtx arm_gen_ccmp_next (rtx, int, rtx, rtx, int);
+static enum machine_mode arm_select_dominance_cc_mode_1 (enum rtx_code cond1,
+

[PATCH, nds32] Committed: Adjust MULT performance cost.

2013-11-28 Thread Chung-Ju Wu
Hi, all,

The multiplication operation is low cost in nds32 target.
COSTS_N_INSNS(5) is too expensive for performance.
Adjust MULT cost to COSTS_N_INSNS(1).

Committed as Rev. 205478: http://gcc.gnu.org/r205478


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 205477)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2013-11-28  Chung-Ju Wu  
+
+   * config/nds32/nds32.c (nds32_rtx_costs): Adjust MULT cost if it is
+   not optimized for size.
+
 2013-11-28  Jakub Jelinek  

* cfgexpand.c (struct stack_vars_data): Add asan_base and asan_alignb

Index: gcc/config/nds32/nds32.c
===
--- gcc/config/nds32/nds32.c(revision 205477)
+++ gcc/config/nds32/nds32.c(working copy)
@@ -2471,7 +2471,7 @@
   break;

 case MULT:
-  *total = COSTS_N_INSNS (5);
+  *total = COSTS_N_INSNS (1);
   break;

 case DIV:


Best regards,
jasonwucj


Re: [PATCH, testsuite] Fix some testcases for nds32 target and provide new nds32 target specific tests

2013-11-28 Thread Chung-Ju Wu
Hi, Mike,

There is a pending testsuite patch for nds32 target:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01584.html

Is it OK for trunk? :)


Best regards,
jasonwucj


2013/11/14 Chung-Ju Wu :
>
> I would like to modify some testcases for nds32 target.
> Also I have some nds32 target specific tests which is
> suggested by Joseph earlier:
>   http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00396.html
>
> The patch is attached and a ChangeLog is as below:
>
> gcc/testsuite/
> 2013-11-14  Chung-Ju Wu  
>
> * g++.dg/other/PR23205.C: Skip for nds32*-*-*.
> * g++.dg/other/pr23205-2.C: Skip for nds32*-*-*.
> * gcc.dg/20020312-2.c: Add __nds32__ case.
> * gcc.dg/builtin-apply2.c: Skip for nds32*-*-*.
> * gcc.dg/lower-subreg-1.c: Skip for nds32*-*-*.
> * gcc.dg/sibcall-3.c: Expected fail for nds32*-*-*.
> * gcc.dg/sibcall-4.c: Expected fail for nds32*-*-*.
> * gcc.dg/stack-usage-1.c (SIZE): Define case for __nds32__.
> * gcc.dg/torture/pr37868.c: Skip for nds32*-*-*.
> * gcc.dg/torture/stackalign/builtin-apply-2.c: Skip for nds32*-*-*.
> * gcc.dg/tree-ssa/20040204-1.c: Expected fail for nds32*-*-*.
> * gcc.dg/tree-ssa/forwprop-28.c: Skip for nds32*-*-*.
> * gcc.dg/tree-ssa/pr42585.c: Skip for nds32*-*-*.
> * gcc.dg/tree-ssa/sra-12.c: Skip for nds32*-*-*.
> * gcc.target/nds32: New nds32 specific directory and testcases.
> * lib/target-supports.exp (check_profiling_available): Check for
> nds32*-*-elf.
>
>
> Is this patch OK for the trunk?
>
>
> Best regards,
> jasonwucj


Re: RFA: patch to fix PR58785 (an ARM LRA crash)

2013-11-28 Thread Richard Earnshaw
On 27/11/13 18:35, Jeff Law wrote:
> On 11/27/13 03:19, Yvan Roux wrote:
>> Ping.
>>
>> On 20 November 2013 10:22, Yvan Roux  wrote:
>>> Hi,
>>>
>>> as Richard said, only a subset of rclass is allowed to be returned by
>>> preferred_reload_class.
> I don't think he was quite that definitive.  "One reading of the manual 
> suggests ...".  However, Richard's interpretation is the same as I've 
> had for eons.  You can return the original class, a narrower class or 
> NO_REGS.

Perhaps the manual could be clarified to make this more explicit.

R.




Re: [PATCH, ARM, LRA] Fixed bootstrap failure in Thumb mode

2013-11-28 Thread Richard Earnshaw
On 27/11/13 18:27, Jeff Law wrote:
> On 11/27/13 10:49, Yvan Roux wrote:
>>> How can that be correct?
>>>
>>> The secondary reload macros/hooks define cases where additional registers
>>> are needed to reload certain forms of rtl.  I doubt the use of LRA
>>> completely eliminates the need for secondary reloads.
>>
>> Vladimir explained me that in that case on arm, secondary reload hook
>> confuses LRA, and that returning NO_REGS will let LRA deal with
>> constraints, but I may have badly understand what he said.
> So I think with the additional information from Vlad, I think we can go 
> forward with your patch -- conditionally approved for the trunk.  The 
> condition is giving the ARM maintainers 24hrs to object.
> 

This is OK, given the previous discussion.

R.



RE: [PING] [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-11-28 Thread Bernd Edlinger
Hi,

On Wed, 27 Nov 2013 12:07:16, Jeff Law wrote:
>
> On 11/27/13 05:29, Bernd Edlinger wrote:
>> Hi,
>>
>> ping...
>>
>> this patch still open: 
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02291.html
>>
>> Note: it does, as it is, _not_ depend on the keep_aligning patch.
>> And it would fix some really nasty wrong code generation issues.
> Is there a testcase for this problem?

Yes,
the patch contains two test cases, one for

struct S { V a; V b[0]; } P __attribute__((aligned (1)))
and another for

struct S { V b[1]; } P __attribute__((aligned (1)))


V can be anything that has a movmisalign_optab or is SLOW_UNALIGNED_ACCESS

If V::b is used as flexible array, reading p->b[1] gives garbage.

We tried hard, to fix this in stor-layout.c by not using the mode of V
for struct S, but this created ABI-fallout. So currently the only possible
way to fix it seems to be in expansion, by letting expand_real_1 know that
we need a memory reference, even if it may be unaligned.

>
> My recommendation is to start a separate thread which focuses on this
> issue and only this issue.
>

If there are more questions of general interest, please feel free to start
in a new thread.

> jeff
>

Thanks
Bernd.

[Patch, ARM] Add v7m specific extra rtx cost table

2013-11-28 Thread Terry Guo
Hello,

This patch intends to add a specific extra rtx cost table for v7-m profile
targets. Tested with gcc regression test, no new regressions. Is it OK to
trunk?

BR,
Terry

2013-11-28  Terry Guo  

   * config/arm/aarch-cost-tables.h (v7m_extra_costs): New table.diff --git a/gcc/config/arm/aarch-cost-tables.h 
b/gcc/config/arm/aarch-cost-tables.h
index d3e7dd2..52e18a1 100644
--- a/gcc/config/arm/aarch-cost-tables.h
+++ b/gcc/config/arm/aarch-cost-tables.h
@@ -223,5 +223,105 @@ const struct cpu_cost_table cortexa53_extra_costs =
 };
 
 
+const struct cpu_cost_table v7m_extra_costs =
+{
+  /* ALU */
+  {
+0, /* Arith.  */
+0, /* Logical.  */
+0, /* Shift.  */
+0, /* Shift_reg.  */
+0, /* Arith_shift.  */
+COSTS_N_INSNS (1), /* Arith_shift_reg.  */
+0, /* Log_shift.  */
+COSTS_N_INSNS (1), /* Log_shift_reg.  */
+0, /* Extend.  */
+COSTS_N_INSNS (1), /* Extend_arith.  */
+0, /* Bfi.  */
+0, /* Bfx.  */
+0, /* Clz.  */
+COSTS_N_INSNS (1), /* non_exec.  */
+false  /* non_exec_costs_exec.  */
+  },
+  {
+/* MULT SImode */
+{
+  COSTS_N_INSNS (1),   /* Simple.  */
+  COSTS_N_INSNS (1),   /* Flag_setting.  */
+  COSTS_N_INSNS (2),   /* Extend.  */
+  COSTS_N_INSNS (1),   /* Add.  */
+  COSTS_N_INSNS (3),   /* Extend_add.  */
+  COSTS_N_INSNS (8)/* Idiv.  */
+},
+/* MULT DImode */
+{
+  0,   /* Simple (N/A).  */
+  0,   /* Flag_setting (N/A).  */
+  COSTS_N_INSNS (2),   /* Extend.  */
+  0,   /* Add (N/A).  */
+  COSTS_N_INSNS (3),   /* Extend_add.  */
+  0/* Idiv (N/A).  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (2), /* Load.  */
+0, /* Load_sign_extend.  */
+COSTS_N_INSNS (3), /* Ldrd.  */
+COSTS_N_INSNS (2), /* Ldm_1st.  */
+1, /* Ldm_regs_per_insn_1st.  */
+1, /* Ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (2), /* Loadf.  */
+COSTS_N_INSNS (3), /* Loadd.  */
+COSTS_N_INSNS (1),  /* Load_unaligned.  */
+COSTS_N_INSNS (2), /* Store.  */
+COSTS_N_INSNS (3), /* Strd.  */
+COSTS_N_INSNS (2), /* Stm_1st.  */
+1, /* Stm_regs_per_insn_1st.  */
+1, /* Stm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (2), /* Storef.  */
+COSTS_N_INSNS (3), /* Stored.  */
+COSTS_N_INSNS (1)  /* Store_unaligned.  */
+  },
+  {
+/* FP SFmode */
+{
+  COSTS_N_INSNS (7),   /* Div.  */
+  COSTS_N_INSNS (2),   /* Mult.  */
+  COSTS_N_INSNS (5),   /* Mult_addsub.  */
+  COSTS_N_INSNS (3),   /* Fma.  */
+  COSTS_N_INSNS (1),   /* Addsub.  */
+  0,   /* Fpconst.  */
+  0,   /* Neg.  */
+  0,   /* Compare.  */
+  0,   /* Widen.  */
+  0,   /* Narrow.  */
+  0,   /* Toint.  */
+  0,   /* Fromint.  */
+  0/* Roundint.  */
+},
+/* FP DFmode */
+{
+  COSTS_N_INSNS (15),  /* Div.  */
+  COSTS_N_INSNS (5),   /* Mult.  */
+  COSTS_N_INSNS (7),   /* Mult_addsub.  */
+  COSTS_N_INSNS (7),   /* Fma.  */
+  COSTS_N_INSNS (3),   /* Addsub.  */
+  0,   /* Fpconst.  */
+  0,   /* Neg.  */
+  0,   /* Compare.  */
+  0,   /* Widen.  */
+  0,   /* Narrow.  */
+  0,   /* Toint.  */
+  0,   /* Fromint.  */
+  0/* Roundint.  */
+}
+  },
+  /* Vector */
+  {
+COSTS_N_INSNS (1)  /* Alu.  */
+  }
+};
+
 #endif /* GCC_AARCH_COST_TABLES_H */
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 129e428..cbd201e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1473,7 +1473,7 @@ const struct tune_params arm_cortex_a9_tune =
 const struct tune_params arm_v7m_tune =
 {
   arm_9e_rtx_costs,
-  &generic_extra_costs,
+  &v7m_extra_costs,
   NULL,/* Sched adj cost.  */
   1,   /* Constant limit.  */
   5,   /* Max cond insns.  */


Re: [Patch, ARM] Add v7m specific extra rtx cost table

2013-11-28 Thread Kyrill Tkachov

On 28/11/13 10:34, Terry Guo wrote:

Hello,

This patch intends to add a specific extra rtx cost table for v7-m profile
targets. Tested with gcc regression test, no new regressions. Is it OK to
trunk?

BR,
Terry

2013-11-28  Terry Guo  

* config/arm/aarch-cost-tables.h (v7m_extra_costs): New table.


Hi Terry,

The aarch-cost-tables.h file was created in order to share cost tables between 
arm and aarch64 for cores that implement both ISAs (for example the ARMv8-A ones).


I think the best position to put this cost table is in arm.c together with the 
rest of the armv7-only cost tables.


Thanks,
Kyrill




Re: [PATCH ARM]Refine scaled address expression on ARM

2013-11-28 Thread Richard Earnshaw
On 18/09/13 10:15, bin.cheng wrote:
> 
> 
>> -Original Message-
>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> ow...@gcc.gnu.org] On Behalf Of bin.cheng
>> Sent: Monday, September 02, 2013 3:09 PM
>> To: Richard Earnshaw
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: RE: [PATCH ARM]Refine scaled address expression on ARM
>>
>>
>>
>>> -Original Message-
>>> From: Richard Earnshaw
>>> Sent: Thursday, August 29, 2013 9:06 PM
>>> To: Bin Cheng
>>> Cc: gcc-patches@gcc.gnu.org
>>> Subject: Re: [PATCH ARM]Refine scaled address expression on ARM
>>>
>>> On 28/08/13 08:00, bin.cheng wrote:
 Hi,

 This patch refines scaled address expression on ARM.  It supports
 "base+index*scale" in arm_legitimate_address_outer_p.  It also tries
 to legitimize "base + index * scale + offset" with "reg <- base +
 offset;  reg
 + index * scale" by introducing thumb2_legitimize_address.  For now
 + function
 thumb2_legitimize_address is a kind of placeholder and just does the
 mentioned transformation by calling to try_multiplier_address.
 Hoping we can improve it in the future.

 With this patch:
 1) "base+index*scale" is recognized.
>>>
>>> That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form.
>>>  So this shouldn't be necessary.  Can you identify where this
>> non-canoncial form is being generated?
>>>
>>
>> Oh, for now ivopt constructs "index*scale" to test whether backend
>> supports scaled addressing mode, which is not valid on ARM, so I was going
>> to construct "base + index*scale" instead.  Since "base + index * scale"
> is not
>> canonical form, I will construct the canonical form and drop this part of
> the
>> patch.
>>
>> Is rest of this patch OK?
>>
> Hi Richard, I removed the part over which you concerned and created this
> updated patch.
> 
> Is it OK?
> 
> Thanks.
> bin
> 
> 2013-09-18  Bin Cheng  
> 
>   * config/arm/arm.c (try_multiplier_address): New function.
>   (thumb2_legitimize_address): New function.
>   (arm_legitimize_address): Call try_multiplier_address and
>   thumb2_legitimize_address.
> 
> 
> 6-arm-scaled_address-20130918.txt
> 
> 
> Index: gcc/config/arm/arm.c
> ===
> --- gcc/config/arm/arm.c  (revision 200774)
> +++ gcc/config/arm/arm.c  (working copy)
> @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
>  }
>  }
>  
> +/* Try to find address expression like base + index * scale + offset
> +   in X.  If we find one, force base + offset into register and
> +   construct new expression reg + index * scale; return the new
> +   address expression if it's valid.  Otherwise return X.  */
> +static rtx
> +try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED)
> +{
> +  rtx tmp, base_reg, new_rtx;
> +  rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = NULL_RTX;
> +
> +  gcc_assert (GET_CODE (x) == PLUS);
> +
> +  /* Try to find and record base/index/scale/offset in X. */
> +  if (GET_CODE (XEXP (x, 1)) == MULT)
> +{
> +  tmp = XEXP (x, 0);
> +  index = XEXP (XEXP (x, 1), 0);
> +  scale = XEXP (XEXP (x, 1), 1);
> +  if (GET_CODE (tmp) != PLUS)
> + return x;
> +
> +  base = XEXP (tmp, 0);
> +  offset = XEXP (tmp, 1);
> +}
> +  else
> +{
> +  tmp = XEXP (x, 0);
> +  offset = XEXP (x, 1);
> +  if (GET_CODE (tmp) != PLUS)
> + return x;
> +
> +  base = XEXP (tmp, 0);
> +  scale = XEXP (tmp, 1);
> +  if (GET_CODE (base) == MULT)
> + {
> +   tmp = base;
> +   base = scale;
> +   scale = tmp;
> + }
> +  if (GET_CODE (scale) != MULT)
> + return x;
> +
> +  index = XEXP (scale, 0);
> +  scale = XEXP (scale, 1);
> +}
> +
> +  if (CONST_INT_P (base))
> +{
> +  tmp = base;
> +  base = offset;
> +  offset = tmp;
> +}
> +
> +  if (CONST_INT_P (index))
> +{
> +  tmp = index;
> +  index = scale;
> +  scale = tmp;
> +}
> +
> +  /* ARM only supports constant scale in address.  */
> +  if (!CONST_INT_P (scale))
> +return x;
> +
> +  if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
> +return x;
> +
> +  /* Only register/constant are allowed in each part.  */
> +  if (!symbol_mentioned_p (base)
> +  && !symbol_mentioned_p (offset)
> +  && !symbol_mentioned_p (index)
> +  && !symbol_mentioned_p (scale))
> +{

It would be easier to do this at the top of the function --
  if (symbol_mentioned_p (x))
return x;


> +  /* Force "base+offset" into register and construct
> +  "register+index*scale".  Return the new expression
> +  only if it's valid.  */
> +  tmp = gen_rtx_PLUS (SImode, base, offset);
> +  base_reg = force_reg (SImode, tmp);
> +  tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale);
> +  new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp);
> +  return new_rtx;

I can't help th

Re: [PATCH v2] libgcc: AArch64: Check for correct signal insns on BE when unwinding

2013-11-28 Thread Richard Earnshaw
On 27/11/13 10:43, Matthew Leach wrote:
> Hi,
> 
> When unwinding the stack, the unwind code checks for two opcodes that
> denote a registrations of a signal handler. This is broken on BE as
> the opcodes will be in the wrong byte-order as insns are always LE.
> 
> Add the correct checks when compiling for AArch64 big endian.
> 
> This patch fixes all glibc backtrace tests and causes no other
> regressions on glibc.
> 
> Please note that I don't have commit access, if this is OK could
> someone merge it for me?
> 
> Thanks,
> Matt Leach
> 
> libgcc/
> 2013-11-26  Matthew Leach  
> 
>   * config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
>   Check for correct opcodes on BE.
> 

Thanks, I've put this in.

R.




Re: [Patch, ARM] Add v7m specific extra rtx cost table

2013-11-28 Thread Richard Earnshaw
On 28/11/13 10:39, Kyrill Tkachov wrote:
> On 28/11/13 10:34, Terry Guo wrote:
>> Hello,
>>
>> This patch intends to add a specific extra rtx cost table for v7-m profile
>> targets. Tested with gcc regression test, no new regressions. Is it OK to
>> trunk?
>>
>> BR,
>> Terry
>>
>> 2013-11-28  Terry Guo  
>>
>> * config/arm/aarch-cost-tables.h (v7m_extra_costs): New table.
> 
> Hi Terry,
> 
> The aarch-cost-tables.h file was created in order to share cost tables 
> between 
> arm and aarch64 for cores that implement both ISAs (for example the ARMv8-A 
> ones).
> 
> I think the best position to put this cost table is in arm.c together with 
> the 
> rest of the armv7-only cost tables.
> 
> Thanks,
> Kyrill
> 

Correct.

R.



Re: [Patch, ARM] Add v7m specific extra rtx cost table

2013-11-28 Thread Richard Earnshaw
On 28/11/13 10:34, Terry Guo wrote:
> Hello,
> 
> This patch intends to add a specific extra rtx cost table for v7-m profile
> targets. Tested with gcc regression test, no new regressions. Is it OK to
> trunk?
> 
> BR,
> Terry
> 
> 2013-11-28  Terry Guo  
> 
>* config/arm/aarch-cost-tables.h (v7m_extra_costs): New table.
> 

You haven't mentioned the change in arm.c

> 
> m4-extra-cost-table-upstream-v1.txt
> 
> 
> diff --git a/gcc/config/arm/aarch-cost-tables.h 
> b/gcc/config/arm/aarch-cost-tables.h
> index d3e7dd2..52e18a1 100644
> --- a/gcc/config/arm/aarch-cost-tables.h
> +++ b/gcc/config/arm/aarch-cost-tables.h
> @@ -223,5 +223,105 @@ const struct cpu_cost_table cortexa53_extra_costs =
>  };
>  
>  
> +const struct cpu_cost_table v7m_extra_costs =

As Kyrill says, this should be in arm.c

OK with that change.

R.




RE: [Patch, ARM] Add v7m specific extra rtx cost table

2013-11-28 Thread Terry Guo

> -Original Message-
> From: Richard Earnshaw
> Sent: Thursday, November 28, 2013 7:09 PM
> To: Terry Guo
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [Patch, ARM] Add v7m specific extra rtx cost table
> 
> On 28/11/13 10:34, Terry Guo wrote:
> > Hello,
> >
> > This patch intends to add a specific extra rtx cost table for v7-m
> > profile targets. Tested with gcc regression test, no new regressions.
> > Is it OK to trunk?
> >
> > BR,
> > Terry
> >
> > 2013-11-28  Terry Guo  
> >
> >* config/arm/aarch-cost-tables.h (v7m_extra_costs): New
table.
> >
> 
> You haven't mentioned the change in arm.c
> 
> >
> > m4-extra-cost-table-upstream-v1.txt
> >
> >
> > diff --git a/gcc/config/arm/aarch-cost-tables.h
> > b/gcc/config/arm/aarch-cost-tables.h
> > index d3e7dd2..52e18a1 100644
> > --- a/gcc/config/arm/aarch-cost-tables.h
> > +++ b/gcc/config/arm/aarch-cost-tables.h
> > @@ -223,5 +223,105 @@ const struct cpu_cost_table
> > cortexa53_extra_costs =  };
> >
> >
> > +const struct cpu_cost_table v7m_extra_costs =
> 
> As Kyrill says, this should be in arm.c
> 
> OK with that change.
> 
> R.

Thank you all. I will update the patch and commit it.

BR,
Terry




RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

2013-11-28 Thread Gopalasubramanian, Ganesh
This patch adds influence of macro TARGET_LOOP_UNROLL_ADJUST during constant 
iterations (decide_unroll_constant_iterations).
The macro has been already checked for runtime iterations 
(decide_unroll_runtime_iterations),  and for unroll stupid 
(decide_unroll_stupid).

Bootstrapping and test passes.

Would like to know your comments before committing.

Regards
Ganesh

2013-11-28  Ganesh Gopalasubramanian  
 
* loop-unroll.c (decide_unroll_constant_iterations): Check macro 
TARGET_LOOP_UNROLL_ADJUST while deciding unroll factor.


diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 9c87167..557915f 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -664,6 +664,9 @@ decide_unroll_constant_iterations (struct loop *loop, int 
flags)
   if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
 nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);

+  if (targetm.loop_unroll_adjust)
+nunroll = targetm.loop_unroll_adjust (nunroll, loop);
+
   /* Skip big loops.  */
   if (nunroll <= 1)
 {

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Friday, November 22, 2013 1:46 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; Richard Guenther  
(richard.guent...@gmail.com); borntrae...@de.ibm.com; H.J. Lu 
(hjl.to...@gmail.com); Jakub Jelinek (ja...@redhat.com)
Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

On Wed, Nov 20, 2013 at 7:26 PM, Gopalasubramanian, Ganesh 
 wrote:

> Steamroller processors contain a loop predictor and a loop buffer, which may 
> make unrolling small loops less important.
> When unrolling small loops for steamroller, making the unrolled loop fit in 
> the loop buffer should be a priority.
>
> This patch uses a heuristic approach (number of memory references) to decide 
> the unrolling factor for small loops.
> This patch has some noise in SPEC 2006 results.
>
> Bootstrapping passes.
>
> I would like to know your comments before committing.

Please split the patch to target-dependant and target-independant part, and get 
target-idependant part reviewed first.



Store the SSA name range type in the tree structure

2013-11-28 Thread Richard Sandiford
At the moment, an anti range ~[A,B] is stored as [B+1,A-1].  This makes
it harder to store the range in the natural precision of A and B, since
B+1 and A-1 might not be representable in that precision.

This patch instead stores the original minimum and maximum values and
uses a spare tree bit to represent the range type.  The version below
is for trunk; I've also tested a wide-int version.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-core.h (tree_base): Document use of static_flag for SSA_NAME.
* tree.h (SSA_NAME_ANTI_RANGE_P, SSA_NAME_RANGE_TYPE): New macros.
* tree-ssanames.h (set_range_info): Add range_type argument.
(duplicate_ssa_name_range_info): Likewise.
* tree-ssanames.c (set_range_info): Take the range type as argument
and store it in SSA_NAME_ANTI_RANGE_P.
(duplicate_ssa_name_range_info): Likewise.
(get_range_info): Use SSA_NAME_ANTI_RANGE_P.
(set_nonzero_bits): Update call to set_range_info.
(duplicate_ssa_name_fn): Update call to duplicate_ssa_name_range_info.
* tree-ssa-copy.c (fini_copy_prop): Likewise.
* tree-vrp.c (remove_range_assertions): Update call to set_range_info.
(vrp_finalize): Likewise, passing anti-ranges directly.

Index: gcc/tree-core.h
===
--- gcc/tree-core.h 2013-11-15 18:23:21.113488640 +
+++ gcc/tree-core.h 2013-11-28 11:12:32.956977322 +
@@ -822,6 +822,9 @@ struct GTY(()) tree_base {
TRANSACTION_EXPR_OUTER in
   TRANSACTION_EXPR
 
+   SSA_NAME_ANTI_RANGE_P in
+  SSA_NAME
+
public_flag:
 
TREE_OVERFLOW in
Index: gcc/tree.h
===
--- gcc/tree.h  2013-11-20 10:58:57.275831561 +
+++ gcc/tree.h  2013-11-28 11:12:32.969977280 +
@@ -1434,6 +1434,14 @@ #define SSA_NAME_IS_DEFAULT_DEF(NODE) \
 #define SSA_NAME_PTR_INFO(N) \
SSA_NAME_CHECK (N)->ssa_name.info.ptr_info
 
+/* True if SSA_NAME_RANGE_INFO describes an anti-range.  */
+#define SSA_NAME_ANTI_RANGE_P(N) \
+SSA_NAME_CHECK (N)->base.static_flag
+
+/* The type of range described by SSA_NAME_RANGE_INFO.  */
+#define SSA_NAME_RANGE_TYPE(N) \
+(SSA_NAME_ANTI_RANGE_P (N) ? VR_ANTI_RANGE : VR_RANGE)
+
 /* Value range info attributes for SSA_NAMEs of non pointer-type variables.  */
 #define SSA_NAME_RANGE_INFO(N) \
 SSA_NAME_CHECK (N)->ssa_name.info.range_info
Index: gcc/tree-ssanames.h
===
--- gcc/tree-ssanames.h 2013-11-15 18:23:22.050485010 +
+++ gcc/tree-ssanames.h 2013-11-28 11:12:32.964977296 +
@@ -70,7 +70,8 @@ #define ssa_name(i) ((*cfun->gimple_df->
 enum value_range_type { VR_UNDEFINED, VR_RANGE, VR_ANTI_RANGE, VR_VARYING };
 
 /* Sets the value range to SSA.  */
-extern void set_range_info (tree, double_int, double_int);
+extern void set_range_info (tree, enum value_range_type, double_int,
+   double_int);
 /* Gets the value range from SSA.  */
 extern enum value_range_type get_range_info (const_tree, double_int *,
 double_int *);
@@ -93,7 +94,8 @@ extern struct ptr_info_def *get_ptr_info
 extern tree copy_ssa_name_fn (struct function *, tree, gimple);
 extern void duplicate_ssa_name_ptr_info (tree, struct ptr_info_def *);
 extern tree duplicate_ssa_name_fn (struct function *, tree, gimple);
-extern void duplicate_ssa_name_range_info (tree, struct range_info_def *);
+extern void duplicate_ssa_name_range_info (tree, enum value_range_type,
+  struct range_info_def *);
 extern void release_defs (gimple);
 extern void replace_ssa_name_symbol (tree, tree);
 
Index: gcc/tree-ssanames.c
===
--- gcc/tree-ssanames.c 2013-11-20 10:59:18.330782865 +
+++ gcc/tree-ssanames.c 2013-11-28 11:12:32.963977300 +
@@ -178,12 +178,14 @@ make_ssa_name_fn (struct function *fn, t
   return t;
 }
 
-/* Store range information MIN, and MAX to tree ssa_name NAME.  */
+/* Store range information RANGE_TYPE, MIN, and MAX to tree ssa_name NAME.  */
 
 void
-set_range_info (tree name, double_int min, double_int max)
+set_range_info (tree name, enum value_range_type range_type, double_int min,
+   double_int max)
 {
   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
+  gcc_assert (range_type == VR_RANGE || range_type == VR_ANTI_RANGE);
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
 
   /* Allocate if not available.  */
@@ -194,12 +196,16 @@ set_range_info (tree name, double_int mi
   ri->nonzero_bits = double_int::mask (TYPE_PRECISION (TREE_TYPE (name)));
 }
 
+  /* Record the range type.  */
+  if (SSA_NAME_RANGE_TYPE (name) != range_type)
+SSA_NAME_ANTI_RANGE_P (name) = (range_type == VR_ANTI_RANGE);
+
   /* Set the values.  */
   ri->min

[Patch, AArch64] Relax CANNOT_CHANGE_MODE_CLASS.

2013-11-28 Thread Tejas Belagod


Hi,

Currently, CANNOT_CHANGE_MODE_CLASS is too restrictive wrt the mode-changes it 
allows on FPREGs - it allows none at the moment. In fact, there are many mode 
changes that are safe and can be allowed. For example, in a pattern like:


(subreg:SF (reg:V4SF v0) 0)

it is legal to reduce this to

 (reg:SF v0)

The attached patch helps parts of rtlanal.c make such decisions(eg. 
simplify_subreg_regno).


Tested on aarch64-none-elf and aarch64_be-none-elf. OK for trunk?

Thanks,
Tejas Belagod
ARM.

Changelog:

2013-11-28  Tejas Belagod  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_cannot_change_mode_class):
Declare.
* config/aarch64/aarch64.c (aarch64_cannot_change_mode_class): New.
* config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Change to call
backend function aarch64_cannot_change_mode_class.diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 489fd1c..3dcc7c3 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -160,6 +160,9 @@ struct tune_params
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
 bool aarch64_bitmask_imm (HOST_WIDE_INT val, enum machine_mode);
+bool aarch64_cannot_change_mode_class (enum machine_mode,
+  enum machine_mode,
+  enum reg_class);
 enum aarch64_symbol_type
 aarch64_classify_symbolic_expression (rtx, enum aarch64_symbol_context);
 bool aarch64_constant_address_p (rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b1b4eef..6567a1b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8209,6 +8209,42 @@ aarch64_vectorize_vec_perm_const_ok (enum machine_mode 
vmode,
   return ret;
 }
 
+/* Implement target hook CANNOT_CHANGE_MODE_CLASS.  */
+bool
+aarch64_cannot_change_mode_class (enum machine_mode from,
+ enum machine_mode to,
+ enum reg_class rclass)
+{
+  /* Full-reg subregs are allowed on general regs or any class if they are
+ the same size.  */
+  if (GET_MODE_SIZE (from) == GET_MODE_SIZE (to)
+  || !reg_classes_intersect_p (FP_REGS, rclass))
+return false;
+
+  /* Limited combinations of subregs are safe on FPREGs.  Particularly,
+ 1. Vector Mode to Scalar mode where 1 unit of the vector is accessed.
+ 2. Scalar to Scalar for integer modes or same size float modes.
+ 3. Vector to Vector modes.  */
+  if (GET_MODE_SIZE (from) > GET_MODE_SIZE (to))
+{
+  if (aarch64_vector_mode_supported_p (from)
+ && GET_MODE_INNER (from) == to)
+   return false;
+
+  if (GET_MODE_NUNITS (from) == 1
+ && GET_MODE_NUNITS (to) == 1
+ && (GET_MODE_CLASS (from) == MODE_INT
+ || from == to))
+   return false;
+
+  if (aarch64_vector_mode_supported_p (from)
+ && aarch64_vector_mode_supported_p (to))
+   return false;
+}
+
+  return true;
+}
+
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index cead022..b58fe04 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -818,13 +818,8 @@ do {   
 \
   extern void  __aarch64_sync_cache_range (void *, void *);\
   __aarch64_sync_cache_range (beg, end)
 
-/*  VFP registers may only be accessed in the mode they
-   were set.  */
 #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)  \
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)  \
-   ? reg_classes_intersect_p (FP_REGS, (CLASS))\
-   : 0)
-
+  aarch64_cannot_change_mode_class (FROM, TO, CLASS)
 
 #define SHIFT_COUNT_TRUNCATED !TARGET_SIMD
 

Re: [PATCH] Get rid of useless -fno-rtti for libubsan

2013-11-28 Thread Yury Gribov

> I wonder if this can't be done through
> just adding -fno-rtti to AM_CXXFLAGS and say
> ubsan_handlers_cxx.% ubsan_type_hash.% : AM_CXXFLAGS += -frtti

Yup, that did the trick. Attaching modified patch.

diff --git a/libsanitizer/ubsan/Makefile.am b/libsanitizer/ubsan/Makefile.am
index e98984a..86fcaca 100644
--- a/libsanitizer/ubsan/Makefile.am
+++ b/libsanitizer/ubsan/Makefile.am
@@ -4,7 +4,7 @@ AM_CPPFLAGS = -I $(top_srcdir) -I $(top_srcdir)/include
 gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 
 DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS 
-AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long  -fPIC -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros
+AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long  -fPIC -fno-builtin -fno-exceptions -fno-rtti -fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros
 AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
 ACLOCAL_AMFLAGS = -I m4
 
@@ -13,11 +13,13 @@ toolexeclib_LTLIBRARIES = libubsan.la
 ubsan_files = \
 	ubsan_diag.cc \
 	ubsan_handlers.cc \
-	ubsan_handlers_cxx.cc \
-	ubsan_type_hash.cc \
 	ubsan_value.cc
 
-libubsan_la_SOURCES = $(ubsan_files) 
+ubsan_cxx_files = \
+	ubsan_handlers_cxx.cc \
+	ubsan_type_hash.cc
+
+libubsan_la_SOURCES = $(ubsan_files) $(ubsan_cxx_files)
 libubsan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la 
 if !USING_MAC_INTERPOSE
 libubsan_la_LIBADD += $(top_builddir)/interception/libinterception.la
@@ -25,6 +27,9 @@ endif
 libubsan_la_LIBADD += $(LIBSTDCXX_RAW_CXX_LDFLAGS)
 libubsan_la_LDFLAGS = -version-info `grep -v '^\#' $(srcdir)/libtool-version` -lpthread -ldl
 
+# Use special rules for files that require RTTI support.
+ubsan_handlers_cxx.% ubsan_type_hash.% : AM_CXXFLAGS += -frtti
+
 # Work around what appears to be a GNU make bug handling MAKEFLAGS
 # values defined in terms of make variables, as is the case for CC and
 # friends when we are called from the top level Makefile.
diff --git a/libsanitizer/ubsan/Makefile.in b/libsanitizer/ubsan/Makefile.in
index 6812538..2e85384 100644
--- a/libsanitizer/ubsan/Makefile.in
+++ b/libsanitizer/ubsan/Makefile.in
@@ -81,9 +81,9 @@ am__DEPENDENCIES_1 =
 libubsan_la_DEPENDENCIES =  \
 	$(top_builddir)/sanitizer_common/libsanitizer_common.la \
 	$(am__append_1) $(am__DEPENDENCIES_1)
-am__objects_1 = ubsan_diag.lo ubsan_handlers.lo ubsan_handlers_cxx.lo \
-	ubsan_type_hash.lo ubsan_value.lo
-am_libubsan_la_OBJECTS = $(am__objects_1)
+am__objects_1 = ubsan_diag.lo ubsan_handlers.lo ubsan_value.lo
+am__objects_2 = ubsan_handlers_cxx.lo ubsan_type_hash.lo
+am_libubsan_la_OBJECTS = $(am__objects_1) $(am__objects_2)
 libubsan_la_OBJECTS = $(am_libubsan_la_OBJECTS)
 libubsan_la_LINK = $(LIBTOOL) --tag=CXX $(AM_LIBTOOLFLAGS) \
 	$(LIBTOOLFLAGS) --mode=link $(CXXLD) $(AM_CXXFLAGS) \
@@ -240,7 +240,7 @@ AM_CPPFLAGS = -I $(top_srcdir) -I $(top_srcdir)/include
 # May be used by toolexeclibdir.
 gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic \
-	-Wno-long-long -fPIC -fno-builtin -fno-exceptions \
+	-Wno-long-long -fPIC -fno-builtin -fno-exceptions -fno-rtti \
 	-fomit-frame-pointer -funwind-tables -fvisibility=hidden \
 	-Wno-variadic-macros $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
 ACLOCAL_AMFLAGS = -I m4
@@ -248,11 +248,13 @@ toolexeclib_LTLIBRARIES = libubsan.la
 ubsan_files = \
 	ubsan_diag.cc \
 	ubsan_handlers.cc \
-	ubsan_handlers_cxx.cc \
-	ubsan_type_hash.cc \
 	ubsan_value.cc
 
-libubsan_la_SOURCES = $(ubsan_files) 
+ubsan_cxx_files = \
+	ubsan_handlers_cxx.cc \
+	ubsan_type_hash.cc
+
+libubsan_la_SOURCES = $(ubsan_files) $(ubsan_cxx_files)
 libubsan_la_LIBADD =  \
 	$(top_builddir)/sanitizer_common/libsanitizer_common.la \
 	$(am__append_1) $(LIBSTDCXX_RAW_CXX_LDFLAGS)
@@ -575,6 +577,9 @@ uninstall-am: uninstall-toolexeclibLTLIBRARIES
 	tags uninstall uninstall-am uninstall-toolexeclibLTLIBRARIES
 
 
+# Use special rules for files that require RTTI support.
+ubsan_handlers_cxx.% ubsan_type_hash.% : AM_CXXFLAGS += -frtti
+
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
2013-11-28  Jakub Jelinek  
Yury Gribov  

PR sanitizer/59106
* ubsan/Makefile.am (AM_CXXFLAGS): Disable -frtti for files that
don't need it.
* ubsan/Makefile.in: Regenerated.



Re: [PATCH] Get rid of useless -fno-rtti for libubsan

2013-11-28 Thread Jakub Jelinek
On Thu, Nov 28, 2013 at 03:29:02PM +0400, Yury Gribov wrote:
> > I wonder if this can't be done through
> > just adding -fno-rtti to AM_CXXFLAGS and say
> > ubsan_handlers_cxx.% ubsan_type_hash.% : AM_CXXFLAGS += -frtti
> 
> Yup, that did the trick. Attaching modified patch.

> @@ -13,11 +13,13 @@ toolexeclib_LTLIBRARIES = libubsan.la
>  ubsan_files = \
>   ubsan_diag.cc \
>   ubsan_handlers.cc \
> - ubsan_handlers_cxx.cc \
> - ubsan_type_hash.cc \
>   ubsan_value.cc
>  
> -libubsan_la_SOURCES = $(ubsan_files) 
> +ubsan_cxx_files = \
> + ubsan_handlers_cxx.cc \
> + ubsan_type_hash.cc
> +
> +libubsan_la_SOURCES = $(ubsan_files) $(ubsan_cxx_files)
>  libubsan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la 
>  if !USING_MAC_INTERPOSE
>  libubsan_la_LIBADD += $(top_builddir)/interception/libinterception.la

The above hunk is not needed anymore, is it?

Ok for trunk without that hunk (and with Makefile.in regenerated again).

> 2013-11-28  Jakub Jelinek  
>   Yury Gribov  
> 
>   PR sanitizer/59106
>   * ubsan/Makefile.am (AM_CXXFLAGS): Disable -frtti for files that
>   don't need it.
>   * ubsan/Makefile.in: Regenerated.

Jakub


Re: wide-int, gimple

2013-11-28 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Mon, Nov 25, 2013 at 12:24:30PM +0100, Richard Biener wrote:
>> On Sat, Nov 23, 2013 at 8:21 PM, Mike Stump  wrote:
>> > Richi has asked the we break the wide-int patch so that the
>> > individual port and front end maintainers can review their parts
>> > without have to go through the entire patch.  This patch covers the
>> > gimple code.
>> 
>> @@ -1754,7 +1754,7 @@ dump_ssaname_info (pretty_printer *buffer, tree
>> node, int spc)
>>if (!POINTER_TYPE_P (TREE_TYPE (node))
>>&& SSA_NAME_RANGE_INFO (node))
>>  {
>> -  double_int min, max, nonzero_bits;
>> +  widest_int min, max, nonzero_bits;
>>value_range_type range_type = get_range_info (node, &min, &max);
>> 
>>if (range_type == VR_VARYING)
>> 
>> this makes me suspect you are changing SSA_NAME_RANGE_INFO
>> to embed two max wide_ints.  That's a no-no.
>
> Well, the range_info_def struct right now contains 3 double_ints, which is
> unnecessary overhead for the most of the cases where the SSA_NAME's type
> has just at most HOST_BITS_PER_WIDE_INT bits and thus we could fit all 3 of
> them into 3 HOST_WIDE_INTs rather than 3 double_ints.  So supposedly struct
> range_info_def could be a template on the type's precision rounded up to HWI
> bits, or say have 3 alternatives there, use
> FIXED_WIDE_INT (HOST_BITS_PER_WIDE_INT) for the smallest types,
> FIXED_WIDE_INT (2 * HOST_BITS_PER_WIDE_INT) aka double_int for the larger
> but still common ones, and widest_int for the rest, then the API to set/get
> it could use widest_int everywhere, and just what storage we'd use would
> depend on the precision of the type.

This patch adds a trailing_wide_ints  that can be used at the end of
a variable-length structure to store N wide_ints.  There's also a macro
to declare get/set methods for each of the N elements.

At the moment I've only defined non-const operator[].  It'd be possible
to add a const version later if necessary.

The size of range_info_def for precisions that fit in M HWIs is then
1 + 3 * M, so 4 for the common case (down from 6 on trunk).  The maximum
is 7 for current x86_64 types (up from 6 on trunk).

I wondered whether to keep the interface using widest_int, but I think
wide_int works out more naturally.  The only caller that wants to extend
beyond the precision is CCP, but that's already special because the upper
bits are supposed to be set (i.e. it's not a normal sign or zero extension).

This relies on the SSA_NAME_ANTI_RANGE_P patch I just posted.

If this is OK I'll look at using the same structure elsewhere.

Thanks,
Richard


Index: gcc/ChangeLog.wide-int
===
--- gcc/ChangeLog.wide-int  2013-11-27 18:45:17.448816304 +
+++ gcc/ChangeLog.wide-int  2013-11-28 11:37:15.320020047 +
@@ -677,6 +677,7 @@
* tree-ssa-ccp.c: Update comment at top of file.  Include
wide-int-print.h.
(struct prop_value_d): Change type of mask to widest_int.
+   (extend_mask): New function.
(dump_lattice_value): Use wide-int interfaces.
(get_default_value): Likewise.
(set_constant_value): Likewise.
@@ -768,16 +769,20 @@
* tree-ssa-math-opts.c
(gimple_expand_builtin_pow): Update calls to real_to_integer.
* tree-ssanames.c
-   (set_range_info): Use widest_ints rather than double_ints.
-   (get_range_info): Likewise.
+   (set_range_info): Use wide_int_refs rather than double_ints.
+   Adjust for trailing_wide_ints <3> representation.
(set_nonzero_bits): Likewise.
+   (get_range_info): Return wide_ints rather than double_ints.
+   Adjust for trailing_wide_ints <3> representation.
(get_nonzero_bits): Likewise.
+   (duplicate_ssa_name_range_info): Adjust for trailing_wide_ints <3>
+   representation.
* tree-ssanames.h
-   (struct range_info_def): Change type of min, max and nonzero_bits
-   to widest_int.
-   (set_range_info): Use widest_ints rather than double_ints.
-   (get_range_info): Likewise.
+   (struct range_info_def): Replace min, max and nonzero_bits with
+   a trailing_wide_ints <3>.
+   (set_range_info): Use wide_int_refs rather than double_ints.
(set_nonzero_bits): Likewise.
+   (get_range_info): Return wide_ints rather than double_ints.
(get_nonzero_bits): Likewise.
* tree-ssa-phiopt.c
(jump_function_from_stmt): Use wide-int interfaces.
Index: gcc/builtins.c
===
--- gcc/builtins.c  2013-11-27 18:45:17.448816304 +
+++ gcc/builtins.c  2013-11-27 18:45:46.710684576 +
@@ -3125,7 +3125,7 @@ determine_block_size (tree len, rtx len_
 }
   else
 {
-  widest_int min, max;
+  wide_int min, max;
   enum value_range_type range_type = VR_UNDEFINED;
 
   /* Determine bounds from the type.  */
@@ -3152,9 +3152,8 @@ determine_block_size (tre

[PATCH] Fix PR59323

2013-11-28 Thread Richard Biener

This fixes PR59323 - we were unifying TYPE_DECLs used in different
BLOCK_VARS - ultimately because they were errorneously put into
the global indexed decls.  Fixed by including some more kinds that,
when having function context, are not indexed.

This is the minimal set of kinds to fix the testcase (intentionally
I refrained from adding other stuff that makes "sense" at this point).

LTO bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2013-11-28  Richard Biener  

PR lto/59323
* lto-streamer-out.c (tree_is_indexable): TYPE_DECLs and
CONST_DECLs in function context are not indexable.

* gcc.dg/lto/pr59323_0.c: New testcase.

Index: gcc/lto-streamer-out.c
===
--- gcc/lto-streamer-out.c  (revision 205447)
+++ gcc/lto-streamer-out.c  (working copy)
@@ -135,8 +135,10 @@ tree_is_indexable (tree t)
  definition.  */
   if (TREE_CODE (t) == PARM_DECL || TREE_CODE (t) == RESULT_DECL)
 return variably_modified_type_p (TREE_TYPE (DECL_CONTEXT (t)), NULL_TREE);
-  else if (TREE_CODE (t) == VAR_DECL && decl_function_context (t)
-  && !TREE_STATIC (t))
+  else if (((TREE_CODE (t) == VAR_DECL && !TREE_STATIC (t))
+   || TREE_CODE (t) == TYPE_DECL
+   || TREE_CODE (t) == CONST_DECL)
+  && decl_function_context (t))
 return false;
   else if (TREE_CODE (t) == DEBUG_EXPR_DECL)
 return false;
Index: gcc/testsuite/gcc.dg/lto/pr59323_0.c
===
--- gcc/testsuite/gcc.dg/lto/pr59323_0.c(revision 0)
+++ gcc/testsuite/gcc.dg/lto/pr59323_0.c(revision 0)
@@ -0,0 +1,37 @@
+/* { dg-lto-do link } */
+/* { dg-lto-options { { -O2 -g -flto } } } */
+/* { dg-extra-ld-options { -r -nostdlib } } */
+
+extern void bar(void);
+
+int main(int argc, char **argv)
+{
+  int i;
+
+  if (argc == 1) {
+enum { X };
+
+bar();
+
+{
+  enum { X };
+
+  asm goto ("" : : : : lab);
+lab:
+  ;
+}
+  }
+
+  {
+enum { X };
+
+int foo(void)
+{
+  return argv[0][0];
+}
+
+i = foo();
+  }
+
+  return i;
+}


Re: [PATCH ARM]Refine scaled address expression on ARM

2013-11-28 Thread Bin.Cheng
On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw  wrote:
> On 18/09/13 10:15, bin.cheng wrote:
>>
>>
>>> -Original Message-
>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>>> ow...@gcc.gnu.org] On Behalf Of bin.cheng
>>> Sent: Monday, September 02, 2013 3:09 PM
>>> To: Richard Earnshaw
>>> Cc: gcc-patches@gcc.gnu.org
>>> Subject: RE: [PATCH ARM]Refine scaled address expression on ARM
>>>
>>>
>>>
 -Original Message-
 From: Richard Earnshaw
 Sent: Thursday, August 29, 2013 9:06 PM
 To: Bin Cheng
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH ARM]Refine scaled address expression on ARM

 On 28/08/13 08:00, bin.cheng wrote:
> Hi,
>
> This patch refines scaled address expression on ARM.  It supports
> "base+index*scale" in arm_legitimate_address_outer_p.  It also tries
> to legitimize "base + index * scale + offset" with "reg <- base +
> offset;  reg
> + index * scale" by introducing thumb2_legitimize_address.  For now
> + function
> thumb2_legitimize_address is a kind of placeholder and just does the
> mentioned transformation by calling to try_multiplier_address.
> Hoping we can improve it in the future.
>
> With this patch:
> 1) "base+index*scale" is recognized.

 That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form.
  So this shouldn't be necessary.  Can you identify where this
>>> non-canoncial form is being generated?

>>>
>>> Oh, for now ivopt constructs "index*scale" to test whether backend
>>> supports scaled addressing mode, which is not valid on ARM, so I was going
>>> to construct "base + index*scale" instead.  Since "base + index * scale"
>> is not
>>> canonical form, I will construct the canonical form and drop this part of
>> the
>>> patch.
>>>
>>> Is rest of this patch OK?
>>>
>> Hi Richard, I removed the part over which you concerned and created this
>> updated patch.
>>
>> Is it OK?
>>
>> Thanks.
>> bin
>>
>> 2013-09-18  Bin Cheng  
>>
>>   * config/arm/arm.c (try_multiplier_address): New function.
>>   (thumb2_legitimize_address): New function.
>>   (arm_legitimize_address): Call try_multiplier_address and
>>   thumb2_legitimize_address.
>>
>>
>> 6-arm-scaled_address-20130918.txt
>>
>>
>> Index: gcc/config/arm/arm.c
>> ===
>> --- gcc/config/arm/arm.c  (revision 200774)
>> +++ gcc/config/arm/arm.c  (working copy)
>> @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
>>  }
>>  }
>>
>> +/* Try to find address expression like base + index * scale + offset
>> +   in X.  If we find one, force base + offset into register and
>> +   construct new expression reg + index * scale; return the new
>> +   address expression if it's valid.  Otherwise return X.  */
>> +static rtx
>> +try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED)
>> +{
>> +  rtx tmp, base_reg, new_rtx;
>> +  rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = 
>> NULL_RTX;
>> +
>> +  gcc_assert (GET_CODE (x) == PLUS);
>> +
>> +  /* Try to find and record base/index/scale/offset in X. */
>> +  if (GET_CODE (XEXP (x, 1)) == MULT)
>> +{
>> +  tmp = XEXP (x, 0);
>> +  index = XEXP (XEXP (x, 1), 0);
>> +  scale = XEXP (XEXP (x, 1), 1);
>> +  if (GET_CODE (tmp) != PLUS)
>> + return x;
>> +
>> +  base = XEXP (tmp, 0);
>> +  offset = XEXP (tmp, 1);
>> +}
>> +  else
>> +{
>> +  tmp = XEXP (x, 0);
>> +  offset = XEXP (x, 1);
>> +  if (GET_CODE (tmp) != PLUS)
>> + return x;
>> +
>> +  base = XEXP (tmp, 0);
>> +  scale = XEXP (tmp, 1);
>> +  if (GET_CODE (base) == MULT)
>> + {
>> +   tmp = base;
>> +   base = scale;
>> +   scale = tmp;
>> + }
>> +  if (GET_CODE (scale) != MULT)
>> + return x;
>> +
>> +  index = XEXP (scale, 0);
>> +  scale = XEXP (scale, 1);
>> +}
>> +
>> +  if (CONST_INT_P (base))
>> +{
>> +  tmp = base;
>> +  base = offset;
>> +  offset = tmp;
>> +}
>> +
>> +  if (CONST_INT_P (index))
>> +{
>> +  tmp = index;
>> +  index = scale;
>> +  scale = tmp;
>> +}
>> +
>> +  /* ARM only supports constant scale in address.  */
>> +  if (!CONST_INT_P (scale))
>> +return x;
>> +
>> +  if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
>> +return x;
>> +
>> +  /* Only register/constant are allowed in each part.  */
>> +  if (!symbol_mentioned_p (base)
>> +  && !symbol_mentioned_p (offset)
>> +  && !symbol_mentioned_p (index)
>> +  && !symbol_mentioned_p (scale))
>> +{
>
> It would be easier to do this at the top of the function --
>   if (symbol_mentioned_p (x))
> return x;
>
>
>> +  /* Force "base+offset" into register and construct
>> +  "register+index*scale".  Return the new expression
>> +  only if it's valid.  */
>> +  tmp = gen_rtx_PLUS (SImode, bas

[wide-int] Small wide_int_to_tree optimisation

2013-11-28 Thread Richard Sandiford
This patch convers some gcc_asserts to gcc_checking_asserts.
I think the first two in particular should be checking-only,
since we ignore the bits above the target precision anyway.

Also we have:

  /* This is a little hokie, but if the prec is smaller than
 what is necessary to hold INTEGER_SHARE_LIMIT, then the
 obvious test will not get the correct answer.  */
  if (prec < HOST_BITS_PER_WIDE_INT)
{
  if (cst.to_uhwi () < (unsigned HOST_WIDE_INT) INTEGER_SHARE_LIMIT)
ix = cst.to_uhwi ();
}
  else if (wi::ltu_p (cst, INTEGER_SHARE_LIMIT))
ix = cst.to_uhwi ();

But this case only occurs for single-HWI integers.  We later check
for that and extract the HWI value, so it seems simpler to postpone
the index check until then.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/tree.c
===
*** gcc/tree.c  2013-11-28 11:27:32.043124135 +
--- gcc/tree.c  2013-11-28 11:45:39.957427563 +
*** wide_int_to_tree (tree type, const wide_
*** 1205,1295 
if (l > 1)
  {
if (pcst.elt (l - 1) == 0)
!   gcc_assert (pcst.elt (l - 2) < 0);
if (pcst.elt (l - 1) == (HOST_WIDE_INT) -1)
!   gcc_assert (pcst.elt (l - 2) >= 0);
  }
  
wide_int cst = wide_int::from (pcst, prec, sgn);
unsigned int ext_len = get_int_cst_ext_nunits (type, cst);
  
!   switch (TREE_CODE (type))
  {
! case NULLPTR_TYPE:
!   gcc_assert (cst == 0);
!   /* Fallthru.  */
! 
! case POINTER_TYPE:
! case REFERENCE_TYPE:
! case POINTER_BOUNDS_TYPE:
!   /* Cache NULL pointer and zero bounds.  */
!   if (cst == 0)
!   {
! limit = 1;
! ix = 0;
!   }
!   break;
  
! case BOOLEAN_TYPE:
!   /* Cache false or true.  */
!   limit = 2;
!   if (wi::leu_p (cst, 1))
!   ix = cst.to_uhwi ();
!   break;
! 
! case INTEGER_TYPE:
! case OFFSET_TYPE:
!   if (TYPE_SIGN (type) == UNSIGNED)
{
! /* Cache 0..N */
! limit = INTEGER_SHARE_LIMIT;
! 
! /* This is a little hokie, but if the prec is smaller than
!what is necessary to hold INTEGER_SHARE_LIMIT, then the
!obvious test will not get the correct answer.  */
! if (prec < HOST_BITS_PER_WIDE_INT)
{
! if (cst.to_uhwi () < (unsigned HOST_WIDE_INT) INTEGER_SHARE_LIMIT)
!   ix = cst.to_uhwi ();
}
! else if (wi::ltu_p (cst, INTEGER_SHARE_LIMIT))
!   ix = cst.to_uhwi ();
!   }
!   else
!   {
! /* Cache -1..N */
! limit = INTEGER_SHARE_LIMIT + 1;
  
! if (cst == -1)
!   ix = 0;
! else if (!wi::neg_p (cst))
{
! if (prec < HOST_BITS_PER_WIDE_INT)
!   {
! if (cst.to_shwi () < INTEGER_SHARE_LIMIT)
!   ix = cst.to_shwi () + 1;
!   }
! else if (wi::lts_p (cst, INTEGER_SHARE_LIMIT))
!   ix = cst.to_shwi () + 1;
}
!   }
!   break;
  
! case ENUMERAL_TYPE:
!   break;
  
! default:
!   gcc_unreachable ();
! }
  
-   if (ext_len == 1)
- {
-   /* We just need to store a single HOST_WIDE_INT.  */
-   HOST_WIDE_INT hwi;
-   if (TYPE_UNSIGNED (type))
-   hwi = cst.to_uhwi ();
-   else
-   hwi = cst.to_shwi ();
if (ix >= 0)
{
  /* Look for it in the type's vector of small shared ints.  */
--- 1205,1276 
if (l > 1)
  {
if (pcst.elt (l - 1) == 0)
!   gcc_checking_assert (pcst.elt (l - 2) < 0);
if (pcst.elt (l - 1) == (HOST_WIDE_INT) -1)
!   gcc_checking_assert (pcst.elt (l - 2) >= 0);
  }
  
wide_int cst = wide_int::from (pcst, prec, sgn);
unsigned int ext_len = get_int_cst_ext_nunits (type, cst);
  
!   if (ext_len == 1)
  {
!   /* We just need to store a single HOST_WIDE_INT.  */
!   HOST_WIDE_INT hwi;
!   if (TYPE_UNSIGNED (type))
!   hwi = cst.to_uhwi ();
!   else
!   hwi = cst.to_shwi ();
  
!   switch (TREE_CODE (type))
{
!   case NULLPTR_TYPE:
! gcc_assert (hwi == 0);
! /* Fallthru.  */
! 
!   case POINTER_TYPE:
!   case REFERENCE_TYPE:
!   case POINTER_BOUNDS_TYPE:
! /* Cache NULL pointer and zero bounds.  */
! if (hwi == 0)
{
! limit = 1;
! ix = 0;
}
! break;
  
!   case BOOLEAN_TYPE:
! /* Cache false or true.  */
! limit = 2;
! if (hwi < 2)
!   ix = hwi;
! break;
! 
!   case INTEGER_TYPE:
!   case OFFSET_TYPE:
! if (TYPE_SIGN (type) == UNSIGNED)
{
! /* Cache [0, N).  */
! limit = INTEGER_SHARE_LIMIT;
! if (IN_RANGE (hwi, 0, INTE

[wwwdocs] Update obvious fix commit policy

2013-11-28 Thread Diego Novillo
[ sent first version in html. apologies for the dup. ]

Based on the recent discussion on the obvious fix policy.

OK to commit?

Index: htdocs/svnwrite.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/svnwrite.html,v
retrieving revision 1.29
diff -u -d -u -p -r1.29 svnwrite.html
--- htdocs/svnwrite.html24 Sep 2013 18:26:29 -  1.29
+++ htdocs/svnwrite.html28 Nov 2013 12:04:40 -
@@ -147,10 +147,13 @@ list.

 The following changes can be made by everyone with SVN write access:

-Fixes for obvious typos in ChangeLog files, docs, web pages, comments
-and similar stuff.  Just check in the fix and copy it to
-gcc-patches.  We don't want to get overly anal-retentive
-about checkin policies.
+Obvious fixes to documentation, code and test cases can be
+committed without prior approval.  Just check in the fix and copy it
+to gcc-patches.  A good test to determine whether a fix
+is obvious: "will the person who objects to my work the most be able
+to find a fault with my fix?".  If the fix is later found to be
+faulty, it can always be rolled back.  We don't want to get overly
+anal-retentive about checkin policies.

 Similarly, no outside approval is needed to revert a patch that you
 checked in.


Re: [PATCH] Get rid of useless -fno-rtti for libubsan

2013-11-28 Thread Yury Gribov

> The above hunk is not needed anymore, is it?

Right.

> Ok for trunk without that hunk (and with Makefile.in regenerated again).

Done, r205482.

-Y


[PATCH, doc] Document -fsanitize=signed-integer-overflow

2013-11-28 Thread Marek Polacek
As promised, this patch on top of this patch by Tobias:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03082.html
adds the documentation for -fsanitize=signed-integer-overflow.

Ok to install after the actual implementation is in?

2013-11-28  Marek Polacek  

* doc/invoke.texi: Document -fsanitize=signed-integer-overflow.

--- gcc/doc/invoke.texi.mp3 2013-11-28 13:07:09.011575348 +0100
+++ gcc/doc/invoke.texi 2013-11-28 13:24:45.109798224 +0100
@@ -5341,6 +5341,19 @@ built with this option turned on will is
 tries to dereference a NULL pointer, or if a reference (possibly an
 rvalue reference) is bound to a NULL pointer.
 
+@item -fsanitize=signed-integer-overflow
+@opindex fsanitize=signed-integer-overflow
+
+This option enables signed integer overflow checking.  We check that
+the result of @code{+}, @code{*}, and both unary and binary @code{-}
+does not overflow in the signed arithmetics.  Note, integer promotion
+rules must be taken into account.  That is, the following is not an
+overflow:
+@smallexample
+signed char a = SCHAR_MAX;
+a++;
+@end smallexample
+
 @end table
 
 While @option{-ftrapv} causes traps for signed overflows to be emitted,

Marek


Re: [testsuite] Properly set ld_library_path in cilk-plus tests

2013-11-28 Thread Rainer Orth
Hi Balaji,

>> 2013-11-26  Rainer Orth  
>> 
>>  * gcc.dg/cilk-plus/cilk-plus.exp: Append to ld_library_path.
>>  Call set_ld_library_path_env_vars.
>>  * g++.dg/cilk-plus/cilk-plus.exp: Likewise.
>
> Thanks for catching this! Sorry I didn't catch it sooner. I am just getting
> myself familiar with the DejaGNU framework.

No worries.  DejaGnu is a complex beast and it's hard enough to wrap
your head around it.  Fortunately Mike alreay identified and fixed a
couple of other issues.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [testsuite] Properly set ld_library_path in cilk-plus tests

2013-11-28 Thread Rainer Orth
Hi Jeff,

>> 2013-11-26  Rainer Orth  
>>
>>  * gcc.dg/cilk-plus/cilk-plus.exp: Append to ld_library_path.
>>  Call set_ld_library_path_env_vars.
>>  * g++.dg/cilk-plus/cilk-plus.exp: Likewise.
> Thanks for taking care of this.  You're probably getting more cilk+ fallout
> than most because you do a lot of Solaris work.   Sorry about that.

no need to be sorry: that's what you get for working on a niche platform ;-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Patch ping (stage1-ish patches)

2013-11-28 Thread Rainer Orth
Hi Jeff,

>> On my side, there's
>>
>> [c++, driver] Add -lrt on Solaris
>> http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01488.html
>>
>> resubmitted as
>>
>> http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00412.html
>>
>> It's unclear if the more intrusive solution outlined in the second
>> message (introduce libstdc++.spec) were acceptable in stage3, and I'm
>> uncertain if I can get it ready in time.
> Well, the short-term hack to g++spec.c along with the corresponding change
> to sol2.h is, OK for the trunk.

thanks, I've just installed it as a stopgap measure.

> As for the more invasive change, I'd let the C++ runtime guys decide if its
> too invasive for stage3.  If you go that route, worst case is it's
> considered too invasive and it goes in during stage1 and you can remove the
> hack-ish solution from this patch.

Right.  I just remembered that something along this line will be needed
for Solaris 10, too, which unlike Solaris 9 won't be removed in GCC
4.10.  I'll see how far I get with a libstdc++.spec patch and than let
the C++ maintainers decide what to do for 4.9.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


GCC -mach=native on OS X 10.8

2013-11-28 Thread Vera Loeser
Dear all,

I use the GCC 4.8 from Macports and I'd like to optimize my code by using the 
flag -march=native. For that I have to use the assembler of clang, because the 
GNU assembler is too old and cannot use the AVX extensions. By setting the 
flags "-Wa,-q" the GCC should work. But with optimizing loops there are some 
problems:

/var/folders/zz/q_423dms0vqcgp5khshwwrwh000142/T//cchZfZOX.s:232771:2:  
error: ambiguous instructions require an explicit suffix (could be  
'fisttps', or 'fisttpl') 
 fisttp 62(%rsp) 

I think the GCC generates as code, which is too imprecise for the clang 
compiler. 
Is there a way to get more precise code to use the -march=native flag?

Best,
Vera


Re: GCC -mach=native on OS X 10.8

2013-11-28 Thread Iain Sandoe
Hello Vera,

On 28 Nov 2013, at 13:15, Vera Loeser wrote:

> I use the GCC 4.8 from Macports and I'd like to optimize my code by using the 
> flag -march=native. For that I have to use the assembler of clang, because 
> the GNU assembler is too old and cannot use the AVX extensions. By setting 
> the flags "-Wa,-q" the GCC should work. But with optimizing loops there are 
> some problems:
> 
> /var/folders/zz/q_423dms0vqcgp5khshwwrwh000142/T//cchZfZOX.s:232771:2:  
> error: ambiguous instructions require an explicit suffix (could be  
> 'fisttps', or 'fisttpl') 
> fisttp 62(%rsp) 
> 
> I think the GCC generates as code, which is too imprecise for the clang 
> compiler. 
> Is there a way to get more precise code to use the -march=native flag?

At this time, using "llvm-mc" (or clang -cc1as)  directly from GCC requires 
non-trivial changes to the configuration.

I have a patch-in-progress to address this, but it is not yet ready - and, in 
the short-term, I don't think there is any simple work-around.

Please file a bug requesting the enhancement, and I will copy the list when the 
patch is ready for testing,

thanks.
Iain

Re: [PATCH] Don't create out-of-bounds BIT_FIELD_REFs

2013-11-28 Thread Richard Biener
On Thu, Nov 28, 2013 at 12:23 AM, Tom de Vries  wrote:
> On 27-11-13 07:20, Jeff Law wrote:
>>
>> On 11/26/13 14:10, Tom de Vries wrote:
>>>
>>> On 26-11-13 11:12, Richard Biener wrote:

 On Tue, Nov 26, 2013 at 8:57 AM, Tom de Vries 
 wrote:
>
> Jason,
>
> This patch prevents creating out-of-bounds BIT_FIELD_REFs in 3
> locations.
>
> It fixes a SIGSEGV (triggered by gimple_fold_indirect_ref_1) in
> simplify_bitfield_ref. I've added an assert to detect the problematic
> BIT_FIELD_REF there.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?


 Looks obvious to me - btw, instead of asserting in tree-ssa-forwprop.c
 can you adjust the verify_expr BIT_FIELD_REF code so it checks for
 this?

>>>
>>> Done.
>>>
>>> And I've move the test-case to c-c++-common.
>>>
>>> Build and reg-tested on x86_64 (ada inclusive). Now redoing build and
>>> test, but with bootstrap build.
>>>
>>> OK for trunk?
>>
>> Yes, OK for the trunk.
>>
>
> Committed to trunk.
>
> Also ok for 4.8 branch? It's a 4.8/4.9 regression.

Ok if testing succeeds there but please leave out the checking bits.

Thanks,
Richard.

> Thanks,
> - Tom
>
>> jeff
>>
>


Re: [wwwdocs] Update obvious fix commit policy

2013-11-28 Thread Diego Novillo
New version with a slightly cleaned up wording:

Index: htdocs/svnwrite.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/svnwrite.html,v
retrieving revision 1.29
diff -u -d -u -p -r1.29 svnwrite.html
--- htdocs/svnwrite.html24 Sep 2013 18:26:29 -  1.29
+++ htdocs/svnwrite.html28 Nov 2013 13:56:55 -
@@ -147,10 +147,13 @@ list.

 The following changes can be made by everyone with SVN write access:

-Fixes for obvious typos in ChangeLog files, docs, web pages, comments
-and similar stuff.  Just check in the fix and copy it to
-gcc-patches.  We don't want to get overly anal-retentive
-about checkin policies.
+Obvious fixes to documentation, code and test cases can be
+committed without prior approval.  Just check in the fix and copy it
+to gcc-patches.  A good test to determine whether a fix
+is obvious: "will the person who objects to my work the most be able
+to find a fault with my fix?".  If the fix is later found to be
+faulty, it can always be rolled back.  We don't want to get overly
+restrictive about checkin policies.

 Similarly, no outside approval is needed to revert a patch that you
 checked in.


Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

2013-11-28 Thread Richard Biener
On Thu, Nov 28, 2013 at 12:23 PM, Gopalasubramanian, Ganesh
 wrote:
> This patch adds influence of macro TARGET_LOOP_UNROLL_ADJUST during constant 
> iterations (decide_unroll_constant_iterations).
> The macro has been already checked for runtime iterations 
> (decide_unroll_runtime_iterations),  and for unroll stupid 
> (decide_unroll_stupid).
>
> Bootstrapping and test passes.
>
> Would like to know your comments before committing.

Quickly checked the only port using the hook (s390) and the patch
looks ok.

Thus, ok.

Thanks,
Richard.

> Regards
> Ganesh
>
> 2013-11-28  Ganesh Gopalasubramanian  
>
> * loop-unroll.c (decide_unroll_constant_iterations): Check macro
> TARGET_LOOP_UNROLL_ADJUST while deciding unroll factor.
>
>
> diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
> index 9c87167..557915f 100644
> --- a/gcc/loop-unroll.c
> +++ b/gcc/loop-unroll.c
> @@ -664,6 +664,9 @@ decide_unroll_constant_iterations (struct loop *loop, int 
> flags)
>if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
>  nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
>
> +  if (targetm.loop_unroll_adjust)
> +nunroll = targetm.loop_unroll_adjust (nunroll, loop);
> +
>/* Skip big loops.  */
>if (nunroll <= 1)
>  {
>
> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Friday, November 22, 2013 1:46 PM
> To: Gopalasubramanian, Ganesh
> Cc: gcc-patches@gcc.gnu.org; Richard Guenther  
> (richard.guent...@gmail.com); borntrae...@de.ibm.com; H.J. Lu 
> (hjl.to...@gmail.com); Jakub Jelinek (ja...@redhat.com)
> Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
>
> On Wed, Nov 20, 2013 at 7:26 PM, Gopalasubramanian, Ganesh 
>  wrote:
>
>> Steamroller processors contain a loop predictor and a loop buffer, which may 
>> make unrolling small loops less important.
>> When unrolling small loops for steamroller, making the unrolled loop fit in 
>> the loop buffer should be a priority.
>>
>> This patch uses a heuristic approach (number of memory references) to decide 
>> the unrolling factor for small loops.
>> This patch has some noise in SPEC 2006 results.
>>
>> Bootstrapping passes.
>>
>> I would like to know your comments before committing.
>
> Please split the patch to target-dependant and target-independant part, and 
> get target-idependant part reviewed first.
>


Re: [PATCH i386] Enable -freorder-blocks-and-partition

2013-11-28 Thread Jan Hubicka
> Dear Teresa and Jan,
>I tried to test Teresa's patch, but I've encountered two bugs
> during usage of -fprofile-generate/use (one in SPEC CPU 2006 and
> Inkscape).

Thanks, this is non-LTO run. Is there a chance to get -flto version, too?
So we see how things combine with -freorder-function
> 
> This will be probably for Jan:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59266
> 
> second one:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59265
> 
> There are numbers I recorded for GIMP with and without block reordering.
> 
> GIMP (-freorder-blocks-and-partition)
> pages read (no readahead): 597 pages (4K)
> 
> GIMP (-no-freorder-blocks-and-partition)
> pages read (no readahead): 596 pages (4K)

The graphs themselves seems bit odd however, why do we have so many accesses
to cold section with -fno-reorder-blocks-and-partition again?

Honza
> 
> Martin
> 
> On 19 November 2013 23:18, Teresa Johnson  wrote:
> > On Tue, Nov 19, 2013 at 9:40 AM, Jeff Law  wrote:
> >> On 11/19/13 10:24, Teresa Johnson wrote:
> >>>
> >>> On Tue, Nov 19, 2013 at 7:44 AM, Jan Hubicka  wrote:
> 
>  Martin,
>  can you, please, generate the updated systemtap with
>  -freorder-blocks-and-partition enabled?
> 
>  I am in favour of enabling this - it is usefull pass and it is pointless
>  ot
>  have passes that are not enabled by default.
>  Is there reason why this would not work on other ELF target? Is it
>  working
>  with Darwin and Windows?
> >>>
> >>>
> >>> I don't know how to test these (I don't see any machines listed in the
> >>> gcc compile farm of those types). For Windows, I assume you mean
> >>> MinGW, which should be enabled as it is under i386. Should I disable
> >>> it there and for Darwin?
> >>>
> 
> > This patch enables -freorder-blocks-and-partition by default for x86
> > at -O2 and up. It is showing some modest gains in cpu2006 performance
> > with profile feedback and -O2 on an Intel Westmere system. Specifically,
> > I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk
> > (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions.
> 
> 
>  This actually sounds very good ;)
> 
>  Lets see how the systemtap graphs goes.  If we will end up with problem
>  of too many accesses to cold section, I would suggest making cold section
>  subdivided into .unlikely and .unlikely.part (we could have better name)
>  with the second consisting only of unlikely parts of hot&normal
>  functions.
> 
>  This should reduce the problems we are seeing with mistakely identifying
>  code to be cold because of roundoff errors (and it probably makes sense
>  in general, too).
>  We will however need to update gold and ld for that.
> >>>
> >>>
> >>> Note that I don't think this would help much unless the linker is
> >>> changed to move the cold split section close to the hot section. There
> >>> is probably some fine-tuning we could do eventually in the linker
> >>> under -ffunction-sections without putting the split portions in a
> >>> separate section. I.e. clump the split parts together within unlikely.
> >>> But hopefully this can all be done later on as follow-on work to boost
> >>> the performance further.
> >>>
> >
> > Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal
> > bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were
> > configured with --enable-languages=all,obj-c++ and tested for both
> > 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}".
> >
> > It would be good to enable this for additional targets as a follow on,
> > but it needs more testing for both correctness and performance on those
> > other targets (i.e for correctness because I see a number of places
> > in other config/*/*.c files that do some special handling under this
> > option for different targets or simply disable it, so I am not sure
> > how well-tested it is under different architectural constraints).
> >
> > Ok for trunk?
> >
> > Thanks,
> > Teresa
> >
> > 2013-11-19  Teresa Johnson  
> >
> >  * common/config/i386/i386-common.c: Enable
> >  -freorder-blocks-and-partition at -O2 and up for x86.
> >  * opts.c (finish_options): Only warn if -freorder-blocks-and-
> >  partition was set on command line.
> 
> 
>  You probably mis doc/invoke.texi update.
>  Thank you for working on this!
> >>>
> >>>
> >>> Yes, thanks. Here is the patch with the invoke.texi update.
> >>>
> >>> Teresa
> >>>
> >>>
> >>> 2013-11-19  Teresa Johnson  
> >>>
> >>>  * common/config/i386/i386-common.c: Enable
> >>>  -freorder-blocks-and-partition at -O2 and up for x86.
> >>>  * doc/invoke.texi: Update -freorder-blocks-and-partition default.
> >>>  * opts.c (finish_options): Only warn if -freorder-blocks-and-
> >>>

Re: Store the SSA name range type in the tree structure

2013-11-28 Thread Richard Biener
On Thu, Nov 28, 2013 at 12:26 PM, Richard Sandiford
 wrote:
> At the moment, an anti range ~[A,B] is stored as [B+1,A-1].  This makes
> it harder to store the range in the natural precision of A and B, since
> B+1 and A-1 might not be representable in that precision.
>
> This patch instead stores the original minimum and maximum values and
> uses a spare tree bit to represent the range type.  The version below
> is for trunk; I've also tested a wide-int version.
>
> Tested on x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * tree-core.h (tree_base): Document use of static_flag for SSA_NAME.
> * tree.h (SSA_NAME_ANTI_RANGE_P, SSA_NAME_RANGE_TYPE): New macros.
> * tree-ssanames.h (set_range_info): Add range_type argument.
> (duplicate_ssa_name_range_info): Likewise.
> * tree-ssanames.c (set_range_info): Take the range type as argument
> and store it in SSA_NAME_ANTI_RANGE_P.
> (duplicate_ssa_name_range_info): Likewise.
> (get_range_info): Use SSA_NAME_ANTI_RANGE_P.
> (set_nonzero_bits): Update call to set_range_info.
> (duplicate_ssa_name_fn): Update call to duplicate_ssa_name_range_info.
> * tree-ssa-copy.c (fini_copy_prop): Likewise.
> * tree-vrp.c (remove_range_assertions): Update call to set_range_info.
> (vrp_finalize): Likewise, passing anti-ranges directly.
>
> Index: gcc/tree-core.h
> ===
> --- gcc/tree-core.h 2013-11-15 18:23:21.113488640 +
> +++ gcc/tree-core.h 2013-11-28 11:12:32.956977322 +
> @@ -822,6 +822,9 @@ struct GTY(()) tree_base {
> TRANSACTION_EXPR_OUTER in
>TRANSACTION_EXPR
>
> +   SSA_NAME_ANTI_RANGE_P in
> +  SSA_NAME
> +
> public_flag:
>
> TREE_OVERFLOW in
> Index: gcc/tree.h
> ===
> --- gcc/tree.h  2013-11-20 10:58:57.275831561 +
> +++ gcc/tree.h  2013-11-28 11:12:32.969977280 +
> @@ -1434,6 +1434,14 @@ #define SSA_NAME_IS_DEFAULT_DEF(NODE) \
>  #define SSA_NAME_PTR_INFO(N) \
> SSA_NAME_CHECK (N)->ssa_name.info.ptr_info
>
> +/* True if SSA_NAME_RANGE_INFO describes an anti-range.  */
> +#define SSA_NAME_ANTI_RANGE_P(N) \
> +SSA_NAME_CHECK (N)->base.static_flag
> +
> +/* The type of range described by SSA_NAME_RANGE_INFO.  */
> +#define SSA_NAME_RANGE_TYPE(N) \
> +(SSA_NAME_ANTI_RANGE_P (N) ? VR_ANTI_RANGE : VR_RANGE)
> +
>  /* Value range info attributes for SSA_NAMEs of non pointer-type variables.  
> */
>  #define SSA_NAME_RANGE_INFO(N) \
>  SSA_NAME_CHECK (N)->ssa_name.info.range_info
> Index: gcc/tree-ssanames.h
> ===
> --- gcc/tree-ssanames.h 2013-11-15 18:23:22.050485010 +
> +++ gcc/tree-ssanames.h 2013-11-28 11:12:32.964977296 +
> @@ -70,7 +70,8 @@ #define ssa_name(i) ((*cfun->gimple_df->
>  enum value_range_type { VR_UNDEFINED, VR_RANGE, VR_ANTI_RANGE, VR_VARYING };
>
>  /* Sets the value range to SSA.  */
> -extern void set_range_info (tree, double_int, double_int);
> +extern void set_range_info (tree, enum value_range_type, double_int,
> +   double_int);
>  /* Gets the value range from SSA.  */
>  extern enum value_range_type get_range_info (const_tree, double_int *,
>  double_int *);
> @@ -93,7 +94,8 @@ extern struct ptr_info_def *get_ptr_info
>  extern tree copy_ssa_name_fn (struct function *, tree, gimple);
>  extern void duplicate_ssa_name_ptr_info (tree, struct ptr_info_def *);
>  extern tree duplicate_ssa_name_fn (struct function *, tree, gimple);
> -extern void duplicate_ssa_name_range_info (tree, struct range_info_def *);
> +extern void duplicate_ssa_name_range_info (tree, enum value_range_type,
> +  struct range_info_def *);
>  extern void release_defs (gimple);
>  extern void replace_ssa_name_symbol (tree, tree);
>
> Index: gcc/tree-ssanames.c
> ===
> --- gcc/tree-ssanames.c 2013-11-20 10:59:18.330782865 +
> +++ gcc/tree-ssanames.c 2013-11-28 11:12:32.963977300 +
> @@ -178,12 +178,14 @@ make_ssa_name_fn (struct function *fn, t
>return t;
>  }
>
> -/* Store range information MIN, and MAX to tree ssa_name NAME.  */
> +/* Store range information RANGE_TYPE, MIN, and MAX to tree ssa_name NAME.  
> */
>
>  void
> -set_range_info (tree name, double_int min, double_int max)
> +set_range_info (tree name, enum value_range_type range_type, double_int min,
> +   double_int max)
>  {
>gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
> +  gcc_assert (range_type == VR_RANGE || range_type == VR_ANTI_RANGE);
>range_info_def *ri = SSA_NAME_RANGE_INFO (name);
>
>/* Allocate if not available.  */
> @@ -194,12 +196,16 @@ set_range_info (tree name, double_i

Re: _Cilk_spawn and _Cilk_sync for C++

2013-11-28 Thread Jason Merrill

On 11/27/2013 11:05 PM, Iyer, Balaji V wrote:

Found the bug. I was not utilizing the stabilize_expr's output correctly.


Unfortunately, I think I was misleading you with talk of stabilize; like 
you said, you want to evaluate the whole expression in the spawned 
function rather than in the caller, so that any temporaries (including 
the lambda closure) live until the _Cilk_sync.  Using stabilize_expr 
this way (the way I was suggesting) forces the lambda closure to be 
evaluated in the caller, and then destroyed at the end of the enclosing 
statement, which is likely to erase any data that the spawned function 
needs to do its work, if anything captured by copy has a destructor.


As I said in my last mail, I think the right fix is to make sure that A 
gets remapped properly during copy_body so that its use in the 
initializer for the closure doesn't confuse later passes.


Jason



[patch] comitted : Re: [resend] - Probable buglet in ipa-prop.c

2013-11-28 Thread Andrew MacLeod

On 11/27/2013 05:51 PM, Jeff Law wrote:

On 11/27/13 15:30, Andrew MacLeod wrote:

On 11/27/2013 05:16 PM, Jeff Law wrote:

On 11/27/13 14:30, Andrew MacLeod wrote:

mailer added html again...


When trying some of my updated prototype changes on trunk, the code
tripped over this segment in ipa-prop.c :

lhs = gimple_assign_lhs (stmt);
rhs = gimple_assign_rhs1 (stmt);
if (!is_gimple_reg_type (rhs)
|| TREE_CODE (lhs) == BIT_FIELD_REF
|| contains_bitfld_component_ref_p (lhs))
  break;

I had converted "gimple_reg_type(tree)" to instead be "gimple_reg_type
(gimple_type)",  and during bootstrap it conked out because it 
received

an SSA_NAME instead of a type.
Which probably caused everything after that conditional to be dead 
code.



I think it should probably be passing TREE_TYPE (rhs) liek so  ?

Yup.  Agreed.  Feel free to submit the fix.  It'll be interested to
see how many of these we find as this work progresses.

It'll also be interesting to see if there's any fallout from the
previously dead code now getting a chance to do something useful.


Just tripped over another one in tree-ssa-propagate.c:

I'll bootstrap the 2 of them together and run regressions overnight, and
then check them in tomorrow, assuming thats OK.

Works for me.

jeff

Bootstrapped on x86_64-unknown-linux-gnu , no regressions.  Attached 
patch checked in as revision 205485


Andrew

	* tree-ssa-propagate.c (valid_gimple_call_p): Pass TREE_TYPE to
	is_gimple_reg_type.
	* ipa-prop.c (determine_known_aggregate_parts): Likewise.

Index: tree-ssa-propagate.c
===
*** tree-ssa-propagate.c	(revision 205457)
--- tree-ssa-propagate.c	(working copy)
*** valid_gimple_call_p (tree expr)
*** 667,673 
for (i = 0; i < nargs; i++)
  {
tree arg = CALL_EXPR_ARG (expr, i);
!   if (is_gimple_reg_type (arg))
  	{
  	  if (!is_gimple_val (arg))
  	return false;
--- 667,673 
for (i = 0; i < nargs; i++)
  {
tree arg = CALL_EXPR_ARG (expr, i);
!   if (is_gimple_reg_type (TREE_TYPE (arg)))
  	{
  	  if (!is_gimple_val (arg))
  	return false;
Index: ipa-prop.c
===
*** ipa-prop.c	(revision 205457)
--- ipa-prop.c	(working copy)
*** determine_known_aggregate_parts (gimple 
*** 1424,1430 
  
lhs = gimple_assign_lhs (stmt);
rhs = gimple_assign_rhs1 (stmt);
!   if (!is_gimple_reg_type (rhs)
  	  || TREE_CODE (lhs) == BIT_FIELD_REF
  	  || contains_bitfld_component_ref_p (lhs))
  	break;
--- 1424,1430 
  
lhs = gimple_assign_lhs (stmt);
rhs = gimple_assign_rhs1 (stmt);
!   if (!is_gimple_reg_type (TREE_TYPE (rhs))
  	  || TREE_CODE (lhs) == BIT_FIELD_REF
  	  || contains_bitfld_component_ref_p (lhs))
  	break;


Re: [wwwdocs] Update obvious fix commit policy

2013-11-28 Thread Diego Novillo
Fixed quotation as per IRC feedback.

Index: htdocs/svnwrite.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/svnwrite.html,v
retrieving revision 1.29
diff -u -d -u -p -r1.29 svnwrite.html
--- htdocs/svnwrite.html24 Sep 2013 18:26:29 -  1.29
+++ htdocs/svnwrite.html28 Nov 2013 14:12:18 -
@@ -147,10 +147,13 @@ list.

 The following changes can be made by everyone with SVN write access:

-Fixes for obvious typos in ChangeLog files, docs, web pages, comments
-and similar stuff.  Just check in the fix and copy it to
-gcc-patches.  We don't want to get overly anal-retentive
-about checkin policies.
+Obvious fixes to documentation, code and test cases can be
+committed without prior approval.  Just check in the fix and copy it
+to gcc-patches.  A good test to determine whether a fix
+is obvious: will the person who objects to my work the most be able
+to find a fault with my fix?  If the fix is later found to be
+faulty, it can always be rolled back.  We don't want to get overly
+restrictive about checkin policies.

 Similarly, no outside approval is needed to revert a patch that you
 checked in.


Re: wide-int, gimple

2013-11-28 Thread Richard Biener
On Thu, Nov 28, 2013 at 12:58 PM, Richard Sandiford
 wrote:
> Jakub Jelinek  writes:
>> On Mon, Nov 25, 2013 at 12:24:30PM +0100, Richard Biener wrote:
>>> On Sat, Nov 23, 2013 at 8:21 PM, Mike Stump  wrote:
>>> > Richi has asked the we break the wide-int patch so that the
>>> > individual port and front end maintainers can review their parts
>>> > without have to go through the entire patch.  This patch covers the
>>> > gimple code.
>>>
>>> @@ -1754,7 +1754,7 @@ dump_ssaname_info (pretty_printer *buffer, tree
>>> node, int spc)
>>>if (!POINTER_TYPE_P (TREE_TYPE (node))
>>>&& SSA_NAME_RANGE_INFO (node))
>>>  {
>>> -  double_int min, max, nonzero_bits;
>>> +  widest_int min, max, nonzero_bits;
>>>value_range_type range_type = get_range_info (node, &min, &max);
>>>
>>>if (range_type == VR_VARYING)
>>>
>>> this makes me suspect you are changing SSA_NAME_RANGE_INFO
>>> to embed two max wide_ints.  That's a no-no.
>>
>> Well, the range_info_def struct right now contains 3 double_ints, which is
>> unnecessary overhead for the most of the cases where the SSA_NAME's type
>> has just at most HOST_BITS_PER_WIDE_INT bits and thus we could fit all 3 of
>> them into 3 HOST_WIDE_INTs rather than 3 double_ints.  So supposedly struct
>> range_info_def could be a template on the type's precision rounded up to HWI
>> bits, or say have 3 alternatives there, use
>> FIXED_WIDE_INT (HOST_BITS_PER_WIDE_INT) for the smallest types,
>> FIXED_WIDE_INT (2 * HOST_BITS_PER_WIDE_INT) aka double_int for the larger
>> but still common ones, and widest_int for the rest, then the API to set/get
>> it could use widest_int everywhere, and just what storage we'd use would
>> depend on the precision of the type.
>
> This patch adds a trailing_wide_ints  that can be used at the end of
> a variable-length structure to store N wide_ints.  There's also a macro
> to declare get/set methods for each of the N elements.
>
> At the moment I've only defined non-const operator[].  It'd be possible
> to add a const version later if necessary.
>
> The size of range_info_def for precisions that fit in M HWIs is then
> 1 + 3 * M, so 4 for the common case (down from 6 on trunk).  The maximum
> is 7 for current x86_64 types (up from 6 on trunk).
>
> I wondered whether to keep the interface using widest_int, but I think
> wide_int works out more naturally.  The only caller that wants to extend
> beyond the precision is CCP, but that's already special because the upper
> bits are supposed to be set (i.e. it's not a normal sign or zero extension).
>
> This relies on the SSA_NAME_ANTI_RANGE_P patch I just posted.
>
> If this is OK I'll look at using the same structure elsewhere.

Looks good to me.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> Index: gcc/ChangeLog.wide-int
> ===
> --- gcc/ChangeLog.wide-int  2013-11-27 18:45:17.448816304 +
> +++ gcc/ChangeLog.wide-int  2013-11-28 11:37:15.320020047 +
> @@ -677,6 +677,7 @@
> * tree-ssa-ccp.c: Update comment at top of file.  Include
> wide-int-print.h.
> (struct prop_value_d): Change type of mask to widest_int.
> +   (extend_mask): New function.
> (dump_lattice_value): Use wide-int interfaces.
> (get_default_value): Likewise.
> (set_constant_value): Likewise.
> @@ -768,16 +769,20 @@
> * tree-ssa-math-opts.c
> (gimple_expand_builtin_pow): Update calls to real_to_integer.
> * tree-ssanames.c
> -   (set_range_info): Use widest_ints rather than double_ints.
> -   (get_range_info): Likewise.
> +   (set_range_info): Use wide_int_refs rather than double_ints.
> +   Adjust for trailing_wide_ints <3> representation.
> (set_nonzero_bits): Likewise.
> +   (get_range_info): Return wide_ints rather than double_ints.
> +   Adjust for trailing_wide_ints <3> representation.
> (get_nonzero_bits): Likewise.
> +   (duplicate_ssa_name_range_info): Adjust for trailing_wide_ints <3>
> +   representation.
> * tree-ssanames.h
> -   (struct range_info_def): Change type of min, max and nonzero_bits
> -   to widest_int.
> -   (set_range_info): Use widest_ints rather than double_ints.
> -   (get_range_info): Likewise.
> +   (struct range_info_def): Replace min, max and nonzero_bits with
> +   a trailing_wide_ints <3>.
> +   (set_range_info): Use wide_int_refs rather than double_ints.
> (set_nonzero_bits): Likewise.
> +   (get_range_info): Return wide_ints rather than double_ints.
> (get_nonzero_bits): Likewise.
> * tree-ssa-phiopt.c
> (jump_function_from_stmt): Use wide-int interfaces.
> Index: gcc/builtins.c
> ===
> --- gcc/builtins.c  2013-11-27 18:45:17.448816304 +
> +++ gcc/builtins.c  2013-11-27 18:45:46.710684576 +
> @@ -3125,7 +3125,7 @@ determine_bl

Re: [wide-int] Small wide_int_to_tree optimisation

2013-11-28 Thread Richard Biener
On Thu, Nov 28, 2013 at 1:05 PM, Richard Sandiford
 wrote:
> This patch convers some gcc_asserts to gcc_checking_asserts.
> I think the first two in particular should be checking-only,
> since we ignore the bits above the target precision anyway.
>
> Also we have:
>
>   /* This is a little hokie, but if the prec is smaller than
>  what is necessary to hold INTEGER_SHARE_LIMIT, then the
>  obvious test will not get the correct answer.  */
>   if (prec < HOST_BITS_PER_WIDE_INT)
> {
>   if (cst.to_uhwi () < (unsigned HOST_WIDE_INT) 
> INTEGER_SHARE_LIMIT)
> ix = cst.to_uhwi ();
> }
>   else if (wi::ltu_p (cst, INTEGER_SHARE_LIMIT))
> ix = cst.to_uhwi ();
>
> But this case only occurs for single-HWI integers.  We later check
> for that and extract the HWI value, so it seems simpler to postpone
> the index check until then.
>
> Tested on x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> Index: gcc/tree.c
> ===
> *** gcc/tree.c  2013-11-28 11:27:32.043124135 +
> --- gcc/tree.c  2013-11-28 11:45:39.957427563 +
> *** wide_int_to_tree (tree type, const wide_
> *** 1205,1295 
> if (l > 1)
>   {
> if (pcst.elt (l - 1) == 0)
> !   gcc_assert (pcst.elt (l - 2) < 0);
> if (pcst.elt (l - 1) == (HOST_WIDE_INT) -1)
> !   gcc_assert (pcst.elt (l - 2) >= 0);
>   }
>
> wide_int cst = wide_int::from (pcst, prec, sgn);
> unsigned int ext_len = get_int_cst_ext_nunits (type, cst);
>
> !   switch (TREE_CODE (type))
>   {
> ! case NULLPTR_TYPE:
> !   gcc_assert (cst == 0);
> !   /* Fallthru.  */
> !
> ! case POINTER_TYPE:
> ! case REFERENCE_TYPE:
> ! case POINTER_BOUNDS_TYPE:
> !   /* Cache NULL pointer and zero bounds.  */
> !   if (cst == 0)
> !   {
> ! limit = 1;
> ! ix = 0;
> !   }
> !   break;
>
> ! case BOOLEAN_TYPE:
> !   /* Cache false or true.  */
> !   limit = 2;
> !   if (wi::leu_p (cst, 1))
> !   ix = cst.to_uhwi ();
> !   break;
> !
> ! case INTEGER_TYPE:
> ! case OFFSET_TYPE:
> !   if (TYPE_SIGN (type) == UNSIGNED)
> {
> ! /* Cache 0..N */
> ! limit = INTEGER_SHARE_LIMIT;
> !
> ! /* This is a little hokie, but if the prec is smaller than
> !what is necessary to hold INTEGER_SHARE_LIMIT, then the
> !obvious test will not get the correct answer.  */
> ! if (prec < HOST_BITS_PER_WIDE_INT)
> {
> ! if (cst.to_uhwi () < (unsigned HOST_WIDE_INT) 
> INTEGER_SHARE_LIMIT)
> !   ix = cst.to_uhwi ();
> }
> ! else if (wi::ltu_p (cst, INTEGER_SHARE_LIMIT))
> !   ix = cst.to_uhwi ();
> !   }
> !   else
> !   {
> ! /* Cache -1..N */
> ! limit = INTEGER_SHARE_LIMIT + 1;
>
> ! if (cst == -1)
> !   ix = 0;
> ! else if (!wi::neg_p (cst))
> {
> ! if (prec < HOST_BITS_PER_WIDE_INT)
> !   {
> ! if (cst.to_shwi () < INTEGER_SHARE_LIMIT)
> !   ix = cst.to_shwi () + 1;
> !   }
> ! else if (wi::lts_p (cst, INTEGER_SHARE_LIMIT))
> !   ix = cst.to_shwi () + 1;
> }
> !   }
> !   break;
>
> ! case ENUMERAL_TYPE:
> !   break;
>
> ! default:
> !   gcc_unreachable ();
> ! }
>
> -   if (ext_len == 1)
> - {
> -   /* We just need to store a single HOST_WIDE_INT.  */
> -   HOST_WIDE_INT hwi;
> -   if (TYPE_UNSIGNED (type))
> -   hwi = cst.to_uhwi ();
> -   else
> -   hwi = cst.to_shwi ();
> if (ix >= 0)
> {
>   /* Look for it in the type's vector of small shared ints.  */
> --- 1205,1276 
> if (l > 1)
>   {
> if (pcst.elt (l - 1) == 0)
> !   gcc_checking_assert (pcst.elt (l - 2) < 0);
> if (pcst.elt (l - 1) == (HOST_WIDE_INT) -1)
> !   gcc_checking_assert (pcst.elt (l - 2) >= 0);
>   }
>
> wide_int cst = wide_int::from (pcst, prec, sgn);
> unsigned int ext_len = get_int_cst_ext_nunits (type, cst);
>
> !   if (ext_len == 1)
>   {
> !   /* We just need to store a single HOST_WIDE_INT.  */
> !   HOST_WIDE_INT hwi;
> !   if (TYPE_UNSIGNED (type))
> !   hwi = cst.to_uhwi ();
> !   else
> !   hwi = cst.to_shwi ();
>
> !   switch (TREE_CODE (type))
> {
> !   case NULLPTR_TYPE:
> ! gcc_assert (hwi == 0);
> ! /* Fallthru.  */
> !
> !   case POINTER_TYPE:
> !   case REFERENCE_TYPE:
> !   case POINTER_BOUNDS_TYPE:
> ! /* Cache NULL pointer and zero bounds.  */
> ! if (hwi == 0)
> {
> ! limit = 1;
> ! ix = 0;
> }
> ! break;
>
> !   cas

Re: [wwwdocs] Update obvious fix commit policy

2013-11-28 Thread Richard Biener
On Thu, Nov 28, 2013 at 3:14 PM, Diego Novillo  wrote:
> Fixed quotation as per IRC feedback.
>
> Index: htdocs/svnwrite.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/svnwrite.html,v
> retrieving revision 1.29
> diff -u -d -u -p -r1.29 svnwrite.html
> --- htdocs/svnwrite.html24 Sep 2013 18:26:29 -  1.29
> +++ htdocs/svnwrite.html28 Nov 2013 14:12:18 -
> @@ -147,10 +147,13 @@ list.
>
>  The following changes can be made by everyone with SVN write access:
>
> -Fixes for obvious typos in ChangeLog files, docs, web pages, comments
> -and similar stuff.  Just check in the fix and copy it to
> -gcc-patches.  We don't want to get overly anal-retentive
> -about checkin policies.
> +Obvious fixes to documentation, code and test cases can be
> +committed without prior approval.

Why remove ChangeLog files, web pages and comments?  Either
enumerate everything or just enumerate nothing and simply say
"Obvious fixes can be committed without prior approval."

Richard.

  Just check in the fix and copy it
> +to gcc-patches.  A good test to determine whether a fix
> +is obvious: will the person who objects to my work the most be able
> +to find a fault with my fix?  If the fix is later found to be
> +faulty, it can always be rolled back.  We don't want to get overly
> +restrictive about checkin policies.
>
>  Similarly, no outside approval is needed to revert a patch that you
>  checked in.


Re: [wwwdocs] Update obvious fix commit policy

2013-11-28 Thread Diego Novillo
On Thu, Nov 28, 2013 at 9:25 AM, Richard Biener
 wrote:

> Why remove ChangeLog files, web pages and comments?  Either
> enumerate everything or just enumerate nothing and simply say
> "Obvious fixes can be committed without prior approval."

Thanks, that's much better.  I was trying to be more inclusive.


Index: htdocs/svnwrite.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/svnwrite.html,v
retrieving revision 1.29
diff -u -d -u -p -r1.29 svnwrite.html
--- htdocs/svnwrite.html24 Sep 2013 18:26:29 -  1.29
+++ htdocs/svnwrite.html28 Nov 2013 14:26:54 -
@@ -147,10 +147,12 @@ list.

 The following changes can be made by everyone with SVN write access:

-Fixes for obvious typos in ChangeLog files, docs, web pages, comments
-and similar stuff.  Just check in the fix and copy it to
-gcc-patches.  We don't want to get overly anal-retentive
-about checkin policies.
+Obvious fixes can be committed without prior approval.  Just check
+in the fix and copy it to gcc-patches.  A good test to
+determine whether a fix is obvious: will the person who objects to
+my work the most be able to find a fault with my fix?  If the fix
+is later found to be faulty, it can always be rolled back.  We don't
+want to get overly restrictive about checkin policies.

 Similarly, no outside approval is needed to revert a patch that you
 checked in.


[PATCH] Fix PR59330

2013-11-28 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2013-11-28  Richard Biener  

PR tree-optimization/59330
* tree-ssa-dce.c (eliminate_unnecessary_stmts): Simplify
and fix delayed marking of free calls not necessary.

* gcc.dg/torture/pr59330.c: New testcase.

Index: gcc/tree-ssa-dce.c
===
*** gcc/tree-ssa-dce.c  (revision 205484)
--- gcc/tree-ssa-dce.c  (working copy)
*** eliminate_unnecessary_stmts (void)
*** 1191,1216 
  stats.total++;
  
  /* We can mark a call to free as not necessary if the
!defining statement of its argument is an allocation
!function and that is not necessary itself.  */
! if (gimple_call_builtin_p (stmt, BUILT_IN_FREE))
{
  tree ptr = gimple_call_arg (stmt, 0);
! tree callee2;
! gimple def_stmt;
! if (TREE_CODE (ptr) != SSA_NAME)
!   continue;
! def_stmt = SSA_NAME_DEF_STMT (ptr);
! if (!is_gimple_call (def_stmt)
! || gimple_plf (def_stmt, STMT_NECESSARY))
!   continue;
! callee2 = gimple_call_fndecl (def_stmt);
! if (callee2 == NULL_TREE
! || DECL_BUILT_IN_CLASS (callee2) != BUILT_IN_NORMAL
! || (DECL_FUNCTION_CODE (callee2) != BUILT_IN_MALLOC
! && DECL_FUNCTION_CODE (callee2) != BUILT_IN_CALLOC))
!   continue;
! gimple_set_plf (stmt, STMT_NECESSARY, false);
}
  
  /* If GSI is not necessary then remove it.  */
--- 1191,1208 
  stats.total++;
  
  /* We can mark a call to free as not necessary if the
!defining statement of its argument is not necessary
!(and thus is getting removed).  */
! if (gimple_plf (stmt, STMT_NECESSARY)
! && gimple_call_builtin_p (stmt, BUILT_IN_FREE))
{
  tree ptr = gimple_call_arg (stmt, 0);
! if (TREE_CODE (ptr) == SSA_NAME)
!   {
! gimple def_stmt = SSA_NAME_DEF_STMT (ptr);
! if (!gimple_plf (def_stmt, STMT_NECESSARY))
!   gimple_set_plf (stmt, STMT_NECESSARY, false);
!   }
}
  
  /* If GSI is not necessary then remove it.  */
Index: gcc/testsuite/gcc.dg/torture/pr59330.c
===
*** gcc/testsuite/gcc.dg/torture/pr59330.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr59330.c  (working copy)
***
*** 0 
--- 1,17 
+ /* { dg-do run } */
+ 
+ void free(void *ptr)
+ {
+ }
+ 
+ void *foo(void)
+ {
+   return 0;
+ }
+ 
+ int main(void)
+ {
+   void *p = foo();
+   free(p);
+   return 0;
+ }


Re: [ARM] Fix register r3 wrongly used to save ip in nested APCS frame

2013-11-28 Thread Richard Earnshaw
Eric,

My apologies for taking so long to look at this.

> 2013-09-05  Eric Botcazou  
>
>   * config/arm/arm.c (arm_expand_prologue): In a nested APCS frame with
>   arguments to push onto the stack and no varargs, save ip into a stack
>   slot if r3 isn't available on entry.

Sorry, but this is not quite right either, as shown by the attached
testcase (in C this time, so we can commit it to gcc.target/arm :-)

The problem is that if we have some alignment padding we end up storing
ip in one location but restoring it from another.

str ip, [sp, #4]
add ip, sp, #8
stmfd   sp!, {fp, ip, lr, pc}
sub fp, ip, #12
ldr ip, [fp, #4]// < Should be fp + 8
@ ip needed
str r3, [fp, #8]
ldr ip, [ip]

R.

On 14/10/13 09:46, Eric Botcazou wrote:
> p__f$1593:
>   @ Nested: function declared inside another function.
>   @ args = 16, pretend = 4, frame = 140
>   @ frame_needed = 1, uses_anonymous_args = 0
>   sub sp, sp, #4
>   str ip, [sp]
>   add ip, sp, #4
>   stmfd   sp!, {r4, r5, r6, r7, r8, r9, r10, fp, ip, lr, pc}


>   sub fp, ip, #8
>   ldr ip, [fp, #4]
>   @ ip needed
>   sub sp, sp, #140
>   str r0, [fp, #-64]
>   str r1, [fp, #-72]
>   str r2, [fp, #-68]
>   str r3, [fp, #4]
> 
> which looks correct.  FWIW we have had the patch in our tree for 4 months now.
> 
> 
> p.adb
> 
> 
> procedure P (I : Integer) is
> 
>   SUBTYPE S IS INTEGER RANGE 1..100;
>   TYPE ARR IS ARRAY (S RANGE <>) OF INTEGER;
> 
>   A : ARR (2..9);
> 
>   FUNCTION F (AR_VAR1, AR_VAR2, AR_VAR3 : ARR) RETURN ARR IS
>   BEGIN
> if I = 0 then
>   RETURN AR_VAR1 & AR_VAR2 & AR_VAR3;
> else
>   RETURN AR_VAR1;
> end if;
>   END;
> 
> begin
>   A := (8,7,6,5,4,3,2,1);
>   if F(A(2..3), A(2..4), A(2..4)) /= (8,7,8,7,6,8,7,6) then
> raise Program_Error;
>   end if;
> end;
> 
/* { dg-do run } */
/* { dg-options "-fno-omit-frame-pointer -mapcs-frame -O" } */
struct x
{
  int y;
  int z;
};

int __attribute__((noinline)) f (int c, int d, int e, int h, int i)
{
  int a;
  struct x b;

  int __attribute__((noinline)) g (int p, int q, int r, struct x s)
  {
return a + p + q + r + s.y + s.z;
  }

  a = 5;
  b.y = h;
  b.z = i;

  return g(c, d, e, b);
}

int main()
{
  if (f (1, 2, 3, 4, 5) != 20)
abort();
  exit (0);
}

[PATCH] One more testcase for PR59323

2013-11-28 Thread Richard Biener

Committed.

Richard.

2013-11-28  Richard Biener  

PR lto/59323
* gcc.dg/lto/pr59323-2_0.c: New testcase.

Index: gcc/testsuite/gcc.dg/lto/pr59323-2_0.c
===
--- gcc/testsuite/gcc.dg/lto/pr59323-2_0.c  (revision 0)
+++ gcc/testsuite/gcc.dg/lto/pr59323-2_0.c  (working copy)
@@ -0,0 +1,37 @@
+/* { dg-lto-do link } */
+/* { dg-lto-options { { -O2 -g -flto } } } */
+/* { dg-extra-ld-options { -r -nostdlib } } */
+
+extern void bar(void);
+
+int main(int argc, char **argv)
+{
+  int i;
+
+  if (argc == 1) {
+extern void bar ();
+
+bar();
+
+{
+  extern void bar ();
+
+  asm goto ("" : : : : lab);
+lab:
+  ;
+}
+  }
+
+  {
+extern void bar ();
+
+int foo(void)
+{
+  return argv[0][0];
+}
+
+i = foo();
+  }
+
+  return i;
+}


[PATCH, i386 libgcc]: Define __FP_FRAC_ADDI_4 in 32bit sfp-machine.h

2013-11-28 Thread Uros Bizjak
Hello!

Attached patch introduces __FP_FRAC_ADDI_4 to improve 32bit soft-fp
code a bit. The effect can be seen in a couple of places, for example
in divtf3 the part of code changes from:

 70f:83 c0 04 add$0x4,%eax
 712:89 84 24 90 00 00 00 mov%eax,0x90(%esp)
 719:83 f8 03 cmp$0x3,%eax
 71c:8b 84 24 94 00 00 00 mov0x94(%esp),%eax
 723:0f 96 c2 setbe  %dl
 726:0f b6 d2 movzbl %dl,%edx
 729:01 d0add%edx,%eax
 72b:39 c2cmp%eax,%edx
 72d:89 84 24 94 00 00 00 mov%eax,0x94(%esp)
 734:8b 84 24 98 00 00 00 mov0x98(%esp),%eax
 73b:0f 97 c2 seta   %dl
 73e:0f b6 d2 movzbl %dl,%edx
 741:01 d0add%edx,%eax
 743:39 c2cmp%eax,%edx
 745:89 84 24 98 00 00 00 mov%eax,0x98(%esp)
 74c:0f 97 c0 seta   %al
 74f:0f b6 c0 movzbl %al,%eax
 752:03 84 24 9c 00 00 00 add0x9c(%esp),%eax
 759:89 84 24 9c 00 00 00 mov%eax,0x9c(%esp)

to:

 71b:8b b4 24 9c 00 00 00 mov0x9c(%esp),%esi
 722:8b 8c 24 98 00 00 00 mov0x98(%esp),%ecx
 729:8b 94 24 94 00 00 00 mov0x94(%esp),%edx
 730:83 c0 04 add$0x4,%eax
 733:83 d2 00 adc$0x0,%edx
 736:83 d1 00 adc$0x0,%ecx
 739:83 d6 00 adc$0x0,%esi
 73c:89 b4 24 9c 00 00 00 mov%esi,0x9c(%esp)
 743:89 8c 24 98 00 00 00 mov%ecx,0x98(%esp)
 74a:89 94 24 94 00 00 00 mov%edx,0x94(%esp)
 751:89 84 24 90 00 00 00 mov%eax,0x90(%esp)

2013-11-28  Uros Bizjak  

* config/i386/32/sfp-machine.h (__FP_FRAC_ADDI_4): New macro.

Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN.

Uros.
Index: config/i386/32/sfp-machine.h
===
--- config/i386/32/sfp-machine.h(revision 205474)
+++ config/i386/32/sfp-machine.h(working copy)
@@ -63,6 +63,16 @@
 "g" ((USItype) (y1)),  \
 "2" ((USItype) (x0)),  \
 "g" ((USItype) (y0)))
+#define __FP_FRAC_ADDI_4(x3,x2,x1,x0,i)\
+  __asm__ ("add{l} {%4,%3|%3,%4}\n\t"  \
+  "adc{l} {$0,%2|%2,0}\n\t"\
+  "adc{l} {$0,%1|%1,0}\n\t"\
+  "adc{l} {$0,%0|%0,0}"\
+  : "+r" ((USItype) (x3)), \
+"+&r" ((USItype) (x2)),\
+"+&r" ((USItype) (x1)),\
+"+&r" ((USItype) (x0)) \
+  : "g" ((USItype) (i)))
 
 
 #define _FP_MUL_MEAT_S(R,X,Y)  \


Re: [wwwdocs] Update obvious fix commit policy

2013-11-28 Thread Richard Earnshaw
On 28/11/13 14:28, Diego Novillo wrote:
> On Thu, Nov 28, 2013 at 9:25 AM, Richard Biener
>  wrote:
> 
>> Why remove ChangeLog files, web pages and comments?  Either
>> enumerate everything or just enumerate nothing and simply say
>> "Obvious fixes can be committed without prior approval."
> 
> Thanks, that's much better.  I was trying to be more inclusive.
> 
> 
> Index: htdocs/svnwrite.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/svnwrite.html,v
> retrieving revision 1.29
> diff -u -d -u -p -r1.29 svnwrite.html
> --- htdocs/svnwrite.html24 Sep 2013 18:26:29 -  1.29
> +++ htdocs/svnwrite.html28 Nov 2013 14:26:54 -
> @@ -147,10 +147,12 @@ list.
> 
>  The following changes can be made by everyone with SVN write access:
> 
> -Fixes for obvious typos in ChangeLog files, docs, web pages, comments
> -and similar stuff.  Just check in the fix and copy it to
> -gcc-patches.  We don't want to get overly anal-retentive
> -about checkin policies.
> +Obvious fixes can be committed without prior approval.  Just check
> +in the fix and copy it to gcc-patches.  A good test to
> +determine whether a fix is obvious: will the person who objects to
> +my work the most be able to find a fault with my fix?  If the fix
> +is later found to be faulty, it can always be rolled back.  We don't
> +want to get overly restrictive about checkin policies.
> 
>  Similarly, no outside approval is needed to revert a patch that you
>  checked in.
> 

I think it might be worth saying that one class of 'obvious' fix that we
don't want to go in without prior clearance are bulk white space
clean-ups.  These can be a right-royal pain to deal with if you're in
the middle of a big re-write of a hunk of code.

R.



Re: [wwwdocs] Update obvious fix commit policy

2013-11-28 Thread Diego Novillo
On Thu, Nov 28, 2013 at 10:32 AM, Richard Earnshaw  wrote:

> I think it might be worth saying that one class of 'obvious' fix that we
> don't want to go in without prior clearance are bulk white space
> clean-ups.  These can be a right-royal pain to deal with if you're in
> the middle of a big re-write of a hunk of code.

Hm, not sure I agree.  Those are the most obvious to me.  Particularly
after I get my clang format pony.  I've asked for GNU style support.
It will be a lot easier to keep files properly formatted to the GNU
guidelines.

Making exceptions to the obvious rule seems illogical to me.


Diego.


Re: [PATCH] Don't create out-of-bounds BIT_FIELD_REFs

2013-11-28 Thread Eric Botcazou
> Ok if testing succeeds there but please leave out the checking bits.

Yes, they (perhaps unsurprisingly) trigger in Ada, I'll investigate.

-- 
Eric Botcazou


Re: [wwwdocs] Update obvious fix commit policy

2013-11-28 Thread H.J. Lu
On Thu, Nov 28, 2013 at 7:38 AM, Diego Novillo  wrote:
> On Thu, Nov 28, 2013 at 10:32 AM, Richard Earnshaw  wrote:
>
>> I think it might be worth saying that one class of 'obvious' fix that we
>> don't want to go in without prior clearance are bulk white space
>> clean-ups.  These can be a right-royal pain to deal with if you're in
>> the middle of a big re-write of a hunk of code.
>
> Hm, not sure I agree.  Those are the most obvious to me.  Particularly
> after I get my clang format pony.  I've asked for GNU style support.
> It will be a lot easier to keep files properly formatted to the GNU
> guidelines.
>
> Making exceptions to the obvious rule seems illogical to me.

I have found that using git helps to mitigate the merging
pain when the places I am working on have changed
at the same time.

-- 
H.J.


PATCH: PR c/59309: FAIL: c-c++-common/cilk-plus/CK/spawnee_inline.c -g -fcilkplus (test for excess errors)

2013-11-28 Thread H.J. Lu
Hi,

For function without arguments, gimplify_cilk_spawn checks

*arg_array == NULL_TREE

But arg_array is a TREE vector of zero elements.  This patch updates
gimplify_cilk_spawn to properly handle function without arguments.
Tested on Linux/x86-64 with GCC bootstraped using -fsanitize=address.
OK to install?

Thanks.


H.J.

2013-11-28   H.J. Lu  

PR c/59309
* cilk.c (gimplify_cilk_spawn): Properly handle function without
arguments.

diff --git a/gcc/c-family/cilk.c b/gcc/c-family/cilk.c
index c85b5f2..99d9c7e 100644
--- a/gcc/c-family/cilk.c
+++ b/gcc/c-family/cilk.c
@@ -757,7 +757,10 @@ gimplify_cilk_spawn (tree *spawn_p, gimple_seq *before 
ATTRIBUTE_UNUSED,
 
   /* This should give the number of parameters.  */
   total_args = list_length (new_args);
-  arg_array = XNEWVEC (tree, total_args);
+  if (total_args)
+arg_array = XNEWVEC (tree, total_args);
+  else
+arg_array = NULL;
 
   ii_args = new_args;
   for (ii = 0; ii < total_args; ii++)
@@ -771,7 +774,7 @@ gimplify_cilk_spawn (tree *spawn_p, gimple_seq *before 
ATTRIBUTE_UNUSED,
 
   call1 = cilk_call_setjmp (cfun->cilk_frame_decl);
 
-  if (*arg_array == NULL_TREE)
+  if (arg_array == NULL || *arg_array == NULL_TREE)
 call2 = build_call_expr (function, 0);
   else 
 call2 = build_call_expr_loc_array (EXPR_LOCATION (*spawn_p), function, 


[PATCH][ARM] Set "conds" attribute for non-predicable ARMv8-A instructions

2013-11-28 Thread Kyrill Tkachov

Hi all,

Some ARMv8-A instructions in the vrint* family as well as the vmaxnm and vminnm 
ones do not have a conditional variant and have therefore their "predicable" 
attribute set to "no".
However we've discovered that they can still end up conditionalised in some 
cases because of the arm_cond_branch pattern that can remove a conditional and 
conditionalise the next instruction, unless the "conds" attribute forbids it. To 
prevent this happeninf with the vrint and vmaxnm, vminnm instructions, this 
patch sets the "conds" attribute to "unconditional".


This was caught in a testcase where the vrinta instruction ended up being 
conditionalised (producing vrintagt) which thankfully the assembler caught and 
complained. If this had happened on the smin/smax patterns that don't have a 
'%?' in their output template, we would have ended silently miscompiling!


This should go into trunk and the 4.8 branch.

Tested arm-none-eabi on a model.

Ok?

Thanks,
Kyrill

2013-11-28  Kyrylo Tkachov  

* config/arm/iterators.md (vrint_conds): New int attribute.
* config/arm/vfp.md (2): Set conds attribute.
(smax3): Likewise.
(smin3): Likewise.

2013-11-28  Kyrylo Tkachov  

* gcc.target/arm/vrinta-ce.c: New testcase.
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index db1634b..dc4cf0d 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -525,6 +525,10 @@
   (UNSPEC_VRINTA "no") (UNSPEC_VRINTM "no")
   (UNSPEC_VRINTR "yes") (UNSPEC_VRINTX "yes")])
 
+(define_int_attr vrint_conds [(UNSPEC_VRINTZ "nocond") (UNSPEC_VRINTP "unconditional")
+  (UNSPEC_VRINTA "unconditional") (UNSPEC_VRINTM "unconditional")
+  (UNSPEC_VRINTR "nocond") (UNSPEC_VRINTX "nocond")])
+
 (define_int_attr nvrint_variant [(UNSPEC_NVRINTZ "z") (UNSPEC_NVRINTP "p")
 (UNSPEC_NVRINTA "a") (UNSPEC_NVRINTM "m")
 (UNSPEC_NVRINTX "x") (UNSPEC_NVRINTN "n")])
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 22b6325..6d0515a 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -1277,7 +1277,8 @@
   "vrint%?.\\t%0, %1"
   [(set_attr "predicable" "")
(set_attr "predicable_short_it" "no")
-   (set_attr "type" "f_rint")]
+   (set_attr "type" "f_rint")
+   (set_attr "conds" "")]
 )
 
 ;; MIN_EXPR and MAX_EXPR eventually map to 'smin' and 'smax' in RTL.
@@ -1293,7 +1294,8 @@
 		  (match_operand:SDF 2 "register_operand" "")))]
   "TARGET_HARD_FLOAT && TARGET_FPU_ARMV8 "
   "vmaxnm.\\t%0, %1, %2"
-  [(set_attr "type" "f_minmax")]
+  [(set_attr "type" "f_minmax")
+   (set_attr "conds" "unconditional")]
 )
 
 (define_insn "smin3"
@@ -1302,7 +1304,8 @@
 		  (match_operand:SDF 2 "register_operand" "")))]
   "TARGET_HARD_FLOAT && TARGET_FPU_ARMV8 "
   "vminnm.\\t%0, %1, %2"
-  [(set_attr "type" "f_minmax")]
+  [(set_attr "type" "f_minmax")
+   (set_attr "conds" "unconditional")]
 )
 
 ;; Unimplemented insns:
diff --git a/gcc/testsuite/gcc.target/arm/vrinta-ce.c b/gcc/testsuite/gcc.target/arm/vrinta-ce.c
new file mode 100644
index 000..71c5b3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vrinta-ce.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_vfp_ok } */
+/* { dg-options "-O2 -marm -march=armv8-a" } */
+/* { dg-add-options arm_v8_vfp } */
+
+double foo (double a)
+{
+  if (a > 3.0)
+return  __builtin_round (a);
+
+  return 0.0;
+}
+
+/* { dg-final { scan-assembler-times "vrinta.f64\td\[0-9\]+" 1 } } */
+

Re: [PATCH][ARM] Set "conds" attribute for non-predicable ARMv8-A instructions

2013-11-28 Thread Richard Earnshaw
On 28/11/13 16:10, Kyrill Tkachov wrote:
> Hi all,
> 
> Some ARMv8-A instructions in the vrint* family as well as the vmaxnm and 
> vminnm 
> ones do not have a conditional variant and have therefore their "predicable" 
> attribute set to "no".
> However we've discovered that they can still end up conditionalised in some 
> cases because of the arm_cond_branch pattern that can remove a conditional 
> and 
> conditionalise the next instruction, unless the "conds" attribute forbids it. 
> To 
> prevent this happeninf with the vrint and vmaxnm, vminnm instructions, this 
> patch sets the "conds" attribute to "unconditional".
> 
> This was caught in a testcase where the vrinta instruction ended up being 
> conditionalised (producing vrintagt) which thankfully the assembler caught 
> and 
> complained. If this had happened on the smin/smax patterns that don't have a 
> '%?' in their output template, we would have ended silently miscompiling!
> 
> This should go into trunk and the 4.8 branch.
> 

OK both.

R.

> Tested arm-none-eabi on a model.
> 
> Ok?
> 
> Thanks,
> Kyrill
> 
> 2013-11-28  Kyrylo Tkachov  
> 
>  * config/arm/iterators.md (vrint_conds): New int attribute.
>  * config/arm/vfp.md (2): Set conds attribute.
>  (smax3): Likewise.
>  (smin3): Likewise.
> 
> 2013-11-28  Kyrylo Tkachov  
> 
>  * gcc.target/arm/vrinta-ce.c: New testcase.
> 



Re: [PATCH] Postpone __LINE__ evaluation to the end of #line directives

2013-11-28 Thread Joseph S. Myers
On Wed, 27 Nov 2013, Max Woodbury wrote:

> There should be a way to change the __FILE__ value without changing the
> line number sequencing.  Whatever that mechanism is, it should NOT
> introduce maintenance problems that involve counting lines of code.

I think that #line is mainly intended for use by code generators that 
generate C code, rather than directly by people writing C programs.  Such 
a code generator can easily manage counting lines of code.

> A little Googeling quickly turns up examples that make it clear that:
> 
> #line __LINE__ "new__FILE__value"
> 
> is that expected mechanism,

You'll find any number of examples online based on misconceptions about 
the C languages, possibly together with what one particular implementation 
does.  Any recommendation to do things based on an area where the editor 
of the standard has said the ambiguity in the standard is deliberate is 
clearly a bad recommendation.  Recommendations on use of C should be based 
on areas where the standard is clear and implementations agree.

> In other words, if you processed the text in multiple phases the way
> the standard requires, you would not substitute the value for the
> __LINE__ token until after the end of the directive has been seen.
> Thus the problem only arises because this implementation folds the
> translation phases into a single pass over the text and takes an
> improper short-cut as it does so.  The standard explicitly warns
> against this kind of mistake.

The standard itself mixes up the phases.  Recall that the definition of 
line number is "one greater than the number of new-line characters read or 
introduced in translation phase 1 (5.1.1.2) while processing the source 
file to the current token" (where "current token" is never defined).  If 
the phases were completely separate, by your reasoning every newline has 
been processed in phase 1 before any of phases 2, 3 or 4 do anything, and 
so all line numbers relate to the end of the file.  There is absolutely 
nothing to say that the newline at the end of the #line directive has been 
read "while processing the source file to the current token" (if __LINE__ 
in the #line directive is the current token) but that the newline after it 
hasn't been read; if anything, the phases imply that all newlines have 
been read.

This case is just as ambiguous as the case of a multi-line macro call, 
where __LINE__ gets expanded somewhere in the macro arguments, and the 
line number can be that of the macro name, or of the closing parenthesis 
of the call, or somewhere in between, and the standard does not make a 
conformance distinction between those choices.

So, I don't think we should make complicated changes to implement one 
particular choice in an area of deliberate ambiguity without direction 
from WG14 to eliminate the ambiguity in the standard.  Instead, we can let 
the choices be whatever is most natural in the implementation.  If you 
believe the standard is defective in not defining certain things, I advise 
filing a DR (or, when next open for revisions, proposing a paper at a 
meeting to change the definition as you think appropriate).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Store the SSA name range type in the tree structure

2013-11-28 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Nov 28, 2013 at 12:26 PM, Richard Sandiford
>  wrote:
>> At the moment, an anti range ~[A,B] is stored as [B+1,A-1].  This makes
>> it harder to store the range in the natural precision of A and B, since
>> B+1 and A-1 might not be representable in that precision.
>>
>> This patch instead stores the original minimum and maximum values and
>> uses a spare tree bit to represent the range type.  The version below
>> is for trunk; I've also tested a wide-int version.
>>
>> Tested on x86_64-linux-gnu.  OK to install?
>
> Ok.

Applied, thanks.  For the record, here's the version I put on wide-int,
which is just a mechanical change from the trunk version.

Richard


Index: gcc/tree-core.h
===
--- gcc/tree-core.h 2013-11-27 11:31:27.043671967 +
+++ gcc/tree-core.h 2013-11-27 11:31:59.596530816 +
@@ -836,6 +836,9 @@ struct GTY(()) tree_base {
TRANSACTION_EXPR_OUTER in
   TRANSACTION_EXPR
 
+   SSA_NAME_ANTI_RANGE_P in
+  SSA_NAME
+
public_flag:
 
TREE_OVERFLOW in
Index: gcc/tree-ssa-copy.c
===
--- gcc/tree-ssa-copy.c 2013-11-27 11:31:27.043671967 +
+++ gcc/tree-ssa-copy.c 2013-11-27 11:31:59.603530785 +
@@ -572,6 +572,7 @@ fini_copy_prop (void)
   && SSA_NAME_RANGE_INFO (var)
   && !SSA_NAME_RANGE_INFO (copy_of[i].value))
duplicate_ssa_name_range_info (copy_of[i].value,
+  SSA_NAME_RANGE_TYPE (var),
   SSA_NAME_RANGE_INFO (var));
}
 }
Index: gcc/tree-ssanames.c
===
--- gcc/tree-ssanames.c 2013-11-27 11:31:29.165662775 +
+++ gcc/tree-ssanames.c 2013-11-27 17:51:40.346250076 +
@@ -178,12 +178,14 @@ make_ssa_name_fn (struct function *fn, t
   return t;
 }
 
-/* Store range information MIN, and MAX to tree ssa_name NAME.  */
+/* Store range information RANGE_TYPE, MIN, and MAX to tree ssa_name NAME.  */
 
 void
-set_range_info (tree name, const widest_int &min, const widest_int &max)
+set_range_info (tree name, enum value_range_type range_type,
+   const widest_int &min, const widest_int &max)
 {
   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
+  gcc_assert (range_type == VR_RANGE || range_type == VR_ANTI_RANGE);
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
 
   /* Allocate if not available.  */
@@ -195,12 +197,16 @@ set_range_info (tree name, const widest_
false);
 }
 
+  /* Record the range type.  */
+  if (SSA_NAME_RANGE_TYPE (name) != range_type)
+SSA_NAME_ANTI_RANGE_P (name) = (range_type == VR_ANTI_RANGE);
+
   /* Set the values.  */
   ri->min = min;
   ri->max = max;
 
   /* If it is a range, try to improve nonzero_bits from the min/max.  */
-  if (wi::cmp (min, max, TYPE_SIGN (TREE_TYPE (name))) != 1)
+  if (range_type == VR_RANGE)
 {
   int prec = TYPE_PRECISION (TREE_TYPE (name));
 
@@ -223,7 +229,6 @@ set_range_info (tree name, const widest_
 enum value_range_type
 get_range_info (const_tree name, widest_int *min, widest_int *max)
 {
-  enum value_range_type range_type;
   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
   gcc_assert (min && max);
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
@@ -234,22 +239,9 @@ get_range_info (const_tree name, widest_
  > 2 * HOST_BITS_PER_WIDE_INT))
 return VR_VARYING;
 
-  /* If min > max, it is VR_ANTI_RANGE.  */
-  if (wi::cmp (ri->min, ri->max, TYPE_SIGN (TREE_TYPE (name))) == 1)
-{
-  /* VR_ANTI_RANGE ~[min, max] is encoded as [max + 1, min - 1].  */
-  range_type = VR_ANTI_RANGE;
-  *min = ri->max + 1;
-  *max = ri->min - 1;
-}
-  else
-  {
-/* Otherwise (when min <= max), it is VR_RANGE.  */
-range_type = VR_RANGE;
-*min = ri->min;
-*max = ri->max;
-  }
-  return range_type;
+  *min = ri->min;
+  *max = ri->max;
+  return SSA_NAME_RANGE_TYPE (name);
 }
 
 /* Change non-zero bits bitmask of NAME.  */
@@ -259,7 +251,7 @@ set_nonzero_bits (tree name, const wides
 {
   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
   if (SSA_NAME_RANGE_INFO (name) == NULL)
-set_range_info (name,
+set_range_info (name, VR_RANGE,
wi::to_widest (TYPE_MIN_VALUE (TREE_TYPE (name))),
wi::to_widest (TYPE_MAX_VALUE (TREE_TYPE (name;
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
@@ -490,15 +482,17 @@ duplicate_ssa_name_ptr_info (tree name,
   SSA_NAME_PTR_INFO (name) = new_ptr_info;
 }
 
-/* Creates a duplicate of the range_info_def at RANGE_INFO for use by
-   the SSA name NAME.  */
+/* Creates a duplicate of the range_info_def at RANGE_INFO of type
+   RANGE_TYPE for use by the SSA name NAME.  */
 void
-duplicate_ssa_name_range_info (tree name, struct range_i

RE: _Cilk_spawn and _Cilk_sync for C++

2013-11-28 Thread Iyer, Balaji V


> -Original Message-
> From: Jason Merrill [mailto:ja...@redhat.com]
> Sent: Thursday, November 28, 2013 9:11 AM
> To: Iyer, Balaji V; gcc-patches@gcc.gnu.org
> Cc: Jeff Law
> Subject: Re: _Cilk_spawn and _Cilk_sync for C++
> 
> On 11/27/2013 11:05 PM, Iyer, Balaji V wrote:
> > Found the bug. I was not utilizing the stabilize_expr's output correctly.
> 
> Unfortunately, I think I was misleading you with talk of stabilize; like you 
> said,
> you want to evaluate the whole expression in the spawned function rather
> than in the caller, so that any temporaries (including the lambda closure) 
> live
> until the _Cilk_sync.  Using stabilize_expr this way (the way I was 
> suggesting)
> forces the lambda closure to be evaluated in the caller, and then destroyed
> at the end of the enclosing statement, which is likely to erase any data that
> the spawned function needs to do its work, if anything captured by copy has
> a destructor.
> 

> As I said in my last mail, I think the right fix is to make sure that A gets
> remapped properly during copy_body so that its use in the initializer for the
> closure doesn't confuse later passes.

Consider the following test case. I took this from the lambda_spawns.cc line 
#203.


  global_var = 0;
  _Cilk_spawn [=](int *Aa, int size){ foo1_c(A, size); }(B, 2);
  foo1 (A, 2);
  _Cilk_sync;
  if (global_var != 2)
return (++q);


... and here is its gimple output:

{
  struct  * D.2349;
  unsigned long D.2350;
  struct  * D.2351;
  struct  * D.2352;
  struct  * D.2353;
  struct  * D.2354;
  unsigned long D.2355;
  struct  * D.2356;
  struct  * D.2357;
  struct  * D.2358;
  struct  * D.2359;
  struct  * D.2360;
  struct __lambda0 D.2219;
  unsigned int D.2361;
  unsigned int D.2362;
  void * D.2363;
  void * D.2364;
  struct  * D.2365;
  struct  * D.2366;
  unsigned int D.2367;

  try
{
  try
{
  __cilkrts_enter_frame_fast_1 (&D.2258);
  D.2349 = D.2258.worker;
  D.2350 = D.2349->pedigree.rank;
  D.2258.pedigree.rank = D.2350;
  D.2351 = D.2258.worker;
  D.2352 = D.2351->pedigree.parent;
  D.2258.pedigree.parent = D.2352;
  D.2353 = D.2258.call_parent;
  D.2354 = D.2258.worker;
  D.2355 = D.2354->pedigree.rank;
  D.2353->pedigree.rank = D.2355;
  D.2356 = D.2258.call_parent;
  D.2357 = D.2258.worker;
  D.2358 = D.2357->pedigree.parent;
  D.2356->pedigree.parent = D.2358;
  D.2359 = D.2258.worker;
  D.2359->pedigree.rank = 0;
  D.2360 = D.2258.worker;
  D.2360->pedigree.parent = &D.2258.pedigree;
  __cilkrts_detach (&D.2258);
  D.2219.__A = CHAIN.6->A;
  try
{
  main2(int)operator() (&D.2219, D.2255, 2);
}
 finally
{
  D.2219 = {CLOBBER};   
<===
}
}
  catch
{
  catch (NULL)
{
  try
{
  D.2361 = D.2258.flags;
  D.2362 = D.2361 | 16;
  D.2258.flags = D.2362;
  D.2363 = __builtin_eh_pointer (0);
  D.2258.except_data = D.2363;
  D.2364 = __builtin_eh_pointer (0);
  __cxa_begin_catch (D.2364);
  __cxa_rethrow ();
}
  finally
{
  __cxa_end_catch ();
}
}
}
  finally

as you can tell, it is clobbering the lambda closure at the end of the lambda 
calling (in the finally expr, I marked with <=  ) and then it is 
catching value of A from main2 as it is supposed to. 

What am I misunderstanding?


> 
> Jason




RE: PATCH: PR c/59309: FAIL: c-c++-common/cilk-plus/CK/spawnee_inline.c -g -fcilkplus (test for excess errors)

2013-11-28 Thread Iyer, Balaji V


> -Original Message-
> From: Lu, Hongjiu
> Sent: Thursday, November 28, 2013 11:06 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Iyer, Balaji V
> Subject: PATCH: PR c/59309: FAIL: c-c++-common/cilk-
> plus/CK/spawnee_inline.c -g -fcilkplus (test for excess errors)
> 
> Hi,
> 
> For function without arguments, gimplify_cilk_spawn checks
> 
> *arg_array == NULL_TREE
> 
> But arg_array is a TREE vector of zero elements.  This patch updates
> gimplify_cilk_spawn to properly handle function without arguments.
> Tested on Linux/x86-64 with GCC bootstraped using -fsanitize=address.
> OK to install?
> 
> Thanks.
> 
> 
> H.J.
> 
> 2013-11-28   H.J. Lu  
> 
>   PR c/59309
>   * cilk.c (gimplify_cilk_spawn): Properly handle function without
>   arguments.
> 
> diff --git a/gcc/c-family/cilk.c b/gcc/c-family/cilk.c index c85b5f2..99d9c7e
> 100644
> --- a/gcc/c-family/cilk.c
> +++ b/gcc/c-family/cilk.c
> @@ -757,7 +757,10 @@ gimplify_cilk_spawn (tree *spawn_p, gimple_seq
> *before ATTRIBUTE_UNUSED,
> 
>/* This should give the number of parameters.  */
>total_args = list_length (new_args);
> -  arg_array = XNEWVEC (tree, total_args);
> +  if (total_args)
> +arg_array = XNEWVEC (tree, total_args);  else
> +arg_array = NULL;
> 
>ii_args = new_args;
>for (ii = 0; ii < total_args; ii++)
> @@ -771,7 +774,7 @@ gimplify_cilk_spawn (tree *spawn_p, gimple_seq
> *before ATTRIBUTE_UNUSED,
> 
>call1 = cilk_call_setjmp (cfun->cilk_frame_decl);
> 
> -  if (*arg_array == NULL_TREE)
> +  if (arg_array == NULL || *arg_array == NULL_TREE)
>  call2 = build_call_expr (function, 0);
>else
>  call2 = build_call_expr_loc_array (EXPR_LOCATION (*spawn_p), function,

Looks good to me.

-Balaji V. Iyer


Re: [WWWDOCS] Document IPA/LTO/FDO/i386 changes in GCC-4.9

2013-11-28 Thread Jan Hubicka
> > +  Functions are no longer pointlessly renamed.
> 
> Readers may struggle a bit with this.  What does it refer to?

We previously renamed every static function foo into foo.1234
(just as a precaution because other compilation unit may have also function 
foo).
This confuses many thins, so now we do renaming only when we see a conflict.

I am attaching the changes I comitted.

I dropped this from news changes.  In meantime we merged in the change enabling
slim LTO files by defualt, what about:

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.41
diff -c -p -r1.41 changes.html
*** changes.html28 Nov 2013 15:05:51 -  1.41
--- changes.html28 Nov 2013 16:53:37 -
***
*** 15,20 
--- 15,25 
  Caveats
  

+ Because -fno-fat-lto-objects is now by default,
+   gcc-ar and gcc-nm wrappers needs
+   to be used to handle objects compiled with -flto.
+   Additionally the resulting binary needs to be linked with
+   -flto (and appropriate optimization flags).
  Support for a number of older systems and recently
  unmaintained or untested target ports of GCC has been declared
  obsolete in GCC 4.9.  Unless there is activity to revive them, the
***
*** 45,50 
--- 50,61 
  
  Link-time optimization (LTO) improvements:
  
+   Slim LTO objects are now used by default.  This means that with
+ -flto GCC will no longer produce non-LTO optimized binary
+ in addition to storing object representation in the intermediate
+ language. Consequently -flto no longer causes everything
+ to be optimized twice (once at compile time and again during link 
time).
+ This feature can be controlled by -ffat-lto-objects.
Type merging was rewritten. The new implementation is significantly 
faster
  and uses less memory. 
Better partitioning algorithm resulting in less streaming during
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.40
diff -r1.40 changes.html
40,41d39
<   
<   
47a46,85
> Link-time optimization (LTO) improvements:
> 
>   Type merging was rewritten. The new implementation is significantly 
> faster
> and uses less memory. 
>   Better partitioning algorithm resulting in less streaming during
> link time.
>   Early removal of virtual methods reduces the size of object files 
> and
> improves link-time memory usage and compile time.
>   Function bodies are now loaded on-demand and released early 
> improving
> overall memory usage at link time.
>   C++ hidden keyed methods can now be optimized out.
> 
> Memory usage building Firefox with debug enabled was reduced from 15GB to
> 3.5GB; link time from 1700 seconds to 350 seconds.
> 
> Inter-procedural optimization improvements:
> 
>   New type inheritance analysis module improving devirtualization.
> Devirtualization now takes into account anonymous name-spaces and the
> C++11 final keyword.
>   New speculative devirtualization pass (controlled by
> -fdevirtualize-speculatively.
>   Calls that were speculatively made direct are turned back to 
> indirect
> where direct call is not cheaper.
>   Local aliases are introduced for symbols that are known to be
> semantically equivalent across shared libraries improving dynamic
> linking times.
> 
> Feedback directed optimization improvements:
> 
>   Profiling of programs using C++ inline functions is now more 
> reliable.
>   New time profiling determines typical order in which functions are
> executed.
>   A new function reordering pass (controlled by
> -freorder-functions) significantly reduces
> startup time of large applications.  Until binutils support is
> completed, it is effective only with link-time optimization.
>   Feedback driven indirect call removal and devirtualization now 
> handle
> cross-module calls when link-time optimization is enabled.
> 
337c375
<  GCC now supports the new Intel microarchitecture named Silvermont
---
> GCC now supports the new Intel microarchitecture named Silvermont
339a378,388
> -march=generic has been retuned for better support of
>   Intel core and AMD Bulldozer architectures.  Performance of AMD K7, K8,
>   Intel Pentium-M, and Pentium4 based CPUs is no longer considered 
> important
>   for generic.
> 
> Better inlining of memcpy and memset 
>   that is aware of value ranges and produces shorter alignment prologues.
> 
> -mno-accumulate-outgoing-args is now honored when unwind
>   information is output.  Argument ac

Re: [PING] [PATCH, ARM, testcase] Skip target arm-neon for lp1243022.c

2013-11-28 Thread Richard Earnshaw
On 28/11/13 06:17, Zhenqiang Chen wrote:
> 
> 
>> -Original Message-
>> From: Jeff Law [mailto:l...@redhat.com]
>> Sent: Thursday, November 28, 2013 12:43 AM
>> To: Zhenqiang Chen; gcc-patches@gcc.gnu.org
>> Cc: Ramana Radhakrishnan; Richard Earnshaw
>> Subject: Re: [PING] [PATCH, ARM, testcase] Skip target arm-neon for
>> lp1243022.c
>>
>> On 11/27/13 02:05, Zhenqiang Chen wrote:
>>> Ping?
>> Thanks for including the actual patch you're pinging, it helps :-)
>>
> Hi,

 lp1243022.c will fail with options: -mfpu=neon -mfloat-abi=hard.

 Logs show it does not generate auto-incremental instruction in pass
 auto_inc_dec. In this case, the check of REG_INC note at subreg2 will
 be invalid. So skip the check for target arm-neon.

 All PASS with the following options:

 -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard
 -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft
 -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp
 -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft/-mfpu=vfpv3
 -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp/-mfpu=vfpv3
 -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard/-mfpu=vfpv3
 -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft/-mfpu=neon
 -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp/-mfpu=neon
 -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard/-mfpu=neon
 -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard
 -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft
 -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp
 -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft/-mfpu=vfpv4
 -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp/-mfpu=vfpv4
 -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard/-mfpu=vfpv4
 -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft/-mfpu=neon
 -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp/-mfpu=neon
 -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard/-mfpu=neon

 Is it OK?

 Thanks!
 -Zhenqiang

 testsuite/ChangeLog:
 2013-11-08  Zhenqiang Chen  

  * gcc.target/arm/lp1243022.c: Skip target arm-neon.
>> It seems to me you should be xfailing arm-neon, not skipping the test.
>> Unless there is some fundamental reason why we can not generate auto-inc
>> instructions on the neon.
> 
> Thanks for the comments. Update the test case as xfail.
> 
> diff --git a/gcc/testsuite/gcc.target/arm/lp1243022.c
> b/gcc/testsuite/gcc.target/arm/lp1243022.c
> index 91a544d..b2ebe7e 100644
> --- a/gcc/testsuite/gcc.target/arm/lp1243022.c
> +++ b/gcc/testsuite/gcc.target/arm/lp1243022.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target arm_thumb2 } } */
>  /* { dg-options "-O2 -fdump-rtl-subreg2" } */
> 
> -/* { dg-final { scan-rtl-dump "REG_INC" "subreg2" } } */
> +/* { dg-final { scan-rtl-dump "REG_INC" "subreg2" { xfail arm_neon } } } */
>  /* { dg-final { cleanup-rtl-dump "subreg2" } } */
>  struct device;
>  typedef unsigned int __u32;
> 
> 

This test looks horribly fragile, since it's taking a large chunk of
code and expecting a specific optimization to have occurred in exactly
one place.  The particular instruction was a large pre-modify offset,
which isn't supported

Looking back through the original bug report, the problem was that the
subreg2 pass was losing a REG_INC note that had previously been created.
 Of course it's not a bug if it was never created before, but there's no
easy way to tell that.

On that basis, I think the original patch is the correct one, please
install that.

I must say that I do wonder what the value of some of these tests are in
the absence of a proper unit test environment.

R.



RE: _Cilk_spawn and _Cilk_sync for C++

2013-11-28 Thread Iyer, Balaji V


> -Original Message-
> From: Jason Merrill [mailto:ja...@redhat.com]
> Sent: Thursday, November 28, 2013 9:11 AM
> To: Iyer, Balaji V; gcc-patches@gcc.gnu.org
> Cc: Jeff Law
> Subject: Re: _Cilk_spawn and _Cilk_sync for C++
> 
> On 11/27/2013 11:05 PM, Iyer, Balaji V wrote:
> > Found the bug. I was not utilizing the stabilize_expr's output correctly.
> 
> Unfortunately, I think I was misleading you with talk of stabilize; like you 
> said,
> you want to evaluate the whole expression in the spawned function rather
> than in the caller, so that any temporaries (including the lambda closure) 
> live
> until the _Cilk_sync.  Using stabilize_expr this way (the way I was 
> suggesting)
> forces the lambda closure to be evaluated in the caller, and then destroyed
> at the end of the enclosing statement, which is likely to erase any data that
> the spawned function needs to do its work, if anything captured by copy has
> a destructor.
> 
> As I said in my last mail, I think the right fix is to make sure that A gets
> remapped properly during copy_body so that its use in the initializer for the
> closure doesn't confuse later passes.

Ok. I think I cut and pasted the wrong part. I am very sorry about it.

Here is the original code:

  global_var = 0;
  _Cilk_spawn [=](int *Aa, int size){ foo1_c(A, size); }(B, 2);
  foo1 (A, 2);
  _Cilk_sync;
  if (global_var != 2)
return (++q);

Here is the gimple output with _Cilk_spawn and _Cilk_sync *DISABLED*

try
{
  A[0] = 5;
  A[1] = 3;
  B[0] = 5;
  B[1] = 3;
  main_size = argc + 1;
  q = 0;
  global_var = 0;
  D.2219.__A = A;
  try
{
  main2(int)operator() (&D.2219, &B, 2);
}
  finally
{
  D.2219 = {CLOBBER};
}
  foo1 (&A, 2);
  global_var.4 = global_var;
  if (global_var.4 != 2) goto ; else goto ;
  :
  q = q + 1;
  D.2257 = q;
  return D.2257;
  :
  D.2257 = q;
  return D.2257;
}
  finally
{
  A = {CLOBBER};
  B = {CLOBBER};
}

Here is the gimple output with _Cilk_spawn and _Cilk_synd enabled
try
{
  try
{
  __cilkrts_enter_frame_fast_1 (&D.2258);
  D.2351 = D.2258.worker;
  D.2352 = D.2351->pedigree.rank;
  D.2258.pedigree.rank = D.2352;
  D.2353 = D.2258.worker;
  D.2354 = D.2353->pedigree.parent;
  D.2258.pedigree.parent = D.2354;
  D.2355 = D.2258.call_parent;
  D.2356 = D.2258.worker;
  D.2357 = D.2356->pedigree.rank;
  D.2355->pedigree.rank = D.2357;
  D.2358 = D.2258.call_parent;
  D.2359 = D.2258.worker;
  D.2360 = D.2359->pedigree.parent;
  D.2358->pedigree.parent = D.2360;
  D.2361 = D.2258.worker;
  D.2361->pedigree.rank = 0;
  D.2362 = D.2258.worker;
  D.2362->pedigree.parent = &D.2258.pedigree;
  __cilkrts_detach (&D.2258);
  B.5 = &CHAIN.7->B;
  D.2219.__A = CHAIN.7->A;
  try
{
  main2(int)operator() (&D.2219, B.5, 2);
}
  finally
{
  D.2219 = {CLOBBER}; 
< CLOBBERING LINE
}
}
  catch
{
  catch (NULL)
{
  try
{
  D.2364 = D.2258.flags;
  D.2365 = D.2364 | 16;
  D.2258.flags = D.2365;
  D.2366 = __builtin_eh_pointer (0);
  D.2258.except_data = D.2366;
  D.2367 = __builtin_eh_pointer (0);
  __cxa_begin_catch (D.2367);
  __cxa_rethrow ();
}
  finally
{
  __cxa_end_catch ();
}
}
}
}
  finally
{
  D.2368 = D.2258.worker;
  D.2369 = D.2258.call_parent;
  D.2368->current_stack_frame = D.2369;
  __cilkrts_pop_frame (&D.2258);
  D.2370 = D.2258.flags;
  if (D.2370 != 16777216) goto ; else goto ;
  :
  __cilkrts_leave_frame (&D.2258);
  goto ;
  :
  :
}
}

In this line (<=) it is clobbering the lambda closure. 

Sorry again about the mistake.

By the way, I have only cut and pasted the part from the top-level try 
expression to just show the relevant part. If you want, I can show you the rest.

> 
> Jason



[wide-int] small cleanup in wide-int.*

2013-11-28 Thread Kenneth Zadeck

This patch does three things in wide-int:

1) it cleans up some comments.
2) removes a small amount of trash.
3) it changes the max size of the wide int from being 4x of 
MAX_BITSIZE_MODE_ANY_INT to 2x +1.   This should improve large muls and 
divs as well as perhaps help with some cache behavior.


ok to commit
Index: gcc/wide-int.h
===
--- gcc/wide-int.h	(revision 205488)
+++ gcc/wide-int.h	(working copy)
@@ -55,10 +55,12 @@ along with GCC; see the file COPYING3.
 
  2) offset_int.  This is a fixed size representation that is
  guaranteed to be large enough to compute any bit or byte sized
- address calculation on the target.  Currently the value is 64 + 4
- bits rounded up to the next number even multiple of
+ address calculation on the target.  Currently the value is 64 + 3
+ + 1 bits rounded up to the next number even multiple of
  HOST_BITS_PER_WIDE_INT (but this can be changed when the first
- port needs more than 64 bits for the size of a pointer).
+ port needs more than 64 bits for the size of a pointer).  The 3
+ bits allow the bits of byte to accessed, the 1 allows any
+ unsigned value to be converted to signed without overflowing.
 
  This flavor can be used for all address math on the target.  In
  this representation, the values are sign or zero extended based
@@ -112,11 +114,11 @@ along with GCC; see the file COPYING3.
two, the default is the prefered representation.
 
All three flavors of wide_int are represented as a vector of
-   HOST_WIDE_INTs.  The default and widest_int vectors contain enough elements
-   to hold a value of MAX_BITSIZE_MODE_ANY_INT bits.  offset_int contains only
-   enough elements to hold ADDR_MAX_PRECISION bits.  The values are stored
-   in the vector with the least significant HOST_BITS_PER_WIDE_INT bits
-   in element 0.
+   HOST_WIDE_INTs.  The default and widest_int vectors contain enough
+   elements to hold a value of MAX_BITSIZE_MODE_ANY_INT bits.
+   offset_int contains only enough elements to hold ADDR_MAX_PRECISION
+   bits.  The values are stored in the vector with the least
+   significant HOST_BITS_PER_WIDE_INT bits in element 0.
 
The default wide_int contains three fields: the vector (VAL),
the precision and a length (LEN).  The length is the number of HWIs
@@ -223,10 +225,6 @@ along with GCC; see the file COPYING3.
 #include "signop.h"
 #include "insn-modes.h"
 
-#if 0
-#define DEBUG_WIDE_INT
-#endif
-
 /* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very
early examination of the target's mode file.  Thus it is safe that
some small multiple of this number is easily larger than any number
@@ -235,8 +233,8 @@ along with GCC; see the file COPYING3.
range of a multiply.  This code needs 2n + 2 bits.  */
 
 #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
+/ HOST_BITS_PER_WIDE_INT) + 1)
 
 /* This is the max size of any pointer on any machine.  It does not
seem to be as easy to sniff this out of the machine description as
Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc	(revision 205488)
+++ gcc/wide-int.cc	(working copy)
@@ -2090,59 +2090,5 @@ void gt_ggc_mx (widest_int *) { }
 void gt_pch_nx (widest_int *, void (*) (void *, void *), void *) { }
 void gt_pch_nx (widest_int *) { }
 
-/*
- * Private debug printing routines.
- */
-#ifdef DEBUG_WIDE_INT
-/* The debugging routines print results of wide operations into the
-   dump files of the respective passes in which they were called.  */
-static char *
-dumpa (const HOST_WIDE_INT *val, unsigned int len, unsigned int prec, char *buf)
-{
-  int i;
-  unsigned int l;
-  const char * sep = "";
 
-  l = sprintf (buf, "[%d (", prec);
-  for (i = len - 1; i >= 0; i--)
-{
-  l += sprintf (&buf[l], "%s" HOST_WIDE_INT_PRINT_HEX, sep, val[i]);
-  sep = " ";
-}
 
-  gcc_assert (len != 0);
-
-  l += sprintf (&buf[l], ")]");
-
-  gcc_assert (l < MAX_SIZE);
-  return buf;
-
-
-}
-#endif
-
-#if 0
-/* The debugging routines print results of wide operations into the
-   dump files of the respective passes in which they were called.  */
-char *
-wide_int_ro::dump (char* buf) const
-{
-  int i;
-  unsigned int l;
-  const char * sep = "";
-
-  l = sprintf (buf, "[%d (", precision);
-  for (i = len - 1; i >= 0; i--)
-{
-  l += sprintf (&buf[l], "%s" HOST_WIDE_INT_PRINT_HEX, sep, val[i]);
-  sep = " ";
-}
-
-  gcc_assert (len != 0);
-
-  l += sprintf (&buf[l], ")]");
-
-  gcc_assert (l < MAX_SIZE);
-  return buf;
-}
-#endif


[PATCH] Fix up bogus warning (PR sanitizer/59331)

2013-11-28 Thread Marek Polacek
We wrongly warned on instrumented VLAs that the size expression's
value is not used (with cc1plus only).  Unfortunately, this hasn't been
detected before due to disabled warnings in the VLA tests.  This patch
adds a (void) cast to suppress the warning as well as enables the
warnings in the VLA tests to detect unwanted warnings next time.

Tested x86_64-linux, ok for trunk?

2013-11-28  Marek Polacek  

PR sanitizer/59331
cp/
* decl.c (compute_array_index_type): Cast the expression to void.
testsuite/
* g++.dg/ubsan/pr59331.C: New test.
* g++.dg/ubsan/cxx1y-vla.C: Enable -Wall -Wno-unused-variable.
Disable the -w option.
* c-c++-common/ubsan/vla-1.c: Likewise.
* c-c++-common/ubsan/vla-2.c: Likewise.
* c-c++-common/ubsan/vla-3.c: Don't use the -w option.

--- gcc/cp/decl.c.mp5   2013-11-28 16:15:42.606690956 +0100
+++ gcc/cp/decl.c   2013-11-28 17:49:44.120202587 +0100
@@ -8435,7 +8435,9 @@ compute_array_index_type (tree name, tre
  tree t = fold_build2 (PLUS_EXPR, TREE_TYPE (itype), itype,
build_one_cst (TREE_TYPE (itype)));
  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t),
-  ubsan_instrument_vla (input_location, t), t);
+  ubsan_instrument_vla (input_location, t),
+  /* Cast to void to prevent bogus warning.  */
+  build1 (CONVERT_EXPR, void_type_node, t));
  finish_expr_stmt (t);
}
}
--- gcc/testsuite/g++.dg/ubsan/pr59331.C.mp52013-11-28 16:29:13.967882392 
+0100
+++ gcc/testsuite/g++.dg/ubsan/pr59331.C2013-11-28 17:54:24.125451857 
+0100
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=vla-bound -Wall -Wno-unused-variable" } */
+
+void foo(int i)
+{
+  /* Don't warn here with "value computed is not used".  */
+  char a[i];
+}
--- gcc/testsuite/g++.dg/ubsan/cxx1y-vla.C.mp5  2013-11-28 17:51:51.066755487 
+0100
+++ gcc/testsuite/g++.dg/ubsan/cxx1y-vla.C  2013-11-28 17:59:49.578744756 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-fsanitize=vla-bound -w -std=c++1y" } */
+/* { dg-options "-fsanitize=vla-bound -Wall -Wno-unused-variable -std=c++1y" } 
*/
 /* { dg-shouldfail "ubsan" } */
 
 int
--- gcc/testsuite/c-c++-common/ubsan/vla-1.c.mp52013-11-28 
18:03:32.318664603 +0100
+++ gcc/testsuite/c-c++-common/ubsan/vla-1.c2013-11-28 18:03:45.627715609 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-fsanitize=vla-bound -w" } */
+/* { dg-options "-fsanitize=vla-bound -Wall -Wno-unused-variable" } */
 
 static int
 bar (void)
--- gcc/testsuite/c-c++-common/ubsan/vla-3.c.mp52013-11-28 
18:04:25.737865780 +0100
+++ gcc/testsuite/c-c++-common/ubsan/vla-3.c2013-11-28 18:04:34.796900021 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-fsanitize=vla-bound -w" } */
+/* { dg-options "-fsanitize=vla-bound" } */
 
 /* Don't instrument the arrays here.  */
 int
--- gcc/testsuite/c-c++-common/ubsan/vla-2.c.mp52013-11-28 
18:03:54.249748290 +0100
+++ gcc/testsuite/c-c++-common/ubsan/vla-2.c2013-11-28 18:04:07.666798731 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-fsanitize=vla-bound -w" } */
+/* { dg-options "-fsanitize=vla-bound -Wall -Wno-unused-variable" } */
 
 int
 main (void)

Marek


RFC: PR bootstrap/59199: [4.9 Regression] r205032 caused LTO bootstrap to fail with bootstrap-profile

2013-11-28 Thread H.J. Lu
There is a bad interaction between inlined C++ member functions
and LTO + profiledbootstrap, which leads to

LTO bootstrap to fail with bootstrap-profile:

Existing SSA name for symbol marked for renaming: aloop_37
In member function \u2018__base_ctor \u2019:
lto1: internal compiler error: SSA corruption
0xcd84eb update_ssa(unsigned int)
/export/project/git/gcc-regression/gcc/gcc/tree-into-ssa.c:3246
0xa5814c input_function
/export/project/git/gcc-regression/gcc/gcc/lto-streamer-in.c:1006
0xa5814c lto_read_body
/export/project/git/gcc-regression/gcc/gcc/lto-streamer-in.c:1070
0xa5814c lto_input_function_body(lto_file_decl_data*, cgraph_node*, char
const*)
/export/project/git/gcc-regression/gcc/gcc/lto-streamer-in.c:1112
0x66d2bc cgraph_get_body(cgraph_node*)
/export/project/git/gcc-regression/gcc/gcc/cgraph.c:2981
0x99aa58 ipa_merge_profiles(cgraph_node*, cgraph_node*)
/export/project/git/gcc-regression/gcc/gcc/ipa-utils.c:699
0x595a86 lto_cgraph_replace_node
/export/project/git/gcc-regression/gcc/gcc/lto/lto-symtab.c:82
0x596159 lto_symtab_merge_symbols_1
/export/project/git/gcc-regression/gcc/gcc/lto/lto-symtab.c:561
0x596159 lto_symtab_merge_symbols()
/export/project/git/gcc-regression/gcc/gcc/lto/lto-symtab.c:589
0x5850dd read_cgraph_and_symbols
/export/project/git/gcc-regression/gcc/gcc/lto/lto.c:2946
0x5850dd lto_main()
/export/project/git/gcc-regression/gcc/gcc/lto/lto.c:3255
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

There are only 2 files which don't inline all loop_iterator
member function and may be miscompiled:

File: ipa-inline-analysis.o

Symbol table '.symtab' contains 454 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
...
   262:  0 NOTYPE  LOCAL  DEFAULT5
loop_iterator::loop_iterator(loop**, unsigned int)
...
   352: 89 FUNCWEAK   DEFAULT   27
loop_iterator::next()
   353:    748 FUNCWEAK   DEFAULT   30
loop_iterator::loop_iterator(loop**, unsigned int)
   354:    748 FUNCWEAK   DEFAULT   30
loop_iterator::loop_iterator(loop**, unsigned int)
...

File: tree-cfg.o

Symbol table '.symtab' contains 783 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
...
   385:  0 NOTYPE  LOCAL  DEFAULT5
loop_iterator::loop_iterator(loop**, unsigned int)
...
   536:    748 FUNCWEAK   DEFAULT   34
loop_iterator::loop_iterator(loop**, unsigned int)
...
   538:    748 FUNCWEAK   DEFAULT   34
loop_iterator::loop_iterator(loop**, unsigned int)
...

When either loop_iterator::next or loop_iterator::loop_iterator
inlined, bootstrap fails with the similar error.  This patch
works around the problem by not inlining those 2 functions.
On Nehalem machine using "make -j8", without the patch, I got

17836.13user 638.12system 55:49.72elapsed

for bootstrap and

32362.67user 4313.11system 1:29:59elapsed

for running testsuite.  With the patch, I got

7900.41user 640.39system 55:03.14elapsed

for bootstrap and

31891.96user 4251.23system 1:31:41elapse

for running testsuite.  There is very little performance
difference and the binaries are also a little bit smaller:

16787252  34920 1098648 179208201117334 
build-x86_64-linux/gcc/cc1
16809748  34920 1098648 17943316111cb14 
build-x86_64-linux.old/gcc/cc1
19188340  35008 1126552 2034990013683cc 
build-x86_64-linux/gcc/cc1objplus
18865150  35008 1121848 2002200613182f6 
build-x86_64-linux/gcc/cc1plus
19210836  35008 1126552 20372396136dbac 
build-x86_64-linux.old/gcc/cc1objplus
18887646  35008 1121848 20044502131dad6 
build-x86_64-linux.old/gcc/cc1plus
17274027  44056 1104024 18422107119195b 
build-x86_64-linux/gcc/f951
17296523  44056 1104024 18444603119713b 
build-x86_64-linux.old/gcc/f951
17354837  51424 1105752 1851201311a788d 
build-x86_64-linux/gcc/go1
17377333  51424 1105752 1853450911ad06d 
build-x86_64-linux.old/gcc/go1
20815529  43928 6289304 2714876119e41d9 
build-x86_64-linux/gcc/gnat1
20838025  43928 6289304 2717125719e99b9 
build-x86_64-linux.old/gcc/gnat1
15944305  35688 1095064 170750571048b71 
build-x86_64-linux/gcc/jc1
15966801  35688 1095064 17097553104e351 
build-x86_64-linux.old/gcc/jc1

Should this patch be applied to restore LTO bootstrap with
bootstrap-profile?

Thanks.

H.J.
---
2013-11-28   H.J. Lu  

PR bootstrap/59199
* cfgloop.h (loop_iterator::next, loop_iterator::loop_iterator):
Moved to ...
* cfgloop.c (loop_iterator::next, loop_iterator::loop_iterator):
Here.

[wide-int] Handle more ltu_p cases inline

2013-11-28 Thread Richard Sandiford
The existing ltu_p fast path can handle any pairs of single-HWI inputs,
even for precision > HOST_BITS_PER_WIDE_INT.  In that case both xl and
yl are implicitly sign-extended to the larger precision, but with the
extended values still being compared as unsigned.  The extension doesn't
change the result in that case.

When compiling a recent fold-const.ii, this reduces the number of
ltu_p_large calls from 23849 to 697.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/alias.c
===
--- gcc/alias.c 2013-11-20 12:12:49.393055063 +
+++ gcc/alias.c 2013-11-28 12:24:23.307549245 +
@@ -342,7 +342,7 @@ ao_ref_from_mem (ao_ref *ref, const_rtx
  || (DECL_P (ref->base)
  && (DECL_SIZE (ref->base) == NULL_TREE
  || TREE_CODE (DECL_SIZE (ref->base)) != INTEGER_CST
- || wi::ltu_p (DECL_SIZE (ref->base),
+ || wi::ltu_p (wi::to_offset (DECL_SIZE (ref->base)),
ref->offset + ref->size)
 return false;
 
Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2013-11-28 11:44:39.041731636 +
+++ gcc/wide-int.h  2013-11-28 12:48:36.200764215 +
@@ -1740,13 +1740,15 @@ wi::ltu_p (const T1 &x, const T2 &y)
   unsigned int precision = get_binary_precision (x, y);
   WIDE_INT_REF_FOR (T1) xi (x, precision);
   WIDE_INT_REF_FOR (T2) yi (y, precision);
-  /* Optimize comparisons with constants and with sub-HWI unsigned
- integers.  */
+  /* Optimize comparisons with constants.  */
   if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0))
 return xi.len == 1 && xi.to_uhwi () < (unsigned HOST_WIDE_INT) yi.val[0];
   if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0))
 return yi.len != 1 || yi.to_uhwi () > (unsigned HOST_WIDE_INT) xi.val[0];
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  /* Optimize the case of two HWIs.  The HWIs are implicitly sign-extended
+ for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both
+ values does not change the result.  */
+  if (xi.len + yi.len == 2)
 {
   unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
   unsigned HOST_WIDE_INT yl = yi.to_uhwi ();



[wide-int] Handle more cmps and cmpu cases inline

2013-11-28 Thread Richard Sandiford
As Richi asked, this patch makes cmps use the same shortcuts as lts_p.
It also makes cmpu use the shortcut that I justed added to ltu_p.

On that same fold-const.ii testcase, this reduces the number of cmps_large
calls from 66924 to 916.  It reduces the number of cmpu_large calls from
3462 to 4.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2001-01-01 00:00:00.0 +
+++ gcc/wide-int.h  2013-11-28 16:08:22.527681077 +
@@ -1858,17 +1858,31 @@ wi::cmps (const T1 &x, const T2 &y)
   unsigned int precision = get_binary_precision (x, y);
   WIDE_INT_REF_FOR (T1) xi (x, precision);
   WIDE_INT_REF_FOR (T2) yi (y, precision);
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  if (wi::fits_shwi_p (yi))
 {
-  HOST_WIDE_INT xl = xi.to_shwi ();
-  HOST_WIDE_INT yl = yi.to_shwi ();
-  if (xl < yl)
+  /* Special case for comparisons with 0.  */
+  if (STATIC_CONSTANT_P (yi.val[0] == 0))
+   return neg_p (xi) ? -1 : !(xi.len == 1 && xi.val[0] == 0);
+  /* If x fits into a signed HWI, we can compare directly.  */
+  if (wi::fits_shwi_p (xi))
+   {
+ HOST_WIDE_INT xl = xi.to_shwi ();
+ HOST_WIDE_INT yl = yi.to_shwi ();
+ return xl < yl ? -1 : xl > yl;
+   }
+  /* If x doesn't fit and is negative, then it must be more
+negative than any signed HWI, and hence smaller than y.  */
+  if (neg_p (xi))
return -1;
-  else if (xl > yl)
-   return 1;
-  else
-   return 0;
+  /* If x is positive, then it must be larger than any signed HWI,
+and hence greater than y.  */
+  return 1;
 }
+  /* Optimize the opposite case, if it can be detected at compile time.  */
+  if (STATIC_CONSTANT_P (xi.len == 1))
+/* If YI is negative it is lower than the least HWI.
+   If YI is positive it is greater than the greatest HWI.  */
+return neg_p (yi) ? 1 : -1;
   return cmps_large (xi.val, xi.len, precision, yi.val, yi.len);
 }
 
@@ -1881,16 +1895,35 @@ wi::cmpu (const T1 &x, const T2 &y)
   unsigned int precision = get_binary_precision (x, y);
   WIDE_INT_REF_FOR (T1) xi (x, precision);
   WIDE_INT_REF_FOR (T2) yi (y, precision);
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  /* Optimize comparisons with constants.  */
+  if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0))
 {
+  /* If XI doesn't fit in a HWI then it must be larger than YI.  */
+  if (xi.len != 1)
+   return 1;
+  /* Otherwise compare directly.  */
   unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
-  unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
-  if (xl < yl)
+  unsigned HOST_WIDE_INT yl = yi.val[0];
+  return xl < yl ? -1 : xl > yl;
+}
+  if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0))
+{
+  /* If YI doesn't fit in a HWI then it must be larger than XI.  */
+  if (yi.len != 1)
return -1;
-  else if (xl == yl)
-   return 0;
-  else
-   return 1;
+  /* Otherwise compare directly.  */
+  unsigned HOST_WIDE_INT xl = xi.val[0];
+  unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
+  return xl < yl ? -1 : xl > yl;
+}
+  /* Optimize the case of two HWIs.  The HWIs are implicitly sign-extended
+ for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both
+ values does not change the result.  */
+  if (xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
+  unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
+  return xl < yl ? -1 : xl > yl;
 }
   return cmpu_large (xi.val, xi.len, precision, yi.val, yi.len);
 }



[wide-int] Handle more add and sub cases inline

2013-11-28 Thread Richard Sandiford
Currently add and sub have no fast path for offset_int and widest_int,
they just call the out-of-line version.  This patch handles the
single-HWI cases inline.  At least on x86_64, this only adds one branch
per call; the fast path itself is straight-line code.

On the same fold-const.ii testcase, this reduces the number of
add_large calls from 877507 to 42459.  It reduces the number of
sub_large calls from 25707 to 148.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2013-11-28 13:34:19.596839877 +
+++ gcc/wide-int.h  2013-11-28 16:08:11.387731775 +
@@ -2234,6 +2234,17 @@ wi::add (const T1 &x, const T2 &y)
   val[0] = xi.ulow () + yi.ulow ();
   result.set_len (1);
 }
+  else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT)
+  && xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.ulow ();
+  unsigned HOST_WIDE_INT yl = yi.ulow ();
+  unsigned HOST_WIDE_INT resultl = xl + yl;
+  val[0] = resultl;
+  val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1;
+  result.set_len (1 + (((resultl ^ xl) & (resultl ^ yl))
+  >> (HOST_BITS_PER_WIDE_INT - 1)));
+}
   else
 result.set_len (add_large (val, xi.val, xi.len,
   yi.val, yi.len, precision,
@@ -2288,6 +2299,17 @@ wi::sub (const T1 &x, const T2 &y)
   val[0] = xi.ulow () - yi.ulow ();
   result.set_len (1);
 }
+  else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT)
+  && xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.ulow ();
+  unsigned HOST_WIDE_INT yl = yi.ulow ();
+  unsigned HOST_WIDE_INT resultl = xl - yl;
+  val[0] = resultl;
+  val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1;
+  result.set_len (1 + (((resultl ^ xl) & (xl ^ yl))
+  >> (HOST_BITS_PER_WIDE_INT - 1)));
+}
   else
 result.set_len (sub_large (val, xi.val, xi.len,
   yi.val, yi.len, precision,



Re: [PATCH] Convert more passes to new dump framework

2013-11-28 Thread Martin Jambor
Hi,

On Tue, Aug 06, 2013 at 10:18:05AM -0700, Sharad Singhai wrote:
> On Tue, Aug 6, 2013 at 10:10 AM, Martin Jambor  wrote:
> > On Tue, Aug 06, 2013 at 09:22:02AM -0700, Sharad Singhai wrote:
> >> On Tue, Aug 6, 2013 at 8:57 AM, Xinliang David Li  
> >> wrote:
> >> > On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor  wrote:
> >> >> Hi,
> >> >>
> >> >> On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
> >> >>> This patch ports messages to the new dump framework,
> >> >>
> >> >> It would be great this new framework was documented somewhere.  I lost
> >> >> track of what was agreed it would be and from the uses in the
> >> >> vectorizer I was never quite sure how to utilize it in other passes.
> >> >
> >> > Sharad, can you put the documentation in GCC wiki.
> >>
> >> Sure. I had user documentation in form of gcc info. But I will add
> >> more developer details to a GCC wiki.
> >>
> >
> > I have built trunk gccint.info yesterday but could not find any string
> > dump_enabled_p there, for example.  And when I quickly searched just
> > for the string "dump," I did not find any thing that looked like
> > dumping infrastructure either.  OTOH, I agree that fie would be the
> > best place for the documentation.
> >
> > Or did I just miss it?  What section is it in then?
> 
> Actually, the user-facing documentation is in doc/invoke.texi.
> However, that doesn't describe dump_enabled_p. Do you think
> gccint.info would be a good place? I can add documentation there
> instead of creating a GCC wiki.
> 

please do not forget about this, otherwise few people will use your
framework.

Thanks,

Martin


Re: [wide-int] Handle more ltu_p cases inline

2013-11-28 Thread Richard Earnshaw
On 28/11/13 17:29, Richard Sandiford wrote:
> The existing ltu_p fast path can handle any pairs of single-HWI inputs,
> even for precision > HOST_BITS_PER_WIDE_INT.  In that case both xl and
> yl are implicitly sign-extended to the larger precision, but with the
> extended values still being compared as unsigned.  The extension doesn't
> change the result in that case.
> 
> When compiling a recent fold-const.ii, this reduces the number of
> ltu_p_large calls from 23849 to 697.
> 

Are these sorts of nuggets of information going to be recorded anywhere?

R.




Re: wide-int, vax

2013-11-28 Thread Jan-Benedict Glaw
On Sat, 2013-11-23 11:23:08 -0800, Mike Stump  wrote:
>   * config/vax/vax.c: Include wide-int.h.
>   (vax_float_literal): Use real_from_integer.

Looks good to me, but Matt must tell for sure.

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of: "really soon now":  an unspecified period of time, 
likly to
the second  : be greater than any reasonable 
definition
  of "soon".


signature.asc
Description: Digital signature


*ping* Re: PR37132 – RFC patch for generation of DWARF symbol for Fortran's namelists (DW_TAG_namelist)

2013-11-28 Thread Tobias Burnus

A slightly early *ping*

Tobias Burnus wrote:

attached is an updated version of the patch.

Change:

Tobias Burnus wrote:
But for "USE mod_name, only: nml", one is supposed to generate a 
DW_TAG_imported_declaration. And there I am stuck. For normal 
variables, the DW_TAG_imported_declaration refers to a 
DW_TAG_variable die. Analogously, for a namelist one would have to 
refer to a DW_TAG_namelist die. But such DW_TAG_namelist comes with a 
DW_TAG_namelist_item list. And for the latter, one needs to have the 
die of all variables in the namelist. But with use-only the symbols 
aren't use associate and no decl or die exists. (Failing call tree 
with the patch: gfc_trans_use_stmts -> 
dwarf2out_imported_module_or_decl_1 -> force_decl_die.)


With the attached patch, one now generates DW_TAG_namelist with no 
DW_TAG_namelist_item and sets DW_AT_declaration.


Thus, for (first file)

  module mm
integer :: ii
real :: rr
namelist /nml/ ii, rr
  end module mm


and (second file):

  subroutine test
use mm, only: nml
write(*,nml)
  end subroutine test


One now generates (first file):

 <1><1e>: Abbrev Number: 2 (DW_TAG_module)
<1f>   DW_AT_name: mm
<22>   DW_AT_decl_file   : 1
<23>   DW_AT_decl_line   : 1
<24>   DW_AT_sibling : <0x59>
 <2><28>: Abbrev Number: 3 (DW_TAG_variable)
<29>   DW_AT_name: ii
<2c>   DW_AT_decl_file   : 1
<2d>   DW_AT_decl_line   : 2
<2e>   DW_AT_linkage_name: (indirect string, offset: 0x15): 
__mm_MOD_ii

<32>   DW_AT_type: <0x59>
<36>   DW_AT_external: 1
<36>   DW_AT_location: 9 byte block: 3 0 0 0 0 0 0 0 0  
(DW_OP_addr: 0)

 <2><40>: Abbrev Number: 3 (DW_TAG_variable)
<41>   DW_AT_name: rr
<44>   DW_AT_decl_file   : 1
<45>   DW_AT_decl_line   : 2
<46>   DW_AT_linkage_name: (indirect string, offset: 0x9): 
__mm_MOD_rr

<4a>   DW_AT_type: <0x60>
<4e>   DW_AT_external: 1
<4e>   DW_AT_location: 9 byte block: 3 4 0 0 0 0 0 0 0  
(DW_OP_addr: 4)

 <2><58>: Abbrev Number: 0
 <1><59>: Abbrev Number: 4 (DW_TAG_base_type)
<5a>   DW_AT_byte_size   : 4
<5b>   DW_AT_encoding: 5(signed)
<5c>   DW_AT_name: (indirect string, offset: 0x29): 
integer(kind=4)

 <1><60>: Abbrev Number: 4 (DW_TAG_base_type)
<61>   DW_AT_byte_size   : 4
<62>   DW_AT_encoding: 4(float)
<63>   DW_AT_name: (indirect string, offset: 0x12c): 
real(kind=4)

 <1><67>: Abbrev Number: 5 (DW_TAG_namelist)
<68>   DW_AT_name: nml
 <2><6c>: Abbrev Number: 6 (DW_TAG_namelist_item)
<6d>   DW_AT_namelist_items: <0x28>
 <2><71>: Abbrev Number: 6 (DW_TAG_namelist_item)
<72>   DW_AT_namelist_items: <0x40>

Second file:

  <2><4f>: Abbrev Number: 3 (DW_TAG_imported_declaration)
<50>   DW_AT_decl_file   : 1
<51>   DW_AT_decl_line   : 2
<52>   DW_AT_import  : <0x70>   [Abbrev Number: 6 
(DW_TAG_namelist)]

 <2><56>: Abbrev Number: 4 (DW_TAG_lexical_block)
<57>   DW_AT_low_pc  : 0xb
<5f>   DW_AT_high_pc : 0xb0
 <2><67>: Abbrev Number: 0
 <1><68>: Abbrev Number: 5 (DW_TAG_module)
<69>   DW_AT_name: mm
<6c>   DW_AT_declaration : 1
<6c>   DW_AT_sibling : <0x76>
 <2><70>: Abbrev Number: 6 (DW_TAG_namelist)
<71>   DW_AT_name: nml
<75>   DW_AT_declaration : 1
 <2><75>: Abbrev Number: 0


Does the dumps look okay? For the first file, DW_TAG_namelist doesn't 
come directly after DW_TAG_module but after its sibling 0x59; does one 
still see that "nml" belongs to that module? (On dwarf2out level, 
context die should point to the module tag, but I don't understand the 
readelf/eu-readelf output well enough to see whether that's also the 
case for the generated dwarf.)


I assume that the compiler can see from the DWARF of the second file 
that "nml" comes from module "mm" and doesn't search the value 
elsewhere. (It is possible to have multiple namelist with the same 
name in different modules.)



For previous version, I did an all-language bootstrap + regtesting; 
for this one, I only build and tested Fortran. I will do a now a full 
all language bootstrap regtesting. Assuming that it is successful:

OK for the trunk?

Tobias




Re: [PATCH] Fix PR59330

2013-11-28 Thread H.J. Lu
On Thu, Nov 28, 2013 at 6:49 AM, Richard Biener  wrote:
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>
> Richard.
>
> 2013-11-28  Richard Biener  
>
> PR tree-optimization/59330
> * tree-ssa-dce.c (eliminate_unnecessary_stmts): Simplify
> and fix delayed marking of free calls not necessary.
>
> * gcc.dg/torture/pr59330.c: New testcase.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59334

-- 
H.J.


Re: [PATCH] Fix PR59330

2013-11-28 Thread Jakub Jelinek
On Thu, Nov 28, 2013 at 11:14:45AM -0800, H.J. Lu wrote:
> On Thu, Nov 28, 2013 at 6:49 AM, Richard Biener  wrote:
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >
> > Richard.
> >
> > 2013-11-28  Richard Biener  
> >
> > PR tree-optimization/59330
> > * tree-ssa-dce.c (eliminate_unnecessary_stmts): Simplify
> > and fix delayed marking of free calls not necessary.
> >
> > * gcc.dg/torture/pr59330.c: New testcase.
> >
> 
> This caused:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59334

It even breaks bootstrap on i686-linux --enable-checking=yes,rtl ,
because insn-recog.c compilation during stage3 eats more memory than can fit
into 32-bit address space.

Jakub


Re: [patch] Fix PR middle-end/59138

2013-11-28 Thread Bernd Edlinger
Hi Eric,

I think I see a small flaw in that patch:
+   /* Make sure not to write past the end of the struct.  */
+   store_bit_field (dest,
+  adj_bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
+  bytepos * BITS_PER_UNIT, ssize * BITS_PER_UNIT,
+  VOIDmode, tmps[i]);
the parameter BITREGION_END is wrong.
it should be:
 ssize * BITS_PER_UNIT - 1

Bernd.

patch for elimination to SP when it is changed in RTL (PR57293)

2013-11-28 Thread Vladimir Makarov

  The following patch fixes PR57293

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293

  It is actually an implementation of missed LRA functionality in reg
elimination.  Before the patch any explicit change of stack pointer in
RTL resulted in necessity to use the frame pointer.

  The patch has practically no effect on generic tuning of x86/x86-64.
But it has a dramatic effect on code performance for other tunings
like corei7 which don't use incoming args accumulation.  The maximum
SPEC2000 improvement 2.5% is achieved on x86 SPECInt2000.  But
SPECFP2000 rate also has improvement about 1% on x86 and x86-64.  Too
bad that I did not implement it at the first place.  The results would
have been even much better ones reported on 2012 GNU Cauldron as I
also used -mtune=corei7 that time.

The patch was bootstrapped and tested on x86-64/x86 and ppc.

Committed as rev. 205498.

 2013-11-28  Vladimir Makarov

PR target/57293
* ira.h (ira_setup_eliminable_regset): Remove parameter.
* ira.c (ira_setup_eliminable_regset): Ditto.  Add
SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
Don't call lra_init_elimination.
(ira): Call ira_setup_eliminable_regset without arguments.
* loop-invariant.c (calculate_loop_reg_pressure): Remove argument
from ira_setup_eliminable_regset call.
* gcse.c (calculate_bb_reg_pressure): Ditto.
* haifa-sched.c (sched_init): Ditto.
* lra.h (lra_init_elimination): Remove the prototype.
* lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
used_insn_alternative upper.
(lra_eliminate_regs_1): Add one more parameter.
(lra-eliminate): Ditto.
* lra.c (lra_invalidate_insn_data): Set sp_offset.
(setup_sp_offset): New.
(lra_process_new_insns): Call setup_sp_offset.
(lra): Add argument to lra_eliminate calls.
* lra-constraints.c (get_equiv_substitution): Rename to get_equiv.
(get_equiv_with_elimination): New.
(process_addr_reg): Call get_equiv_with_elimination instead of
get_equiv_substitution.
(equiv_address_substitution): Ditto.
(loc_equivalence_change_p): Ditto.
(loc_equivalence_callback, lra_constraints): Ditto.
(curr_insn_transform): Ditto.  Print the sp offset
(process_alt_operands): Prevent stack pointer reloads.
(lra_constraints): Remove one argument from lra_eliminate call.
Move it up.  Mark used hard regs bfore it.  Use
get_equiv_with_elimination instead of get_equiv_substitution.
* lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
assert for param values combination.  Use sp offset.  Add argument
to lra_eliminate_regs_1 calls.
(lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
(curr_sp_change): New static var.
(mark_not_eliminable): Add parameter.  Update curr_sp_change.
Don't prevent elimination to sp if we can calculate its change.
Pass the argument to mark_not_eliminable calls.
(eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
argument to lra_eliminate_regs_1 call.
(update_reg_eliminate): Move calculation of hard regs for spill
lower.  Switch off lra_in_progress temporarily to generate regs
involved into elimination.
(lra_init_elimination): Rename to init_elimination.  Make it
static.  Set up insn sp offset, check the offsets at the end of
BBs.
(process_insn_for_elimination): Add parameter.  Pass its value to
eliminate_regs_in_insn.
(lra_eliminate): : Add parameter.  Pass its value to
process_insn_for_elimination.  Add assert for param values
combination.  Call init_elimination.  Don't update offsets in
equivalence substitutions.
* lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1
for created stack slot.
(remove_pseudos): Call lra_eliminate_regs_1 before changing memory
onto stack slot.

2013-11-28  Vladimir Makarov

PR target/57293
* gcc.target/i386/pr57293.c: New.

Index: gcse.c
===
--- gcse.c  (revision 205233)
+++ gcse.c  (working copy)
@@ -3509,7 +3509,7 @@ calculate_bb_reg_pressure (void)
   bitmap_iterator bi;
 
 
-  ira_setup_eliminable_regset (false);
+  ira_setup_eliminable_regset ();
   curr_regs_live = BITMAP_ALLOC (®_obstack);
   FOR_EACH_BB (bb)
 {
Index: haifa-sched.c
===
--- haifa-sched.c   (revision 205233)
+++ haifa-sched.c   (working copy)
@@ -6624,7 +6624,7 @@ sched_init (void)
 sched_pressure = SCHED_PRESSURE_NONE;
 
   if (sched_pressure != SCHED_PRESSURE_NONE)
-ira_setup_eliminable_regset (false);
+ira_setup_eliminable_regset ();
 
   /* Initialize SPEC_INFO.  */
   if (target

Re: [wide-int] Handle more add and sub cases inline

2013-11-28 Thread Kenneth Zadeck
I would like to see some comment to the effect that this to allow 
inlining for the common case for widest int and offset int without 
inlining the uncommon case for regular wide-int.





On 11/28/2013 12:38 PM, Richard Sandiford wrote:

Currently add and sub have no fast path for offset_int and widest_int,
they just call the out-of-line version.  This patch handles the
single-HWI cases inline.  At least on x86_64, this only adds one branch
per call; the fast path itself is straight-line code.

On the same fold-const.ii testcase, this reduces the number of
add_large calls from 877507 to 42459.  It reduces the number of
sub_large calls from 25707 to 148.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2013-11-28 13:34:19.596839877 +
+++ gcc/wide-int.h  2013-11-28 16:08:11.387731775 +
@@ -2234,6 +2234,17 @@ wi::add (const T1 &x, const T2 &y)
val[0] = xi.ulow () + yi.ulow ();
result.set_len (1);
  }
+  else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT)
+  && xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.ulow ();
+  unsigned HOST_WIDE_INT yl = yi.ulow ();
+  unsigned HOST_WIDE_INT resultl = xl + yl;
+  val[0] = resultl;
+  val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1;
+  result.set_len (1 + (((resultl ^ xl) & (resultl ^ yl))
+  >> (HOST_BITS_PER_WIDE_INT - 1)));
+}
else
  result.set_len (add_large (val, xi.val, xi.len,
   yi.val, yi.len, precision,
@@ -2288,6 +2299,17 @@ wi::sub (const T1 &x, const T2 &y)
val[0] = xi.ulow () - yi.ulow ();
result.set_len (1);
  }
+  else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT)
+  && xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.ulow ();
+  unsigned HOST_WIDE_INT yl = yi.ulow ();
+  unsigned HOST_WIDE_INT resultl = xl - yl;
+  val[0] = resultl;
+  val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1;
+  result.set_len (1 + (((resultl ^ xl) & (xl ^ yl))
+  >> (HOST_BITS_PER_WIDE_INT - 1)));
+}
else
  result.set_len (sub_large (val, xi.val, xi.len,
   yi.val, yi.len, precision,





Fwd: Re: [PATCH] Postpone __LINE__ evaluation to the end of #line directives

2013-11-28 Thread Max Woodbury




 Original Message 
Subject: Re: [PATCH] Postpone __LINE__ evaluation to the end of #line 
directives

Date: Thu, 28 Nov 2013 17:32:41 -0500
From: Max Woodbury 
To: Joseph S. Myers 

On 11/28/2013 11:34 AM, Joseph S. Myers wrote:

On Wed, 27 Nov 2013, Max Woodbury wrote:


There should be a way to change the __FILE__ value without changing the
line number sequencing.  Whatever that mechanism is, it should NOT
introduce maintenance problems that involve counting lines of code.


I think that #line is mainly intended for use by code generators that
generate C code, rather than directly by people writing C programs.  Such
a code generator can easily manage counting lines of code.


A little Googeling quickly turns up examples that make it clear that:

 #line __LINE__ "new__FILE__value"

is that expected mechanism,


You'll find any number of examples online based on misconceptions about
the C languages, possibly together with what one particular implementation
does.  Any recommendation to do things based on an area where the editor
of the standard has said the ambiguity in the standard is deliberate is
clearly a bad recommendation.  Recommendations on use of C should be based
on areas where the standard is clear and implementations agree.


Please try not to be deliberately obstructive.  While #line is indeed
used extensively by code generators to map generated code back to the
source code used by the generator, other uses are possible, and the
expectations associated with those uses are worthy of serious
consideration.  '#line __LINE__' is indeed a common idiom and it is
expected to leave the line numbering sequence unchanged.

As for the sequence of comments you point to, they are discussing the
use of __LINE__ in macros, not directives.  The standard is quite a bit
more explicit about token substitution in directives, making it fairly
clear that substitution is not to occur in directives until
specifically called for.  The elaboration of three distinct forms for
the '#line' directive with substitution only being called for in the
third and last form, indicates that something special is intended.

The standard was not created in a vacuum.  The ideas did not
materialize out of thin air.  The elaborate specification was intended
to codify actual usage.  That usage included the '#line __LINE__' idiom
with its intent to NOT break line sequencing.


In other words, if you processed the text in multiple phases the way
the standard requires, you would not substitute the value for the
__LINE__ token until after the end of the directive has been seen.
Thus the problem only arises because this implementation folds the
translation phases into a single pass over the text and takes an
improper short-cut as it does so.  The standard explicitly warns
against this kind of mistake.


The standard itself mixes up the phases.  Recall that the definition of
line number is "one greater than the number of new-line characters read or
introduced in translation phase 1 (5.1.1.2) while processing the source
file to the current token" (where "current token" is never defined).  If
the phases were completely separate, by your reasoning every newline has
been processed in phase 1 before any of phases 2, 3 or 4 do anything, and
so all line numbers relate to the end of the file.  There is absolutely
nothing to say that the newline at the end of the #line directive has been
read "while processing the source file to the current token" (if __LINE__
in the #line directive is the current token) but that the newline after it
hasn't been read; if anything, the phases imply that all newlines have
been read.


The standard also includes a mechanism for encoding s seen
in tokens, so that argument falls apart fairly easily.


This case is just as ambiguous as the case of a multi-line macro call,
where __LINE__ gets expanded somewhere in the macro arguments, and the
line number can be that of the macro name, or of the closing parenthesis
of the call, or somewhere in between, and the standard does not make a
conformance distinction between those choices.

So, I don't think we should make complicated changes to implement one
particular choice in an area of deliberate ambiguity without direction
from WG14 to eliminate the ambiguity in the standard.  Instead, we can let
the choices be whatever is most natural in the implementation.  If you
believe the standard is defective in not defining certain things, I advise
filing a DR (or, when next open for revisions, proposing a paper at a
meeting to change the definition as you think appropriate).


As pointed out above, this case is distinct from the macro CALL case.
The rules are much more explicitly spelled out for directives and is
only ambiguous if you start with the preconceived notion that it is.
The standard is explicit enough as it stands.

Further, the changes are not all that complicated.  One check in the
__LINE__ macro expansion code.  A flag set and reset, and two special
ca

Re: Fwd: Re: [PATCH] Postpone __LINE__ evaluation to the end of #line directives

2013-11-28 Thread Joseph S. Myers
On Thu, 28 Nov 2013, Max Woodbury wrote:

> As for the sequence of comments you point to, they are discussing the
> use of __LINE__ in macros, not directives.  The standard is quite a bit
> more explicit about token substitution in directives, making it fairly
> clear that substitution is not to occur in directives until
> specifically called for.  The elaboration of three distinct forms for
> the '#line' directive with substitution only being called for in the
> third and last form, indicates that something special is intended.

I think the natural reading is that the current token is __LINE__ on the 
#line line, because that's what's being macro-expanded, and that the 
relevant number of newlines is those strictly before the #line line, so 
this directive is expected to make the next line's number that of the 
current line (i.e. one less than it would otherwise have been).  I think 
interpreting it otherwise is what strains the language of the standard.  
I think it's completely irrelevant what later parts of the source file are 
involved in identifying the form of the directive - the relevant thing is 
what is being expanded rather than anything later that was involved in 
causing it to be expanded.  So, I think this patch is a bad idea, absent 
direction otherwise from WG14, and you should raise a DR with WG14 if you 
disagree.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] SIMD clones LTO fixes part 1 (PR lto/59326)

2013-11-28 Thread Jakub Jelinek
Hi!

Here is the first part of LTO fixes for #pragma omp declare simd,
in partuclar support for streaming OMP_CLAUSEs.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-11-28  Jakub Jelinek  

PR lto/59326
* tree-core.h (enum omp_clause_schedule_kind): Add
OMP_CLAUSE_SCHEDULE_LAST.
(enum omp_clause_default_kind): Add OMP_CLAUSE_DEFAULT_LAST.
(enum omp_clause_depend_kind): Add OMP_CLAUSE_DEPEND_LAST.
(enum omp_clause_map_kind): Add OMP_CLAUSE_MAP_LAST.
(enum omp_clause_proc_bind_kind): Add OMP_CLAUSE_PROC_BIND_LAST.
* lto-streamer-out.c (lto_is_streamable): Allow streaming
OMP_CLAUSE.
(DFS_write_tree_body): Handle OMP_CLAUSE.
* tree-streamer-out.c (pack_ts_omp_clause_value_fields): New
function.
(streamer_pack_tree_bitfields): Call it for OMP_CLAUSE.
(write_ts_omp_clause_tree_pointers): New function.
(streamer_write_tree_body): Call it for OMP_CLAUSE.
(streamer_write_tree_header): For OMP_CLAUSE stream OMP_CLAUSE_CODE.
* tree-streamer-in.c (unpack_ts_omp_clause_value_fields): New
function.
(unpack_value_fields): Call it for OMP_CLAUSE.
(streamer_alloc_tree): Handle OMP_CLAUSE.
(lto_input_ts_omp_clause_tree_pointers): New function.
(streamer_read_tree_body): Call it for OMP_CLAUSE.
lto/
* lto.c (mentions_vars_p_omp_clause): New function.
(mentions_vars_p): Call it for OMP_CLAUSE.  Remove break;
after return stmts.

--- gcc/tree-core.h.jj  2013-11-27 12:15:14.0 +0100
+++ gcc/tree-core.h 2013-11-28 10:55:46.691490627 +0100
@@ -350,7 +350,8 @@ enum omp_clause_schedule_kind {
   OMP_CLAUSE_SCHEDULE_DYNAMIC,
   OMP_CLAUSE_SCHEDULE_GUIDED,
   OMP_CLAUSE_SCHEDULE_AUTO,
-  OMP_CLAUSE_SCHEDULE_RUNTIME
+  OMP_CLAUSE_SCHEDULE_RUNTIME,
+  OMP_CLAUSE_SCHEDULE_LAST
 };
 
 enum omp_clause_default_kind {
@@ -358,7 +359,8 @@ enum omp_clause_default_kind {
   OMP_CLAUSE_DEFAULT_SHARED,
   OMP_CLAUSE_DEFAULT_NONE,
   OMP_CLAUSE_DEFAULT_PRIVATE,
-  OMP_CLAUSE_DEFAULT_FIRSTPRIVATE
+  OMP_CLAUSE_DEFAULT_FIRSTPRIVATE,
+  OMP_CLAUSE_DEFAULT_LAST
 };
 
 /* There is a TYPE_QUAL value for each type qualifier.  They can be
@@ -1107,7 +1109,8 @@ enum omp_clause_depend_kind
 {
   OMP_CLAUSE_DEPEND_IN,
   OMP_CLAUSE_DEPEND_OUT,
-  OMP_CLAUSE_DEPEND_INOUT
+  OMP_CLAUSE_DEPEND_INOUT,
+  OMP_CLAUSE_DEPEND_LAST
 };
 
 enum omp_clause_map_kind
@@ -1119,7 +1122,8 @@ enum omp_clause_map_kind
   /* The following kind is an internal only map kind, used for pointer based
  array sections.  OMP_CLAUSE_SIZE for these is not the pointer size,
  which is implicitly POINTER_SIZE / BITS_PER_UNIT, but the bias.  */
-  OMP_CLAUSE_MAP_POINTER
+  OMP_CLAUSE_MAP_POINTER,
+  OMP_CLAUSE_MAP_LAST
 };
 
 enum omp_clause_proc_bind_kind
@@ -1129,7 +1133,8 @@ enum omp_clause_proc_bind_kind
   OMP_CLAUSE_PROC_BIND_TRUE = 1,
   OMP_CLAUSE_PROC_BIND_MASTER = 2,
   OMP_CLAUSE_PROC_BIND_CLOSE = 3,
-  OMP_CLAUSE_PROC_BIND_SPREAD = 4
+  OMP_CLAUSE_PROC_BIND_SPREAD = 4,
+  OMP_CLAUSE_PROC_BIND_LAST
 };
 
 struct GTY(()) tree_exp {
--- gcc/lto-streamer-out.c.jj   2013-11-27 18:02:46.0 +0100
+++ gcc/lto-streamer-out.c  2013-11-28 11:51:03.085288356 +0100
@@ -297,7 +297,6 @@ lto_is_streamable (tree expr)
 && code != BIND_EXPR
 && code != WITH_CLEANUP_EXPR
 && code != STATEMENT_LIST
-&& code != OMP_CLAUSE
 && (code == CASE_LABEL_EXPR
 || code == DECL_EXPR
 || TREE_CODE_CLASS (code) != tcc_statement);
@@ -667,6 +666,14 @@ DFS_write_tree_body (struct output_block
}
 }
 
+  if (code == OMP_CLAUSE)
+{
+  int i;
+  for (i = 0; i < omp_clause_num_ops[OMP_CLAUSE_CODE (expr)]; i++)
+   DFS_follow_tree_edge (OMP_CLAUSE_OPERAND (expr, i));
+  DFS_follow_tree_edge (OMP_CLAUSE_CHAIN (expr));
+}
+
 #undef DFS_follow_tree_edge
 }
 
--- gcc/tree-streamer-out.c.jj  2013-11-22 21:03:16.0 +0100
+++ gcc/tree-streamer-out.c 2013-11-28 11:49:49.327672855 +0100
@@ -390,6 +390,46 @@ pack_ts_optimization (struct bitpack_d *
 }
 
 
+/* Pack all the non-pointer fields of the TS_OMP_CLAUSE structure
+   of expression EXPR into bitpack BP.  */
+
+static void
+pack_ts_omp_clause_value_fields (struct output_block *ob,
+struct bitpack_d *bp, tree expr)
+{
+  stream_output_location (ob, bp, OMP_CLAUSE_LOCATION (expr));
+  switch (OMP_CLAUSE_CODE (expr))
+{
+case OMP_CLAUSE_DEFAULT:
+  bp_pack_enum (bp, omp_clause_default_kind, OMP_CLAUSE_DEFAULT_LAST,
+   OMP_CLAUSE_DEFAULT_KIND (expr));
+  break;
+case OMP_CLAUSE_SCHEDULE:
+  bp_pack_enum (bp, omp_clause_schedule_kind, OMP_CLAUSE_SCHEDULE_LAST,
+   OMP_CLAUSE_SCHEDULE_KIND (expr));
+  break;
+case OMP_CLAUSE_DEPEND:
+  bp_pack_enum (bp, omp_clause_depend_kind, OMP_CLAUSE_DEPEND_LAST,
+

[PATCH] SIMD clones LTO fixes part 2 (PR lto/59326)

2013-11-28 Thread Jakub Jelinek
Hi!

And here is second part of the fixes.  Still, the vect-simd-clone-12.c
testcase fails with -flto -flto-partition=1to1, so there is further work to
do, but at least all current test succeed and actually use SIMD elementals
when they should.  Bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2013-11-28  Jakub Jelinek  
Richard Biener  

PR lto/59326
* omp-low.c (simd_clone_create): Return NULL if for definition
!cgraph_function_with_gimple_body_p (old_node).  Call cgraph_get_body
before calling cgraph_function_versioning.
(expand_simd_clones): Look for "omp declare simd" attribute first.
Don't check targetm.simd_clone.compute_vecsize_and_simdlen here.
Punt if node->global.inlined_to.
(pass_omp_simd_clone::gate): Also enable if flag_ltrans.  Disable
pass if targetm.simd_clone.compute_vecsize_and_simdlen is NULL.
* lto-streamer-out.c (hash_tree): Handle OMP_CLAUSE.
lto/
* lto.c (compare_tree_sccs_1): Handle OMP_CLAUSE.
testsuite/
* gcc.dg/vect/vect-simd-clone-12.c: New test.
* gcc.dg/vect/vect-simd-clone-12a.c: New test.
* gcc.dg/vect/vect-simd-clone-10a.c: Remove extern keywords.

--- gcc/omp-low.c.jj2013-11-27 12:15:13.0 +0100
+++ gcc/omp-low.c   2013-11-28 16:53:49.388242468 +0100
@@ -10912,8 +10912,13 @@ simd_clone_create (struct cgraph_node *o
 {
   struct cgraph_node *new_node;
   if (old_node->definition)
-new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false,
-  NULL, NULL, "simdclone");
+{
+  if (!cgraph_function_with_gimple_body_p (old_node))
+   return NULL;
+  cgraph_get_body (old_node);
+  new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL,
+false, NULL, NULL, "simdclone");
+}
   else
 {
   tree old_decl = old_node->decl;
@@ -11622,13 +11627,13 @@ simd_clone_adjust (struct cgraph_node *n
 static void
 expand_simd_clones (struct cgraph_node *node)
 {
-  if (lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
-return;
-
   tree attr = lookup_attribute ("omp declare simd",
DECL_ATTRIBUTES (node->decl));
-  if (!attr || targetm.simd_clone.compute_vecsize_and_simdlen == NULL)
+  if (attr == NULL_TREE
+  || node->global.inlined_to
+  || lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
 return;
+
   /* Ignore
  #pragma omp declare simd
  extern int foo ();
@@ -11764,8 +11769,10 @@ public:
   {}
 
   /* opt_pass methods: */
-  bool gate () { return flag_openmp || flag_openmp_simd
-   || flag_enable_cilkplus; }
+  bool gate () { return ((flag_openmp || flag_openmp_simd
+ || flag_enable_cilkplus || flag_ltrans)
+&& (targetm.simd_clone.compute_vecsize_and_simdlen
+!= NULL)); }
   unsigned int execute () { return ipa_omp_simd_clone (); }
 };
 
--- gcc/lto/lto.c.jj2013-11-28 16:02:36.0 +0100
+++ gcc/lto/lto.c   2013-11-28 16:27:04.164663085 +0100
@@ -1410,6 +1410,36 @@ compare_tree_sccs_1 (tree t1, tree t2, t
   TREE_STRING_LENGTH (t1)) != 0)
   return false;
 
+  if (code == OMP_CLAUSE)
+{
+  compare_values (OMP_CLAUSE_CODE);
+  switch (OMP_CLAUSE_CODE (t1))
+   {
+   case OMP_CLAUSE_DEFAULT:
+ compare_values (OMP_CLAUSE_DEFAULT_KIND);
+ break;
+   case OMP_CLAUSE_SCHEDULE:
+ compare_values (OMP_CLAUSE_SCHEDULE_KIND);
+ break;
+   case OMP_CLAUSE_DEPEND:
+ compare_values (OMP_CLAUSE_DEPEND_KIND);
+ break;
+   case OMP_CLAUSE_MAP:
+ compare_values (OMP_CLAUSE_MAP_KIND);
+ break;
+   case OMP_CLAUSE_PROC_BIND:
+ compare_values (OMP_CLAUSE_PROC_BIND_KIND);
+ break;
+   case OMP_CLAUSE_REDUCTION:
+ compare_values (OMP_CLAUSE_REDUCTION_CODE);
+ compare_values (OMP_CLAUSE_REDUCTION_GIMPLE_INIT);
+ compare_values (OMP_CLAUSE_REDUCTION_GIMPLE_MERGE);
+ break;
+   default:
+ break;
+   }
+}
+
 #undef compare_values
 
 
@@ -1633,6 +1663,16 @@ compare_tree_sccs_1 (tree t1, tree t2, t
}
 }
 
+  if (code == OMP_CLAUSE)
+{
+  int i;
+
+  for (i = 0; i < omp_clause_num_ops[OMP_CLAUSE_CODE (t1)]; i++)
+   compare_tree_edges (OMP_CLAUSE_OPERAND (t1, i),
+   OMP_CLAUSE_OPERAND (t2, i));
+  compare_tree_edges (OMP_CLAUSE_CHAIN (t1), OMP_CLAUSE_CHAIN (t2));
+}
+
 #undef compare_tree_edges
 
   return true;
--- gcc/lto-streamer-out.c.jj   2013-11-28 16:02:36.0 +0100
+++ gcc/lto-streamer-out.c  2013-11-28 16:26:42.059776312 +0100
@@ -1060,6 +1060,39 @@ hash_tree (struct streamer_tree_cache_d
}
 }
 
+  if (code == OMP_CLAUSE)
+{
+  int i;
+
+  v = ite

[PATCH] Avoid SIMD clone dg-do run tests if assembler doesn't support AVX2 (PR lto/59326)

2013-11-28 Thread Jakub Jelinek
Hi!

As we create SIMD clones for all of SSE2, AVX and AVX2 ISAs right now,
the assembler needs to support SSE2, AVX and AVX2.  Apparently some folks
are still using binutils that don't handle that, this patch conditionalizes
the test on that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-11-28  Jakub Jelinek  

PR lto/59326
* gcc.target/i386/i386.exp (check_effective_target_avx2): Move to...
* lib/target-supports.exp (check_effective_target_avx2): ... here.
(check_effective_target_vect_simd_clones): New.
* gcc.dg/vect/vect-simd-clone-1.c: Add dg-require-effective-target
vect_simd_clones.
* gcc.dg/vect/vect-simd-clone-2.c: Likewise.
* gcc.dg/vect/vect-simd-clone-3.c: Likewise.
* gcc.dg/vect/vect-simd-clone-4.c: Likewise.
* gcc.dg/vect/vect-simd-clone-5.c: Likewise.
* gcc.dg/vect/vect-simd-clone-6.c: Likewise.
* gcc.dg/vect/vect-simd-clone-7.c: Likewise.
* gcc.dg/vect/vect-simd-clone-8.c: Likewise.
* gcc.dg/vect/vect-simd-clone-9.c: Likewise.
* gcc.dg/vect/vect-simd-clone-10.c: Likewise.
* gcc.dg/vect/vect-simd-clone-11.c: Likewise.

--- gcc/testsuite/gcc.target/i386/i386.exp.jj   2013-01-11 09:02:38.0 
+0100
+++ gcc/testsuite/gcc.target/i386/i386.exp  2013-11-28 13:36:40.464167773 
+0100
@@ -209,18 +209,6 @@ proc check_effective_target_lzcnt { } {
 } "-mlzcnt" ]
 }
 
-# Return 1 if avx2 instructions can be compiled.
-proc check_effective_target_avx2 { } {
-return [check_no_compiler_messages avx2 object {
-   typedef long long __v4di __attribute__ ((__vector_size__ (32)));
-   __v4di
-   mm256_is32_andnotsi256  (__v4di __X, __v4di __Y)
-{
-  return __builtin_ia32_andnotsi256 (__X, __Y);
-   }
-} "-O0 -mavx2" ]
-}
-
 # Return 1 if bmi instructions can be compiled.
 proc check_effective_target_bmi { } {
 return [check_no_compiler_messages bmi object {
--- gcc/testsuite/lib/target-supports.exp.jj2013-11-15 09:39:37.0 
+0100
+++ gcc/testsuite/lib/target-supports.exp   2013-11-28 13:35:54.408422777 
+0100
@@ -2146,6 +2146,32 @@ proc check_effective_target_vect_floatui
 return $et_vect_floatuint_cvt_saved
 }
 
+# Return 1 if the target supports #pragma omp declare simd, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_simd_clones { } {
+global et_vect_simd_clones_saved
+
+if [info exists et_vect_simd_clones_saved] {
+   verbose "check_effective_target_vect_simd_clones: using cached result" 2
+} else {
+   set et_vect_simd_clones_saved 0
+   if { [istarget i?86-*-*] || [istarget x86_64-*-*] } {
+   # On i?86/x86_64 #pragma omp declare simd builds a sse2, avx and
+   # avx2 clone.  Only the right clone for the specified arch will be
+   # chosen, but still we need to at least be able to assemble
+   # avx2.
+   if { [check_effective_target_avx2] } {
+   set et_vect_simd_clones_saved 1
+   }
+   }
+}
+
+verbose "check_effective_target_vect_simd_clones: returning 
$et_vect_simd_clones_saved" 2
+return $et_vect_simd_clones_saved
+}
+
 # Return 1 if this is a AArch64 target supporting big endian
 proc check_effective_target_aarch64_big_endian { } {
 return [check_no_compiler_messages aarch64_big_endian assembly {
@@ -5106,6 +5132,18 @@ proc check_effective_target_avx { } {
 } "-O2 -mavx" ]
 }
 
+# Return 1 if avx2 instructions can be compiled.
+proc check_effective_target_avx2 { } {
+return [check_no_compiler_messages avx2 object {
+   typedef long long __v4di __attribute__ ((__vector_size__ (32)));
+   __v4di
+   mm256_is32_andnotsi256  (__v4di __X, __v4di __Y)
+{
+  return __builtin_ia32_andnotsi256 (__X, __Y);
+   }
+} "-O0 -mavx2" ]
+}
+
 # Return 1 if sse instructions can be compiled.
 proc check_effective_target_sse { } {
 return [check_no_compiler_messages sse object {
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c.jj2013-11-27 
12:15:14.0 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c   2013-11-28 
13:24:55.345839723 +0100
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target vect_simd_clones } */
 /* { dg-additional-options "-fopenmp-simd" } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
 
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c.jj2013-11-27 
12:15:14.0 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c   2013-11-28 
13:25:43.158572535 +0100
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target vect_simd_clones } */
 /* { dg-additional-options "-fopenmp-simd" } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
 
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c.jj2013-11-27 
12:15:14.0 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c   2013-11-28 
13

[committed] Fix bootstrap with 32-bit HWI (PR middle-end/59327)

2013-11-28 Thread Jakub Jelinek
Hi!

With 32-bit HWI, we would get a -Wsign-compare warning here, this patch
fixes it.  Bootstrapped/regtested on x86_64-linux and i686-linux,
committed as obvious.

2013-11-28  Jakub Jelinek  

PR middle-end/59327
* cfgexpand.c (expand_used_vars): Avoid warning on 32-bit
HWI hosts.

--- gcc/cfgexpand.c.jj  2013-11-28 08:34:36.0 +0100
+++ gcc/cfgexpand.c 2013-11-28 12:40:44.969758239 +0100
@@ -1833,7 +1833,7 @@ expand_used_vars (void)
  sz = data.asan_vec[0] - prev_offset;
  if (data.asan_alignb > ASAN_RED_ZONE_SIZE
  && data.asan_alignb <= 4096
- && sz + ASAN_RED_ZONE_SIZE >= data.asan_alignb)
+ && sz + ASAN_RED_ZONE_SIZE >= (int) data.asan_alignb)
redzonesz = ((sz + ASAN_RED_ZONE_SIZE + data.asan_alignb - 1)
 & ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
  offset

Jakub


[committed] Fix #pragma omp atomic cleanup handling (PR c++/59297)

2013-11-28 Thread Jakub Jelinek
Hi!

This patch fixes missing CLEANUP_POINT_EXPR around OMP_ATOMIC,
which resulted in ICEs.  Bootstrapped/regtested on x86_64-linux and
i686-linux, committed to trunk and 4.8 branch.

2013-11-28  Jakub Jelinek  

PR c++/59297
* semantics.c (finish_omp_atomic): Call finish_expr_stmt
rather than add_stmt.

* g++.dg/gomp/pr59297.C: New test.

--- gcc/cp/semantics.c.jj   2013-11-27 18:02:43.0 +0100
+++ gcc/cp/semantics.c  2013-11-28 17:37:13.563664150 +0100
@@ -6548,7 +6548,7 @@ finish_omp_atomic (enum tree_code code,
   stmt = build2 (OMP_ATOMIC, void_type_node, integer_zero_node, stmt);
   OMP_ATOMIC_SEQ_CST (stmt) = seq_cst;
 }
-  add_stmt (stmt);
+  finish_expr_stmt (stmt);
 }
 
 void
--- gcc/testsuite/g++.dg/gomp/pr59297.C.jj  2013-11-28 17:39:05.449075129 
+0100
+++ gcc/testsuite/g++.dg/gomp/pr59297.C 2013-11-28 17:38:51.0 +0100
@@ -0,0 +1,25 @@
+// PR c++/59297
+// { dg-do compile }
+// { dg-options "-fopenmp" }
+
+template 
+struct A
+{
+  ~A ();
+  const T &operator[] (int) const;
+};
+
+struct B
+{
+  int &operator () (A );
+};
+
+void
+foo (B &x, int &z)
+{
+  A > y;
+  #pragma omp atomic
+  x (y[0]) += 1;
+  #pragma omp atomic
+  z += x(y[1]);
+}

Jakub


[PATCH] Fix handling of invalid {con,de}structor attribute arg (PR c/59280)

2013-11-28 Thread Jakub Jelinek
Hi!

Calling default_conversion on IDENTIFIER_NODE which doesn't even have a type
results in ICEs.  On the other side, if arg is already error_mark_node, we
have already reported an error and there is no point issuing another error.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8?

2013-11-28  Jakub Jelinek  

PR c/59280
* c-common.c (get_priority): If TREE_VALUE (args) is IDENTIFIER_NODE,
goto invalid.  If it is error_mark_node, don't issue further
diagnostics.
testsuite/
* c-c++-common/pr59280.c: New test.

--- gcc/c-family/c-common.c.jj  2013-11-22 21:03:05.0 +0100
+++ gcc/c-family/c-common.c 2013-11-28 18:06:44.796404710 +0100
@@ -7014,6 +7014,10 @@ get_priority (tree args, bool is_destruc
 }
 
   arg = TREE_VALUE (args);
+  if (TREE_CODE (arg) == IDENTIFIER_NODE)
+goto invalid;
+  if (arg == error_mark_node)
+return DEFAULT_INIT_PRIORITY;
   arg = default_conversion (arg);
   if (!tree_fits_shwi_p (arg)
   || !INTEGRAL_TYPE_P (TREE_TYPE (arg)))
--- gcc/testsuite/c-c++-common/pr59280.c.jj 2013-11-28 18:09:09.843654172 
+0100
+++ gcc/testsuite/c-c++-common/pr59280.c2013-11-28 18:10:54.910108073 
+0100
@@ -0,0 +1,4 @@
+/* PR c/59280 */
+/* { dg-do compile } */
+
+void bar (char *) __attribute__((constructor(foo))); /* { dg-error 
"constructor priorities must be integers|was not declared|constructor 
priorities are not supported" } */

Jakub


Re: [PATCH] Fix handling of invalid {con,de}structor attribute arg (PR c/59280)

2013-11-28 Thread Joseph S. Myers
On Fri, 29 Nov 2013, Jakub Jelinek wrote:

> Hi!
> 
> Calling default_conversion on IDENTIFIER_NODE which doesn't even have a type
> results in ICEs.  On the other side, if arg is already error_mark_node, we
> have already reported an error and there is no point issuing another error.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING][PATCH] LRA: check_rtl modifies RTL instruction stream

2013-11-28 Thread Alan Modra
On Wed, Nov 20, 2013 at 11:18:49AM -0700, Jeff Law wrote:
> >2013-11-13  Robert Suchanek  
> >
> > * lra.c (lra): Set lra_in_progress before check_rtl call.
> > * recog.c (insn_invalid_p): Add !lra_in_progress to prevent
> > adding clobber regs when LRA is running

Trying to run the testsuite with -mlra and the default -mcmodel=medium
on powerpc64 now results in enormous numbers of failures like the
following.

/home/alanm/src/gcc-virgin/libatomic/testsuite/libatomic.c/atomic-exchange-1.c:67:1:
 error: insn does not satisfy its constraints:
 }
 ^
(insn 5 2 6 2 (set (reg/f:DI 212)
(mem/u/c:DI (unspec:DI [
(symbol_ref/u:DI ("*.LC0") [flags 0x2])
(reg:DI 2 2)
] UNSPEC_TOCREL) [0 S8 A8])) 
/home/alanm/src/gcc-virgin/libatomic/testsuite/libatomic.c/atomic-exchange-1.c:14
 505 {*movdi_internal64}
 (expr_list:REG_EQUAL (symbol_ref:DI ("v") )
(nil)))

This is due to that innocuous seeming change of setting
lra_in_progress before calling check_rtl(), in combination with
previous changes Vlad made to the rs6000 backend here:
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02208.html
In particular the "Call legitimate_constant_pool_address_p in strict
mode for LRA" change, that sets "strict" when lra_in_progress.

I'm not at all familiar with lra so why Vlad made those changes to
rs6000.c is totally opaque to me.  If this were a reload problem I
could dive in and fix it, but not lra, sorry..

What I can say is that the rtl shown above is a toc reference of the
form that is valid for -mcmodel=small both before and after reload,
and generates "ld offset(r2)" machine instructions.  The form is valid
for -mcmodel=medium/large only before reload.  After reload it is
supposed to be split into high/lo_sum variants that generate
"addis rtmp,offset@ha(r2); ld offset@l(rtmp)".

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Fix up bogus warning (PR sanitizer/59331)

2013-11-28 Thread Jason Merrill

On 11/28/2013 12:14 PM, Marek Polacek wrote:

  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t),
-  ubsan_instrument_vla (input_location, t), t);
+  ubsan_instrument_vla (input_location, t),
+  /* Cast to void to prevent bogus warning.  */
+  build1 (CONVERT_EXPR, void_type_node, t));
  finish_expr_stmt (t);


Why do you need the COMPOUND_EXPR at all?  Why can't you just do

t = ubsan_instrument_vla (input_location, t);

?

Jason




Update docs for C99 and C11 support

2013-11-28 Thread Joseph S. Myers
This patch updates various documentation in GCC regarding the state of
C99 and C11 support, in particular updating the documentation of
implementation-defined behavior for C11.

Tested with "make info html pdf" and committed.

2013-11-29  Joseph Myers  

* doc/implement-c.texi: Document C11 implementation-defined
behavior.  Refer to -ffp-contract=fast for contraction behavior.
* doc/invoke.texi (-std=c99, std=c11): Update description of
completeness.
(-std=gnu99): Don't mention as future default.
(-std=gnu11): Mention as intended future default.
* doc/standards.texi: Update descriptions of C99 and C11 support.
Limit statement about C99 facilities for freestanding
implementations to some platforms only.

Index: doc/implement-c.texi
===
--- doc/implement-c.texi(revision 205504)
+++ doc/implement-c.texi(working copy)
@@ -9,9 +9,9 @@
 A conforming implementation of ISO C is required to document its
 choice of behavior in each of the areas that are designated
 ``implementation defined''.  The following lists all such areas,
-along with the section numbers from the ISO/IEC 9899:1990 and ISO/IEC
-9899:1999 standards.  Some areas are only implementation-defined in
-one version of the standard.
+along with the section numbers from the ISO/IEC 9899:1990, ISO/IEC
+9899:1999 and ISO/IEC 9899:2011 standards.  Some areas are only
+implementation-defined in one version of the standard.
 
 Some choices depend on the externally determined ABI for the platform
 (including standard character encodings) which GCC follows; these are
@@ -47,14 +47,15 @@ a freestanding environment); refer to their docume
 
 @itemize @bullet
 @item
-@cite{How a diagnostic is identified (C90 3.7, C99 3.10, C90 and C99 5.1.1.3).}
+@cite{How a diagnostic is identified (C90 3.7, C99 and C11 3.10, C90,
+C99 and C11 5.1.1.3).}
 
 Diagnostics consist of all the output sent to stderr by GCC@.
 
 @item
 @cite{Whether each nonempty sequence of white-space characters other than
 new-line is retained or replaced by one space character in translation
-phase 3 (C90 and C99 5.1.1.2).}
+phase 3 (C90, C99 and C11 5.1.1.2).}
 
 @xref{Implementation-defined behavior, , Implementation-defined
 behavior, cpp, The C Preprocessor}.
@@ -70,7 +71,8 @@ of the C library, and are not defined by GCC itsel
 @itemize @bullet
 @item
 @cite{The mapping between physical source file multibyte characters
-and the source character set in translation phase 1 (C90 and C99 5.1.1.2).}
+and the source character set in translation phase 1 (C90, C99 and C11
+5.1.1.2).}
 
 @xref{Implementation-defined behavior, , Implementation-defined
 behavior, cpp, The C Preprocessor}.
@@ -83,14 +85,14 @@ behavior, cpp, The C Preprocessor}.
 @itemize @bullet
 @item
 @cite{Which additional multibyte characters may appear in identifiers
-and their correspondence to universal character names (C99 6.4.2).}
+and their correspondence to universal character names (C99 and C11 6.4.2).}
 
 @xref{Implementation-defined behavior, , Implementation-defined
 behavior, cpp, The C Preprocessor}.
 
 @item
 @cite{The number of significant initial characters in an identifier
-(C90 6.1.2, C90 and C99 5.2.4.1, C99 6.4.2).}
+(C90 6.1.2, C90, C99 and C11 5.2.4.1, C99 and C11 6.4.2).}
 
 For internal names, all characters are significant.  For external names,
 the number of significant characters are defined by the linker; for
@@ -100,7 +102,7 @@ almost all targets, all characters are significant
 @cite{Whether case distinctions are significant in an identifier with
 external linkage (C90 6.1.2).}
 
-This is a property of the linker.  C99 requires that case distinctions
+This is a property of the linker.  C99 and C11 require that case distinctions
 are always significant in identifiers with external linkage and
 systems without this property are not supported by GCC@.
 
@@ -111,33 +113,34 @@ systems without this property are not supported by
 
 @itemize @bullet
 @item
-@cite{The number of bits in a byte (C90 3.4, C99 3.6).}
+@cite{The number of bits in a byte (C90 3.4, C99 and C11 3.6).}
 
 Determined by ABI@.
 
 @item
-@cite{The values of the members of the execution character set (C90
-and C99 5.2.1).}
+@cite{The values of the members of the execution character set (C90,
+C99 and C11 5.2.1).}
 
 Determined by ABI@.
 
 @item
 @cite{The unique value of the member of the execution character set produced
-for each of the standard alphabetic escape sequences (C90 and C99 5.2.2).}
+for each of the standard alphabetic escape sequences (C90, C99 and C11
+5.2.2).}
 
 Determined by ABI@.
 
 @item
 @cite{The value of a @code{char} object into which has been stored any
 character other than a member of the basic execution character set
-(C90 6.1.2.5, C99 6.2.5).}
+(C90 6.1.2.5, C99 and C11 6.2.5).}
 
 Determined by ABI@.
 
 @item
 @cite{Which of @code{signed char} or @code{unsigned char} 

List C11 features in 4.9 release notes

2013-11-28 Thread Joseph S. Myers
I've applied this patch to mention new C11 features in the 4.9 release 
notes.  (I expect there are plenty more features all over GCC that still 
need adding to the release notes - someone will need to review the changes 
that have gone in since 4.8, looking for significant new features.)

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.41
diff -u -r1.41 changes.html
--- changes.html28 Nov 2013 15:05:51 -  1.41
+++ changes.html29 Nov 2013 00:45:46 -
@@ -143,10 +143,26 @@
 instructions.
  
 
-
-
+
+  ISO C11 atomics (the _Atomic type specifier and
+  qualifier and the  header) are now
+  supported.
+
+  ISO C11 generic selections (_Generic keyword) are
+  now supported.
+
+  ISO C11 thread-local storage (_Thread_local,
+  similar to GNU C __thread) is now supported.
+
+  ISO C11 support is now at a similar level of completeness to ISO
+  C99 support: substantially complete modulo bugs, extended
+  identifiers (supported except for corner cases
+  when -fextended-identifiers is used), floating-point
+  issues (mainly but not entirely relating to optional C99 features
+  from Annexes F and G) and the optional Annexes K (Bounds-checking
+  interfaces) and L (Analyzability).
+
 
 C++
 

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-11-28 Thread H.J. Lu
On Thu, Nov 28, 2013 at 2:11 PM, Vladimir Makarov  wrote:
>   The following patch fixes PR57293
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293
>
>   It is actually an implementation of missed LRA functionality in reg
> elimination.  Before the patch any explicit change of stack pointer in
> RTL resulted in necessity to use the frame pointer.
>
>   The patch has practically no effect on generic tuning of x86/x86-64.
> But it has a dramatic effect on code performance for other tunings
> like corei7 which don't use incoming args accumulation.  The maximum
> SPEC2000 improvement 2.5% is achieved on x86 SPECInt2000.  But
> SPECFP2000 rate also has improvement about 1% on x86 and x86-64.  Too
> bad that I did not implement it at the first place.  The results would
> have been even much better ones reported on 2012 GNU Cauldron as I
> also used -mtune=corei7 that time.
>
> The patch was bootstrapped and tested on x86-64/x86 and ppc.
>
> Committed as rev. 205498.
>
>  2013-11-28  Vladimir Makarov
>
> PR target/57293
> * ira.h (ira_setup_eliminable_regset): Remove parameter.
> * ira.c (ira_setup_eliminable_regset): Ditto.  Add
> SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
> Don't call lra_init_elimination.
> (ira): Call ira_setup_eliminable_regset without arguments.
> * loop-invariant.c (calculate_loop_reg_pressure): Remove argument
> from ira_setup_eliminable_regset call.
> * gcse.c (calculate_bb_reg_pressure): Ditto.
> * haifa-sched.c (sched_init): Ditto.
> * lra.h (lra_init_elimination): Remove the prototype.
> * lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
> used_insn_alternative upper.
> (lra_eliminate_regs_1): Add one more parameter.
> (lra-eliminate): Ditto.
> * lra.c (lra_invalidate_insn_data): Set sp_offset.
> (setup_sp_offset): New.
> (lra_process_new_insns): Call setup_sp_offset.
> (lra): Add argument to lra_eliminate calls.
> * lra-constraints.c (get_equiv_substitution): Rename to get_equiv.
> (get_equiv_with_elimination): New.
> (process_addr_reg): Call get_equiv_with_elimination instead of
> get_equiv_substitution.
> (equiv_address_substitution): Ditto.
> (loc_equivalence_change_p): Ditto.
> (loc_equivalence_callback, lra_constraints): Ditto.
> (curr_insn_transform): Ditto.  Print the sp offset
> (process_alt_operands): Prevent stack pointer reloads.
> (lra_constraints): Remove one argument from lra_eliminate call.
> Move it up.  Mark used hard regs bfore it.  Use
> get_equiv_with_elimination instead of get_equiv_substitution.
> * lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
> assert for param values combination.  Use sp offset.  Add argument
> to lra_eliminate_regs_1 calls.
> (lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
> (curr_sp_change): New static var.
> (mark_not_eliminable): Add parameter.  Update curr_sp_change.
> Don't prevent elimination to sp if we can calculate its change.
> Pass the argument to mark_not_eliminable calls.
> (eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
> argument to lra_eliminate_regs_1 call.
> (update_reg_eliminate): Move calculation of hard regs for spill
> lower.  Switch off lra_in_progress temporarily to generate regs
> involved into elimination.
> (lra_init_elimination): Rename to init_elimination.  Make it
> static.  Set up insn sp offset, check the offsets at the end of
> BBs.
> (process_insn_for_elimination): Add parameter.  Pass its value to
> eliminate_regs_in_insn.
> (lra_eliminate): : Add parameter.  Pass its value to
> process_insn_for_elimination.  Add assert for param values
> combination.  Call init_elimination.  Don't update offsets in
> equivalence substitutions.
> * lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1
> for created stack slot.
> (remove_pseudos): Call lra_eliminate_regs_1 before changing memory
> onto stack slot.
>

Hi Vladimir,

Thanks for your hard work.   I noticed a few regressions
on x86-64:

FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  li

Fix C99 inline function definition following static inline declaration (PR c/57574)

2013-11-28 Thread Joseph S. Myers
This patch fixes PR 57574, where a function defined (in C99 mode) as
"inline" following a "static inline" declaration was wrongly
considered an external reference, meaning (a) spurious diagnostics
about references to static variables and (b) undefined references to
the function could be generated as if it were non-static inline and an
external definition had to be provided elsewhere.  The fix is to
detect this case in merge_decls and set DECL_EXTERNAL appropriately.

Bootstrapped with no regressions on x86_64-unknown-linux-gnu.  Applied
to mainline.

c:
2013-11-29  Joseph Myers  

PR c/57574
* c-decl.c (merge_decls): Clear DECL_EXTERNAL for a definition of
an inline function following a static declaration.

testsuite:
2013-11-29  Joseph Myers  

PR c/57574
* gcc.dg/inline-35.c: New test.

Index: gcc/testsuite/gcc.dg/inline-35.c
===
--- gcc/testsuite/gcc.dg/inline-35.c(revision 0)
+++ gcc/testsuite/gcc.dg/inline-35.c(revision 0)
@@ -0,0 +1,19 @@
+/* A function definition of an inline function following a static
+   declaration does not make an inline definition in C99/C11 terms.
+   PR 57574.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c99 -pedantic-errors" } */
+
+static int n;
+
+static inline int f1 (void);
+inline int f1 (void) { return n; }
+
+static int f2 (void);
+inline int f2 (void) { return n; }
+
+static inline int f3 (void);
+int f3 (void) { return n; }
+
+static int f4 (void);
+int f4 (void) { return n; }
Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c  (revision 205497)
+++ gcc/c/c-decl.c  (working copy)
@@ -2343,6 +2343,14 @@ merge_decls (tree newdecl, tree olddecl, tree newt
   && !current_function_decl)
 DECL_EXTERNAL (newdecl) = 0;
 
+  /* An inline definition following a static declaration is not
+ DECL_EXTERNAL.  */
+  if (new_is_definition
+  && (DECL_DECLARED_INLINE_P (newdecl)
+ || DECL_DECLARED_INLINE_P (olddecl))
+  && !TREE_PUBLIC (olddecl))
+DECL_EXTERNAL (newdecl) = 0;
+
   if (DECL_EXTERNAL (newdecl))
 {
   TREE_STATIC (newdecl) = TREE_STATIC (olddecl);

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [wide-int] Handle more ltu_p cases inline

2013-11-28 Thread Kenneth Zadeck

this is fine.

kenny
On 11/28/2013 12:29 PM, Richard Sandiford wrote:

The existing ltu_p fast path can handle any pairs of single-HWI inputs,
even for precision > HOST_BITS_PER_WIDE_INT.  In that case both xl and
yl are implicitly sign-extended to the larger precision, but with the
extended values still being compared as unsigned.  The extension doesn't
change the result in that case.

When compiling a recent fold-const.ii, this reduces the number of
ltu_p_large calls from 23849 to 697.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/alias.c
===
--- gcc/alias.c 2013-11-20 12:12:49.393055063 +
+++ gcc/alias.c 2013-11-28 12:24:23.307549245 +
@@ -342,7 +342,7 @@ ao_ref_from_mem (ao_ref *ref, const_rtx
  || (DECL_P (ref->base)
  && (DECL_SIZE (ref->base) == NULL_TREE
  || TREE_CODE (DECL_SIZE (ref->base)) != INTEGER_CST
- || wi::ltu_p (DECL_SIZE (ref->base),
+ || wi::ltu_p (wi::to_offset (DECL_SIZE (ref->base)),
ref->offset + ref->size)
  return false;
  
Index: gcc/wide-int.h

===
--- gcc/wide-int.h  2013-11-28 11:44:39.041731636 +
+++ gcc/wide-int.h  2013-11-28 12:48:36.200764215 +
@@ -1740,13 +1740,15 @@ wi::ltu_p (const T1 &x, const T2 &y)
unsigned int precision = get_binary_precision (x, y);
WIDE_INT_REF_FOR (T1) xi (x, precision);
WIDE_INT_REF_FOR (T2) yi (y, precision);
-  /* Optimize comparisons with constants and with sub-HWI unsigned
- integers.  */
+  /* Optimize comparisons with constants.  */
if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0))
  return xi.len == 1 && xi.to_uhwi () < (unsigned HOST_WIDE_INT) yi.val[0];
if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0))
  return yi.len != 1 || yi.to_uhwi () > (unsigned HOST_WIDE_INT) xi.val[0];
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  /* Optimize the case of two HWIs.  The HWIs are implicitly sign-extended
+ for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both
+ values does not change the result.  */
+  if (xi.len + yi.len == 2)
  {
unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
unsigned HOST_WIDE_INT yl = yi.to_uhwi ();





Re: [wide-int] Handle more cmps and cmpu cases inline

2013-11-28 Thread Kenneth Zadeck
like the add/sub patch, enhance the comment so that it says that it is 
designed to hit the widestint and offset int common cases.


kenny
On 11/28/2013 12:34 PM, Richard Sandiford wrote:

As Richi asked, this patch makes cmps use the same shortcuts as lts_p.
It also makes cmpu use the shortcut that I justed added to ltu_p.

On that same fold-const.ii testcase, this reduces the number of cmps_large
calls from 66924 to 916.  It reduces the number of cmpu_large calls from
3462 to 4.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2001-01-01 00:00:00.0 +
+++ gcc/wide-int.h  2013-11-28 16:08:22.527681077 +
@@ -1858,17 +1858,31 @@ wi::cmps (const T1 &x, const T2 &y)
unsigned int precision = get_binary_precision (x, y);
WIDE_INT_REF_FOR (T1) xi (x, precision);
WIDE_INT_REF_FOR (T2) yi (y, precision);
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  if (wi::fits_shwi_p (yi))
  {
-  HOST_WIDE_INT xl = xi.to_shwi ();
-  HOST_WIDE_INT yl = yi.to_shwi ();
-  if (xl < yl)
+  /* Special case for comparisons with 0.  */
+  if (STATIC_CONSTANT_P (yi.val[0] == 0))
+   return neg_p (xi) ? -1 : !(xi.len == 1 && xi.val[0] == 0);
+  /* If x fits into a signed HWI, we can compare directly.  */
+  if (wi::fits_shwi_p (xi))
+   {
+ HOST_WIDE_INT xl = xi.to_shwi ();
+ HOST_WIDE_INT yl = yi.to_shwi ();
+ return xl < yl ? -1 : xl > yl;
+   }
+  /* If x doesn't fit and is negative, then it must be more
+negative than any signed HWI, and hence smaller than y.  */
+  if (neg_p (xi))
return -1;
-  else if (xl > yl)
-   return 1;
-  else
-   return 0;
+  /* If x is positive, then it must be larger than any signed HWI,
+and hence greater than y.  */
+  return 1;
  }
+  /* Optimize the opposite case, if it can be detected at compile time.  */
+  if (STATIC_CONSTANT_P (xi.len == 1))
+/* If YI is negative it is lower than the least HWI.
+   If YI is positive it is greater than the greatest HWI.  */
+return neg_p (yi) ? 1 : -1;
return cmps_large (xi.val, xi.len, precision, yi.val, yi.len);
  }
  
@@ -1881,16 +1895,35 @@ wi::cmpu (const T1 &x, const T2 &y)

unsigned int precision = get_binary_precision (x, y);
WIDE_INT_REF_FOR (T1) xi (x, precision);
WIDE_INT_REF_FOR (T2) yi (y, precision);
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  /* Optimize comparisons with constants.  */
+  if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0))
  {
+  /* If XI doesn't fit in a HWI then it must be larger than YI.  */
+  if (xi.len != 1)
+   return 1;
+  /* Otherwise compare directly.  */
unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
-  unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
-  if (xl < yl)
+  unsigned HOST_WIDE_INT yl = yi.val[0];
+  return xl < yl ? -1 : xl > yl;
+}
+  if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0))
+{
+  /* If YI doesn't fit in a HWI then it must be larger than XI.  */
+  if (yi.len != 1)
return -1;
-  else if (xl == yl)
-   return 0;
-  else
-   return 1;
+  /* Otherwise compare directly.  */
+  unsigned HOST_WIDE_INT xl = xi.val[0];
+  unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
+  return xl < yl ? -1 : xl > yl;
+}
+  /* Optimize the case of two HWIs.  The HWIs are implicitly sign-extended
+ for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both
+ values does not change the result.  */
+  if (xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
+  unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
+  return xl < yl ? -1 : xl > yl;
  }
return cmpu_large (xi.val, xi.len, precision, yi.val, yi.len);
  }





Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-11-28 Thread Vladimir Makarov

On 11/28/2013, 7:51 PM, H.J. Lu wrote:

On Thu, Nov 28, 2013 at 2:11 PM, Vladimir Makarov  wrote:

   The following patch fixes PR57293

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293

   It is actually an implementation of missed LRA functionality in reg
elimination.  Before the patch any explicit change of stack pointer in
RTL resulted in necessity to use the frame pointer.

   The patch has practically no effect on generic tuning of x86/x86-64.
But it has a dramatic effect on code performance for other tunings
like corei7 which don't use incoming args accumulation.  The maximum
SPEC2000 improvement 2.5% is achieved on x86 SPECInt2000.  But
SPECFP2000 rate also has improvement about 1% on x86 and x86-64.  Too
bad that I did not implement it at the first place.  The results would
have been even much better ones reported on 2012 GNU Cauldron as I
also used -mtune=corei7 that time.

The patch was bootstrapped and tested on x86-64/x86 and ppc.

Committed as rev. 205498.

  2013-11-28  Vladimir Makarov

 PR target/57293
 * ira.h (ira_setup_eliminable_regset): Remove parameter.
 * ira.c (ira_setup_eliminable_regset): Ditto.  Add
 SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
 Don't call lra_init_elimination.
 (ira): Call ira_setup_eliminable_regset without arguments.
 * loop-invariant.c (calculate_loop_reg_pressure): Remove argument
 from ira_setup_eliminable_regset call.
 * gcse.c (calculate_bb_reg_pressure): Ditto.
 * haifa-sched.c (sched_init): Ditto.
 * lra.h (lra_init_elimination): Remove the prototype.
 * lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
 used_insn_alternative upper.
 (lra_eliminate_regs_1): Add one more parameter.
 (lra-eliminate): Ditto.
 * lra.c (lra_invalidate_insn_data): Set sp_offset.
 (setup_sp_offset): New.
 (lra_process_new_insns): Call setup_sp_offset.
 (lra): Add argument to lra_eliminate calls.
 * lra-constraints.c (get_equiv_substitution): Rename to get_equiv.
 (get_equiv_with_elimination): New.
 (process_addr_reg): Call get_equiv_with_elimination instead of
 get_equiv_substitution.
 (equiv_address_substitution): Ditto.
 (loc_equivalence_change_p): Ditto.
 (loc_equivalence_callback, lra_constraints): Ditto.
 (curr_insn_transform): Ditto.  Print the sp offset
 (process_alt_operands): Prevent stack pointer reloads.
 (lra_constraints): Remove one argument from lra_eliminate call.
 Move it up.  Mark used hard regs bfore it.  Use
 get_equiv_with_elimination instead of get_equiv_substitution.
 * lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
 assert for param values combination.  Use sp offset.  Add argument
 to lra_eliminate_regs_1 calls.
 (lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
 (curr_sp_change): New static var.
 (mark_not_eliminable): Add parameter.  Update curr_sp_change.
 Don't prevent elimination to sp if we can calculate its change.
 Pass the argument to mark_not_eliminable calls.
 (eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
 argument to lra_eliminate_regs_1 call.
 (update_reg_eliminate): Move calculation of hard regs for spill
 lower.  Switch off lra_in_progress temporarily to generate regs
 involved into elimination.
 (lra_init_elimination): Rename to init_elimination.  Make it
 static.  Set up insn sp offset, check the offsets at the end of
 BBs.
 (process_insn_for_elimination): Add parameter.  Pass its value to
 eliminate_regs_in_insn.
 (lra_eliminate): : Add parameter.  Pass its value to
 process_insn_for_elimination.  Add assert for param values
 combination.  Call init_elimination.  Don't update offsets in
 equivalence substitutions.
 * lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1
 for created stack slot.
 (remove_pseudos): Call lra_eliminate_regs_1 before changing memory
 onto stack slot.



Hi Vladimir,

Thanks for your hard work.   I noticed a few regressions
on x86-64:

FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c  -O2

RE: [PATCH] Fix PR58944

2013-11-28 Thread Bernd Edlinger
Hi,

On Wed, 27 Nov 2013 19:49:39, Uros Bizjak wrote:
>
> On Mon, Nov 25, 2013 at 10:08 PM, Sriraman Tallam  wrote:
>
>> I have attached a patch to fix this bug :
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58944
>>
>> A similar problem was also reported here:
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01050.html
>>
>>
>> Recently, ix86_valid_target_attribute_tree in config/i386/i386.c was
>> refactored to not depend on global_options structure and to be able to
>> use any gcc_options structure. One clean way to fix this is by having
>> target_option_default_node save all the default target options which
>> can be restored to any gcc_options structure. The root cause of the
>> above bugs was that ix86_arch_string and ix86_tune_string was not
>> saved in target_option_deault_node in PR58944 and
>> ix86_preferred_stack_boundary_arg was not saved in the latter case.
>>
>> This patch saves all the target options used in i386.opt which are
>> either obtained from the command-line or set to some default. Is this
>> patch alright?
>
> Things looks rather complicated, but I see no other solution that save
> and restore the way you propose.
>
> Please wait 24h if somebody has a different idea, otherwise please go
> ahead and commit the patch to mainline.
>

Maybe you should also look at the handling or preferred_stack_boundary_arg
versus incoming_stack_boundary_arg in ix86_option_override_internal:

Remember ix86_incoming_stack_boundary_arg is defined to
global_options.x_ix86_incoming_stack_boundary_arg.

like this?

  if (opts_set->x_ix86_incoming_stack_boundary_arg)
    {
-  if (ix86_incoming_stack_boundary_arg
+  if (opts->x_ix86_incoming_stack_boundary_arg
  < (TARGET_64BIT_P (opts->x_ix86_isa_flags) ? 4 : 2)
-  || ix86_incoming_stack_boundary_arg> 12)
+ || opts->x_ix86_incoming_stack_boundary_arg> 12)
    error ("-mincoming-stack-boundary=%d is not between %d and 12",
-   ix86_incoming_stack_boundary_arg,
+  opts->x_ix86_incoming_stack_boundary_arg,
   TARGET_64BIT_P (opts->x_ix86_isa_flags) ? 4 : 2);
  else
    {
  ix86_user_incoming_stack_boundary
-    = (1 << ix86_incoming_stack_boundary_arg) * BITS_PER_UNIT;
+   = (1 << opts->x_ix86_incoming_stack_boundary_arg) * BITS_PER_UNIT;
  ix86_incoming_stack_boundary
    = ix86_user_incoming_stack_boundary;
    }

Note however that opts_set always points to global_options_set.
so this logic combines the stat of global_options_set and the 
target_option_default_node.


Bernd.


> Thanks,
> Uros

  1   2   >