Re: [PATCH] PR 53975

2012-10-16 Thread Andrey Belevantsev

Hello,

On 31.07.2012 15:08, Andrey Belevantsev wrote:

Hello,

This PR is about wrong speculation of an insn that doesn't support storing
NaT bits done by the selective scheduler (more details in the PR audit
trail).  The reason for this is the wrong one-liner patch committed last
year, the fix is to revert that patch and to clarify the comment before the
patched code.

Bootstrapped and tested on ia64, approved by Alexander offline.  No test as
I don't know how to check whether an insn got moved through another insn.


Finally I've got to porting this to 4.7.  The patch was successfully tested 
on ia64, I'll commit this after double checking on x86-64.  The patch 
should be committed with the fix for PR 53701, which I will do right after 
this one.


Andrey

2012-10-16  Andrey Belevantsev  

Backport from mainline
2012-07-31  Andrey Belevantsev  
PR target/53975

* sel-sched-ir.c (has_dependence_note_reg_use): Clarify comment.
Revert
2011-08-04  Sergey Grechanik  
* sel-sched-ir.c (has_dependence_note_reg_use): Call ds_full_merge
only if producer writes to the register given by regno.

Index: gcc/sel-sched-ir.c
===
*** gcc/sel-sched-ir.c  (revision 192488)
--- gcc/sel-sched-ir.c  (working copy)
*** has_dependence_note_reg_use (int regno)
*** 3224,3230 
if (reg_last->clobbers)
*dsp = (*dsp & ~SPECULATIVE) | DEP_ANTI;

!   /* Handle BE_IN_SPEC.  */
if (reg_last->uses)
{
  ds_t pro_spec_checked_ds;
--- 3224,3234 
if (reg_last->clobbers)
*dsp = (*dsp & ~SPECULATIVE) | DEP_ANTI;

!   /* Merge BE_IN_SPEC bits into *DSP when the dependency producer
!is actually a check insn.  We need to do this for any register
!read-read dependency with the check unless we track properly
!all registers written by BE_IN_SPEC-speculated insns, as
!we don't have explicit dependence lists.  See PR 53975.  */
if (reg_last->uses)
{
  ds_t pro_spec_checked_ds;
*** has_dependence_note_reg_use (int regno)
*** 3232,3240 
  pro_spec_checked_ds = INSN_SPEC_CHECKED_DS 
(has_dependence_data.pro);

  pro_spec_checked_ds = ds_get_max_dep_weak (pro_spec_checked_ds);

! if (pro_spec_checked_ds != 0
! && bitmap_bit_p (INSN_REG_SETS (has_dependence_data.pro), regno))
!   /* Merge BE_IN_SPEC bits into *DSP.  */
*dsp = ds_full_merge (*dsp, pro_spec_checked_ds,
  NULL_RTX, NULL_RTX);
}
--- 3236,3242 
  pro_spec_checked_ds = INSN_SPEC_CHECKED_DS 
(has_dependence_data.pro);

  pro_spec_checked_ds = ds_get_max_dep_weak (pro_spec_checked_ds);

! if (pro_spec_checked_ds != 0)
*dsp = ds_full_merge (*dsp, pro_spec_checked_ds,
  NULL_RTX, NULL_RTX);
}



RE: [PATCH RFA] Implement register pressure directed hoist pass

2012-10-16 Thread Bin Cheng
Hi Steven, Jeff,
I found a flaw in original patch, which results in inaccurate register
pressure.
Here comes the updated patches, improving code size a little bit on
Thumb2/powerpc comparing to original patches.
Please review.

Thanks

2012-10-16  Bin Cheng  

* gcse.c: Update copyright dates.
(hoist_expr_reaches_here_p): Change parameter type from char *
to sbitmap.

2012-10-16  Bin Cheng  

* common.opt (flag_ira_hoist_pressure): New.
* doc/invoke.texi (-fira-hoist-pressure): Describe.
* ira-costs.c (ira_set_pseudo_classes): New parameter.
* ira.h: Update copyright dates.
(ira_set_pseudo_classes): Update prototype.
* haifa-sched.c (sched_init): Update call.
* ira.c (ira): Update call.
* regmove.c: Update copyright dates.
(regmove_optimize): Update call.
* loop-invariant.c: Update copyright dates.
(move_loop_invariants): Update call.
* gcse.c: (struct bb_data): New structure.
(BB_DATA): New macro.
(curr_bb, curr_reg_pressure): New static variables.
(should_hoist_expr_to_dom): Rename from hoist_expr_reaches_here_p.
Change parameter expr_index to expr.
New parameters pressure_class, nregs and hoisted_bbs.
Use reg pressure to determine the distance expr can be hoisted.
(hoist_code): Use reg pressure to direct the hoist process.
(get_regno_pressure_class, get_pressure_class_and_nregs)
(change_pressure, calculate_bb_reg_pressure): New.
(one_code_hoisting_pass): Calculate register pressure. Allocate
and free data.

gcc/testsuite/ChangeLog
2012-10-16  Bin Cheng  

* testsuite/gcc.dg/hoist-register-pressure.c: New test.


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
On
> Behalf Of Bin Cheng
> Sent: Friday, October 12, 2012 4:09 PM
> To: 'Steven Bosscher'
> Cc: Jeff Law; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH RFA] Implement register pressure directed hoist pass
> 
> Hi,
> This is the updated patches split from original one according to Steven's
> suggestion. Also fixed spelling errors.
> 
> Apart from this, I also implemented a draft patch simulating register
pressure
> accurately during hoisting, unfortunately the size data isn't better than
this
> patch. If it's right, this can prove my previous observation that decrease
of
> register pressure during hoisting process is rare. I will continue
> investigating the correctness of the patch and see what I can get.
> 
> Please review. Thanks
> 
> And the ChangeLog:
> 
> 2012-10-12  Bin Cheng  
> 
>   * gcse.c: Update copyright dates.
>   (hoist_expr_reaches_here_p): Change parameter type from char *
>   to sbitmap.
> 
> 2012-10-12  Bin Cheng  
> 
>   * common.opt (flag_ira_hoist_pressure): New.
>   * doc/invoke.texi (-fira-hoist-pressure): Describe.
>   * ira-costs.c (ira_set_pseudo_classes): New parameter.
>   * ira.h: Update copyright dates.
>   (ira_set_pseudo_classes): Update prototype.
>   * haifa-sched.c (sched_init): Update call.
>   * ira.c (ira): Update call.
>   * regmove.c: Update copyright dates.
>   (regmove_optimize): Update call.
>   * loop-invariant.c: Update copyright dates.
>   (move_loop_invariants): Update call.
>   * gcse.c: (struct bb_data): New structure.
>   (BB_DATA): New macro.
>   (curr_bb, curr_reg_pressure): New static variables.
>   (should_hoist_expr_to_dom): Rename from hoist_expr_reaches_here_p.
>   Change parameter expr_index to expr.
>   New parameters pressure_class, nregs and hoisted_bbs.
>   Use reg pressure to determine the distance expr can be hoisted.
>   (hoist_code): Use reg pressure to direct the hoist process.
>   (get_regno_pressure_class, get_pressure_class_and_nregs)
>   (change_pressure, calculate_bb_reg_pressure): New.
>   (one_code_hoisting_pass): Calculate register pressure. Allocate
>   and free data.
> 
> gcc/testsuite/ChangeLog
> 2012-10-12  Bin Cheng  
> 
>   * testsuite/gcc.dg/hoist-register-pressure.c: New test.diff -x .svn -rpN wc1/gcc/common.opt wc2/gcc/common.opt
*** wc1/gcc/common.opt  2012-10-12 15:13:41.679617846 +0800
--- wc2/gcc/common.opt  2012-10-16 15:19:35.231674135 +0800
*** Enum(ira_region) String(all) Value(IRA_R
*** 1392,1397 
--- 1392,1402 
  EnumValue
  Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED)
  
+ fira-hoist-pressure
+ Common Report Var(flag_ira_hoist_pressure) Init(1) Optimization
+ Use IRA based register pressure calculation
+ in RTL hoist optimizations.
+ 
  fira-loop-pressure
  Common Report Var(flag_ira_loop_pressure)
  Use IRA based register pressure calculation
diff -x .svn -rpN wc1/gcc/doc/invoke.texi wc2/gcc/doc/invoke.texi
*** wc1/gcc/doc/invoke.texi 2012-10-12 15:13:40.282479595 +0800
--- wc2/gcc/doc/invoke.texi 2012-10-16 15:19:35.239111464 +0800

Re: Fix PR 53701

2012-10-16 Thread Andrey Belevantsev

Hello,

On 09.08.2012 17:19, Alexander Monakov wrote:



On Thu, 9 Aug 2012, Andrey Belevantsev wrote:


Hello,

The problem in question is uncovered by the recent speculation patch, it is in
the handling of expressions blocked by bookkeeping.  Those are expressions
that become unavailable due to the newly created bookkeeping copies.  In the
original algorithm the supported insns and transformations cannot lead to this
result, but when handling non-separable insns or creating speculative checks
that unpredictably block certain insns the situation can arise.  We just
filter out all such expressions from the final availability set for
correctness.

The PR happens because the expression being filtered out can be transformed
while being moved up, thus we need to look up not only its exact pattern but
also all its previous forms saved in its history of changes.  The patch does
exactly that, I also clarified the comments w.r.t. this situation.

Bootstrapped and tested on ia64 and x86-64, the PR testcase is minimized, too.
OK for trunk?  Also need to backport this to 4.7 with PR 53975, say on the
next week.


This is OK.


The below is the port of this patch to 4.7, took longer than expected but 
still.  Will commit after retesting on x86-64 (testing on ia64 is already 
fine) and with the fix for PR 53975.


Andrey

2012-10-16  Andrey Belevantsev  

Backport from mainline
2012-08-09  Andrey Belevantsev  

PR rtl-optimization/53701
* sel-sched.c (vinsn_vec_has_expr_p): Clarify function comment.
Process not only expr's vinsns but all old vinsns from expr's
history of changes.
(update_and_record_unavailable_insns): Clarify comment. 

testsuite:
2012-10-16  Andrey Belevantsev  

Backport from mainline
2012-08-09  Andrey Belevantsev  

PR rtl-optimization/53701
* gcc.dg/pr53701.c: New test.


Index: gcc/testsuite/gcc.dg/pr53701.c
===
*** gcc/testsuite/gcc.dg/pr53701.c  (revision 0)
--- gcc/testsuite/gcc.dg/pr53701.c  (revision 0)
***
*** 0 
--- 1,59 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+ /* { dg-options "-O3 -fselective-scheduling2 -fsel-sched-pipelining" } */
+ typedef unsigned short int uint16_t;
+ typedef unsigned long int uintptr_t;
+ typedef struct GFX_VTABLE
+ {
+   int color_depth;
+   unsigned char *line[];
+ }
+ BITMAP;
+ extern int _drawing_mode;
+ extern BITMAP *_drawing_pattern;
+ extern int _drawing_y_anchor;
+ extern unsigned int _drawing_x_mask;
+ extern unsigned int _drawing_y_mask;
+ extern uintptr_t bmp_write_line (BITMAP *, int);
+   void
+ _linear_hline15 (BITMAP * dst, int dx1, int dy, int dx2, int color)
+ {
+   int w;
+   if (_drawing_mode == 0)
+   {
+ int x, curw;
+ unsigned short *sline =
+   (unsigned short *) (_drawing_pattern->
+   line[((dy) -
+ _drawing_y_anchor) & _drawing_y_mask]);
+ unsigned short *s;
+ unsigned short *d =
+   ((unsigned short *) (bmp_write_line (dst, dy)) + (dx1));
+ s = ((unsigned short *) (sline) + (x));
+ if (_drawing_mode == 2)
+ {
+ }
+ else if (_drawing_mode == 3)
+ {
+   do
+   {
+ w -= curw;
+ do
+ {
+   unsigned long c = (*(s));
+   if (!((unsigned long) (c) == 0x7C1F))
+   {
+ (*((uint16_t *) ((uintptr_t) (d))) = ((color)));
+   }
+   ((s)++);
+ }
+ while (--curw > 0);
+ s = sline;
+ curw =
+   (((w) <
+ ((int) _drawing_x_mask +
+  1)) ? (w) : ((int) _drawing_x_mask + 1));
+   }
+   while (curw > 0);
+ }
+   }
+ }
Index: gcc/sel-sched.c
===
*** gcc/sel-sched.c (revision 192488)
--- gcc/sel-sched.c (working copy)
*** process_use_exprs (av_set_t *av_ptr)
*** 3567,3595 
return NULL;
  }

! /* Lookup EXPR in VINSN_VEC and return TRUE if found.  */
  static bool
  vinsn_vec_has_expr_p (vinsn_vec_t vinsn_vec, expr_t expr)
  {
!   vinsn_t vinsn;
int n;

!   FOR_EACH_VEC_ELT (vinsn_t, vinsn_vec, n, vinsn)
! if (VINSN_SEPARABLE_P (vinsn))
!   {
! if (vinsn_equal_p (vinsn, EXPR_VINSN (expr)))
!   return true;
!   }
! else
!   {
! /* For non-separable instructions, the blocking insn can have
!another pattern due to substitution, and we can't choose
!different register as in the above case.  Check all registers
!being written instead.  */
! if (bitmap_intersect_p (VINSN_REG_SETS (vinsn),
! VINSN_REG_SETS (EXPR_VINSN (expr
!   return true;
!   }

return false;
  }
--- 3567,3607 
return NULL;
  }

! /* Lookup EXPR in VINSN_VEC and return TRUE if found.  Also check 
patterns from

!EXP

Fourth ping: Re: Add a configure option to disable system header canonicalizations (issue6495088)

2012-10-16 Thread Simon Baldwin
Ping, again

http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00459.html

--
Google UK Limited | Registered Office: Belgrave House, 76 Buckingham
Palace Road, London SW1W 9TQ | Registered in England Number: 3977902


[AARCH64-4.7] Merge from upstream gcc-4_7-branch r192444

2012-10-16 Thread Sofiane Naci
Hi,

I have just merged upstream gcc-4_7-branch on the aarch64-4.7-branch up to
r192444.

Thanks
Sofiane





Re: Fourth ping: Re: Add a configure option to disable system header canonicalizations (issue6495088)

2012-10-16 Thread Steven Bosscher
On Tue, Oct 16, 2012 at 10:50 AM, Simon Baldwin wrote:
> Ping, again
>
> http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00459.html

Probably you mean the revised patch here:

http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00459.html

The patch look OK to me but I can't approve it.
Technically you're fixing a regression. It'd be helpful if you can
file a PR about it and add the PR number to the ChangeLog entry. That
helps for traceability, and might, eh, motivate release managers to
review your patch :-)

Ciao!
Steven


[AARCH64] Merge from upstream trunk r192445

2012-10-16 Thread Sofiane Naci
Hi,

I have just merged upstream trunk on the aarch64-branch up to r192445.

Thanks
Sofiane





Re: Ping: RFA: Improve doloop_begin support

2012-10-16 Thread Zdenek Dvorak
Hi,

> 2012-09-26  J"orn Rennecke  
> 
> * loop-doloop.c (doloop_modify): Pass doloop_end pattern to
> gen_doloop_begin.
> * loop-doloop.c (doloop_optimize): Pass flag to indicate if loop is
> entered at top to gen_doloop_end.
>   * config/arm/thumb2.md (doloop_end): Accept extra operand.
>   * config/bfin/bfin.md (doloop_end): Likewise.
>   * config/c6x/c6x.md (doloop_end): Likewise.
>   * config/ia64/ia64.md (doloop_end): Likewise.
>   * config/mep/mep.md (doloop_begin, doloop_end): Likewise.
>   * config/rs6000/rs6000.md (doloop_end): Likewise.
>   * config/s390/s390.md (doloop_end): Likewise.
>   * config/sh/sh.md (doloop_end): Likewise.
>   * config/spu/spu.md (doloop_end): Likewise.
>   * config/tilegx/tilegx.md (doloop_end): Likewise.
>   * config/tilepro/tilepro.md (doloop_end): Likewise.
>   * doc/md.texi (doloop_end): Document new operand.
> 
> http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01807.html

+  entered_at_top = loop_preheader_edge (loop)->dest == desc->in_edge->dest;

is equivalent to

+  entered_at_top = loop->header == desc->in_edge->dest;

But, I don't think it will do what you want.  Loops are canonicalized so that
their latch blocks have single successors.  So, desc->in_edge->dest will be
the latch block, not the header, for the loop entered at the top.  I think

+  entered_at_top = (loop->latch == desc->in_edge->dest 
+&& forwarder_block_p (loop->latch));

is what you want.  Other than that, the patch seems ok to me,

Zdenek


Re: [PATCH, ARM] Fix PR44557 (Thumb-1 ICE)

2012-10-16 Thread Chung-Lin Tang
On 12/9/27 6:25 AM, Janis Johnson wrote:
> On 09/26/2012 01:58 AM, Chung-Lin Tang wrote:
> 
> +/* { dg-do compile } */
> +/* { dg-options "-mthumb -O1 -march=armv5te -fno-omit-frame-pointer 
> -fno-forward-propagate" }  */
> +/* { dg-require-effective-target arm_thumb1_ok } */
> 
> This test will fail to compile for test flags that conflict with
> the -march option, and the specified -march option might be
> overridden with similar options from other test flags.  The problem
> might have also been seen for other -march options.  I recommend
> leaving it off and omitting the dg-require so the test can be run
> for more multilibs.

I'm not sure, as the intent is to test a Thumb-1 case here. If the
maintainers think we should adjust the testcase, I'm of course fine with it.

And ping for the patch.

Thanks,
Chung-Lin



[patch] Clean up code looking for call rtx

2012-10-16 Thread Steven Bosscher
Hello,

In RTL land it's not trivial to find a CALL rtx inside a CALL_INSN,
and code to find the CALL is duplicated in a few places. The attached
patch cleans this up.

Bootstrapped&tested on powerpc64-unknown-linux-gnu. OK for trunk?

Ciao!
Steven


P.S. question: when is XEXP(call,0) _not_ a MEM?


get_call_rtx_from.diff
Description: Binary data


Re: [patch] Fix PR rtl-optimization/54870

2012-10-16 Thread Eric Botcazou
> As I'm not sure how to best do that I suggest we do a more proper RTL
> DSE hack by adding a 'libcall-call-escape'-set which we can add to
> instead of calling mark_addressable this late.  We need to add all
> partitions of a decl here, of course, and we need to query it from
> can_escape.

That doesn't pertain only to libcalls though, mark_addressable can also be 
called for regular calls if arguments are callee-copied.  And mark_addressable 
has also been called from expand_asm_operands for ages.

> Well, it just means that the enhanced DSE is fragile :/

Maybe, but the current implementation is the outcome of a discussion between 
Easwaran and you IIRC.  In any case, before attempting to rewrite everything, 
here is another approach that patches things up by extending the usage of the 
TREE_ADDRESSABLE bit from the vars in a partition to the artificial pointer to 
that partition, i.e. the bit is set on the artificial pointer if it is set on 
at least one variable in the partition.

Tested on x86_64-suse-linux.


PR rtl-optimization/54870
* tree.h (TREE_ADDRESSABLE): Document special usage of SSA_NAME.
* cfgexpand.c (update_alias_info_with_stack_vars ): Set it on the
SSA_NAME pointer that points to a partition if there is at least
one variable with it set in the partition.
* dse.c (local_variable_can_escape): New predicate.
(can_escape): Call it.
* gimplify.c (mark_addressable): If this is a partitioned decl, mark
the SSA_NAME pointer that points to a partition as well.


-- 
Eric BotcazouIndex: tree.h
===
--- tree.h	(revision 192447)
+++ tree.h	(working copy)
@@ -484,9 +484,10 @@ struct GTY(()) tree_base {
 
TREE_ADDRESSABLE in
VAR_DECL, PARM_DECL, RESULT_DECL, FUNCTION_DECL, LABEL_DECL
+   SSA_NAME
all types
CONSTRUCTOR, IDENTIFIER_NODE
-   STMT_EXPR, it means we want the result of the enclosed expression
+   STMT_EXPR
 
CALL_EXPR_TAILCALL in
CALL_EXPR
@@ -1085,15 +1086,18 @@ extern void omp_clause_range_check_faile
 /* In VAR_DECL, PARM_DECL and RESULT_DECL nodes, nonzero means address
of this is needed.  So it cannot be in a register.
In a FUNCTION_DECL it has no meaning.
-   In CONSTRUCTOR nodes, it means object constructed must be in memory.
In LABEL_DECL nodes, it means a goto for this label has been seen
from a place outside all binding contours that restore stack levels.
+   In an artificial SSA_NAME that points to a stack partition with at least
+   two variables, it means that at least one variable has TREE_ADDRESSABLE.
In ..._TYPE nodes, it means that objects of this type must be fully
addressable.  This means that pieces of this object cannot go into
register parameters, for example.  If this a function type, this
means that the value must be returned in memory.
+   In CONSTRUCTOR nodes, it means object constructed must be in memory.
In IDENTIFIER_NODEs, this means that some extern decl for this name
-   had its address taken.  That matters for inline functions.  */
+   had its address taken.  That matters for inline functions.
+   In a STMT_EXPR, it means we want the result of the enclosed expression.  */
 #define TREE_ADDRESSABLE(NODE) ((NODE)->base.addressable_flag)
 
 /* Set on a CALL_EXPR if the call is in a tail position, ie. just before the
Index: cfgexpand.c
===
--- cfgexpand.c	(revision 192447)
+++ cfgexpand.c	(working copy)
@@ -635,6 +635,8 @@ update_alias_info_with_stack_vars (void)
 	   (void *)(size_t) uid)) = part;
 	  *((tree *) pointer_map_insert (cfun->gimple_df->decls_to_pointers,
 	 decl)) = name;
+	  if (TREE_ADDRESSABLE (decl))
+	TREE_ADDRESSABLE (name) = 1;
 	}
 
   /* Make the SSA name point to all partition members.  */
Index: dse.c
===
--- dse.c	(revision 192447)
+++ dse.c	(working copy)
@@ -989,7 +989,32 @@ delete_dead_store_insn (insn_info_t insn
   insn_info->wild_read = false;
 }
 
-/* Check if EXPR can possibly escape the current function scope.  */
+/* Return whether DECL, a local variable, can possibly escape the current
+   function scope.  */
+
+static bool
+local_variable_can_escape (tree decl)
+{
+  if (TREE_ADDRESSABLE (decl))
+return true;
+
+  /* If this is a partitioned variable, we need to consider all the variables
+ in the partition.  This is necessary because a store into one of them can
+ be replaced with a store into another and this may not change the outcome
+ of the escape analysis.  */
+  if (cfun->gimple_df->decls_to_pointers != NULL)
+{
+  void *namep
+	= pointer_map_contains (cfun->gimple_df->decls_to_pointers, decl);
+  if (namep)
+	return TREE_ADDRESSABLE (*(tree *)namep);
+}
+
+  return false;
+}
+
+/* Retu

Re: [patch] Clean up code looking for call rtx

2012-10-16 Thread Eric Botcazou
> In RTL land it's not trivial to find a CALL rtx inside a CALL_INSN,
> and code to find the CALL is duplicated in a few places. The attached
> patch cleans this up.
> 
> Bootstrapped&tested on powerpc64-unknown-linux-gnu. OK for trunk?

Sure, thanks.

> P.S. question: when is XEXP(call,0) _not_ a MEM?

Good question.  The manual seems to answer "never":

 A `call' expression has two operands, as follows:

 (call (mem:FM ADDR) NBYTES)

-- 
Eric Botcazou


Re: [RFC PATCH] Add support for sparc compare-and-branch.

2012-10-16 Thread Eric Botcazou
> I've scanned the documentation and there is no indication of any
> preprocessor predefines or anything like that.
> 
> And keep in mind that __VIS__ is our very own invention.
> 
> Sun's compilers never predefined this.
> 
> Their makefiles do for various targets in the MediaLib sources, but that's
> a source tree and header file localized convention.
> 
> Sun also never provided intrinsics other than via assembler inlines in
> their VIS header.  They were never compiler builtins like our's.  The
> user had to define __VIS__ on the command line to get visibility of
> the routines they wanted from Sun's VIS inline assembler header file.
> 
> Sun also does not provide, and is almost certainly not going to ever
> provide crypto intrinsics.
> 
> Therefore there is no convention to follow and we can do whatever we want
> here.

OK, thanks.  I keep thinking that we should use -mcpu=sparc4/-D__sparc4__ for 
new instructions in the SPARC-T4 architecture that aren't related to VIS.

And given Rainer's insight, I agree that switching to the -m{32,64} -xarch= 
scheme with recent assemblers is the way to go.  I'll try and get my hands on 
one of them...

-- 
Eric Botcazou


Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-16 Thread Steven Bosscher
On Mon, Oct 15, 2012 at 3:26 PM, Steven Bosscher wrote:
> On Mon, Oct 15, 2012 at 3:21 PM, Paolo Bonzini wrote:
>> Il 15/10/2012 14:53, Steven Bosscher ha scritto:
>>> I think I've shown above that we're all looking at the wrong pass...
>>
>> I think you have... so we want a patch like this?
>
> I don't think so. df_kill_notes is already supposed to take care of this.

But it doesn't because if the SET_DEST of an insn is the same as the
register dieing in the insn's REG_EQUAL note, the reg is live at the
end of the insn and so the note stays:

Breakpoint 2, df_kill_notes (insn=0x75e3e7e0, live=0x7fffda90)
at ../../trunk/gcc/df-problems.c:2833
2833  rtx *pprev = ®_NOTES (insn);
1: debug_rtx(insn) = (insn 79 50 52 8 (set (reg:DI 72 [ ivtmp.17D.1758 ])
(reg:DI 103 [ ivtmp.17D.1758 ])) -1
 (expr_list:REG_DEAD (reg:DI 103 [ ivtmp.17D.1758 ])
(expr_list:REG_EQUAL (plus:DI (reg:DI 72 [ ivtmp.17D.1758 ])
(const_int 24 [0x18]))
(nil
void
(gdb) undisp 1
(gdb) p debug_bitmap(live)

first = 0x1627200 current = 0x1627200 indx = 0
0x1627200 next = (nil) prev = (nil) indx = 0
bits = { 6 7 16 20 72 82 85 87 }
$2 = void
(gdb)


So GCC should be looking at whether the reg in the REG_EQUAL note is
dead *before* the insn.

Bottom line: This is a bug in df_kill_notes.

Ciao!
Steven


Re: [Patch, Fortran] Fix OPTIONAL, esp. with polymorphism

2012-10-16 Thread Tobias Burnus

Hi Dominique,

Dominique Dhumieres wrote:

the test gfortran.dg/class_optional_1.f90 does not compile

...

but the code seems weird.


I concur – and believe that it is already covered by the other test 
cases. Thus, I removed it.




The code gfortran.dg/class_optional_2.f90 compiles, but
the runtime does not exit (at least after more than 30s).
Finally I have applied the following changes in order
to make it works:

-  call suba2(xa2, alloc=.false., prsnt=.true.)
-  if (.not. allocated (xa2)) call abort ()
-  if (size (xa2) /= 1) call abort ()
-  if (.not. allocated (xa2(1)%i)) call abort ()
-  if (xa2(1)%i /= 5) call abort ()
-  xa2(1)%i = -3
-  call suba2(xa2, alloc=.true., prsnt=.true.)
-  if (allocated (xa2)) call abort ()


This change and the next one, I do not understand; it works here with 
-m32 and -m64 and shows no issues in valgrind. (Contrary to the 
elemental test cases, which show up in valgrind; there it makes sense 
that they fail for you, given that similar test cases also fail for me.)


I have split the sub* test cases into a new file. I think it makes sense 
to understand why they fail - and how.


How do those test cases fail for you? Does this depend on the used 
flags? And can you create a minimal failing test case?



* * *

On October 11, 2012 23:07, Janus Weil wrote:

In the comment, 'alloc_ptr' should be 'optional_alloc_ptr'.


Fixed (twice).


+class_scalar_coarray_to_class (gfc_se *parmse, gfc_expr *e,
+  gfc_typespec class_ts, bool optional)

How about a small comment preceding this function, to shortly describe
its functionality and arguments? And then inside ...


Done.


+  for (ref = e->ref; ref; ref = ref->next)
+{
+  if (ref->type == REF_COMPONENT
+   && ref->u.c.component->ts.type == BT_CLASS)
+   class_ref = ref;
+
+  if (ref->next == NULL)
+   break;
+}

... I guess the last if statement is not needed, since this condition
is already checked by the for loop.


No, it's not the same: As written, "ref" might be non-NULL after the 
loop, without, it will be always NULL.



Again: 'alloc_ptr' -> 'optional_alloc_ptr' in the comment. And how
about a short comment on the 'copyback' argument?


Done.


Build and regtested on x86-64-Linux.

Tobias
2012-10-16  Tobias Burnus  

	PR fortran/50981
	PR fortran/54618
	* trans.h (gfc_conv_derived_to_class, gfc_conv_class_to_class):
	Update prototype.
	* trans-stmt.c (trans_associate_var,gfc_trans_allocate): Update
	calls to those functions.
	* trans-expr.c (gfc_conv_derived_to_class, gfc_conv_class_to_class,
	gfc_conv_expr_present): Handle absent polymorphic arguments.
	(class_scalar_coarray_to_class): New function.
	(gfc_conv_procedure_call): Update calls.

2012-10-16  Tobias Burnus  

	PR fortran/50981
	PR fortran/54618
	* gfortran.dg/class_optional_1.f90: New.
	* gfortran.dg/class_optional_2.f90: New.

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 1178e3d..7532ec7 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -231,12 +231,16 @@ class_array_data_assign (stmtblock_t *block, tree lhs_desc, tree rhs_desc,
 
 /* Takes a derived type expression and returns the address of a temporary
class object of the 'declared' type.  If vptr is not NULL, this is
-   used for the temporary class object.  */ 
+   used for the temporary class object.
+   optional_alloc_ptr is false when the dummy is neither allocatable
+   nor a pointer; that's only relevant for the optional handling.  */
 void
 gfc_conv_derived_to_class (gfc_se *parmse, gfc_expr *e,
-			   gfc_typespec class_ts, tree vptr)
+			   gfc_typespec class_ts, tree vptr, bool optional,
+			   bool optional_alloc_ptr)
 {
   gfc_symbol *vtab;
+  tree cond_optional = NULL_TREE;
   gfc_ss *ss;
   tree ctree;
   tree var;
@@ -269,13 +273,21 @@ gfc_conv_derived_to_class (gfc_se *parmse, gfc_expr *e,
   /* Now set the data field.  */
   ctree =  gfc_class_data_get (var);
 
+  if (optional)
+cond_optional = gfc_conv_expr_present (e->symtree->n.sym);
+
   if (parmse->ss && parmse->ss->info->useflags)
 {
   /* For an array reference in an elemental procedure call we need
 	 to retain the ss to provide the scalarized array reference.  */
   gfc_conv_expr_reference (parmse, e);
   tmp = fold_convert (TREE_TYPE (ctree), parmse->expr);
+  if (optional)
+	tmp = build3_loc (input_location, COND_EXPR, TREE_TYPE (tmp),
+			  cond_optional, tmp,
+			  fold_convert (TREE_TYPE (tmp), null_pointer_node));
   gfc_add_modify (&parmse->pre, ctree, tmp);
+
 }
   else
 {
@@ -293,28 +305,148 @@ gfc_conv_derived_to_class (gfc_se *parmse, gfc_expr *e,
 		gfc_expr_attr (e));
 	  gfc_add_modify (&parmse->pre, gfc_conv_descriptor_dtype (ctree),
 			  gfc_get_dtype (type));
+	  if (optional)
+		parmse->expr = build3_loc (input_location, COND_EXPR,
+	   TREE_TYPE (parmse->expr),
+	   cond_optional, parmse->expr,
+	   fold_convert (TREE_TYPE (pa

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-16 Thread Steven Bosscher
On Tue, Oct 16, 2012 at 12:15 PM, Steven Bosscher wrote:
> On Mon, Oct 15, 2012 at 3:26 PM, Steven Bosscher wrote:
>> On Mon, Oct 15, 2012 at 3:21 PM, Paolo Bonzini wrote:
>>> Il 15/10/2012 14:53, Steven Bosscher ha scritto:
 I think I've shown above that we're all looking at the wrong pass...
>>>
>>> I think you have... so we want a patch like this?
>>
>> I don't think so. df_kill_notes is already supposed to take care of this.
>
> But it doesn't because if the SET_DEST of an insn is the same as the
> register dieing in the insn's REG_EQUAL note, the reg is live at the
> end of the insn and so the note stays:
>
> Breakpoint 2, df_kill_notes (insn=0x75e3e7e0, live=0x7fffda90)
> at ../../trunk/gcc/df-problems.c:2833
> 2833  rtx *pprev = ®_NOTES (insn);
> 1: debug_rtx(insn) = (insn 79 50 52 8 (set (reg:DI 72 [ ivtmp.17D.1758 ])
> (reg:DI 103 [ ivtmp.17D.1758 ])) -1
>  (expr_list:REG_DEAD (reg:DI 103 [ ivtmp.17D.1758 ])
> (expr_list:REG_EQUAL (plus:DI (reg:DI 72 [ ivtmp.17D.1758 ])
> (const_int 24 [0x18]))
> (nil
> void
> (gdb) undisp 1
> (gdb) p debug_bitmap(live)
>
> first = 0x1627200 current = 0x1627200 indx = 0
> 0x1627200 next = (nil) prev = (nil) indx = 0
> bits = { 6 7 16 20 72 82 85 87 }
> $2 = void
> (gdb)
>
>
> So GCC should be looking at whether the reg in the REG_EQUAL note is
> dead *before* the insn.
>
> Bottom line: This is a bug in df_kill_notes.

I think this should fix it. Can't test it right now, so help
appreciated (Honza: hint hint! ;-)

Ciao!
Steven


remove_dead_eq_notes.diff
Description: Binary data


Re: [Patch, Fortran] Fix OPTIONAL, esp. with polymorphism

2012-10-16 Thread Janus Weil
Hi Tobias,

>> +  for (ref = e->ref; ref; ref = ref->next)
>> +{
>> +  if (ref->type == REF_COMPONENT
>> +   && ref->u.c.component->ts.type == BT_CLASS)
>> +   class_ref = ref;
>> +
>> +  if (ref->next == NULL)
>> +   break;
>> +}
>>
>> ... I guess the last if statement is not needed, since this condition
>> is already checked by the for loop.
>
>
> No, it's not the same: As written, "ref" might be non-NULL after the loop,
> without, it will be always NULL.

That's true. However, it seems you don't use the value of 'ref' after
the loop exits ...

Thanks for the updated patch,
Janus


Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Rainer Orth
Diego Novillo  writes:

> On Mon, Oct 15, 2012 at 11:55 AM, Ian Lance Taylor  wrote:
>
>> In my opinion, supporting the full range of GCC testsuite annotations
>> means imposing a lot of mechanism that the Go testsuite does not
>> require.  It would complicate the Go testsuite for no benefit.
>> Anybody who can understand the GCC testsuite annotations can
>> understand the much simpler Go testsuite annotations.
>
> Agreed.  The fact that we have to suffer DejaGNU does not gives the
> right to make other projects miserable.

But importing different upstream testsuites with different annotation
systems allows them to make GCC maintainer's lives miserable?  If this
trend continues, maintainers need to cope with several different
annoation systems with different capabilites instead of a single one,
some of them lacking necessary features which are already present in
DejaGnu (which leads to handling stuff that's just a simple annotation
in DejaGnu in the testsuite drivers instead).  I'm not asking upstreams
to deal with DejaGnu themselves, just to accept that the dg annotations
live in their repos.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Constant-fold vector comparisons

2012-10-16 Thread Richard Biener
On Mon, Oct 15, 2012 at 3:51 PM, Marc Glisse  wrote:
> On Mon, 15 Oct 2012, Richard Biener wrote:
>
>>>else if ((code == BIT_NOT_EXPR
>>> && TYPE_PRECISION (TREE_TYPE (cond)) == 1)
>>>|| (code == BIT_XOR_EXPR
>>> -  && integer_onep (gimple_assign_rhs2 (def_stmt
>>> +  && ((gimple_assign_rhs_code (stmt) == VEC_COND_EXPR)
>>> +  ? integer_all_onesp (gimple_assign_rhs2
>>> (def_stmt))
>>> +  : integer_onep (gimple_assign_rhs2 (def_stmt)
>>
>>
>> I don't think that we can do anything for vectors here.  The non-vector
>> path assumes that the type is a boolean type (thus two-valued), but
>> for vectors we can have arbitrary integer value input.
>
>
> Actually, we just defined VEC_COND_EXPR as taking only vectors of -1 and 0
> as its first argument. So if it takes X^-1 or ~X as first argument (looks
> like I forgot the ~ case), then X is also a vector of -1 and 0.

Ok, indeed.

> I liked your idea of the signed boolean vector, as a way to express that we
> know some vector can only have values 0 and -1, but I am not sure how to use
> it.

Ah no, I didn't mean to suggest that ;)

>
>> Thus, as we defined true to -1 and false to 0 we cannot, unless relaxing
>> what VEC_COND_EXRP treats as true or false, optimize any of ~ or ^ -1 away.
>
>
> It seems to me that what prevents from optimizing is if we want to keep the
> door open for a future relaxation of what VEC_COND_EXPR accepts as its first
> argument. Which means: produce only -1 and 0, but don't assume we are only
> reading -1 and 0 (unless we have a reason to know it, for instance because
> it is the result of a comparison), and don't assume any specific
> interpretation on those other values. Not sure how much that limits possible
> optimizations.

I'm not sure either - I'd rather leave the possibility open until we see
a compelling reason to go either way (read: a testcase where it matters
in practice).

>> Which means I'd prefer if you simply condition the existing ~ and ^
>> handling on COND_EXPR.
>
>
> Ok, that situation probably won't happen soon anyway, I just wanted to do
> something while I was looking at this region of code.

Yes, guarding against VEC_COND_EXPR is definitely needed.

Richard.

> Thanks,
>
> --
> Marc Glisse


Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Diego Novillo

On 2012-10-16 07:05 , Rainer Orth wrote:

Diego Novillo  writes:


On Mon, Oct 15, 2012 at 11:55 AM, Ian Lance Taylor  wrote:


In my opinion, supporting the full range of GCC testsuite annotations
means imposing a lot of mechanism that the Go testsuite does not
require.  It would complicate the Go testsuite for no benefit.
Anybody who can understand the GCC testsuite annotations can
understand the much simpler Go testsuite annotations.


Agreed.  The fact that we have to suffer DejaGNU does not gives the
right to make other projects miserable.


But importing different upstream testsuites with different annotation
systems allows them to make GCC maintainer's lives miserable?


Yes, absolutely.  We have to adapt to upstream's conventions.  If that 
means putting translation layers, the onus is on us.


Alternately, we could import libasan the same way we import things like 
zlib or boehm-gc and then re-write the testsuite.  That makes it harder 
to import newer versions, however.


Finally, we could simply duplicate libasan's testsuite inside 
gcc/testsuite/asan and add our own tests as well.  This may be the more 
practical choice.



Diego.


Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-16 Thread Paolo Bonzini
Il 16/10/2012 12:35, Steven Bosscher ha scritto:
> On Tue, Oct 16, 2012 at 12:15 PM, Steven Bosscher wrote:
>> On Mon, Oct 15, 2012 at 3:26 PM, Steven Bosscher wrote:
>>> On Mon, Oct 15, 2012 at 3:21 PM, Paolo Bonzini wrote:
 Il 15/10/2012 14:53, Steven Bosscher ha scritto:
> I think I've shown above that we're all looking at the wrong pass...

 I think you have... so we want a patch like this?
>>>
>>> I don't think so. df_kill_notes is already supposed to take care of this.
>>
>> But it doesn't because if the SET_DEST of an insn is the same as the
>> register dieing in the insn's REG_EQUAL note, the reg is live at the
>> end of the insn and so the note stays:
>>
>> Breakpoint 2, df_kill_notes (insn=0x75e3e7e0, live=0x7fffda90)
>> at ../../trunk/gcc/df-problems.c:2833
>> 2833  rtx *pprev = ®_NOTES (insn);
>> 1: debug_rtx(insn) = (insn 79 50 52 8 (set (reg:DI 72 [ ivtmp.17D.1758 ])
>> (reg:DI 103 [ ivtmp.17D.1758 ])) -1
>>  (expr_list:REG_DEAD (reg:DI 103 [ ivtmp.17D.1758 ])
>> (expr_list:REG_EQUAL (plus:DI (reg:DI 72 [ ivtmp.17D.1758 ])
>> (const_int 24 [0x18]))
>> (nil
>> void
>> (gdb) undisp 1
>> (gdb) p debug_bitmap(live)
>>
>> first = 0x1627200 current = 0x1627200 indx = 0
>> 0x1627200 next = (nil) prev = (nil) indx = 0
>> bits = { 6 7 16 20 72 82 85 87 }
>> $2 = void
>> (gdb)
>>
>>
>> So GCC should be looking at whether the reg in the REG_EQUAL note is
>> dead *before* the insn.
>>
>> Bottom line: This is a bug in df_kill_notes.

Yep, and it could cause wrong code generation, couldn't it?  Because the
new (reg:DI 72) is obviously not equal to (plus:DI (reg:DI 72)
(const_int 24)), but it could be propagated to subsequent insns.

So I think this patch should be backported to all release branches after
regstrapping.

> I think this should fix it. Can't test it right now, so help
> appreciated (Honza: hint hint! ;-)

Ok after bootstrap, regtest and checking that it actually fixes the bug.

Paolo


Re: [Patch, Fortran] Fix OPTIONAL, esp. with polymorphism

2012-10-16 Thread Dominique Dhumieres
Hi Tobias,

I did not yet appied you latest patch to gfortran, but I ran the new tests
with gfortran patched with the previous patch and both pass (manual
testing without option, but -fcoarray=single). Note that for
class_optional_1.f90, valgrind --leak-check=full gives

==45665== 4 bytes in 1 blocks are definitely lost in loss record 1 of 5
==45665==at 0x100014679: malloc (vg_replace_malloc.c:266)
==45665==by 0x123A7: suba2.1920 (class_optional_1_orig_2.f90:120)
==45665==by 0x10F49: MAIN__ (class_optional_1_orig_2.f90:32)
==45665==by 0x12A3A: main (class_optional_1_orig_2.f90:77)
==45665== 
==45665== 4 bytes in 1 blocks are definitely lost in loss record 2 of 5
==45665==at 0x100014679: malloc (vg_replace_malloc.c:266)
==45665==by 0x11E79: subp2.1897 (class_optional_1_orig_2.f90:152)
==45665==by 0x1132D: MAIN__ (class_optional_1_orig_2.f90:51)
==45665==by 0x12A3A: main (class_optional_1_orig_2.f90:77)
==45665== 
==45665== 4 bytes in 1 blocks are definitely lost in loss record 3 of 5
==45665==at 0x100014679: malloc (vg_replace_malloc.c:266)
==45665==by 0x125D8: subac.1929 (class_optional_1_orig_2.f90:104)
==45665==by 0x115F6: MAIN__ (class_optional_1_orig_2.f90:61)
==45665==by 0x12A3A: main (class_optional_1_orig_2.f90:77)
==45665== 
==45665== 4 bytes in 1 blocks are definitely lost in loss record 4 of 5
==45665==at 0x100014679: malloc (vg_replace_malloc.c:266)
==45665==by 0x128A8: suba2c.1949 (class_optional_1_orig_2.f90:88)
==45665==by 0x11878: MAIN__ (class_optional_1_orig_2.f90:70)
==45665==by 0x12A3A: main (class_optional_1_orig_2.f90:77)

I dont really understand why this test was failing when included in
class_optional_2.f90. The only explanation I have is that it was run
after call s2elem(x) and call s2elem(y), that is no longer run:

635,636c794,795
< !   call s2elem(x) ! FIXME: Conditional jump or move depends on uninitialised 
value
< !   call s2elem(y) ! FIXME: Conditional jump or move depends on uninitialised 
value
---
>call s2elem(x)
>call s2elem(y)

These calls may have corrupted the memory(?).

Thanks for your work on the problem,

Dominique



Re: [patch] Fix PR rtl-optimization/54870

2012-10-16 Thread Richard Biener
On Tue, Oct 16, 2012 at 11:39 AM, Eric Botcazou  wrote:
>> As I'm not sure how to best do that I suggest we do a more proper RTL
>> DSE hack by adding a 'libcall-call-escape'-set which we can add to
>> instead of calling mark_addressable this late.  We need to add all
>> partitions of a decl here, of course, and we need to query it from
>> can_escape.
>
> That doesn't pertain only to libcalls though, mark_addressable can also be
> called for regular calls if arguments are callee-copied.  And mark_addressable
> has also been called from expand_asm_operands for ages.

Hm, I see ...

>> Well, it just means that the enhanced DSE is fragile :/
>
> Maybe, but the current implementation is the outcome of a discussion between
> Easwaran and you IIRC.  In any case, before attempting to rewrite everything,
> here is another approach that patches things up by extending the usage of the
> TREE_ADDRESSABLE bit from the vars in a partition to the artificial pointer to
> that partition, i.e. the bit is set on the artificial pointer if it is set on
> at least one variable in the partition.

Ok, I like that a lot more - can_escape now properly checks all partition vars
and the use of TREE_ADDRESSABLE on the SSA-NAME is a nice idea.

Less ideal is the mark_addressable change but I can't see any better
way to deal with late addressable markings ...

> Tested on x86_64-suse-linux.

... thus, ok!

Thanks,
Richard.

>
> PR rtl-optimization/54870
> * tree.h (TREE_ADDRESSABLE): Document special usage of SSA_NAME.
> * cfgexpand.c (update_alias_info_with_stack_vars ): Set it on the
> SSA_NAME pointer that points to a partition if there is at least
> one variable with it set in the partition.
> * dse.c (local_variable_can_escape): New predicate.
> (can_escape): Call it.
> * gimplify.c (mark_addressable): If this is a partitioned decl, mark
> the SSA_NAME pointer that points to a partition as well.
>
>
> --
> Eric Botcazou


Re: Ping: RFA: Improve doloop_begin support

2012-10-16 Thread Joern Rennecke

Quoting Zdenek Dvorak :


+  entered_at_top = loop_preheader_edge (loop)->dest == desc->in_edge->dest;

is equivalent to

+  entered_at_top = loop->header == desc->in_edge->dest;

But, I don't think it will do what you want.  Loops are canonicalized so that
their latch blocks have single successors.  So, desc->in_edge->dest will be
the latch block, not the header, for the loop entered at the top.  I think

+  entered_at_top = (loop->latch == desc->in_edge->dest
+&& forwarder_block_p (loop->latch));

is what you want.  Other than that, the patch seems ok to me,


I'm pretty sure that I checked that this did what I wanted it to do when
I wrote the code originally.  Alas, that was a few years ago; the  
canonicalization probably has come in in the meantime.


So I made up another test and tried it:

int f(int a, int b)
{
  while (--b >= 0)
{
  a <<= 1;
  a += 42;
}
  return a;
}

The loop appears to be entered at the top, yet both my original and your test
fail.

Looking what happens with your test in more detail, I find that

 loop->latch == desc->in_edge->dest

is true, but forwarder_block_p (loop->latch) fails:

562   if (dest->loop_father->header == dest)
(gdb)
563 return false;
(gdb) p dest
$7 = (basic_block) 0xb7be8198
(gdb) p dest->loop_father->header
$8 = (basic_block) 0xb7be8198

The comment in front of this code says:
  /* Protect loop latches, headers and preheaders.  */

So, presumably, the loop latch will remain a forwarder block precisely because
forwarder_block_p denies that it is one.

So, may I just write:
entered_at_top = (loop->latch == desc->in_edge->dest);


Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Ian Lance Taylor
On Tue, Oct 16, 2012 at 4:05 AM, Rainer Orth
 wrote:
>
> But importing different upstream testsuites with different annotation
> systems allows them to make GCC maintainer's lives miserable?  If this
> trend continues, maintainers need to cope with several different
> annoation systems with different capabilites instead of a single one,
> some of them lacking necessary features which are already present in
> DejaGnu (which leads to handling stuff that's just a simple annotation
> in DejaGnu in the testsuite drivers instead).

Yes, that is true.  But when you say "if this trend continues" you are
making a slippery slope argument that I don't think applies.  To date,
the trend consists of a single example.  We are discussing adding a
second example, and we may decide against it.  There are no current
prospects of a third example.

> I'm not asking upstreams
> to deal with DejaGnu themselves, just to accept that the dg annotations
> live in their repos.

Where it would be untested and unmaintained.  Do you think we would be
happy adding additional annotations to our testsuite for the benefit
of some other project?

Ian


[PATCH v2, ARM] 64-bit shifts in NEON

2012-10-16 Thread Ulrich Weigand
Richard Earnshaw wrote:
> On 20/09/12 16:49, Ulrich Weigand wrote:
> > Richard Earnshaw wrote:
> >
> >> Hmm, this is going to cause bottlenecks on Cortex-A15: writing a Neon
> >> single-precision register and then reading it back as a double-precision
> >> value will cause scheduling problems.
[snip]
> >> A solution to this is to have the set of the shifter register done as a
> >> lane-set operation rather than as a set of the lower register, but it
> >> probably needs some thought as to how to achieve this without creating
> >> other overheads.
> >
> > What instruction are you refering to here?  Loads from memory?
> 
> Yes, if that's the source, or if from another register, something like
> 
>   vmov.32 Dd[0], Rt
> 
> (it doesn't matter that the other lane remains unintialized).  This has
> the advantage that it doesn't clobber the other half of the register.
> 
> If the operand is already known to be in an S-register, then
> vdup(scalar) can be used, but of course that needs a full 64-bit target
> register.

Here's an updated version of the patch that allocated double registers
and uses lane-set or load from memory instructions to fill in the
shift count operand.

Re-tested on arm-linux-gnueabi with no regression.  Re-benchmarked
(the Linaro backport) with results comparable to the original patch.

Would this version be OK?

Bye,
Ulrich


2012-10-16  Andrew Stubbs  
Ulrich Weigand  

* config/arm/arm.c (arm_emit_coreregs_64bit_shift): Fix comment.
* config/arm/arm.md (opt, opt_enabled): New attributes.
(enabled): Use opt_enabled.
(ashldi3, ashrdi3, lshrdi3): Add TARGET_NEON case.
(ashldi3): Allow general operands for TARGET_NEON case.
* config/arm/iterators.md (rshifts): New code iterator.
(shift, shifttype): New code attributes.
* config/arm/neon.md (UNSPEC_LOAD_COUNT): New unspec type.
(neon_load_count, ashldi3_neon_noclobber, ashldi3_neon,
signed_shift_di3_neon, unsigned_shift_di3_neon,
ashrdi3_neon_imm_noclobber, lshrdi3_neon_imm_noclobber,
di3_neon): New patterns.


Index: gcc/config/arm/arm.c
===
*** gcc/config/arm/arm.c(revision 191400)
--- gcc/config/arm/arm.c(working copy)
*** arm_autoinc_modes_ok_p (enum machine_mod
*** 26293,26300 
 Input requirements:
  - It is safe for the input and output to be the same register, but
early-clobber rules apply for the shift amount and scratch registers.
! - Shift by register requires both scratch registers.  Shift by a constant
!   less than 32 in Thumb2 mode requires SCRATCH1 only.  In all other cases
the scratch registers may be NULL.
  - Ashiftrt by a register also clobbers the CC register.  */
  void
--- 26293,26299 
 Input requirements:
  - It is safe for the input and output to be the same register, but
early-clobber rules apply for the shift amount and scratch registers.
! - Shift by register requires both scratch registers.  In all other cases
the scratch registers may be NULL.
  - Ashiftrt by a register also clobbers the CC register.  */
  void
Index: gcc/config/arm/neon.md
===
*** gcc/config/arm/neon.md  (revision 191400)
--- gcc/config/arm/neon.md  (working copy)
***
*** 23,28 
--- 23,29 
  (define_c_enum "unspec" [
UNSPEC_ASHIFT_SIGNED
UNSPEC_ASHIFT_UNSIGNED
+   UNSPEC_LOAD_COUNT
UNSPEC_VABD
UNSPEC_VABDL
UNSPEC_VADD
***
*** 1170,1175 
--- 1171,1359 
DONE;
  })
  
+ ;; 64-bit shifts
+ 
+ ;; This pattern loads a 32-bit shift count into a 64-bit NEON register,
+ ;; leaving the upper half uninitalized.  This is OK since the shift
+ ;; instruction only looks at the low 8 bits anyway.  To avoid confusing
+ ;; data flow analysis however, we pretend the full register is set
+ ;; using an unspec.
+ (define_insn "neon_load_count"
+   [(set (match_operand:DI 0 "s_register_operand" "=w,w")
+ (unspec:DI [(match_operand:SI 1 "nonimmediate_operand" "Um,r")]
+UNSPEC_LOAD_COUNT))]
+   "TARGET_NEON"
+   "@
+vld1.32\t{%P0[0]}, %A1
+vmov.32\t%P0[0], %1"
+   [(set_attr "neon_type" "neon_vld1_vld2_lane,neon_mcr")]
+ )
+ 
+ (define_insn "ashldi3_neon_noclobber"
+   [(set (match_operand:DI 0 "s_register_operand"  "=w,w")
+   (ashift:DI (match_operand:DI 1 "s_register_operand" " w,w")
+  (match_operand:DI 2 "reg_or_int_operand" " i,w")))]
+   "TARGET_NEON && reload_completed
+&& (!CONST_INT_P (operands[2])
+|| (INTVAL (operands[2]) >= 0 && INTVAL (operands[2]) < 64))"
+   "@
+vshl.u64\t%P0, %P1, %2
+vshl.u64\t%P0, %P1, %P2"
+   [(set_attr "neon_type" "neon_vshl_ddd,neon_vshl_ddd")]
+ )
+ 
+ (define_insn_and_split "ashldi3_neon"
+   [(set (match_operand:DI 0 "s_register_operand"   

[commit] Re-work find_reloads_subreg_address

2012-10-16 Thread Ulrich Weigand
Tejas Belagod wrote:
> > Ulrich Weigand wrote:
> >> The following patch implements this idea; it passes a basic regression
> >> test on arm-linux-gnueabi.  (Obviously this would need a lot more
> >> testing on various platforms before getting into mainline ...)
> >>
> >> Can you have a look whether this fixes the problem you're seeing?
[snip]

> Sorry for the delay in replying. These new issues that I was seeing were
> bugs specific to my code that I've fixed. Your patch seems to work fine.
> Thanks a lot for the patch.

I've now checked the patch in to mainline.   In addition to your test on
aarch64, I've completed tests without regression on ppc(64)-linux,
s390(x)-linux, and i386-linux (with Uros' patch).

Note that Uros' patch here:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01463.html
is a pre-requisite for the reload patch to avoid regressions on i386.

Current version (nearly unchanged) of the patch appended below.

Bye,
Ulrich

 
ChangeLog:

* reload.c (find_reloads_subreg_address): Remove FORCE_REPLACE
parameter.  Always replace normal subreg with memory reference
whenever possible.  Return NULL otherwise.
(find_reloads_toplev): Always call find_reloads_subreg_address
for subregs of registers equivalent to a memory location.
Only recurse further if find_reloads_subreg_address fails.
(find_reloads_address_1): Only call find_reloads_subreg_address
for subregs of registers equivalent to a memory location.
Properly handle failure of find_reloads_subreg_address.

Index: gcc/reload.c
===
*** gcc/reload.c(revision 191665)
--- gcc/reload.c(working copy)
*** static int find_reloads_address_1 (enum 
*** 282,288 
  static void find_reloads_address_part (rtx, rtx *, enum reg_class,
   enum machine_mode, int,
   enum reload_type, int);
! static rtx find_reloads_subreg_address (rtx, int, int, enum reload_type,
int, rtx, int *);
  static void copy_replacements_1 (rtx *, rtx *, int);
  static int find_inc_amount (rtx, rtx);
--- 282,288 
  static void find_reloads_address_part (rtx, rtx *, enum reg_class,
   enum machine_mode, int,
   enum reload_type, int);
! static rtx find_reloads_subreg_address (rtx, int, enum reload_type,
int, rtx, int *);
  static void copy_replacements_1 (rtx *, rtx *, int);
  static int find_inc_amount (rtx, rtx);
*** find_reloads_toplev (rtx x, int opnum, e
*** 4810,4840 
}
  
/* If the subreg contains a reg that will be converted to a mem,
!convert the subreg to a narrower memref now.
!Otherwise, we would get (subreg (mem ...) ...),
!which would force reload of the mem.
! 
!We also need to do this if there is an equivalent MEM that is
!not offsettable.  In that case, alter_subreg would produce an
!invalid address on big-endian machines.
! 
!For machines that extend byte loads, we must not reload using
!a wider mode if we have a paradoxical SUBREG.  find_reloads will
!force a reload in that case.  So we should not do anything here.  */
  
if (regno >= FIRST_PSEUDO_REGISTER
! #ifdef LOAD_EXTEND_OP
! && !paradoxical_subreg_p (x)
! #endif
! && (reg_equiv_address (regno) != 0
! || (reg_equiv_mem (regno) != 0
! && (! strict_memory_address_addr_space_p
! (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0),
!  MEM_ADDR_SPACE (reg_equiv_mem (regno)))
! || ! offsettable_memref_p (reg_equiv_mem (regno))
! || num_not_at_initial_offset
!   x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels,
!  insn, address_reloaded);
  }
  
for (copied = 0, i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
--- 4810,4828 
}
  
/* If the subreg contains a reg that will be converted to a mem,
!attempt to convert the whole subreg to a (narrower or wider)
!memory reference instead.  If this succeeds, we're done --
!otherwise fall through to check whether the inner reg still
!needs address reloads anyway.  */
  
if (regno >= FIRST_PSEUDO_REGISTER
! && reg_equiv_memory_loc (regno) != 0)
!   {
! tem = find_reloads_subreg_address (x, opnum, type, ind_levels,
!insn, address_reloaded);
! if (tem)
!   return tem;
!   }
  }
  
for (copied = 0, i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
*** find_reloads_address_1 (enum machine_mod
*** 6070,6081 
  if (ira_reg_class_max_nregs 

Re: Ping: RFA: Improve doloop_begin support

2012-10-16 Thread Zdenek Dvorak
Hi,

> The loop appears to be entered at the top, yet both my original and your test
> fail.
> 
> Looking what happens with your test in more detail, I find that
> 
>  loop->latch == desc->in_edge->dest
> 
> is true, but forwarder_block_p (loop->latch) fails:
> 
> 562   if (dest->loop_father->header == dest)
> (gdb)
> 563 return false;
> (gdb) p dest
> $7 = (basic_block) 0xb7be8198
> (gdb) p dest->loop_father->header
> $8 = (basic_block) 0xb7be8198
> 
> The comment in front of this code says:
>   /* Protect loop latches, headers and preheaders.  */
> 
> So, presumably, the loop latch will remain a forwarder block precisely because
> forwarder_block_p denies that it is one.
> 
> So, may I just write:
> entered_at_top = (loop->latch == desc->in_edge->dest);

no -- you should also test that latch contains no active insns.  I.e., factorize
out whatever forwarder_block_p does except for the test 
"(dest->loop_father->header == dest)" test,

Zdenek


Re: [commit] Re-work find_reloads_subreg_address

2012-10-16 Thread Tejas Belagod

Ulrich Weigand wrote:

Tejas Belagod wrote:

Ulrich Weigand wrote:

The following patch implements this idea; it passes a basic regression
test on arm-linux-gnueabi.  (Obviously this would need a lot more
testing on various platforms before getting into mainline ...)

Can you have a look whether this fixes the problem you're seeing?

[snip]


Sorry for the delay in replying. These new issues that I was seeing were
bugs specific to my code that I've fixed. Your patch seems to work fine.
Thanks a lot for the patch.


I've now checked the patch in to mainline.   In addition to your test on
aarch64, I've completed tests without regression on ppc(64)-linux,
s390(x)-linux, and i386-linux (with Uros' patch).

Note that Uros' patch here:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01463.html
is a pre-requisite for the reload patch to avoid regressions on i386.

Current version (nearly unchanged) of the patch appended below.



Thank you.

Tejas Belagod,
ARM.



Re: Ping: RFA: Improve doloop_begin support

2012-10-16 Thread Joern Rennecke

Quoting Zdenek Dvorak :

no -- you should also test that latch contains no active insns.
I.e., factorize
out whatever forwarder_block_p does except for the test   
"(dest->loop_father->header == dest)" test,


Like this?

* basic-block.h (forwarder_block_p_1): Declare.
* cfgrtl.c (orwarder_block_p_1): New function, factored out of ...
(orwarder_block_p): ... here.





--- cfgrtl.c-old2012-10-16 15:21:06.025532573 +0100
+++ cfgrtl.c2012-10-16 15:23:23.135529867 +0100
@@ -541,10 +541,9 @@ flow_active_insn_p (const_rtx insn)
 
 /* Return true if the block has no effect and only forwards control flow to
its single destination.  */
-/* FIXME: Make this a cfg hook.  */
 
 bool
-forwarder_block_p (const_basic_block bb)
+forwarder_block_p_1 (const_basic_block bb)
 {
   rtx insn;
 
@@ -552,6 +551,24 @@ forwarder_block_p (const_basic_block bb)
   || !single_succ_p (bb))
 return false;
 
+  for (insn = BB_HEAD (bb); insn != BB_END (bb); insn = NEXT_INSN (insn))
+if (INSN_P (insn) && flow_active_insn_p (insn))
+  return false;
+
+  return (!INSN_P (insn)
+ || (JUMP_P (insn) && simplejump_p (insn))
+ || !flow_active_insn_p (insn));
+}
+
+/* Likewise, but protect loop latches, headers and preheaders.  */
+/* FIXME: Make this a cfg hook.  */
+
+bool
+forwarder_block_p (const_basic_block bb)
+{
+  if (!forwarder_block_p_1 (bb))
+return false;
+
   /* Protect loop latches, headers and preheaders.  */
   if (current_loops)
 {
@@ -563,13 +580,7 @@ forwarder_block_p (const_basic_block bb)
return false;
 }
 
-  for (insn = BB_HEAD (bb); insn != BB_END (bb); insn = NEXT_INSN (insn))
-if (INSN_P (insn) && flow_active_insn_p (insn))
-  return false;
-
-  return (!INSN_P (insn)
- || (JUMP_P (insn) && simplejump_p (insn))
- || !flow_active_insn_p (insn));
+  return true;
 }
 
 /* Return nonzero if we can reach target from src by falling through.  */


Re: Ping: RFA: Improve doloop_begin support

2012-10-16 Thread Zdenek Dvorak
> Quoting Zdenek Dvorak :
> 
> >no -- you should also test that latch contains no active insns.
> >I.e., factorize
> >out whatever forwarder_block_p does except for the test
> >"(dest->loop_father->header == dest)" test,
> 
> Like this?
> 
> * basic-block.h (forwarder_block_p_1): Declare.
> * cfgrtl.c (orwarder_block_p_1): New function, factored out of ...
> (orwarder_block_p): ... here.

yes (except maybe giving it some more descriptive name than forwarder_block_p_1,
say "contains_no_active_insn_p" or something),

Zdenek


Re: Ping: RFA: Improve doloop_begin support

2012-10-16 Thread Richard Biener
On Tue, Oct 16, 2012 at 4:30 PM, Joern Rennecke
 wrote:
> Quoting Zdenek Dvorak :
>
>> no -- you should also test that latch contains no active insns.   I.e.,
>> factorize
>> out whatever forwarder_block_p does except for the test
>> "(dest->loop_father->header == dest)" test,
>
>
> Like this?

ISTR the tree level has a predicate like can_remove_forwarder_block
and forwarder_block_p.  I suppose on the RTL level they are merged
for cfgcleanup.

> * basic-block.h (forwarder_block_p_1): Declare.
> * cfgrtl.c (orwarder_block_p_1): New function, factored out of ...
> (orwarder_block_p): ... here.
>
>
>
>
>
>
> --- cfgrtl.c-old2012-10-16 15:21:06.025532573 +0100
> +++ cfgrtl.c2012-10-16 15:23:23.135529867 +0100
> @@ -541,10 +541,9 @@ flow_active_insn_p (const_rtx insn)
>
>  /* Return true if the block has no effect and only forwards control flow to
> its single destination.  */
> -/* FIXME: Make this a cfg hook.  */
>
>  bool
> -forwarder_block_p (const_basic_block bb)
> +forwarder_block_p_1 (const_basic_block bb)
>  {
>rtx insn;
>
> @@ -552,6 +551,24 @@ forwarder_block_p (const_basic_block bb)
>|| !single_succ_p (bb))
>  return false;
>
> +  for (insn = BB_HEAD (bb); insn != BB_END (bb); insn = NEXT_INSN (insn))
> +if (INSN_P (insn) && flow_active_insn_p (insn))
> +  return false;
> +
> +  return (!INSN_P (insn)
> + || (JUMP_P (insn) && simplejump_p (insn))
> + || !flow_active_insn_p (insn));
> +}
> +
> +/* Likewise, but protect loop latches, headers and preheaders.  */
> +/* FIXME: Make this a cfg hook.  */
> +
> +bool
> +forwarder_block_p (const_basic_block bb)
> +{
> +  if (!forwarder_block_p_1 (bb))
> +return false;
> +
>/* Protect loop latches, headers and preheaders.  */
>if (current_loops)
>  {
> @@ -563,13 +580,7 @@ forwarder_block_p (const_basic_block bb)
> return false;
>  }
>
> -  for (insn = BB_HEAD (bb); insn != BB_END (bb); insn = NEXT_INSN (insn))
> -if (INSN_P (insn) && flow_active_insn_p (insn))
> -  return false;
> -
> -  return (!INSN_P (insn)
> - || (JUMP_P (insn) && simplejump_p (insn))
> - || !flow_active_insn_p (insn));
> +  return true;
>  }
>
>  /* Return nonzero if we can reach target from src by falling through.  */
>


Re: LangEnabledBy with arguments

2012-10-16 Thread Joseph S. Myers
On Sun, 14 Oct 2012, Manuel L?pez-Ib??ez wrote:

> 2012-10-14  Manuel L?pez-Ib??ez  
> 
>   PR c/53063
>   PR c/40989
> gcc/
>   * optc-gen.awk: Handle new form of LangEnabledBy.
>   * opts.c (set_Wstrict_aliasing): Declare here. Make static.
>   * common.opt (Wstrict-aliasing=,Wstrict-overflow=): Do not use Init.
>   * doc/options.texi (LangEnabledBy): Document new form.
>   * flags.h (set_Wstrict_aliasing): Do not declare.
> c-family/
>   * c.opt (Wstrict-aliasing=,Wstrict-overflow=): Use LangEnabledBy.
>   * c-opts.c (c_common_handle_option): Do not set them here. Add
>   comment.
>   (c_common_post_options): Likewise.
> testsuite/
>   * gcc.dg/Wstrict-overflow-24.c: New.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH][RFC] Re-organize how we stream trees in LTO

2012-10-16 Thread Richard Biener

This patch shows work-in-progress (read: implementation uglyness
hopefully to vanish ...) regarding to moving LTO type merging
work from WPA to compile stage.

The patch re-organizes lto_output_tree (the write_tree streamer
hook for LTO) in a way so that we output all tree fields
in easy to discover places.  This means that we have forward
references to trees not yet (fully) written, something the
current state avoids by "inline-expanding" forward referenced
trees at the point of reference (which results in interweaved
trees on disk).  To be able to achieve this lto_output_tree
now performs a DFS walk along tree edges to collect trees
that need to be output and outputs them in SCC chunks
in a new LTO stream container, LTO_tree_scc.

This will allow the reader side at WPA stage to call uniquify
nodes on each tree SCC, avoiding completely the DFS walks done
there (hopefully, we do DFS walks for both hashing and comparing
there - note that WPA gathers type SCCs, not tree SCCs - a subtle
difference that remains to be seen on whether it is important ...)

One complication is how we handle streaming INTEGER_CSTs.
When an INTEGER_CST refers to an already input type we can
allocate it using the build_int_cst_wide routine which takes
care of putting it into the appropriate hashtables for caching.
If an INTEGER_CST is part of a SCC (TYPE_DOMAIN values are
the most obvious cases) we now stream them as regular trees
(they cannot already exist in any cache as the type is being
read in) - but the patch does not yet populate the caches
in case the type ends up prevailing (thus it will produce
some unshared INTEGER_CSTs for now).

One uglyness of the patch is how I mis-use the streamer hooks
to switch the tree streamer routines from actually writing
tree references to performing the DFS walk.  For the DFS walk
I need information on the edge, thus a 'from' tree argument
is added to the write_tree hook.  Eventually my plan was
to transform this to a walk_tree-like interface.

Diego - is PTH still live?  Thus, do I need to bother about
inventing things in a way that can be hook-ized?  Do you
forsee any complication for PTH or C++ modules the way
I re-arranged things?  Any good (C++) ideas on how to
design this lto_walk_tree?

The patch below survives LTO bootstrap (ok, we are only
in stage2 yet ...) so it serves as suitable vehicle for
gathering statistics about tree SCC sizes.  It's probably
the state I'll build on re-organizing uniquify_nodes to
avoid the various DFS walks done there.

Any comments welcome.

Thanks,
Richard.

Index: trunk/gcc/streamer-hooks.h
===
*** trunk.orig/gcc/streamer-hooks.h 2012-10-15 16:17:56.0 +0200
--- trunk/gcc/streamer-hooks.h  2012-10-16 14:42:15.571464478 +0200
*** struct streamer_hooks {
*** 44,50 
   tree itself.  The second boolean parameter specifies this for
   the tree itself, the first for all siblings that are streamed.
   The referencing mechanism is up to each streamer to implement.  */
!   void (*write_tree) (struct output_block *, tree, bool, bool);
  
/* [REQ] Called by every tree streaming routine that needs to read
   a tree node.  It takes two arguments: an lto_input_block pointing
--- 44,50 
   tree itself.  The second boolean parameter specifies this for
   the tree itself, the first for all siblings that are streamed.
   The referencing mechanism is up to each streamer to implement.  */
!   void (*write_tree) (struct output_block *, tree, tree, bool, bool);
  
/* [REQ] Called by every tree streaming routine that needs to read
   a tree node.  It takes two arguments: an lto_input_block pointing
*** struct streamer_hooks {
*** 61,70 
  };
  
  #define stream_write_tree(OB, EXPR, REF_P) \
! streamer_hooks.write_tree(OB, EXPR, REF_P, REF_P)
  
  #define stream_write_tree_shallow_non_ref(OB, EXPR, REF_P) \
! streamer_hooks.write_tree(OB, EXPR, REF_P, false)
  
  #define stream_read_tree(IB, DATA_IN) \
  streamer_hooks.read_tree(IB, DATA_IN)
--- 61,76 
  };
  
  #define stream_write_tree(OB, EXPR, REF_P) \
! streamer_hooks.write_tree(OB, NULL_TREE, EXPR, REF_P, REF_P)
  
  #define stream_write_tree_shallow_non_ref(OB, EXPR, REF_P) \
! streamer_hooks.write_tree(OB, NULL_TREE, EXPR, REF_P, false)
! 
! #define stream_write_tree_edge(OB, FROM, TO, REF_P) \
! streamer_hooks.write_tree(OB, FROM, TO, REF_P, REF_P)
! 
! #define stream_write_tree_edge_shallow_non_ref(OB, FROM, TO, REF_P) \
! streamer_hooks.write_tree(OB, FROM, TO, REF_P, false)
  
  #define stream_read_tree(IB, DATA_IN) \
  streamer_hooks.read_tree(IB, DATA_IN)
Index: trunk/gcc/tree-streamer-out.c
===
*** trunk.orig/gcc/tree-streamer-out.c  2012-10-15 16:17:56.0 +0200
--- trunk/gcc/tree-streamer-out.c   2012-10-16 15:44:51.688240278 +0200
*** pack_t

Re: EnabledBy(Wa && Wb)

2012-10-16 Thread Joseph S. Myers
On Mon, 15 Oct 2012, Manuel L?pez-Ib??ez wrote:

> 2012-10-15  Manuel L?pez-Ib??ez  
> 
>   PR c/53063
>   PR c/40989
>   * doc/options.texi (EnabledBy): Document new form.
>   * optc-gen.awk: Handle new form of EnabledBy.
>   * common.opt (Wunused-but-set-parameter): Use EnabledBy.
>   (Wunused-parameter): Likewise.
>   * opts.c (finish_options): Do not handle them explicitly.
>   * opt-functions.awk (search_var_name): New.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Rs6000 infrastructure cleanup (switches), revised patch #4

2012-10-16 Thread Joseph S. Myers
On Mon, 15 Oct 2012, Michael Meissner wrote:

> > I think changing the names is safer - it's immediately obvious as a build 
> > failure if you missed anything.  If you have MASK_* names for bits in more 
> > than one flags variable, there's a risk of accidentally testing a bit in 
> > the wrong variable, or ORing together bits that belong in different 
> > variables in a way that can't possibly work, without this causing 
> > immediately visible problems.  Maybe you're actually only using the names 
> > for a single variable, but it still seems error-prone for future changes.
> 
> Well to be safest, we should have a prefix for each word if you define more
> than one flag word.  Preferably a name that the user can specify in the .opt
> file.

Yes, for MASK_*.

> > I guess TARGET_* names should be safe in a way that MASK_* ones aren't for 
> > multiple variables - but then I wouldn't have options to do things two 
> > different ways, but instead use TARGET_* instead of OPTION_* and fix 
> > existing uses of OPTION_* for such bits.
> 
> I can see the MASK/OPTION_MASK thing, but not having TARGET_* means there are
> lots and lots of code changes.
> 
> Unfortunately in order to bring the number of changes down to a point where 
> the
> patches can be reviewed, my previous patches did:
> 
> #define TARGET_FOO OPTION_FOO
> #define MASK_FOO OPTION_MASK_FOO

The first of those #defines might be an intermediate step towards actually 
replacing OPTION_FOO by TARGET_FOO everywhere (since there seems to be no 
actual need for the different naming convention there, only for the 
masks).  But I don't really think we should delay the mechanical 
replacement much (changing all OPTION_* that aren't OPTION_MASK_* to be 
TARGET_* should be a straightforward change to make automatically, 
although a pain to review the results of that substitution so maybe best 
kept in a separate patch from one doing anything more substantive).

That is:

1. Patch adding TARGET_FOO aliases for OPTION_FOO (small change to the awk 
scripts and associated documentation, I expect).

2. Large, mechanical, automatically generated patch to change existing 
OPTION_FOO users (or maybe one such patch per target).

3. Patch removing the OPTION_FOO name (small change to awk scripts and 
documentation).

Then you've eliminated one unnecessary cause of changes when moving bits 
out of target_flags.

> If TargetName were defined, it would use TARGET_ instead of OPTION_,
> but the OPTION_MASK_ would not be changed.

Not needed, given the above sequence of changes.

> If SetFunction was defined, the opt*.awk files would generate:
> 
> #define SET_FOO(VALUE)\
> do {  \
>   if (VALUE)  \
> target_flags &= ~MASK_FOO;\
>   else\
> target_flags |= MASK_FOO; \
> } while (0)
> 
> If ExplicitFunction was defined, the opt*.awk files would generate:
> 
> #define EXPLICIT_FOO(VALUE)   \
>   ((global_options_set.x_target_flags & MASK_FOO) != 0)

I'd like any such new macros to take an argument that's the pointer to the 
relevant options structure (global_options, global_options_set).  If the 
place where the macro is called has a pointer available, then it can be 
passed in, otherwise pass in &global_options or &global_options_set unless 
and until such a pointer becomes available in the relevant place.

> How would you feel about SetFunction, ExplicitFunction, and the reduced
> TargetName?

The principle of having macros for setting flags or testing if they are 
explicitly set is fine, though it's not clear to me that they need any 
such special settings as SetFunction and ExplicitFunction (rather than 
being generated unconditionally).

I'd quite like the macros such as target_flags that refer to global 
options to end up not being lvalues at all.  That helps ensure that option 
settings are only modified in limited places that have options pointers.  
It would be nice eventually for such things as "optimize" and "target" 
attributes to be able to swap options structures, and to work closer to 
how options on the command line are processed - for that, you want careful 
control on what places actually modify options at all.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][RFC] Re-organize how we stream trees in LTO

2012-10-16 Thread Richard Biener
On Tue, 16 Oct 2012, Richard Biener wrote:

> 
> This patch shows work-in-progress (read: implementation uglyness
> hopefully to vanish ...) regarding to moving LTO type merging
> work from WPA to compile stage.
> 
> The patch re-organizes lto_output_tree (the write_tree streamer
> hook for LTO) in a way so that we output all tree fields
> in easy to discover places.  This means that we have forward
> references to trees not yet (fully) written, something the
> current state avoids by "inline-expanding" forward referenced
> trees at the point of reference (which results in interweaved
> trees on disk).  To be able to achieve this lto_output_tree
> now performs a DFS walk along tree edges to collect trees
> that need to be output and outputs them in SCC chunks
> in a new LTO stream container, LTO_tree_scc.
> 
> This will allow the reader side at WPA stage to call uniquify
> nodes on each tree SCC, avoiding completely the DFS walks done
> there (hopefully, we do DFS walks for both hashing and comparing
> there - note that WPA gathers type SCCs, not tree SCCs - a subtle
> difference that remains to be seen on whether it is important ...)
> 
> One complication is how we handle streaming INTEGER_CSTs.
> When an INTEGER_CST refers to an already input type we can
> allocate it using the build_int_cst_wide routine which takes
> care of putting it into the appropriate hashtables for caching.
> If an INTEGER_CST is part of a SCC (TYPE_DOMAIN values are
> the most obvious cases) we now stream them as regular trees
> (they cannot already exist in any cache as the type is being
> read in) - but the patch does not yet populate the caches
> in case the type ends up prevailing (thus it will produce
> some unshared INTEGER_CSTs for now).
> 
> One uglyness of the patch is how I mis-use the streamer hooks
> to switch the tree streamer routines from actually writing
> tree references to performing the DFS walk.  For the DFS walk
> I need information on the edge, thus a 'from' tree argument
> is added to the write_tree hook.  Eventually my plan was
> to transform this to a walk_tree-like interface.
> 
> Diego - is PTH still live?  Thus, do I need to bother about
> inventing things in a way that can be hook-ized?  Do you
> forsee any complication for PTH or C++ modules the way
> I re-arranged things?  Any good (C++) ideas on how to
> design this lto_walk_tree?
> 
> The patch below survives LTO bootstrap (ok, we are only
> in stage2 yet ...) so it serves as suitable vehicle for
> gathering statistics about tree SCC sizes.  It's probably
> the state I'll build on re-organizing uniquify_nodes to
> avoid the various DFS walks done there.

To followup myself, here are statistics for WPAing cc1:

There are a total of 2530667 SCCs, 2405450 of them are
singletons (that's 95%), 98% have less than 5 trees.
The biggest SCCs have 4178 trees.

2405450 1
  31929 2
  35922 3
  15554 4
   7648 5
   6288 6
   5865 7
   4677 8
   1819 9
   3529 10
659 11
   1467 12
916 13
320 14
305 15
940 16
138 17
815 18
434 19
341 20
109 21
375 22
...
  1 1452
  1 1466
  1 1470
  1 1544
260 1892
  1 2042
  5 2090
 24 4178

LTO file-sizes grow with the current patch and it's certainly
worth optimizing the singleton case some more.  It will definitely
be worth optimizing singletons on the merging side as well.

The above info was collected by adding

  if (flag_wpa)
fprintf (stderr, "%lu\n", size);

to the LTO_tree_scc path in lto_input_tree.

Richard.


[AARCH64] Fix ICE in aarch64_split_doubleword_move

2012-10-16 Thread Marcus Shawcroft
I've just committed this patch to aarch64-trunk to resolve an ICE in 
aarch64_split_doubleword_move when attempting to split v->v moves.


/Marcus

2012-10-16  Marcus Shawcroft 



   * config/aarch64/aarch64-protos.h (aarch64_split_doubleword_move):
   Rename to aarch64_split_128bit_move.
   (aarch64_split_128bit_move_p): New.
   * config/aarch64/aarch64.c (aarch64_split_doubleword_move):
   Rename to aarch64_split_128bit_move.
   (aarch64_split_128bit_move_p): New.
   * config/aarch64/aarch64.md: Adjust TImode move split.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index e6d35e4..712d2f6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -245,7 +245,9 @@ void aarch64_simd_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
 /* Emit code for reinterprets.  */
 void aarch64_simd_reinterpret (rtx, rtx);
 
-void aarch64_split_doubleword_move (rtx, rtx);
+void aarch64_split_128bit_move (rtx, rtx);
+
+bool aarch64_split_128bit_move_p (rtx, rtx);
 
 #if defined (RTX_CODE)
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a766b7e..b36be90 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -537,7 +537,7 @@ aarch64_emit_move (rtx dest, rtx src)
 }
 
 void
-aarch64_split_doubleword_move (rtx dst, rtx src)
+aarch64_split_128bit_move (rtx dst, rtx src)
 {
   rtx low_dst;
 
@@ -569,7 +569,7 @@ aarch64_split_doubleword_move (rtx dst, rtx src)
 	}
   /* Fall through to r -> r cases.  */
 }
-
+
   low_dst = gen_lowpart (word_mode, dst);
   if (REG_P (low_dst)
   && reg_overlap_mentioned_p (low_dst, src))
@@ -586,6 +586,13 @@ aarch64_split_doubleword_move (rtx dst, rtx src)
 }
 }
 
+bool
+aarch64_split_128bit_move_p (rtx dst, rtx src)
+{
+  return (! REG_P (src)
+	  || ! (FP_REGNUM_P (REGNO (dst)) && FP_REGNUM_P (REGNO (src;
+}
+
 static rtx
 aarch64_force_temporary (rtx x, rtx value)
 {
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5c92a5b..1669726 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -947,10 +947,10 @@
 (define_split
[(set (match_operand:TI 0 "register_operand" "")
 	 (match_operand:TI 1 "aarch64_reg_or_imm" ""))]
-  "reload_completed"
+  "reload_completed && aarch64_split_128bit_move_p (operands[0], operands[1])"
   [(const_int 0)]
 {
-  aarch64_split_doubleword_move (operands[0], operands[1]);
+  aarch64_split_128bit_move (operands[0], operands[1]);
   DONE;
 })
 
-- 
1.7.12.rc0.22.gcdd159b


Re: [lra] patch to revert a code from previous patch.

2012-10-16 Thread Vladimir Makarov

On 10/16/2012 02:44 AM, Richard Sandiford wrote:

Vladimir Makarov  writes:

On 12-10-15 4:25 PM, Richard Sandiford wrote:

Richard Sandiford  writes:

Vladimir Makarov  writes:

 After committing a patch yesterday to implement proposals from a
review, I found that GCC crashes on SPEC2000 gap.  LRA is trying to find
a mode of operand (const_int 1) in *lea_general_1 insn and can not find
it as the operand and insn template operand has VOIDmode.

There are still cases when context lookup is necessary to find a mode of
the operand.  So I am reversing the change I did yesterday.

The patch is committed as rev. 192462.

2012-10-15  Vladimir Makarov  

   * lra-int.h (lra_get_mode): Remove.
   * lra-constraints.c (find_mode, get_op_mode): New functions.
   (match_reload): Use get_op_mode instead of lra_get_mode.
   (process_alt_operands, curr_insn_transform): Ditto.

But my objection to this code still stands.  It's wrong to assume
that an operand to an rtx has the same mode as the containing rtx.

Please add a testcase that shows the problem.

(...because I was hoping to have a look myself).  But if that's too
difficult to reduce, then which operand to *lea_general_1 was the problem?
The pattern looks like:

(define_insn_and_split "*lea_general_1"
[(set (match_operand 0 "register_operand" "=r")
(plus (plus (match_operand 1 "index_register_operand" "l")
(match_operand 2 "register_operand" "r"))
  (match_operand 3 "immediate_operand" "i")))]

So operands 0, 1 and 2 should have been registers.  Operand 3 never
needs reloading, so its mode shouldn't matter.


In this case the const needs a reload as it was a pseudo substituted by
equiv constant.

But in that case the operand modes we needed were there in the original
instruction operands.  If equiv substitution is causing us to lose track
of those operand modes, then that needs to be fixed.

If the pattern had instead been:

   (set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand 1 "register_operand" "r")] UNSPEC_FOO))

and equiv substitution replaced operand 1 with a const_int without
the original mode being recorded anywhere, then we'd have no way
of recovering that mode from the pattern itself, because the modes
of unspec parameters are entirely target-specific.

I remember I thought about this.  The case is rare, it is better to 
calculate than keep mode for each operand of each insn.  LRA speed is 
achieved by minimizations of keeping data for each operand.  We keep 
only locations. I believe it is a better approach.




[go] bulitins housekeeping; add __bultin_unreachable

2012-10-16 Thread Jan Hubicka
Hi,
this patch udpates go-frontend to deifine unreachable bultin I need for loop
and LTO optimizations.  I also noticed that GO ignores existence of all flags
except for CONST and thus I synchronized the flags with C FE variants.

(I plan to use set_call_expr_flags in other FEs too)

Regtested x86_64-linux, OK?

Honza

* gofrontend/gogo-tree.cc (define_bultin): Accept ECF flags and
use set_call_expr_flags.
(define_builtin_function_trees): Update; add BULIT_IN_UNREACHALE.

* calls.c (set_call_expr_flags): New.
* tree.h (set_call_expr_flags): Declare.
Index: go/gofrontend/gogo-tree.cc
===
--- go/gofrontend/gogo-tree.cc  (revision 192483)
+++ go/gofrontend/gogo-tree.cc  (working copy)
@@ -50,24 +50,22 @@ static std::map built
 // defined by builtins.def.  NAME is the name of the builtin function.
 // LIBNAME is the name of the corresponding library function, and is
 // NULL if there isn't one.  FNTYPE is the type of the function.
-// CONST_P is true if the function has the const attribute.
+// FLAGS specify fnction attributes.
 
 static void
 define_builtin(built_in_function bcode, const char* name, const char* libname,
-  tree fntype, bool const_p)
+  tree fntype, int flags)
 {
   tree decl = add_builtin_function(name, fntype, bcode, BUILT_IN_NORMAL,
   libname, NULL_TREE);
-  if (const_p)
-TREE_READONLY(decl) = 1;
   set_builtin_decl(bcode, decl, true);
+  set_call_expr_flags (decl, flags);
   builtin_functions[name] = decl;
   if (libname != NULL)
 {
   decl = add_builtin_function(libname, fntype, bcode, BUILT_IN_NORMAL,
  NULL, NULL_TREE);
-  if (const_p)
-   TREE_READONLY(decl) = 1;
+  set_call_expr_flags (decl, flags);
   builtin_functions[libname] = decl;
 }
 }
@@ -82,22 +80,22 @@ Gogo::define_builtin_function_trees()
   tree t = go_type_for_size(BITS_PER_UNIT, 1);
   tree p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
   define_builtin(BUILT_IN_SYNC_ADD_AND_FETCH_1, "__sync_fetch_and_add_1", NULL,
-build_function_type_list(t, p, t, NULL_TREE), false);
+build_function_type_list(t, p, t, NULL_TREE), ECF_LEAF | 
ECF_NOTHROW);
 
   t = go_type_for_size(BITS_PER_UNIT * 2, 1);
   p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
   define_builtin (BUILT_IN_SYNC_ADD_AND_FETCH_2, "__sync_fetch_and_add_2", 
NULL,
- build_function_type_list(t, p, t, NULL_TREE), false);
+ build_function_type_list(t, p, t, NULL_TREE), ECF_LEAF | 
ECF_NOTHROW);
 
   t = go_type_for_size(BITS_PER_UNIT * 4, 1);
   p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
   define_builtin(BUILT_IN_SYNC_ADD_AND_FETCH_4, "__sync_fetch_and_add_4", NULL,
-build_function_type_list(t, p, t, NULL_TREE), false);
+build_function_type_list(t, p, t, NULL_TREE), ECF_LEAF | 
ECF_NOTHROW);
 
   t = go_type_for_size(BITS_PER_UNIT * 8, 1);
   p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
   define_builtin(BUILT_IN_SYNC_ADD_AND_FETCH_8, "__sync_fetch_and_add_8", NULL,
-build_function_type_list(t, p, t, NULL_TREE), false);
+build_function_type_list(t, p, t, NULL_TREE), ECF_LEAF | 
ECF_NOTHROW);
 
   // We use __builtin_expect for magic import functions.
   define_builtin(BUILT_IN_EXPECT, "__builtin_expect", NULL,
@@ -105,7 +103,13 @@ Gogo::define_builtin_function_trees()
  long_integer_type_node,
  long_integer_type_node,
  NULL_TREE),
-true);
+ECF_CONST | ECF_NOTHROW | ECF_LEAF);
+
+  // Middle-end use __builtin_unreachable
+  define_builtin(BUILT_IN_UNREACHABLE, "__builtin_unreachable", NULL,
+build_function_type(void_type_node,
+void_list_node),
+ECF_NOTHROW | ECF_LEAF | ECF_NORETURN);
 
   // We use __builtin_memcmp for struct comparisons.
   define_builtin(BUILT_IN_MEMCMP, "__builtin_memcmp", "memcmp",
@@ -114,7 +118,8 @@ Gogo::define_builtin_function_trees()
  const_ptr_type_node,
  size_type_node,
  NULL_TREE),
-false);
+ECF_LEAF | ECF_NOTHROW | ECF_PURE);
+  /* TODO: add non-null attribute.  */
 
   // We provide some functions for the math library.
   tree math_function_type = build_function_type_list(double_type_node,
@@ -131,93 +136,93 @@ Gogo::define_builtin_function_trees()
 build_function_type_list(long_double_type_node, long_double_type_node,
 long_double_type_node, NULL_TREE);
   define_builtin(BUILT_IN_ACOS, "__builtin

[Java] Tidy bultins and add __bultin_unreachable

2012-10-16 Thread Jan Hubicka

Hi,
this patch udpates Java frontend to declare __bulitin_unreachable and also
fixes flags of synchronize bulitins to match ones from C FE.

Regtested x86_64-linux, OK?

Honza
* builtins.c (define_builtin): Accept ECF flags and
use set_call_expr_flags.
(initialize_builtins): Update; add BULIT_IN_UNREACHALE.

* calls.c (set_call_expr_flags): New.
* tree.h (set_call_expr_flags): Declare.
Index: java/builtins.c
===
--- java/builtins.c (revision 192483)
+++ java/builtins.c (working copy)
@@ -451,8 +451,6 @@ VMSupportsCS8_builtin (tree method_retur
 
 
 
-#define BUILTIN_NOTHROW 1
-#define BUILTIN_CONST 2
 /* Define a single builtin.  */
 static void
 define_builtin (enum built_in_function val,
@@ -471,10 +469,7 @@ define_builtin (enum built_in_function v
   pushdecl (decl);
   DECL_BUILT_IN_CLASS (decl) = BUILT_IN_NORMAL;
   DECL_FUNCTION_CODE (decl) = val;
-  if (flags & BUILTIN_NOTHROW)
-TREE_NOTHROW (decl) = 1;
-  if (flags & BUILTIN_CONST)
-TREE_READONLY (decl) = 1;
+  set_call_expr_flags (decl, flags);
 
   set_builtin_decl (val, decl, true);
 }
@@ -488,6 +483,7 @@ initialize_builtins (void)
   tree double_ftype_double, double_ftype_double_double;
   tree float_ftype_float_float;
   tree boolean_ftype_boolean_boolean;
+  tree void_ftype;
   int i;
 
   for (i = 0; java_builtins[i].builtin_code != END_BUILTINS; ++i)
@@ -512,49 +508,49 @@ initialize_builtins (void)
double_type_node, double_type_node, NULL_TREE);
 
   define_builtin (BUILT_IN_FMOD, "__builtin_fmod",
- double_ftype_double_double, "fmod", BUILTIN_CONST);
+ double_ftype_double_double, "fmod", ECF_CONST);
   define_builtin (BUILT_IN_FMODF, "__builtin_fmodf",
- float_ftype_float_float, "fmodf", BUILTIN_CONST);
+ float_ftype_float_float, "fmodf", ECF_CONST);
 
   define_builtin (BUILT_IN_ACOS, "__builtin_acos",
  double_ftype_double, "_ZN4java4lang4Math4acosEJdd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_ASIN, "__builtin_asin",
  double_ftype_double, "_ZN4java4lang4Math4asinEJdd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_ATAN, "__builtin_atan",
  double_ftype_double, "_ZN4java4lang4Math4atanEJdd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_ATAN2, "__builtin_atan2",
  double_ftype_double_double, "_ZN4java4lang4Math5atan2EJddd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_CEIL, "__builtin_ceil",
  double_ftype_double, "_ZN4java4lang4Math4ceilEJdd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_COS, "__builtin_cos",
  double_ftype_double, "_ZN4java4lang4Math3cosEJdd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_EXP, "__builtin_exp",
  double_ftype_double, "_ZN4java4lang4Math3expEJdd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_FLOOR, "__builtin_floor",
  double_ftype_double, "_ZN4java4lang4Math5floorEJdd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_LOG, "__builtin_log",
  double_ftype_double, "_ZN4java4lang4Math3logEJdd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_POW, "__builtin_pow",
  double_ftype_double_double, "_ZN4java4lang4Math3powEJddd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_SIN, "__builtin_sin",
  double_ftype_double, "_ZN4java4lang4Math3sinEJdd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_SQRT, "__builtin_sqrt",
  double_ftype_double, "_ZN4java4lang4Math4sqrtEJdd",
- BUILTIN_CONST);
+ ECF_CONST);
   define_builtin (BUILT_IN_TAN, "__builtin_tan",
  double_ftype_double, "_ZN4java4lang4Math3tanEJdd",
- BUILTIN_CONST);
+ ECF_CONST);
   
   boolean_ftype_boolean_boolean
 = build_function_type_list (boolean_type_node,
@@ -563,28 +559,34 @@ initialize_builtins (void)
   define_builtin (BUILT_IN_EXPECT, "__builtin_expect", 
  boolean_ftype_boolean_boolean,
  "__builtin_expect",
- BUILTIN_CONST | BUILTIN_NOTHROW);
+ ECF_CONST | ECF_NOTHROW);
+  void_ftype
+= build_function_type (void_type_node, void_list_node);
+  define_builtin (BUILT_IN_UNREACHABLE, "__builtin_unreachable", 
+ void_ftype,
+ "__builtin_unreachable",
+

Re: [PATCH] Fix up vector CONSTRUCTOR verification ICE (PR tree-optimization/54889)

2012-10-16 Thread Jakub Jelinek
On Mon, Oct 15, 2012 at 10:38:02AM +0200, Richard Biener wrote:
> On Fri, Oct 12, 2012 at 8:16 PM, Jakub Jelinek  wrote:
> > Apparently vectorizable_load is another spot that could create vector
> > CONSTRUCTORs that wouldn't pass the new CONSTRUCTOR verification.
> >
> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> > ok for trunk?
> 
> You should only need this on the ARRAY_REF path (I wonder what are
> the types that have a mismatch?), for MEM_REF simply use
> build2 (MEM_REF, TREE_TYPE (vectype), ...).
> 
> Ok with that change.

Thanks, here is what I'll commit later today, bootstrapped/regtested on
x86_64-linux and i686-linux:

2012-10-15  Jakub Jelinek  

PR tree-optimization/54889
* tree-vect-stmts.c (vectorizable_load): Add VIEW_CONVERT_EXPR if
ARRAY_REF newref doesn't have compatible type with vectype element
type, use vectype element type for MEM_REF.

* gfortran.dg/pr54889.f90: New test.

--- gcc/tree-vect-stmts.c.jj2012-10-12 20:18:47.568041703 +0200
+++ gcc/tree-vect-stmts.c   2012-10-15 11:47:57.874019752 +0200
@@ -4743,12 +4743,18 @@ vectorizable_load (gimple stmt, gimple_s
  tree newref, newoff;
  gimple incr;
  if (TREE_CODE (ref) == ARRAY_REF)
-   newref = build4 (ARRAY_REF, TREE_TYPE (ref),
-unshare_expr (TREE_OPERAND (ref, 0)),
-running_off,
-NULL_TREE, NULL_TREE);
+   {
+ newref = build4 (ARRAY_REF, TREE_TYPE (ref),
+  unshare_expr (TREE_OPERAND (ref, 0)),
+  running_off,
+  NULL_TREE, NULL_TREE);
+ if (!useless_type_conversion_p (TREE_TYPE (vectype),
+ TREE_TYPE (newref)))
+   newref = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype),
+newref);
+   }
  else
-   newref = build2 (MEM_REF, TREE_TYPE (ref),
+   newref = build2 (MEM_REF, TREE_TYPE (vectype),
 running_off,
 TREE_OPERAND (ref, 1));
 
--- gcc/testsuite/gfortran.dg/pr54889.f90.jj2012-10-15 11:46:13.252593971 
+0200
+++ gcc/testsuite/gfortran.dg/pr54889.f90   2012-10-15 11:46:13.252593971 
+0200
@@ -0,0 +1,10 @@
+! PR tree-optimization/54889
+! { dg-do compile }
+! { dg-options "-O3" }
+! { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } }
+
+subroutine foo(x,y,z)
+  logical, pointer :: x(:,:)
+  integer :: y, z
+  x=x(1:y,1:z)
+end subroutine

Jakub


[PATCH] Fix sizeof related pt.c ICE (PR c++/54844)

2012-10-16 Thread Jakub Jelinek
Hi!

Now that SIZEOF_EXPR isn't folded immediately, e.g. REAL_TYPE can
be an argument of it, and tsubst_copy* doesn't handle that, but tsubst does.
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2012-10-15  Jakub Jelinek  

PR c++/54844
* pt.c (tsubst_copy, tsubst_copy_and_build) : Use
tsubst instead of tsubst_copy* on types.

* g++.dg/template/sizeof14.C: New test.

--- gcc/cp/pt.c.jj  2012-10-15 11:45:30.0 +0200
+++ gcc/cp/pt.c 2012-10-15 12:08:10.232364822 +0200
@@ -12104,8 +12104,8 @@ tsubst_copy (tree t, tree args, tsubst_f
 }
   if (SIZEOF_EXPR_TYPE_P (t))
{
- r = tsubst_copy (TREE_TYPE (TREE_OPERAND (t, 0)),
-  args, complain, in_decl);
+ r = tsubst (TREE_TYPE (TREE_OPERAND (t, 0)),
+ args, complain, in_decl);
  r = build1 (NOP_EXPR, r, error_mark_node);
  r = build1 (SIZEOF_EXPR,
  tsubst (TREE_TYPE (t), args, complain, in_decl), r);
@@ -13533,10 +13533,13 @@ tsubst_copy_and_build (tree t,
  {
++cp_unevaluated_operand;
++c_inhibit_evaluation_warnings;
-   op1 = tsubst_copy_and_build (op1, args, complain, in_decl,
-/*function_p=*/false,
-/*integral_constant_expression_p=*/
-false);
+   if (TYPE_P (op1))
+ op1 = tsubst (op1, args, complain, in_decl);
+   else
+ op1 = tsubst_copy_and_build (op1, args, complain, in_decl,
+  /*function_p=*/false,
+  /*integral_constant_expression_p=*/
+  false);
--cp_unevaluated_operand;
--c_inhibit_evaluation_warnings;
  }
--- gcc/testsuite/g++.dg/template/sizeof14.C.jj 2012-10-15 12:01:41.596492966 
+0200
+++ gcc/testsuite/g++.dg/template/sizeof14.C2012-10-15 12:01:31.0 
+0200
@@ -0,0 +1,4 @@
+// PR c++/54844
+// { dg-do compile }
+template  int fn () { return sizeof (double); }
+int var = fn <0> ();

Jakub


[fortran] add __builtin_unreachable

2012-10-16 Thread Jan Hubicka
Hi,
this patch add __buliltin_unreachable to Fortran FE and also cleans up the code 
a bit

Bootstrapped/regtested x86_64-linux, OK?
Honza

* f95-lang.c (gfc_define_builtin): Use set_call_expr_flags.
(ATTR_NOTHROW_LEAF_MALLOC_LIST,
ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST): New lists.
(gfc_init_builtin_functions): Add __builtin_unreachable;
tidy malloc/calloc.

* calls.c (set_call_expr_flags): New.
* tree.h (set_call_expr_flags): Declare.
Index: fortran/f95-lang.c
===
--- fortran/f95-lang.c  (revision 192483)
+++ fortran/f95-lang.c  (working copy)
@@ -535,9 +535,11 @@ gfc_builtin_function (tree decl)
 
 /* So far we need just these 4 attribute types.  */
 #define ATTR_NOTHROW_LEAF_LIST (ECF_NOTHROW | ECF_LEAF)
+#define ATTR_NOTHROW_LEAF_MALLOC_LIST  (ECF_NOTHROW | ECF_LEAF | ECF_MALLOC)
 #define ATTR_CONST_NOTHROW_LEAF_LIST   (ECF_NOTHROW | ECF_LEAF | ECF_CONST)
 #define ATTR_NOTHROW_LIST  (ECF_NOTHROW)
 #define ATTR_CONST_NOTHROW_LIST(ECF_NOTHROW | ECF_CONST)
+#define ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST (ECF_CONST | ECF_NOTHROW | 
ECF_LEAF | ECF_NORETURN)
 
 static void
 gfc_define_builtin (const char *name, tree type, enum built_in_function code,
@@ -547,13 +550,7 @@ gfc_define_builtin (const char *name, tr
 
   decl = add_builtin_function (name, type, code, BUILT_IN_NORMAL,
   library_name, NULL_TREE);
-  if (attr & ECF_CONST)
-TREE_READONLY (decl) = 1;
-  if (attr & ECF_NOTHROW)
-TREE_NOTHROW (decl) = 1;
-  if (attr & ECF_LEAF)
-DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("leaf"),
-   NULL, DECL_ATTRIBUTES (decl));
+  set_call_expr_flags (decl, attr);
 
   set_builtin_decl (code, decl, true);
 }
@@ -908,6 +905,11 @@ gfc_init_builtin_functions (void)
   gfc_define_builtin ("__builtin_expect", ftype, BUILT_IN_EXPECT,
  "__builtin_expect", ATTR_CONST_NOTHROW_LEAF_LIST);
 
+  ftype = build_function_type_list (void_type_node, NULL_TREE);
+
+  gfc_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_UNREACHABLE,
+ "__builtin_unreachable", 
ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST);
+
   ftype = build_function_type_list (void_type_node,
 pvoid_type_node, NULL_TREE);
   gfc_define_builtin ("__builtin_free", ftype, BUILT_IN_FREE,
@@ -916,14 +918,12 @@ gfc_init_builtin_functions (void)
   ftype = build_function_type_list (pvoid_type_node,
 size_type_node, NULL_TREE);
   gfc_define_builtin ("__builtin_malloc", ftype, BUILT_IN_MALLOC,
- "malloc", ATTR_NOTHROW_LEAF_LIST);
-  DECL_IS_MALLOC (builtin_decl_explicit (BUILT_IN_MALLOC)) = 1;
+ "malloc", ATTR_NOTHROW_LEAF_MALLOC_LIST);
 
   ftype = build_function_type_list (pvoid_type_node, size_type_node,
size_type_node, NULL_TREE);
   gfc_define_builtin ("__builtin_calloc", ftype, BUILT_IN_CALLOC,
- "calloc", ATTR_NOTHROW_LEAF_LIST);
-  DECL_IS_MALLOC (builtin_decl_explicit (BUILT_IN_CALLOC)) = 1;
+ "calloc", ATTR_NOTHROW_LEAF_MALLOC_LIST);
 
   ftype = build_function_type_list (pvoid_type_node,
 size_type_node, pvoid_type_node,
Index: calls.c
===
--- calls.c (revision 192483)
+++ calls.c (working copy)
@@ -802,6 +802,36 @@ call_expr_flags (const_tree t)
   return flags;
 }
 
+/* Modify DECL for given flags.  */
+void
+set_call_expr_flags (tree decl, int flags)
+{
+  if (flags & ECF_NOTHROW)
+TREE_NOTHROW (decl) = 1;
+  if (flags & ECF_CONST)
+TREE_READONLY (decl) = 1;
+  if (flags & ECF_PURE)
+DECL_PURE_P (decl) = 1;
+  if (flags & ECF_NOVOPS)
+DECL_IS_NOVOPS (decl) = 1;
+  if (flags & ECF_NORETURN)
+TREE_THIS_VOLATILE (decl) = 1;
+  if (flags & ECF_MALLOC)
+DECL_IS_MALLOC (decl) = 1;
+  if (flags & ECF_RETURNS_TWICE)
+DECL_IS_RETURNS_TWICE (decl) = 1;
+  if (flags & ECF_LEAF)
+DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("leaf"),
+   NULL, DECL_ATTRIBUTES (decl));
+  if (flags & ECF_TM_PURE)
+DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("transaction_pure"),
+   NULL, DECL_ATTRIBUTES (decl));
+  /* Looping const or pure is imlpies by noreturn.
+ There is currently no way to declare looping const or looping pure alone. 
 */
+  gcc_assert (!(flags & ECF_LOOPING_CONST_OR_PURE)
+ || ((flags & ECF_NORETURN) && (flags & (ECF_CONST | ECF_PURE;
+}
+
 /* Precompute all register parameters as described by ARGS, storing values
into fields within the ARGS array.
 
Index: tree.h
===
--- tree.h  (revision 19

Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Jakub Jelinek
On Mon, Oct 15, 2012 at 10:48:13PM -0700, Xinliang David Li wrote:
> Another error checking feature is to poison stack vars on entry and
> exit of the lexical scope to catch uninit variable reference and out
> of scope references:
> 
> S* sp;
>   {
> S s;
> sp = &s;
>   }
>   .. *sp ...
> 
> This is relatively easy to do in gcc thanks to the clobber statement.
> In Clang/LLVM, it is in the wishlist:
> http://code.google.com/p/address-sanitizer/issues/detail?id=83

That is not easy at all unfortunately, CLOBBER isn't sufficient for that.
You have the points where the variable looses value, but there aren't
similar markup statement where it gets into scope again.
See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54770#c3

Jakub


Re: [asan] migrate runtime from llvm

2012-10-16 Thread Jakub Jelinek
On Mon, Oct 15, 2012 at 11:39:52PM -0700, Wei Mi wrote:
> --- gcc/gcc.c (revision 192487)
> +++ gcc/gcc.c (working copy)
> @@ -679,6 +679,7 @@ proper position among the other output f
>  %{fgnu-tm:%:include(libitm.spec)%(link_itm)}\
>  %(mflib) " STACK_SPLIT_SPEC "\
>  %{fprofile-arcs|fprofile-generate*|coverage:-lgcov}\
> +%{fasan|coverage:-lasan -lpthread -ldl -lstdc++}\
>  %{!nostdlib:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}\
>  %{!nostdlib:%{!nostartfiles:%E}} %{T*} }}"

Why the |coverage there?  It isn't related to asan in any way.
Also, why -lstdc++ in there?  I could understand %{static:-lstdc++}, but
given that libasan doesn't support static linking, I find it hardly useful.

> --- Makefile.def  (revision 192487)
> +++ Makefile.def  (working copy)
> @@ -119,6 +119,7 @@ target_modules = { module= libstdc++-v3;
>  lib_path=src/.libs;
>  raw_cxx=true; };
>  target_modules = { module= libmudflap; lib_path=.libs; };
> +target_modules = { module= libasan; lib_path=.libs; };
>  target_modules = { module= libssp; lib_path=.libs; };
>  target_modules = { module= newlib; };
>  target_modules = { module= libgcc; bootstrap=true; no_check=true; };

Shouldn't libasan, given it is a C++ shared library, depend on libstdc++-v3?

Jakub


Re: [PATCH] Stop looping in move_by_pieces loops when there's no more data to process

2012-10-16 Thread Jakub Jelinek
On Tue, Oct 16, 2012 at 08:40:21AM +0200, Tom de Vries wrote:
> 2012-10-16  Tom de Vries  
> 
>   * expr.c (move_by_pieces, move_by_pieces_ninsns, can_store_by_pieces)
>   (store_by_pieces_1): Don't enter loop when no more data is left.

Okay.

Jakub


[asan] Protection of stack vars (take 2)

2012-10-16 Thread Jakub Jelinek
Hi!

Here is an updated patch, which emits also the shadow clearing sequence
at the end of function in the right spot.

2012-10-16  Jakub Jelinek  

* Makefile.in (asan.o): Depend on $(EXPR_H) $(OPTABS_H).
(cfgexpand.o): Depend on asan.h.
* asan.c: Include expr.h and optabs.h.
(asan_shadow_set): New variable.
(asan_shadow_cst, asan_emit_stack_protection): New functions.
(asan_init_shadow_ptr_types): Initialize also asan_shadow_set.
* cfgexpand.c: Include asan.h.  Define HOST_WIDE_INT heap vector.
(struct stack_vars_data): New type.
(expand_stack_vars): Add DATA argument.  Fill it in and add needed
padding in between variables if -fasan.
(defer_stack_allocation): Defer everything for flag_asan.
(expand_used_vars): Return var destruction sequence.  Call
asan_emit_stack_protection if expand_stack_vars added anything
to the vectors.
(expand_gimple_basic_block): Add disable_tail_calls argument.
(gimple_expand_cfg): Pass true to it if expand_used_vars returned
non-NULL.  Emit the sequence returned by expand_used_vars after
return_label.
* asan.h (asan_emit_stack_protection): New prototype.
(asan_shadow_set): New decl.
(ASAN_RED_ZONE_SIZE, ASAN_STACK_MAGIC_LEFT, ASAN_STACK_MAGIC_MIDDLE,
ASAN_STACK_MAGIC_RIGHT, ASAN_STACK_FRAME_MAGIC): Define.
* toplev.c (process_options): Also disable -fasan on
!FRAME_GROWS_DOWNWARDS targets.

--- gcc/Makefile.in.jj  2012-10-12 23:30:31.635325772 +0200
+++ gcc/Makefile.in 2012-10-15 09:40:40.287706610 +0200
@@ -2204,7 +2204,7 @@ stor-layout.o : stor-layout.c $(CONFIG_H
 asan.o : asan.c asan.h $(CONFIG_H) pointer-set.h \
$(SYSTEM_H) $(TREE_H) $(GIMPLE_H) \
output.h $(DIAGNOSTIC_H) coretypes.h $(TREE_DUMP_H) $(FLAGS_H) \
-   tree-pretty-print.h $(TARGET_H)
+   tree-pretty-print.h $(TARGET_H) $(EXPR_H) $(OPTABS_H)
 tree-ssa-tail-merge.o: tree-ssa-tail-merge.c \
$(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(BITMAP_H) \
$(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) \
@@ -3074,7 +3074,7 @@ cfgexpand.o : cfgexpand.c $(TREE_FLOW_H)
$(DIAGNOSTIC_H) toplev.h $(DIAGNOSTIC_CORE_H) $(BASIC_BLOCK_H) $(FLAGS_H) 
debug.h $(PARAMS_H) \
value-prof.h $(TREE_INLINE_H) $(TARGET_H) $(SSAEXPAND_H) $(REGS_H) \
$(GIMPLE_PRETTY_PRINT_H) $(BITMAP_H) sbitmap.h \
-   $(INSN_ATTR_H) $(CFGLOOP_H)
+   $(INSN_ATTR_H) $(CFGLOOP_H) asan.h
 cfgrtl.o : cfgrtl.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_ERROR_H) 
\
$(FLAGS_H) insn-config.h $(BASIC_BLOCK_H) $(REGS_H) hard-reg-set.h \
$(FUNCTION_H) $(EXCEPT_H) $(TM_P_H) $(INSN_ATTR_H) \
--- gcc/toplev.c.jj 2012-10-12 23:30:31.717325313 +0200
+++ gcc/toplev.c2012-10-15 09:40:03.540923935 +0200
@@ -1544,7 +1544,9 @@ process_options (void)
 }
 
   /* Address Sanitizer needs porting to each target architecture.  */
-  if (flag_asan && targetm.asan_shadow_offset == NULL)
+  if (flag_asan
+  && (targetm.asan_shadow_offset == NULL
+ || !FRAME_GROWS_DOWNWARD))
 {
   warning (0, "-fasan not supported for this target");
   flag_asan = 0;
--- gcc/asan.c.jj   2012-10-12 23:31:55.423856689 +0200
+++ gcc/asan.c  2012-10-16 12:18:41.414029844 +0200
@@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.
 #include "asan.h"
 #include "gimple-pretty-print.h"
 #include "target.h"
+#include "expr.h"
+#include "optabs.h"
 
 /*
  AddressSanitizer finds out-of-bounds and use-after-free bugs 
@@ -79,10 +81,195 @@ along with GCC; see the file COPYING3.
  to create redzones for stack and global object and poison them.
 */
 
+alias_set_type asan_shadow_set = -1;
+
 /* Pointer types to 1 resp. 2 byte integers in shadow memory.  A separate
alias set is used for all shadow memory accesses.  */
 static GTY(()) tree shadow_ptr_types[2];
 
+/* Return a CONST_INT representing 4 subsequent shadow memory bytes.  */
+
+static rtx
+asan_shadow_cst (unsigned char shadow_bytes[4])
+{
+  int i;
+  unsigned HOST_WIDE_INT val = 0;
+  gcc_assert (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
+  for (i = 0; i < 4; i++)
+val |= (unsigned HOST_WIDE_INT) shadow_bytes[BYTES_BIG_ENDIAN ? 3 - i : i]
+  << (BITS_PER_UNIT * i);
+  return GEN_INT (trunc_int_for_mode (val, SImode));
+}
+
+/* Insert code to protect stack vars.  The prologue sequence should be emitted
+   directly, epilogue sequence returned.  BASE is the register holding the
+   stack base, against which OFFSETS array offsets are relative to, OFFSETS
+   array contains pairs of offsets in reverse order, always the end offset
+   of some gap that needs protection followed by starting offset,
+   and DECLS is an array of representative decls for each var partition.
+   LENGTH is the length of the OFFSETS array, DECLS array is LENGTH / 2 - 1
+   elements long (OFFSETS include gap before the first variable as well
+   as gaps after each stack variable).  */
+
+rtx
+asan_emit_stack

[asan] WIP protection of globals

2012-10-16 Thread Jakub Jelinek
Hi!

This is a WIP patch for globals protection.
I'm not filling names yet and has_dynamic_init is always
false (wonder how to figure it has_dynamic_init out, especially
with LTO, TYPE_ADDRESSABLE (TREE_TYPE (decl)) probably isn't it,
and for more I'm afraid we need a langhook).

--- gcc/varasm.c.jj 2012-10-11 19:10:39.0 +0200
+++ gcc/varasm.c2012-10-16 15:40:37.075662625 +0200
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.
 #include "tree-mudflap.h"
 #include "cgraph.h"
 #include "pointer-set.h"
+#include "asan.h"
 
 #ifdef XCOFF_DEBUGGING_INFO
 #include "xcoffout.h"  /* Needed for external data
@@ -1831,6 +1832,9 @@ assemble_noswitch_variable (tree decl, c
   size = tree_low_cst (DECL_SIZE_UNIT (decl), 1);
   rounded = size;
 
+  if (flag_asan && asan_protect_global (decl))
+size += asan_red_zone_size (size);
+
   /* Don't allocate zero bytes of common,
  since that means "undefined external" in the linker.  */
   if (size == 0)
@@ -1897,6 +1901,7 @@ assemble_variable (tree decl, int top_le
   const char *name;
   rtx decl_rtl, symbol;
   section *sect;
+  bool asan_protected = false;
 
   /* This function is supposed to handle VARIABLES.  Ensure we have one.  */
   gcc_assert (TREE_CODE (decl) == VAR_DECL);
@@ -1984,6 +1989,15 @@ assemble_variable (tree decl, int top_le
   /* Compute the alignment of this data.  */
 
   align_variable (decl, dont_output_data);
+
+  if (flag_asan
+  && asan_protect_global (decl)
+  && DECL_ALIGN (decl) < ASAN_RED_ZONE_SIZE * BITS_PER_UNIT)
+{
+  asan_protected = true;
+  DECL_ALIGN (decl) = ASAN_RED_ZONE_SIZE * BITS_PER_UNIT;
+}
+
   set_mem_align (decl_rtl, DECL_ALIGN (decl));
 
   if (TREE_PUBLIC (decl))
@@ -2022,6 +2036,12 @@ assemble_variable (tree decl, int top_le
   if (DECL_ALIGN (decl) > BITS_PER_UNIT)
ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (DECL_ALIGN_UNIT (decl)));
   assemble_variable_contents (decl, name, dont_output_data);
+  if (asan_protected)
+   {
+ unsigned HOST_WIDE_INT int size
+   = tree_low_cst (DECL_SIZE_UNIT (decl), 1);
+ assemble_zeros (asan_red_zone_size (size));
+   }
 }
 }
 
@@ -6926,6 +6946,8 @@ place_block_symbol (rtx symbol)
   decl = SYMBOL_REF_DECL (symbol);
   alignment = DECL_ALIGN (decl);
   size = tree_low_cst (DECL_SIZE_UNIT (decl), 1);
+  if (flag_asan && asan_protect_global (decl))
+   size += asan_red_zone_size (size);
 }
 
   /* Calculate the object's offset from the start of the block.  */
--- gcc/Makefile.in.jj  2012-10-15 09:40:40.0 +0200
+++ gcc/Makefile.in 2012-10-16 16:54:12.463712014 +0200
@@ -2712,7 +2712,7 @@ varasm.o : varasm.c $(CONFIG_H) $(SYSTEM
output.h $(DIAGNOSTIC_CORE_H) xcoffout.h debug.h $(GGC_H) $(TM_P_H) \
$(HASHTAB_H) $(TARGET_H) langhooks.h gt-varasm.h $(BASIC_BLOCK_H) \
$(CGRAPH_H) $(TARGET_DEF_H) tree-mudflap.h \
-   pointer-set.h $(COMMON_TARGET_H)
+   pointer-set.h $(COMMON_TARGET_H) asan.h
 function.o : function.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) 
$(RTL_ERROR_H) \
$(TREE_H) $(GIMPLE_H) $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) \
$(OPTABS_H) $(LIBFUNCS_H) $(REGS_H) hard-reg-set.h insn-config.h $(RECOG_H) 
\
--- gcc/asan.c.jj   2012-10-16 12:18:41.0 +0200
+++ gcc/asan.c  2012-10-16 16:52:24.266434151 +0200
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.
 #include "target.h"
 #include "expr.h"
 #include "optabs.h"
+#include "output.h"
 
 /*
  AddressSanitizer finds out-of-bounds and use-after-free bugs 
@@ -270,6 +271,48 @@ asan_emit_stack_protection (rtx base, HO
   return ret;
 }
 
+/* Return true if DECL is a VAR_DECL that should be protected
+   by Address Sanitizer, by appending a red zone with protected
+   shadow memory after it and aligning it to at least
+   ASAN_RED_ZONE_SIZE bytes.  */
+
+bool
+asan_protect_global (tree decl)
+{
+  rtx rtl, symbol;
+  section *sect;
+
+  if (TREE_CODE (decl) != VAR_DECL
+  || DECL_THREAD_LOCAL_P (decl)
+  || DECL_EXTERNAL (decl)
+  || !TREE_ASM_WRITTEN (decl)
+  || !DECL_RTL_SET_P (decl)
+  || DECL_ONE_ONLY (decl)
+  || DECL_COMMON (decl)
+  || (DECL_SECTION_NAME (decl) != NULL_TREE
+ && !DECL_HAS_IMPLICIT_SECTION_NAME_P (decl))
+  || DECL_SIZE (decl) == 0
+  || ASAN_RED_ZONE_SIZE * BITS_PER_UNIT > MAX_OFILE_ALIGNMENT
+  || !valid_constant_size_p (DECL_SIZE_UNIT (decl))
+  || DECL_ALIGN_UNIT (decl) > 2 * ASAN_RED_ZONE_SIZE)
+return false;
+
+  rtl = DECL_RTL (decl);
+  if (!MEM_P (rtl) || GET_CODE (XEXP (rtl, 0)) != SYMBOL_REF)
+return false;
+  symbol = XEXP (rtl, 0);
+
+  if (CONSTANT_POOL_ADDRESS_P (symbol)
+  || TREE_CONSTANT_POOL_ADDRESS_P (symbol))
+return false;
+
+  sect = get_variable_section (decl, false);
+  if (sect->common.flags & SECTION_COMMON)
+return false;
+
+  return true;
+}
+
 /* Construct a function tree for __asan_report_{load,store}{1,2,4,8,16}.
IS_STO

Do not drop the loop bounds when copying it

2012-10-16 Thread Jan Hubicka
Hi,
while looking into cases where loop-iv.c still deduce useful bounds that are not
recorded by tree level I noticed that we do not duplicate the bounds when 
copying
the loop.
Fixed thus.
Bootstrapped/regtested x86_64-linux, OK?

Honza

* cfgloopmanip.c (copy_loop_info): New function.
(duplicate_loop): Use it.
(loop_version): Use it.
* loop-unswitch.c (unswitch_loop): Use it.
* cfgloop.h (copy_loop_info): Declare.
Index: cfgloopmanip.c
===
--- cfgloopmanip.c  (revision 192483)
+++ cfgloopmanip.c  (working copy)
@@ -971,6 +970,20 @@ fix_loop_placements (struct loop *loop, 
 }
 }
 
+/* Duplicate loop bounds and other information we store about
+   the loop into its duplicate.  */
+
+void
+copy_loop_info (struct loop *loop, struct loop *target)
+{
+  gcc_checking_assert (!target->any_upper_bound && !target->any_estimate);
+  target->any_upper_bound = loop->any_upper_bound;
+  target->nb_iterations_upper_bound = loop->nb_iterations_upper_bound;
+  target->any_estimate = loop->any_estimate;
+  target->nb_iterations_estimate = loop->nb_iterations_estimate;
+  target->estimate_state = loop->estimate_state;
+}
+
 /* Copies copy of LOOP as subloop of TARGET loop, placing newly
created loop into loops structure.  */
 struct loop *
@@ -979,6 +992,8 @@ duplicate_loop (struct loop *loop, struc
   struct loop *cloop;
   cloop = alloc_loop ();
   place_new_loop (cloop);
+ 
+  copy_loop_info (loop, cloop);
 
   /* Mark the new loop as copy of LOOP.  */
   set_loop_copy (loop, cloop);
@@ -1687,6 +1702,8 @@ loop_version (struct loop *loop,
   false /* Do not redirect all edges.  */,
   then_scale, else_scale);
 
+  copy_loop_info (loop, nloop);
+
   /* loopify redirected latch_edge. Update its PENDING_STMTS.  */
   lv_flush_pending_stmts (latch_edge);
 
Index: loop-unswitch.c
===
--- loop-unswitch.c (revision 192483)
+++ loop-unswitch.c (working copy)
@@ -454,6 +454,7 @@ unswitch_loop (struct loop *loop, basic_
   BRANCH_EDGE (switch_bb), FALLTHRU_EDGE (switch_bb), true,
   prob, REG_BR_PROB_BASE - prob);
 
+  copy_loop_info (loop, nloop);
   /* Remove branches that are now unreachable in new loops.  */
   remove_path (true_edge);
   remove_path (false_edge);
Index: cfgloop.h
===
--- cfgloop.h   (revision 192483)
+++ cfgloop.h   (working copy)
@@ -310,6 +310,7 @@ extern edge create_empty_if_region_on_ed
 extern struct loop *create_empty_loop_on_edge (edge, tree, tree, tree, tree,
   tree *, tree *, struct loop *);
 extern struct loop * duplicate_loop (struct loop *, struct loop *);
+extern void copy_loop_info (struct loop *loop, struct loop *target);
 extern void duplicate_subloops (struct loop *, struct loop *);
 extern bool duplicate_loop_to_header_edge (struct loop *, edge,
   unsigned, sbitmap, edge,


Re: [fortran] add __builtin_unreachable

2012-10-16 Thread Bernhard Reutner-Fischer

Hi,

s/imlpies/implied/


Sent with AquaMail for Android
http://www.aqua-mail.com




Re: [PATCH, AARCH64] Added predefines for AArch64 code models

2012-10-16 Thread Marcus Shawcroft

On 15/10/12 11:03, Marcus Shawcroft wrote:

On 11/09/12 15:02, Chris Schlumberger-Socha wrote:

This patch adds predefines for AArch64 code models. These code models are
added as an effective target for the AArch64 platform.



I've committed this patch to aarch64-trunk.

/Marcus




.. and back ported to aarch64-4.7-branch, the patch applies cleanly and 
requires no modification.


/Marcus



Re: [AARCH64] Fix ICE in aarch64_split_doubleword_move

2012-10-16 Thread Marcus Shawcroft

On 16/10/12 16:10, Marcus Shawcroft wrote:

I've just committed this patch to aarch64-trunk to resolve an ICE in
aarch64_split_doubleword_move when attempting to split v->v moves.

/Marcus

2012-10-16  Marcus Shawcroft



 * config/aarch64/aarch64-protos.h (aarch64_split_doubleword_move):
 Rename to aarch64_split_128bit_move.
 (aarch64_split_128bit_move_p): New.
 * config/aarch64/aarch64.c (aarch64_split_doubleword_move):
 Rename to aarch64_split_128bit_move.
 (aarch64_split_128bit_move_p): New.
 * config/aarch64/aarch64.md: Adjust TImode move split.


and committed to aarch64-4.7-branch.

/Marcus



Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Xinliang David Li
On Mon, Oct 15, 2012 at 11:12 PM, Jakub Jelinek  wrote:
> On Mon, Oct 15, 2012 at 10:48:13PM -0700, Xinliang David Li wrote:
>> Another error checking feature is to poison stack vars on entry and
>> exit of the lexical scope to catch uninit variable reference and out
>> of scope references:
>>
>> S* sp;
>>   {
>> S s;
>> sp = &s;
>>   }
>>   .. *sp ...
>>
>> This is relatively easy to do in gcc thanks to the clobber statement.
>> In Clang/LLVM, it is in the wishlist:
>> http://code.google.com/p/address-sanitizer/issues/detail?id=83
>
> That is not easy at all unfortunately, CLOBBER isn't sufficient for that.
> You have the points where the variable looses value, but there aren't
> similar markup statement where it gets into scope again.
> See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54770#c3

I don't get it.  Clobber marks the end of lifetime of a variable so it
is safe to emit code to really clobber its value -- otherwise how
would clobber based slot sharing work?

David

>
> Jakub


Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Jakub Jelinek
On Tue, Oct 16, 2012 at 09:06:22AM -0700, Xinliang David Li wrote:
> I don't get it.  Clobber marks the end of lifetime of a variable so it
> is safe to emit code to really clobber its value -- otherwise how
> would clobber based slot sharing work?

If you at CLOBBER protect the var in the shadow mem as unaccessible, the
thing is where you make it accessible again.
Consider:

int i;
for (i = 0; i < 2; i++)
  {
int tmp;
bar ();
baz (&tmp);
bar ();
  }

where the first bar () can't possibly access tmp, but the second bar () can.
Now we unroll it and have:
  bar ();
  baz (&tmp);
  bar ();
  tmp ={v} {CLOBBER};
  bar ();
  baz (&tmp);
  bar ();
  tmp ={v} {CLOBBER};
So, if you want to *mem_to_shadow (&tmp) = 0xff; at {CLOBBER} time, where
and how you would insert *mem_to_shadow (&tmp) = 0; again?  In the above
code it would need to go before the second baz (&tmp);, the question is
how you'd find that out...

Jakub


Re: [RFC PATCH] Add support for sparc compare-and-branch.

2012-10-16 Thread David Miller
From: Eric Botcazou 
Date: Tue, 16 Oct 2012 12:10:51 +0200

>> I've scanned the documentation and there is no indication of any
>> preprocessor predefines or anything like that.
>> 
>> And keep in mind that __VIS__ is our very own invention.
>> 
>> Sun's compilers never predefined this.
>> 
>> Their makefiles do for various targets in the MediaLib sources, but that's
>> a source tree and header file localized convention.
>> 
>> Sun also never provided intrinsics other than via assembler inlines in
>> their VIS header.  They were never compiler builtins like our's.  The
>> user had to define __VIS__ on the command line to get visibility of
>> the routines they wanted from Sun's VIS inline assembler header file.
>> 
>> Sun also does not provide, and is almost certainly not going to ever
>> provide crypto intrinsics.
>> 
>> Therefore there is no convention to follow and we can do whatever we want
>> here.
> 
> OK, thanks.  I keep thinking that we should use -mcpu=sparc4/-D__sparc4__ for 
> new instructions in the SPARC-T4 architecture that aren't related to VIS.
> 
> And given Rainer's insight, I agree that switching to the -m{32,64} -xarch= 
> scheme with recent assemblers is the way to go.  I'll try and get my hands on 
> one of them...

Ok.

Meanwhile I asked some folks and there doesn't appear to be any special
CPP symbols set by the Solaris compiler when -xarch=sparc4.

So what I'm going to do is get rid of the __VIS__==0x400 thing for now
and we can add that, or something similar, at a later date.


[RFC] Bug handling SUBREG (MEM) - MEM having side-effects?

2012-10-16 Thread Tejas Belagod

Hi,

I seem to have uncovered what seems to be a bug with handling SUBREG (MEM) with
the MEM having side-effects while testing aarch64-4.7. This bug seems to be
latent on trunk.

Here is a test case reduced from gcc.c-torture/execute/scal-to-vec1.c.

#define vector(elcount, type)  \
__attribute__((vector_size((elcount)*sizeof(type type

#define vidx(type, vec, idx) (*((type *) &(vec) + idx))

#define operl(a, b, op) (a op b)
#define operr(a, b, op) (b op a)

#define check(type, count, vec0, vec1, num, op, lr) \
do {\
   int __i; \
   for (__i = 0; __i < count; __i++) {\
   if (vidx (type, vec1, __i) != oper##lr (num, vidx (type, vec0, __i),
op)) \
   __builtin_abort (); \
   }\
} while (0)

#define veccompare(type, count, v0, v1) \
do {\
   int __i; \
   for (__i = 0; __i < count; __i++) { \
   if (vidx (type, v0, __i) != vidx (type, v1, __i)) \
   __builtin_abort (); \
   } \
} while (0)

volatile int one = 1;

int main (int argc, char *argv[]) {

   vector(8, short) v0 = {one, 1, 2, 3, 4, 5, 6, 7};
   vector(8, short) v1;
   v1 = 2 << v0;   check (short, 8, v0, v1, 2, <<, l);
   return 0;
}

When compiled with -O1, the combiner tries to combine these two instructions:

(insn 87 86 89 4 (set (reg:SI 154 [ MEM[base: D.2294_40, offset: 0B] ])
   (sign_extend:SI (mem:HI (pre_inc:DI (reg:DI 113 [ ivtmp.23 ])) [0
MEM[base: D.2294_40, offset: 0B]+0 S2 A16]))) scal-to-vec1.c:35 54
{*extendhisi2_aarch64}
(expr_list:REG_INC (reg:DI 113 [ ivtmp.23 ])
   (nil)))

(insn 89 87 90 4 (set (reg:SI 155)
   (ashift:SI (reg:SI 158)
   (subreg:QI (reg:SI 154 [ MEM[base: D.2294_40, offset: 0B] ]) 0)))
scal-to-vec1.c:35 331 {*ashlsi3_insn}
(expr_list:REG_DEAD (reg:SI 154 [ MEM[base: D.2294_40, offset: 0B] ])
   (nil)))

It tries to forward-propagate reg 154 to insn 89 and the intermediate RTL looks
like this:

(insn 89 87 90 4 (set (reg:SI 155)
   (ashift:SI (reg:SI 158)
   (subreg:QI ((sign_extend:SI (mem:HI (pre_inc:DI (reg:DI 113
[ivtmp.23 ])) [0 MEM[base:
D.2294_40, offset: 0B]+0 S2 A16])))

The combiner then recursively tries to simplify this RTL. During simplifying the
subreg part of the RTL i.e.

(subreg:QI ((sign_extend:SI (mem:HI (pre_inc:DI (reg:DI 113 ivtmp.23 ])) [0
MEM[base: D.2294_40, offset:
0B]+0 S2 A16])))

combine_simplify_rtx () calls simplify_subreg () and this returns

(subreg:QI (mem:HI (pre_inc:DI (reg:DI 113 ivtmp.23 ])) [0  
MEM[base: D.2294_40, offset: 0B]+0 S2 A16])))

The code that does this is in combine.c:combine_simplify_rtx ()

...
rtx temp;
temp = simplify_subreg (mode, SUBREG_REG (x), op0_mode,
SUBREG_BYTE (x));
if (temp)
  return temp;
 }
...

So, the above tree is returned as is and insn 87 gets combined into insn 89 to
become

(insn 89 87 90 4 (set (reg:SI 155)
   (ashift:SI (reg:SI 158)
   (subreg:QI (mem:HI (pre_inc:DI (reg:DI 113 ivtmp.23 ])) [0   
MEM[base: D.2294_40, offset: 0B]+0 S2 A16])))

This ends up in reload and in cleanup_subreg_operands () the subreg:QI (mem:HI)
gets narrowed to mem:QI, but the interesting bit is that the pre_inc stays the
same, so becomes a pre_inc for a QI mode address rather than the original HI
mode address and therefore instead of addressing a byte and incrementing the
address by 2 to get the next half word, the address gets incremented by 1!

Also, I feel this is a latent bug on the trunk. This is because while the
combiner takes the same path on the trunk, the address expression is not a
pre_inc there - the trunk wants to keep the addressing to a mem (plus (reg)
(const)) and therefore combine_simplify_rtx () simplifies the subreg expression
right down to the narrowed memref i.e. mem:QI (plus:DI (reg:DI) (const)). This
gets combined to insn 89 to become

(insn 89 87 90 4 (set (reg:SI 155)
   (ashift:SI (reg:SI 158) (mem:QI (plus:DI (reg:DI) (const

This then, gets later checked with recog () and since the predicate for operand
2 does not allow memory operands for shifts in aarch64.md, does not get combined
and all is well.

After the discussions Richard Earnshaw had on IRC with Andrew Pinski, it was
felt that it was best to fix the standard predicate
'general_operand' to not allow SUBREG (MEM) with side-effects as it has known
issues associated with it - particularly reload not being able to deal with such
cases. 'general_operand' currently outlaws cases like paradoxical SUBREG (MEM)
on targets with insn scheduling and SUBREG (MEM) with non-zero SUBREG_BYTE. This
is another case it should not allow. Here is a patch that tig

RE: [PATCH][AArch64] Restrict usage of SBFIZ to valid range only

2012-10-16 Thread Ian Bolton
> Subject: [PATCH][AArch64] Restrict usage of SBFIZ to valid range only
> 
> This fixes an issue where we were generating an SBFIZ with
> operand 3 outside of the valid range (as determined by the
> size of the destination register and the amount of shift).
> 
> My patch checks that the range is valid before allowing
> the pattern to be used.
> 
> This has now had full regression testing and all is OK.
> 
> OK for aarch64-trunk and aarch64-4_7-branch?
> 
> Cheers,
> Ian
> 
> 
> 2012-10-15  Ian Bolton  
> 
>   * gcc/config/aarch64/aarch64.md
>   (_shft_): Restrict based on op2.
> 
> 
> -
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md
> index e6086a9..3bfe6e6 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -2311,7 +2311,7 @@
> (ashift:GPI (ANY_EXTEND:GPI
>  (match_operand:ALLX 1 "register_operand" "r"))
> (match_operand 2 "const_int_operand" "n")))]
> -  ""
> +  " <= ( - UINTVAL (operands[2]))"
>"bfiz\\t%0, %1, %2, #"
>[(set_attr "v8type" "bfm")
> (set_attr "mode" "")]
> 
> 
> 

New and improved version is at the end of this email.

This has had full regression testing and all is OK.

OK for aarch64-trunk and aarch64-4_7-branch?

Cheers,
Ian



2012-10-16  Ian Bolton  

  * gcc/config/aarch64/aarch64.md
  (_shft_): Restrict operands.




diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e6086a9..e77496f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2311,8 +2311,13 @@
(ashift:GPI (ANY_EXTEND:GPI
 (match_operand:ALLX 1 "register_operand" "r"))
(match_operand 2 "const_int_operand" "n")))]
-  ""
-  "bfiz\\t%0, %1, %2, #"
+  "UINTVAL (operands[2]) < "
+{
+  operands[3] = ( <= ( - UINTVAL (operands[2])))
+ ? GEN_INT ()
+ : GEN_INT ( - UINTVAL (operands[2]));
+  return "bfiz\t%0, %1, %2, %3";
+}
   [(set_attr "v8type" "bfm")
(set_attr "mode" "")]
 )





Re: [go] bulitins housekeeping; add __bultin_unreachable

2012-10-16 Thread Ian Lance Taylor
On Tue, Oct 16, 2012 at 8:14 AM, Jan Hubicka  wrote:

> this patch udpates go-frontend to deifine unreachable bultin I need for loop
> and LTO optimizations.  I also noticed that GO ignores existence of all flags
> except for CONST and thus I synchronized the flags with C FE variants.

I can't see how it makes sense to have to change the frontends in
order to be able to generate calls to a builtin function in the
middle-end.  Builtin functions used by the middle-end should be
defined by the middle-end, without any effort required on the part of
the frontend.  The frontend should only need to define functions that
might appear in the language that the frontend is compiling.

In other words, why not just have the middle-end call
add_builtin_function as required?  I see that some builtin functions
are already added by the middle-end in build_common_builtin_nodes.

Ian


Re: [PATCH][AArch64] Restrict usage of SBFIZ to valid range only

2012-10-16 Thread Richard Earnshaw

On 16/10/12 17:34, Ian Bolton wrote:

Subject: [PATCH][AArch64] Restrict usage of SBFIZ to valid range only

This fixes an issue where we were generating an SBFIZ with
operand 3 outside of the valid range (as determined by the
size of the destination register and the amount of shift).

My patch checks that the range is valid before allowing
the pattern to be used.

This has now had full regression testing and all is OK.

OK for aarch64-trunk and aarch64-4_7-branch?

Cheers,
Ian


2012-10-15  Ian Bolton  

* gcc/config/aarch64/aarch64.md
(_shft_): Restrict based on op2.


-


diff --git a/gcc/config/aarch64/aarch64.md
b/gcc/config/aarch64/aarch64.md
index e6086a9..3bfe6e6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2311,7 +2311,7 @@
 (ashift:GPI (ANY_EXTEND:GPI
  (match_operand:ALLX 1 "register_operand" "r"))
 (match_operand 2 "const_int_operand" "n")))]
-  ""
+  " <= ( - UINTVAL (operands[2]))"
"bfiz\\t%0, %1, %2, #"
[(set_attr "v8type" "bfm")
 (set_attr "mode" "")]





New and improved version is at the end of this email.

This has had full regression testing and all is OK.

OK for aarch64-trunk and aarch64-4_7-branch?

Cheers,
Ian



2012-10-16  Ian Bolton  

   * gcc/config/aarch64/aarch64.md
   (_shft_): Restrict operands.



OK both.

R.





diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e6086a9..e77496f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2311,8 +2311,13 @@
 (ashift:GPI (ANY_EXTEND:GPI
  (match_operand:ALLX 1 "register_operand" "r"))
 (match_operand 2 "const_int_operand" "n")))]
-  ""
-  "bfiz\\t%0, %1, %2, #"
+  "UINTVAL (operands[2]) < "
+{
+  operands[3] = ( <= ( - UINTVAL (operands[2])))
+ ? GEN_INT ()
+ : GEN_INT ( - UINTVAL (operands[2]));
+  return "bfiz\t%0, %1, %2, %3";
+}
[(set_attr "v8type" "bfm")
 (set_attr "mode" "")]
  )









Re: [Java] Tidy bultins and add __bultin_unreachable

2012-10-16 Thread Tom Tromey
> "Jan" == Jan Hubicka  writes:

Jan> this patch udpates Java frontend to declare __bulitin_unreachable
Jan> and also fixes flags of synchronize bulitins to match ones from C
Jan> FE.

Jan> Regtested x86_64-linux, OK?

The java bits are ok.

Jan> +  /* Looping const or pure is imlpies by noreturn.

Typo, should be "implied".

Tom


Re: [go] bulitins housekeeping; add __bultin_unreachable

2012-10-16 Thread Jan Hubicka
> On Tue, Oct 16, 2012 at 8:14 AM, Jan Hubicka  wrote:
> 
> > this patch udpates go-frontend to deifine unreachable bultin I need for loop
> > and LTO optimizations.  I also noticed that GO ignores existence of all 
> > flags
> > except for CONST and thus I synchronized the flags with C FE variants.
> 
> I can't see how it makes sense to have to change the frontends in
> order to be able to generate calls to a builtin function in the
> middle-end.  Builtin functions used by the middle-end should be
> defined by the middle-end, without any effort required on the part of
> the frontend.  The frontend should only need to define functions that
> might appear in the language that the frontend is compiling.
> 
> In other words, why not just have the middle-end call
> add_builtin_function as required?  I see that some builtin functions
> are already added by the middle-end in build_common_builtin_nodes.

Hmm, I was not aware of these bits (and yes, I agree it is non-sence this being
duplicated everywhere).  I will add UNREACHABLE there.  What about rest of the
change (i.e. adding the proper bits)?

honza
> 
> Ian


Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Xinliang David Li
The way to do this is not to use the shadow memory, but to clobber
directly the variable itself with specified byte pattern -- though
error diagnostics won't be as nice.

To use shadow memory, the scope start marker is also needed.

David

On Tue, Oct 16, 2012 at 9:12 AM, Jakub Jelinek  wrote:
> On Tue, Oct 16, 2012 at 09:06:22AM -0700, Xinliang David Li wrote:
>> I don't get it.  Clobber marks the end of lifetime of a variable so it
>> is safe to emit code to really clobber its value -- otherwise how
>> would clobber based slot sharing work?
>
> If you at CLOBBER protect the var in the shadow mem as unaccessible, the
> thing is where you make it accessible again.
> Consider:
>
> int i;
> for (i = 0; i < 2; i++)
>   {
> int tmp;
> bar ();
> baz (&tmp);
> bar ();
>   }
>
> where the first bar () can't possibly access tmp, but the second bar () can.
> Now we unroll it and have:
>   bar ();
>   baz (&tmp);
>   bar ();
>   tmp ={v} {CLOBBER};
>   bar ();
>   baz (&tmp);
>   bar ();
>   tmp ={v} {CLOBBER};
> So, if you want to *mem_to_shadow (&tmp) = 0xff; at {CLOBBER} time, where
> and how you would insert *mem_to_shadow (&tmp) = 0; again?  In the above
> code it would need to go before the second baz (&tmp);, the question is
> how you'd find that out...
>
> Jakub


Re: [Java] Tidy bultins and add __bultin_unreachable

2012-10-16 Thread Jan Hubicka
> > "Jan" == Jan Hubicka  writes:
> 
> Jan> this patch udpates Java frontend to declare __bulitin_unreachable
> Jan> and also fixes flags of synchronize bulitins to match ones from C
> Jan> FE.
> 
> Jan> Regtested x86_64-linux, OK?
> 
> The java bits are ok.
> 
> Jan> +  /* Looping const or pure is imlpies by noreturn.
> 
> Typo, should be "implied".

Thanks,
as per Iant comment I will remove the bits adding __buliltin_unreachable
and put it into build_common_builtin_nodes

Honza


[RFC] VEC interface overhaul

2012-10-16 Thread Diego Novillo
I have overhauled the interface to VEC over the last few
weeks.  I implemented most of the plan I outlined in
http://gcc.gnu.org/ml/gcc/2012-08/msg00268.html.

I have implemented the embedded and space-efficient vectors.  The
diff for vec.[ch] is sufficiently confusing that I'm just
attaching both files for review.

In this message, I want to outline the major changes I've done.
The patch still does not work (gengtype is giving me a LOT of
problems because vectors are not pointers anymore).

There are two new types:

1- Embedded vectors: vec_e

   These are fixed-size vectors useful to be embedded in
   structures.  For instance, tree_binfo::base_binfos.  They use
   the trailing array idiom.  Once allocated, they cannot be
   re-sized.


2- Space efficient vectors: vec_s

   These are the traditional VECs we have always used.  They are
   implemented as a pointer to a vec_e.  The
   difference with traditional VECs is that they no longer need
   to be pointer themselves, so their address does not change if
   the internal vector needs to be re-allocated.  They still use
   just a single word of storage before allocation, so they can
   be embedded in data structures without altering their size.

I have not yet implemented the "fast" vectors from my proposal.
Those would be useful for free variables or structures that can
tolerate an extra word of storage.

Needless to say the patch is gigantic: 2.4 Mb, 334 files
changed, 10718 insertions(+), 11536 deletions(-).  But it is
largely mechanic.

We no longer access the vectors via VEC_* macros.  The pattern is
"VEC_operation (T, A, V, args)" becomes "V.operation (args)".
That is mostly why you see ~800 fewer lines in the patch.

The only thing I could not do is create proper ctors and dtors
for the vec_s/vec_e classes.  Since these vectors are stored in
unions, we have to keep them as PODs (C++03 does not allow
non-PODs in unions).

This means that creation and destruction must be explicit.  There
is a new method vec_s::create() and another
vec_s::destroy() to allocate the internal
vector.

My next steps are:

- Fix gengtype issues.  Since the GTY(()) vectors are not
  pointers, gengtype is ignoring them, not adding roots and
  marking functions.  I think I've almost fixed it, but it will
  take me another day or two.

- Test on all affected architectures and compilation modes.

- Make sure memory consumption and performance are still on par
  with the current implementation.

Things I want to change (that I may do in subsequent patches):

- The allocation strategy ought to be an allocator type not an
  enum.

- vec_s::iterate should disappear.  Instead, we want to use
  standard iterator idioms.


Please take a look at vec.[ch].  I have also pushed my changes to
the git branch git://gcc.gnu.org/git/gcc.git/dnovillo/vec-rewrite
(it's based on today's trunk).

Comments?


Thanks.  Diego.
/* Vector API for GNU compiler.
   Copyright (C) 2004, 2005, 2006, 2007, 2008, 2010, 2011, 2012
   Free Software Foundation, Inc.
   Contributed by Nathan Sidwell 
   Re-implemented in C++ by Diego Novillo 

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.

GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
.  */

/* This file is compiled twice: once for the generator programs
   once for the compiler.  */
#ifdef GENERATOR_FILE
#include "bconfig.h"
#else
#include "config.h"
#endif

#include "system.h"
#include "coretypes.h"
#include "ggc.h"
#include "vec.h"
#include "diagnostic-core.h"
#include "hashtab.h"

/* Store information about each particular vector.  */
struct vec_descriptor
{
  const char *function;
  const char *file;
  int line;
  size_t allocated;
  size_t times;
  size_t peak;
};


/* Hashtable mapping vec addresses to descriptors.  */
static htab_t vec_desc_hash;

/* Hashtable helpers.  */
static hashval_t
hash_descriptor (const void *p)
{
  const struct vec_descriptor *const d =
(const struct vec_descriptor *) p;
  return htab_hash_pointer (d->file) + d->line;
}
static int
eq_descriptor (const void *p1, const void *p2)
{
  const struct vec_descriptor *const d = (const struct vec_descriptor *) p1;
  const struct vec_descriptor *const l = (const struct vec_descriptor *) p2;
  return d->file == l->file && d->function == l->function && d->line == l->line;
}

/* Hashtable converting address of allocated field to loc descriptor.  */
static htab_t ptr_hash;
struct ptr_hash_entry
{
  void *ptr;
  struct vec_descriptor *loc;
  size_t allocated;
};

/* Hash table

Re: [PATCH RFA] Implement register pressure directed hoist pass

2012-10-16 Thread Jeff Law

On 10/16/2012 01:44 AM, Bin Cheng wrote:

Hi Steven, Jeff,
I found a flaw in original patch, which results in inaccurate register
pressure.
Here comes the updated patches, improving code size a little bit on
Thumb2/powerpc comparing to original patches.
Please review.

Thanks

2012-10-16  Bin Cheng

* gcse.c: Update copyright dates.
(hoist_expr_reaches_here_p): Change parameter type from char *
to sbitmap.

2012-10-16  Bin Cheng

* common.opt (flag_ira_hoist_pressure): New.
* doc/invoke.texi (-fira-hoist-pressure): Describe.
* ira-costs.c (ira_set_pseudo_classes): New parameter.
* ira.h: Update copyright dates.
(ira_set_pseudo_classes): Update prototype.
* haifa-sched.c (sched_init): Update call.
* ira.c (ira): Update call.
* regmove.c: Update copyright dates.
(regmove_optimize): Update call.
* loop-invariant.c: Update copyright dates.
(move_loop_invariants): Update call.
* gcse.c: (struct bb_data): New structure.
(BB_DATA): New macro.
(curr_bb, curr_reg_pressure): New static variables.
(should_hoist_expr_to_dom): Rename from hoist_expr_reaches_here_p.
Change parameter expr_index to expr.
New parameters pressure_class, nregs and hoisted_bbs.
Use reg pressure to determine the distance expr can be hoisted.
(hoist_code): Use reg pressure to direct the hoist process.
(get_regno_pressure_class, get_pressure_class_and_nregs)
(change_pressure, calculate_bb_reg_pressure): New.
(one_code_hoisting_pass): Calculate register pressure. Allocate
and free data.





+
+ /* Currently register pressure for each pressure class.  */
+ static int curr_reg_pressure[N_REG_CLASSES];

"Currently" -> "Current"

OK for the mainline sources.

Thanks for your patience,
Jeff


Re: [go] bulitins housekeeping; add __bultin_unreachable

2012-10-16 Thread Ian Lance Taylor
On Tue, Oct 16, 2012 at 9:44 AM, Jan Hubicka  wrote:
>
> Hmm, I was not aware of these bits (and yes, I agree it is non-sence this 
> being
> duplicated everywhere).  I will add UNREACHABLE there.  What about rest of the
> change (i.e. adding the proper bits)?

I suppose it's basically fine but add add set_call_expr_flags first.
Please don't commit any changes to gcc/go/gofrontend/gogo-tree.cc, as
that file is copied from an upstream repository.

Ian


Re: [go] bulitins housekeeping; add __bultin_unreachable

2012-10-16 Thread Jan Hubicka
> On Tue, Oct 16, 2012 at 9:44 AM, Jan Hubicka  wrote:
> >
> > Hmm, I was not aware of these bits (and yes, I agree it is non-sence this 
> > being
> > duplicated everywhere).  I will add UNREACHABLE there.  What about rest of 
> > the
> > change (i.e. adding the proper bits)?
> 
> I suppose it's basically fine but add add set_call_expr_flags first.
> Please don't commit any changes to gcc/go/gofrontend/gogo-tree.cc, as
> that file is copied from an upstream repository.

OK, I noticed that similar code already exists in the tree.c builtin creation
so I will get the generic bits done first.
Thanks!
Honza


Re: [path] PR 54900: store data race in if-conversion pass

2012-10-16 Thread Ian Lance Taylor
On Mon, Oct 15, 2012 at 5:28 AM, Aldy Hernandez  wrote:
>
> Here we have a store data race because noce_can_store_speculate_p()
> incorrectly returns true.  The problem is that memory_modified_in_insn_p()
> returns true if an instruction *MAY* modify a memory location, but the store
> can only be speculated if we are absolutely sure the store will happen on
> all subsequent paths.
>
> My approach is to implement a memory_SURELY_modified_in_insn_p(), which will
> trigger this optimization less often, but at least it will be correct.

I don't see a reason to add this new function.  I would just inline it
into noce_can_store_speculate_p, replacing the call to
memory_modified_in_p.  And I'm not sure I see a reason to change the
comment for memory_modified_in_p, it seems to already be accurate.

Clearly we could consider the possibility of a PARALLEL of SET insns,
but of course most the compiler won't handle that anyhow.  I suppose
that would be a reason to use memory_surely_modified_in_insn_p, but in
that case you might as well handle the PARALLEL case now.

Ian


Re: [path] PR 54900: store data race in if-conversion pass

2012-10-16 Thread Jakub Jelinek
On Tue, Oct 16, 2012 at 10:19:54AM -0700, Ian Lance Taylor wrote:
> On Mon, Oct 15, 2012 at 5:28 AM, Aldy Hernandez  wrote:
> I don't see a reason to add this new function.  I would just inline it
> into noce_can_store_speculate_p, replacing the call to
> memory_modified_in_p.  And I'm not sure I see a reason to change the
> comment for memory_modified_in_p, it seems to already be accurate.
> 
> Clearly we could consider the possibility of a PARALLEL of SET insns,
> but of course most the compiler won't handle that anyhow.  I suppose
> that would be a reason to use memory_surely_modified_in_insn_p, but in
> that case you might as well handle the PARALLEL case now.

I see another problem with noce_can_store_speculate_p, the return false;
checks.  The search for a store to exactly that location can be done
just on post dominator bbs, but I wonder how you can avoid looking for
volatile insns or non-const/pure calls (or insns that may modify
the address) also in other bbs.
What if there is:
  location of considered if-conversion on mem
  if (some_condition)
call ();
  mem = something;
?  We only walk immediate post-dominators, so look at the bb containing
mem = something;, but I think a call/volatile insn/modification of mem's
address is problematic in any bb on any path in between test_bb and
the post-dominator on which the write has been found.

Jakub


Re: [Patch, Fortran] Fix OPTIONAL, esp. with polymorphism

2012-10-16 Thread Dominique Dhumieres
Tobias,

The previous class_optional_2.f90 runs after your commit, but it takes
~168s (so it may have run with the previous patch also, but I was not
patient enough to see it). The culprits are given by the following
profile:

+ 100.0%, start, a.out
| + 100.0%, main, a.out
| | + 100.0%, MAIN__, a.out
| | | + 25.5%, a3ac1.2085, a.out
| | | |   7.2%, s2elem_t.2178, a.out
| | | |   7.0%, s2elem.2184, a.out
| | | + 25.5%, a1a.2151, a.out
| | | |   15.3%, s2elem_t2.2175, a.out
| | | + 24.5%, a1a1.2168, a.out
| | | |   14.2%, s2elem_t.2178, a.out
| | | + 12.2%, a3a.2097, a.out
| | | |   7.1%, s2elem_t2.2175, a.out
| | | + 12.2%, a3a1.2110, a.out
| | | |   7.1%, s2elem_t.2178, a.out

Dominique


Re: Make try_unroll_loop_completely to use loop bounds recorded

2012-10-16 Thread Jan Hubicka
Hi,
here is third revised version of the complette unroling changes.  While working
on the RTL variant I noticed PR54937 and the fact that I was overly aggressive
on forcing single exit of the last iteration to be taken, because loop may 
terminate
otherwise (by EH or by exitting the program).  Same thinko is in loop-niter.

This patch adds loop_edge_to_cancel that is more conservative: it looks for the
exit conditional where the non-exitting edges leads to latch and verifies that
latch contains no statement with side effect that may terminate the loop.
This still actually matches quite few non-single-exit loops and works well in
practice.

Unlike previous revision it also enables complette unrolling when code size
does not grow even for non-innermost loops (with update in
tree_unroll_loops_completely to walk them). This is something we did on RTL
land but missed in trees.  This actually enables quite some optimizations when
things can be propagated to the tiny inner loop body.

I also fixed accounting in tree_estimate_loop_size for the cases where last
iteration is not going to be updated.

Finally I added code constructing __bulitin_unreachable as suggested by
Ian.

Bootstrapped/regtested x86_64-linux, also bootstrapped with -O3 and -Werror
disabled and benchmarked. Among best benefits is about 7% improvement on Applu,
and it causes up to 15% improvements on vectorized loops with small iteration
counts (by completelly peeling the precondition code).  There are no real
performance regressions but there is some code size bloat.  

I plan to followup with strenghtening the heuristic to disable unrolling when
benefits are absymal.  Easy is to limit unrolling on loops with CFG and/or
calls in them.  We already have quite informed analysis in place.  I also plan
to move simple FDO guided loop peeling from RTL level to trees to enable more
propagation into peeled sequences.

The patch also triggers bug in niter and requires xfailing do_1.f90 testcase.
I filled PR 54932 to track this.

There are also confused array bound warnings I hope to track incrementally, too,
by recording statements that are known to become unreachable in the last
iteration and adding __buitin_unreachable in front of them. This is also
important to avoid duplication leading to dead code: no other optimizers
force paths leading to out of bound accesses to not happen.

Honza


* tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Add edge_to_cancel
parameter and use it to estimate code optimized out in the final 
iteration.
(loop_edge_to_cancel): New function.
(try_unroll_loop_completely): New IRRED_IVALIDATED parameter;
handle unrolling loops with bounds given via max_loop_iteratins;
handle unrolling non-inner loops when code size shrinks;
tidy dump output; when the last iteration loop still stays
as loop in the CFG forcongly redirect the latch to
__builtin_unreachable.
(canonicalize_loop_induction_variables): Add irred_invlaidated
parameter; record niter bound derrived; dump
max_loop_iterations bounds; call try_unroll_loop_completely
even if no niter bound is given.
(canonicalize_induction_variables): Handle irred_invalidated.
(tree_unroll_loops_completely): Handle non-innermost loops;
handle irred_invalidated.
* cfgloop.h (unlop): Declare.
* cfgloopmanip.c (unloop): Export.
* tree.c (build_common_builtin_nodes): Build BULTIN_UNREACHABLE.

* gcc.target/i386/l_fma_float_?.c: Update.
* gcc.target/i386/l_fma_double_?.c: Update.
* gfortran.dg/do_1.f90: XFAIL
* gcc.dg/tree-ssa/cunroll-1.c: New testcase.
* gcc.dg/tree-ssa/cunroll-2.c: New testcase.
* gcc.dg/tree-ssa/cunroll-3.c: New testcase.
* gcc.dg/tree-ssa/cunroll-4.c: New testcase.
* gcc.dg/tree-ssa/cunroll-5.c: New testcase.
* gcc.dg/tree-ssa/ldist-17.c: Block cunroll to make testcase still
valid.
Index: tree-ssa-loop-ivcanon.c
===
--- tree-ssa-loop-ivcanon.c (revision 192483)
+++ tree-ssa-loop-ivcanon.c (working copy)
@@ -192,7 +192,7 @@ constant_after_peeling (tree op, gimple 
Return results in SIZE, estimate benefits for complete unrolling exiting by 
EXIT.  */
 
 static void
-tree_estimate_loop_size (struct loop *loop, edge exit, struct loop_size *size)
+tree_estimate_loop_size (struct loop *loop, edge exit, edge edge_to_cancel, 
struct loop_size *size)
 {
   basic_block *body = get_loop_body (loop);
   gimple_stmt_iterator gsi;
@@ -208,8 +208,8 @@ tree_estimate_loop_size (struct loop *lo
 fprintf (dump_file, "Estimating sizes for loop %i\n", loop->num);
   for (i = 0; i < loop->num_nodes; i++)
 {
-  if (exit && body[i] != exit->src
- && dominated_by_p (CDI_DOMINATORS, body[i], exit->src))
+  if (edge_to_cancel && body[i] != edge_to_cancel->src
+ && dom

Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Jakub Jelinek
On Tue, Oct 16, 2012 at 09:48:28AM -0700, Xinliang David Li wrote:
> The way to do this is not to use the shadow memory, but to clobber
> directly the variable itself with specified byte pattern -- though
> error diagnostics won't be as nice.

Clobbering the memory directly is definitely easy, but doesn't have anything
to do how shadow memory poisioning is done for stack vars.
Guess should be done depending on some special option, not unconditionally
implied by -fasan though.

Jakub


Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Xinliang David Li
On Tue, Oct 16, 2012 at 10:40 AM, Jakub Jelinek  wrote:
> On Tue, Oct 16, 2012 at 09:48:28AM -0700, Xinliang David Li wrote:
>> The way to do this is not to use the shadow memory, but to clobber
>> directly the variable itself with specified byte pattern -- though
>> error diagnostics won't be as nice.
>
> Clobbering the memory directly is definitely easy, but doesn't have anything
> to do how shadow memory poisioning is done for stack vars.
> Guess should be done depending on some special option, not unconditionally
> implied by -fasan though.

yes -- agree.  Longer term this functionality should be using the
shadow memory scheme though.

David

>
> Jakub


Re: [PATCH][RFC] Re-organize how we stream trees in LTO

2012-10-16 Thread Diego Novillo

On 2012-10-16 10:43 , Richard Biener wrote:


This patch shows work-in-progress (read: implementation uglyness
hopefully to vanish ...) regarding to moving LTO type merging
work from WPA to compile stage.


You mean to LTRANS, the stage after WPA, right?


The patch re-organizes lto_output_tree (the write_tree streamer
hook for LTO) in a way so that we output all tree fields
in easy to discover places.  This means that we have forward
references to trees not yet (fully) written, something the
current state avoids by "inline-expanding" forward referenced
trees at the point of reference (which results in interweaved
trees on disk).  To be able to achieve this lto_output_tree
now performs a DFS walk along tree edges to collect trees
that need to be output and outputs them in SCC chunks
in a new LTO stream container, LTO_tree_scc.


Nice.  The interleaved output is ugly.


This will allow the reader side at WPA stage to call uniquify
nodes on each tree SCC, avoiding completely the DFS walks done
there (hopefully, we do DFS walks for both hashing and comparing
there - note that WPA gathers type SCCs, not tree SCCs - a subtle
difference that remains to be seen on whether it is important ...)

One complication is how we handle streaming INTEGER_CSTs.
When an INTEGER_CST refers to an already input type we can
allocate it using the build_int_cst_wide routine which takes
care of putting it into the appropriate hashtables for caching.
If an INTEGER_CST is part of a SCC (TYPE_DOMAIN values are
the most obvious cases) we now stream them as regular trees
(they cannot already exist in any cache as the type is being
read in) - but the patch does not yet populate the caches
in case the type ends up prevailing (thus it will produce
some unshared INTEGER_CSTs for now).

One uglyness of the patch is how I mis-use the streamer hooks
to switch the tree streamer routines from actually writing
tree references to performing the DFS walk.  For the DFS walk
I need information on the edge, thus a 'from' tree argument
is added to the write_tree hook.  Eventually my plan was
to transform this to a walk_tree-like interface.

Diego - is PTH still live?  Thus, do I need to bother about
inventing things in a way that can be hook-ized?


We will eventually revive PPH.  But not in the short term.  I think it 
will come back when/if we start implementing C++ modules.  Jason, 
Lawrence, is that something that you see coming for the next standard?


I suspect that the front end will need to distance itself from 'tree' 
and have its own streamable IL.  So, the hooks may not be something we 
need to keep long term.


Emitting the trees in SCC groups should not affect the C++ streamer too 
much.  It already is doing its own strategy of emitting tree headers so 
it can do declaration and type merging.  As long as the trees can be 
fully materialized from the SCC groups, it should be fine.



forsee any complication for PTH or C++ modules the way
I re-arranged things?  Any good (C++) ideas on how to
design this lto_walk_tree?


Sorry.  What lto_walk_tree?


--- 44,50 
tree itself.  The second boolean parameter specifies this for
the tree itself, the first for all siblings that are streamed.
The referencing mechanism is up to each streamer to implement.  */
!   void (*write_tree) (struct output_block *, tree, tree, bool, bool);


You want to document what the new 'tree' argument does here.


! #define stream_write_tree_edge_shallow_non_ref(OB, FROM, TO, REF_P) \
! streamer_hooks.write_tree(OB, FROM, TO, REF_P, false)


Why the stream_write_tree_edge() naming?  Not sure what you mean by this.


Diego.


Re: [Java] Tidy bultins and add __bultin_unreachable

2012-10-16 Thread Andrew Haley
On 10/16/2012 08:17 AM, Jan Hubicka wrote:
>   * builtins.c (define_builtin): Accept ECF flags and
>   use set_call_expr_flags.
>   (initialize_builtins): Update; add BULIT_IN_UNREACHALE.
> 
>   * calls.c (set_call_expr_flags): New.
>   * tree.h (set_call_expr_flags): Declare.

OK, thanks.

Andrew.



Re: [Patch, Fortran] Fix OPTIONAL, esp. with polymorphism

2012-10-16 Thread Tobias Burnus

Dominique,

Dominique Dhumieres wrote:

The previous class_optional_2.f90 runs after your commit, but it takes
~168s (so it may have run with the previous patch also, but I was not
patient enough to see it).


Here, it takes (current version) < 2s to compile and 0.150s to run the 
program. If I look at your profile output, the new class_optional_1.f90 
should be okay as it doesn't get listed with your profile.


The problems seem to occur with elemental calls, which means that 
probably some additional elemental bugs exist, even if they do not show 
up in valgrind.


Can you confirm that the class_optional_1.f90 of the trunk works correctly?

For class_optional_2.f90, we might have to comment some additional 
elemental calls. I wonder what goes wrong there. (But given that other 
elemental calls don't work, I do not wonder that much.)


Tobias


The culprits are given by the following
profile:

+ 100.0%, start, a.out
| + 100.0%, main, a.out
| | + 100.0%, MAIN__, a.out
| | | + 25.5%, a3ac1.2085, a.out
| | | |   7.2%, s2elem_t.2178, a.out
| | | |   7.0%, s2elem.2184, a.out
| | | + 25.5%, a1a.2151, a.out
| | | |   15.3%, s2elem_t2.2175, a.out
| | | + 24.5%, a1a1.2168, a.out
| | | |   14.2%, s2elem_t.2178, a.out
| | | + 12.2%, a3a.2097, a.out
| | | |   7.1%, s2elem_t2.2175, a.out
| | | + 12.2%, a3a1.2110, a.out
| | | |   7.1%, s2elem_t.2178, a.out





Re: [PATCH] Rs6000 infrastructure cleanup (switches), revised patch #4

2012-10-16 Thread Michael Meissner
On Tue, Oct 16, 2012 at 03:02:47PM +, Joseph S. Myers wrote:
> That is:
> 
> 1. Patch adding TARGET_FOO aliases for OPTION_FOO (small change to the awk 
> scripts and associated documentation, I expect).
>
> 2. Large, mechanical, automatically generated patch to change existing 
> OPTION_FOO users (or maybe one such patch per target).

I just grep'ed for OPTION_, filtering out OPTION_MASK_, TARGET_OPTION_OVERRIDE,
OPTION_DEFAULT_SPECS_*, OPTION_GLIBC, OPTION_UCLIBC, OPTION_BIONIC,
TARGET_OPTION_SAVE, TARGET_OPTION_RESTORE, TARGET_OPTION_PRINT,
OPTION_TARGET_CPU_DEFAULT, TARGET_OPTION_VALID_ATTRIBUTE_P, _SPEC[\" ],
MIPS_ARCH_*, TARGET_OPTION_*, RS6000_CPU_OPTION_NATIVE, and there is only one
place where OPTION_* is used as a test (config/linux-android.h).  The only
other port to do OPTION_* is x86, and there they have a bunch of #defines that
map OPTION_ into TARGET_.  So it looks fairly straight forward to do
the conversion in one jump.

> 3. Patch removing the OPTION_FOO name (small change to awk scripts and 
> documentation).
> 
> Then you've eliminated one unnecessary cause of changes when moving bits 
> out of target_flags.
> 
> > If TargetName were defined, it would use TARGET_ instead of 
> > OPTION_,
> > but the OPTION_MASK_ would not be changed.
> 
> Not needed, given the above sequence of changes.

Yep, I would prefer not to have to add TargetName, though it is simple enough.

> > If SetFunction was defined, the opt*.awk files would generate:
> > 
> > #define SET_FOO(VALUE)  \
> > do {\
> >   if (VALUE)\
> > target_flags &= ~MASK_FOO;  \
> >   else  \
> > target_flags |= MASK_FOO;   \
> > } while (0)
> > 
> > If ExplicitFunction was defined, the opt*.awk files would generate:
> > 
> > #define EXPLICIT_FOO(VALUE) \
> >   ((global_options_set.x_target_flags & MASK_FOO) != 0)
> 
> I'd like any such new macros to take an argument that's the pointer to the 
> relevant options structure (global_options, global_options_set).  If the 
> place where the macro is called has a pointer available, then it can be 
> passed in, otherwise pass in &global_options or &global_options_set unless 
> and until such a pointer becomes available in the relevant place.

It occurs to me that now that we've committed to GCC being done in C++, we
could just make global_options{,_set} be a class instead of a structure.  So
you could say:

global_options.set_FOO (value)

Or:

global_options.set_FOO ();
global_options.clear_FOO ();

I could generate the macros (or inline functions) if you would prefer to stick
the C style of doing things.  However, as an old C dinosaur, I'm not sure of
all of the ramifications of doing this.  It just seems it would be cleaner to
use the class structure, instead of passing pointers.

> > How would you feel about SetFunction, ExplicitFunction, and the reduced
> > TargetName?
> 
> The principle of having macros for setting flags or testing if they are 
> explicitly set is fine, though it's not clear to me that they need any 
> such special settings as SetFunction and ExplicitFunction (rather than 
> being generated unconditionally).

Yes, it is simpler not to have to add another flag.  I was just trying to be
conservative in generating things other ports might not reference.

> I'd quite like the macros such as target_flags that refer to global 
> options to end up not being lvalues at all.  That helps ensure that option 
> settings are only modified in limited places that have options pointers.  
> It would be nice eventually for such things as "optimize" and "target" 
> attributes to be able to swap options structures, and to work closer to 
> how options on the command line are processed - for that, you want careful 
> control on what places actually modify options at all.

Yep, though unfortunately that are various ports that do want to change
optimization options if not set.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899



Re: [Patch, Fortran] Fix OPTIONAL, esp. with polymorphism

2012-10-16 Thread Dominique Dhumieres
> Can you confirm that the class_optional_1.f90 of the trunk works correctly?

Yes: I have regtested with

make -k check-gfortran RUNTESTFLAGS="dg.exp=class_optional* 
--target_board=unix'{-m32,-m64}'"

without failures.

The profile I have posted was for the former version of class_optional_2.f90.
The profiler I have to use on darwin monitors the codes running, it looks
like it missed all the routines which took only a tiny time.

Dominique


[PATCH][google] Add warning flags for clang compatibility

2012-10-16 Thread Delesley Hutchins
This patch adds the clang thread safety warning flags as recognized
warning flags that do nothing, for command-line compatibility with
clang.  Enclosed patch is for branches/google/gcc-4_6, I have
identical patches for google/gcc-4_7 and google/main.

Tested on linux x86.

-- 
DeLesley Hutchins | Software Engineer | deles...@google.com | 505-206-0315


gcc-warning-flags.patch
Description: Binary data


Re: Ping: RFA: add lock_length attribute to break branch-shortening cycles

2012-10-16 Thread Joern Rennecke

Quoting Richard Sandiford :


Joern Rennecke  writes:

2012-10-04  Joern Rennecke  

 * final.c (get_attr_length_1): Use direct recursion rather than
 calling get_attr_length.
 (get_attr_lock_length): New function.
 (INSN_VARIABLE_LENGTH_P): Define.
 (shorten_branches): Take HAVE_ATTR_lock_length into account.
 Don't overwrite non-delay slot insn lengths with the lengths of
 delay slot insns with same uid.
 * genattrtab.c (lock_length_str): New variable.
 (make_length_attrs): New parameter base.
 (main): Initialize lock_length_str.
 Generate lock_lengths attributes.
 * genattr.c (gen_attr): Emit declarations for lock_length attribute
related functions.
* doc/md.texi (node Insn Lengths): Document lock_length attribute.

http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00383.html


Sorry, this is really just repeating Richard B's comments, but I still
don't understand why we need two attributes.  Why can't shorten_branches
work with the existing length and simply make sure that the length doesn't
decrease from one iteration to the next?  That seems to be how you implement
CASE_VECTOR_SHORTEN_MODE.  It also means that we can continue to use the
pessimistic algorithm for -O0.


That does not really work for ARCompact, non-branch instructions are
lengthened to align or un-align branch instructions.  Failure to
correct the size of such instruction downwards when indicated messes
not only up the directly affected branch, but also gets the alignment
wrong downstream, and can even cause branches go out of range  
(blindsiding branch shortening) when there are interactions with  
explicit alignments

for things like loops.
Unless you think it's OK to poke the contents of the insn_length array from
ADJUST_INSN_LENGTH.  That should work, but it'd require an extra parameter
when you want to hookize this macro.


You said in your reply that one of the reasons was to avoid
"interesting" interactions with ADJUST_INSN_LENGTH.  But the
previous minimum length would be applied after ADJUST_INSN_LENGTH,
so I'm not sure why it's a factor.


Well, for starters, the ARCompact ADJUST_INSN_LENGTH uses
get_attr_lock_length when processing delayed-branch SEQUENCEs.
And then there's the short/long instruction opcode distinction (function /
attribute verify_short), which depends on alignment, and influences
instruction length and conditional execution calculation.
Without exact alignment information, branch scheduling becomes a crapshot.

You might think I could subsume the length effect of the upsized instructions
in the affected branch, but that doesn't work because the exact length is
needed both for ADJUST_INSN_LENGTH, and in order to output the correct
branch / jump output template.



If lock_length is just an optimisation on top of that, then maybe
it would help to split it out.


Well, to the extent that branch shortening and scheduling are just
optimizations, having the lock_length attribute is just an optimization.

Well, we could split it anyway, and give ports without the need for
multiple length attributes the benefit of the optimistic algorithm.

I have attached a patch that implements this.

I first meant to test it for sh-elf, alas, PR54938 currently makes this
impossible.
So I used arm-eabi instead.  regression tests look OK, but libgcc.a and
libc.a (newlib) are unchanged in size, while libdtfc++.a increaes by twelve
bytes; more specifically, the size increase happens for the object file
debug.o, which goes from 11028 to 11040 bytes.
Likewise, cris-elf and mipsel-elf show a small increase in size of debug.o,
with the rest of libstdc++.a unchanged in size.

Generating and comparing the arm intermediate files debug.s shows a suspicious
incidence of pathnames.  Using a longer build directory pathname makes no
difference, though.
OTOH, using a longer source directory pathname does.  So that leaves...
makes no difference for building these libraries.
It should stop endless looping in shorten_branches, though.  Albeit that
is only releant for ports that do have some negative shorten_branch feedback.
2012-10-16  J"orn Rennecke  

* final.c (shorten_branches): When optimizing, start with small
length and increase from there, and don't decrease lengths.
Don't overwrite non-delay slot insn lengths with the lengths of
delay slot insns with same uid.

Index: final.c
===
--- final.c (revision 192491)
+++ final.c (working copy)
@@ -1067,6 +1067,8 @@ shorten_branches (rtx first ATTRIBUTE_UN
 #endif /* CASE_VECTOR_SHORTEN_MODE */
 
   /* Compute initial lengths, addresses, and varying flags for each insn.  */
+  int (*length_fun) (rtx) = optimize ? insn_min_length : insn_default_length;
+
   for (insn_current_address = 0, insn = first;
insn != 0;
insn_current_address += insn_lengths[uid], insn = NEXT_INSN (in

Re: [PATCH][google] Add warning flags for clang compatibility

2012-10-16 Thread Diego Novillo
On Tue, Oct 16, 2012 at 3:19 PM, Delesley Hutchins  wrote:
> This patch adds the clang thread safety warning flags as recognized
> warning flags that do nothing, for command-line compatibility with
> clang.  Enclosed patch is for branches/google/gcc-4_6, I have
> identical patches for google/gcc-4_7 and google/main.

OK for all google/ branches.  You still need to remove the old thread
safety code, right?


Diego.


Re: [PATCH][google] Add warning flags for clang compatibility

2012-10-16 Thread Delesley Hutchins
Yes, but that won't happen for a while yet.

On Tue, Oct 16, 2012 at 12:43 PM, Diego Novillo  wrote:
> On Tue, Oct 16, 2012 at 3:19 PM, Delesley Hutchins  
> wrote:
>> This patch adds the clang thread safety warning flags as recognized
>> warning flags that do nothing, for command-line compatibility with
>> clang.  Enclosed patch is for branches/google/gcc-4_6, I have
>> identical patches for google/gcc-4_7 and google/main.
>
> OK for all google/ branches.  You still need to remove the old thread
> safety code, right?
>
>
> Diego.



-- 
DeLesley Hutchins | Software Engineer | deles...@google.com | 505-206-0315


Re: New unordered containers debug check

2012-10-16 Thread François Dumont

Attached patch applied.

2012-10-16  François Dumont  

* include/debug/formatter.h (_Debug_msg_id): Add
__msg_bucket_index_oob.
* include/debug/macros.h (__glibcxx_check_bucket_index): New.
* include/debug/unordered_set (unordered_set<>::begin(size_type)):
Add check on bucket index.
(unordered_set<>::begin(size_type) const): Likewise.
(unordered_set<>::cbegin(size_type) const): Likewise.
(unordered_set<>::end(size_type)): Likewise.
(unordered_set<>::end(size_type) const): Likewise.
(unordered_set<>::cend(size_type) const): Likewise.
(unordered_set<>::bucket_size(size_type)): Likewise.
(unordered_multiset<>::begin(size_type)): Likewise.
(unordered_multiset<>::begin(size_type) const): Likewise.
(unordered_multiset<>::cbegin(size_type) const): Likewise.
(unordered_multiset<>::end(size_type)): Likewise.
(unordered_multiset<>::end(size_type) const): Likewise.
(unordered_multiset<>::cend(size_type) const): Likewise.
(unordered_multiset<>::bucket_size(size_type)): Likewise.
* include/debug/unordered_map (unordered_map<>::begin(size_type)):
Likewise.
(unordered_map<>::begin(size_type) const): Likewise.
(unordered_map<>::cbegin(size_type) const): Likewise.
(unordered_map<>::end(size_type)): Likewise.
(unordered_map<>::end(size_type) const): Likewise.
(unordered_map<>::cend(size_type) const): Likewise.
(unordered_map<>::bucket_size(size_type)): Likewise.
(unordered_multimap<>::begin(size_type)): Likewise.
(unordered_multimap<>::begin(size_type) const): Likewise.
(unordered_multimap<>::cbegin(size_type) const): Likewise.
(unordered_multimap<>::end(size_type)): Likewise.
(unordered_multimap<>::end(size_type) const): Likewise.
(unordered_multimap<>::cend(size_type) const): Likewise.
(unordered_multimap<>::bucket_size(size_type)): Likewise.
* testsuite/23_containers/unordered_map/debug/bucket_size_neg.cc:
New.
* testsuite/23_containers/unordered_map/debug/begin1_neg.cc: New.
* testsuite/23_containers/unordered_map/debug/begin2_neg.cc: New.
* testsuite/23_containers/unordered_map/debug/cbegin_neg.cc: New.
* testsuite/23_containers/unordered_map/debug/end1_neg.cc: New.
* testsuite/23_containers/unordered_map/debug/end2_neg.cc: New.
* testsuite/23_containers/unordered_map/debug/cend_neg.cc: New.
* testsuite/23_containers/unordered_multimap/debug/bucket_size_neg.cc:
New.
* testsuite/23_containers/unordered_multimap/debug/begin1_neg.cc: New.
* testsuite/23_containers/unordered_multimap/debug/begin2_neg.cc: New.
* testsuite/23_containers/unordered_multimap/debug/cbegin_neg.cc: New.
* testsuite/23_containers/unordered_multimap/debug/end1_neg.cc: New.
* testsuite/23_containers/unordered_multimap/debug/end2_neg.cc: New.
* testsuite/23_containers/unordered_multimap/debug/cend_neg.cc: New.
* testsuite/23_containers/unordered_set/debug/bucket_size_neg.cc:
New.
* testsuite/23_containers/unordered_set/debug/begin1_neg.cc: New.
* testsuite/23_containers/unordered_set/debug/begin2_neg.cc: New.
* testsuite/23_containers/unordered_set/debug/cbegin_neg.cc: New.
* testsuite/23_containers/unordered_set/debug/end1_neg.cc: New.
* testsuite/23_containers/unordered_set/debug/end2_neg.cc: New.
* testsuite/23_containers/unordered_set/debug/cend_neg.cc: New.
* testsuite/23_containers/unordered_multiset/debug/bucket_size_neg.cc:
New.
* testsuite/23_containers/unordered_multiset/debug/begin1_neg.cc: New.
* testsuite/23_containers/unordered_multiset/debug/begin2_neg.cc: New.
* testsuite/23_containers/unordered_multiset/debug/cbegin_neg.cc: New.
* testsuite/23_containers/unordered_multiset/debug/end1_neg.cc: New.
* testsuite/23_containers/unordered_multiset/debug/end2_neg.cc: New.
* testsuite/23_containers/unordered_multiset/debug/cend_neg.cc: New.

François

On 10/16/2012 12:26 AM, Paolo Carlini wrote:

Hi,

On 10/15/2012 11:11 PM, François Dumont wrote:

OK to commit ?

Looks good. In the debug.cc change I see a stray comma, though.

Thanks!
Paolo.



Index: src/c++11/debug.cc
===
--- src/c++11/debug.cc	(revision 192510)
+++ src/c++11/debug.cc	(working copy)
@@ -178,7 +178,9 @@
 "attempt to compare local iterators from different unordered container"
 " buckets",
 "function requires a non-empty iterator range [%1.name;, %2.name;)",
-"attempt to self move assign"
+"attempt to self move assign",
+"attempt to access container with out-of-bounds bucket index %2;,"
+" container only holds %3; buckets"
   };
 
   void
Index: include/debug/unordered_map
===
--- include/debug/unordered_map	(revision 192510)
+++ include/debug/unordered_map	(working copy)
@@ -183,28 +183,53 @@
   // local versions
   local_iterator
   begin(size_type __b)

Re: [PATCH][google] Add warning flags for clang compatibility

2012-10-16 Thread Delesley Hutchins
Committed.
google/gcc-4_6: r192510.
google/gcc-4_7: r192511.
google/main: r192513.

On Tue, Oct 16, 2012 at 12:44 PM, Delesley Hutchins  wrote:
> Yes, but that won't happen for a while yet.
>
> On Tue, Oct 16, 2012 at 12:43 PM, Diego Novillo  wrote:
>> On Tue, Oct 16, 2012 at 3:19 PM, Delesley Hutchins  
>> wrote:
>>> This patch adds the clang thread safety warning flags as recognized
>>> warning flags that do nothing, for command-line compatibility with
>>> clang.  Enclosed patch is for branches/google/gcc-4_6, I have
>>> identical patches for google/gcc-4_7 and google/main.
>>
>> OK for all google/ branches.  You still need to remove the old thread
>> safety code, right?
>>
>>
>> Diego.
>
>
>
> --
> DeLesley Hutchins | Software Engineer | deles...@google.com | 505-206-0315



-- 
DeLesley Hutchins | Software Engineer | deles...@google.com | 505-206-0315


Re: [asan] WIP protection of globals

2012-10-16 Thread Marek Polacek
On Tue, Oct 16, 2012 at 04:58:48PM +0200, Jakub Jelinek wrote:
> @@ -2022,6 +2036,12 @@ assemble_variable (tree decl, int top_le
>if (DECL_ALIGN (decl) > BITS_PER_UNIT)
>   ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (DECL_ALIGN_UNIT (decl)));
>assemble_variable_contents (decl, name, dont_output_data);
> +  if (asan_protected)
> + {
> +   unsigned HOST_WIDE_INT int size
> + = tree_low_cst (DECL_SIZE_UNIT (decl), 1);

Shouldn't this be only HOST_WIDE_INT, without following int?

Marek


Re: [Patch, Fortran, OOP] PR 54881: [4.8 Regression] [OOP] ICE in fold_convert_loc, at fold-const.c:2016

2012-10-16 Thread Janus Weil
ping!


2012/10/11 Janus Weil :
> Hi all,
>
> here is an OOP patch for the above PR, which has two disconnected parts:
>
> 1) It fixes a problem with ASSOCIATED, when it is fed a CLASS-valued
> function as argument (i.e. the ICE in the bug title). This is the
> trans-intrinsic part of the patch. Instead of adding the _data
> component to the expr first and translating then, we now translate
> first and then add the _data component.
>
> 2) It fixes an error with SELECT TYPE (which is a 4.8 regression), by
> respecting the POINTER argument of the selector when building the
> temporaries for the select type branches. This is the match.c part of
> the patch, which looks much more complicated than it is, because I
> merged two functions into one, which do essentially the same. I think
> they were at some point split up by Paul, but I see no advantage this,
> to be honest.
>
> Regtested on x86_64-unknown-linux-gnu. Ok for trunk?
>
> Cheers,
> Janus
>
>
> 2012-10-11  Janus Weil  
>
> PR fortran/54881
> * match.c (select_derived_set_tmp,select_class_set_tmp): Removed and
> unified into ...
> (select_type_set_tmp): ... this one. Set POINTER argument according to
> selector.
> * trans-intrinsic.c (gfc_conv_associated): Use 'gfc_class_data_get'
> instead of 'gfc_add_data_component'.
>
> 2012-10-11  Janus Weil  
>
> PR fortran/54881
> * gfortran.dg/associated_6.f90: New.
> * gfortran.dg/select_type_30.f03: New.


Re: [Patch, Fortran, OOP] PR 54881: [4.8 Regression] [OOP] ICE in fold_convert_loc, at fold-const.c:2016

2012-10-16 Thread Dominique Dhumieres
Janus,

Your patch works as advertised without disturbing my pet bugs.
Just a nit pick: the double parentheses in

+  if ((CLASS_DATA (select_type_stack->selector)->attr.dimension
  || CLASS_DATA (select_type_stack->selector)->attr.codimension))

do not seem necessary.

Note for Paul: I had to adjust the patch in order to make it compatible
with the "unlimited polymorphism" patch at
http://gcc.gnu.org/ml/fortran/2012-07/msg00130.html

Dominique


Re: [asan] Protection of stack vars (take 2)

2012-10-16 Thread Xinliang David Li
Looks good except for the following:

1) I am not sure if the stack slot sharing is handled correctly. If I
read the code correctly, the redzone var will be only created for the
representative variable in a partition -- will this lead to false
negatives? As I asked before, should stack slot sharing even turned
on?
2) Performance tuning -- it is probably better to skip those variables
that are compiler generated -- is there any point guarding them?

thanks,

David

On Tue, Oct 16, 2012 at 3:31 AM, Jakub Jelinek  wrote:
> Hi!
>
> Here is an updated patch, which emits also the shadow clearing sequence
> at the end of function in the right spot.
>
> 2012-10-16  Jakub Jelinek  
>
> * Makefile.in (asan.o): Depend on $(EXPR_H) $(OPTABS_H).
> (cfgexpand.o): Depend on asan.h.
> * asan.c: Include expr.h and optabs.h.
> (asan_shadow_set): New variable.
> (asan_shadow_cst, asan_emit_stack_protection): New functions.
> (asan_init_shadow_ptr_types): Initialize also asan_shadow_set.
> * cfgexpand.c: Include asan.h.  Define HOST_WIDE_INT heap vector.
> (struct stack_vars_data): New type.
> (expand_stack_vars): Add DATA argument.  Fill it in and add needed
> padding in between variables if -fasan.
> (defer_stack_allocation): Defer everything for flag_asan.
> (expand_used_vars): Return var destruction sequence.  Call
> asan_emit_stack_protection if expand_stack_vars added anything
> to the vectors.
> (expand_gimple_basic_block): Add disable_tail_calls argument.
> (gimple_expand_cfg): Pass true to it if expand_used_vars returned
> non-NULL.  Emit the sequence returned by expand_used_vars after
> return_label.
> * asan.h (asan_emit_stack_protection): New prototype.
> (asan_shadow_set): New decl.
> (ASAN_RED_ZONE_SIZE, ASAN_STACK_MAGIC_LEFT, ASAN_STACK_MAGIC_MIDDLE,
> ASAN_STACK_MAGIC_RIGHT, ASAN_STACK_FRAME_MAGIC): Define.
> * toplev.c (process_options): Also disable -fasan on
> !FRAME_GROWS_DOWNWARDS targets.
>
> --- gcc/Makefile.in.jj  2012-10-12 23:30:31.635325772 +0200
> +++ gcc/Makefile.in 2012-10-15 09:40:40.287706610 +0200
> @@ -2204,7 +2204,7 @@ stor-layout.o : stor-layout.c $(CONFIG_H
>  asan.o : asan.c asan.h $(CONFIG_H) pointer-set.h \
> $(SYSTEM_H) $(TREE_H) $(GIMPLE_H) \
> output.h $(DIAGNOSTIC_H) coretypes.h $(TREE_DUMP_H) $(FLAGS_H) \
> -   tree-pretty-print.h $(TARGET_H)
> +   tree-pretty-print.h $(TARGET_H) $(EXPR_H) $(OPTABS_H)
>  tree-ssa-tail-merge.o: tree-ssa-tail-merge.c \
> $(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(BITMAP_H) \
> $(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) \
> @@ -3074,7 +3074,7 @@ cfgexpand.o : cfgexpand.c $(TREE_FLOW_H)
> $(DIAGNOSTIC_H) toplev.h $(DIAGNOSTIC_CORE_H) $(BASIC_BLOCK_H) $(FLAGS_H) 
> debug.h $(PARAMS_H) \
> value-prof.h $(TREE_INLINE_H) $(TARGET_H) $(SSAEXPAND_H) $(REGS_H) \
> $(GIMPLE_PRETTY_PRINT_H) $(BITMAP_H) sbitmap.h \
> -   $(INSN_ATTR_H) $(CFGLOOP_H)
> +   $(INSN_ATTR_H) $(CFGLOOP_H) asan.h
>  cfgrtl.o : cfgrtl.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) 
> $(RTL_ERROR_H) \
> $(FLAGS_H) insn-config.h $(BASIC_BLOCK_H) $(REGS_H) hard-reg-set.h \
> $(FUNCTION_H) $(EXCEPT_H) $(TM_P_H) $(INSN_ATTR_H) \
> --- gcc/toplev.c.jj 2012-10-12 23:30:31.717325313 +0200
> +++ gcc/toplev.c2012-10-15 09:40:03.540923935 +0200
> @@ -1544,7 +1544,9 @@ process_options (void)
>  }
>
>/* Address Sanitizer needs porting to each target architecture.  */
> -  if (flag_asan && targetm.asan_shadow_offset == NULL)
> +  if (flag_asan
> +  && (targetm.asan_shadow_offset == NULL
> + || !FRAME_GROWS_DOWNWARD))
>  {
>warning (0, "-fasan not supported for this target");
>flag_asan = 0;
> --- gcc/asan.c.jj   2012-10-12 23:31:55.423856689 +0200
> +++ gcc/asan.c  2012-10-16 12:18:41.414029844 +0200
> @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.
>  #include "asan.h"
>  #include "gimple-pretty-print.h"
>  #include "target.h"
> +#include "expr.h"
> +#include "optabs.h"
>
>  /*
>   AddressSanitizer finds out-of-bounds and use-after-free bugs
> @@ -79,10 +81,195 @@ along with GCC; see the file COPYING3.
>   to create redzones for stack and global object and poison them.
>  */
>
> +alias_set_type asan_shadow_set = -1;
> +
>  /* Pointer types to 1 resp. 2 byte integers in shadow memory.  A separate
> alias set is used for all shadow memory accesses.  */
>  static GTY(()) tree shadow_ptr_types[2];
>
> +/* Return a CONST_INT representing 4 subsequent shadow memory bytes.  */
> +
> +static rtx
> +asan_shadow_cst (unsigned char shadow_bytes[4])
> +{
> +  int i;
> +  unsigned HOST_WIDE_INT val = 0;
> +  gcc_assert (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
> +  for (i = 0; i < 4; i++)
> +val |= (unsigned HOST_WIDE_INT) shadow_bytes[BYTES_BIG_ENDIAN ? 3 - i : 
> i]
> +  << (BITS_PER_UNIT * i);
> +  return GE

Re: [lra] patch to revert a code from previous patch.

2012-10-16 Thread Richard Sandiford
Vladimir Makarov  writes:
> On 10/16/2012 02:44 AM, Richard Sandiford wrote:
>> Vladimir Makarov  writes:
>>> On 12-10-15 4:25 PM, Richard Sandiford wrote:
 Richard Sandiford  writes:
> Vladimir Makarov  writes:
>>  After committing a patch yesterday to implement proposals from a
>> review, I found that GCC crashes on SPEC2000 gap.  LRA is trying to find
>> a mode of operand (const_int 1) in *lea_general_1 insn and can not find
>> it as the operand and insn template operand has VOIDmode.
>>
>> There are still cases when context lookup is necessary to find a mode of
>> the operand.  So I am reversing the change I did yesterday.
>>
>> The patch is committed as rev. 192462.
>>
>> 2012-10-15  Vladimir Makarov  
>>
>>* lra-int.h (lra_get_mode): Remove.
>>* lra-constraints.c (find_mode, get_op_mode): New functions.
>>(match_reload): Use get_op_mode instead of lra_get_mode.
>>(process_alt_operands, curr_insn_transform): Ditto.
> But my objection to this code still stands.  It's wrong to assume
> that an operand to an rtx has the same mode as the containing rtx.
>
> Please add a testcase that shows the problem.
 (...because I was hoping to have a look myself).  But if that's too
 difficult to reduce, then which operand to *lea_general_1 was the problem?
 The pattern looks like:

 (define_insn_and_split "*lea_general_1"
 [(set (match_operand 0 "register_operand" "=r")
(plus (plus (match_operand 1 "index_register_operand" "l")
(match_operand 2 "register_operand" "r"))
  (match_operand 3 "immediate_operand" "i")))]

 So operands 0, 1 and 2 should have been registers.  Operand 3 never
 needs reloading, so its mode shouldn't matter.

>>> In this case the const needs a reload as it was a pseudo substituted by
>>> equiv constant.
>> But in that case the operand modes we needed were there in the original
>> instruction operands.  If equiv substitution is causing us to lose track
>> of those operand modes, then that needs to be fixed.
>>
>> If the pattern had instead been:
>>
>>(set (match_operand:SI 0 "register_operand" "=r")
>> (unspec:SI [(match_operand 1 "register_operand" "r")] UNSPEC_FOO))
>>
>> and equiv substitution replaced operand 1 with a const_int without
>> the original mode being recorded anywhere, then we'd have no way
>> of recovering that mode from the pattern itself, because the modes
>> of unspec parameters are entirely target-specific.
>>
> I remember I thought about this.  The case is rare, it is better to 
> calculate than keep mode for each operand of each insn.  LRA speed is 
> achieved by minimizations of keeping data for each operand.  We keep 
> only locations. I believe it is a better approach.

I don't think we need to keep a _global_ tab on the operand modes.  The nice
thing about representing intermediate steps in rtl is that the substituted,
unreloaded operands only appear temporarily during curr_insn_transform.
Once that function has finished reloading the instruction, the instruction's
operands have the right form again.

This is the kind of thing I had in mind.  Tested on x86_64-linux-gnu
({,-m32}), no regressions.  Does it look OK?  And does it work with
the SPEC testcase?

I realise you probably have patches pending as well, so I'm happy to
wait until those have gone in and update.

Richard


gcc/
* lra-int.h (lra_operand_data): Add is_address field.
* lra.c (debug_operand_data): Initialize is_address field.
(get_static_insn_data): Likewise.
(setup_operand_alternative): Record is_address in operand data.
(lra_set_insn_recog_data): Initialize is_address field.
* lra-constraints.c (curr_operand_mode): New array.
(init_curr_operand_mode): New function.
(find_mode, get_op_mode): Delete.
(match_reload): Use curr_operand_mode rather than get_op_mode.
(process_alt_operands): Likewise.  Remove VOIDmode check from
CONST_OK_POOL_P check.
(curr_insn_transform): Likewise.  Swap the operand modes when
swapping the operands.
(lra_constraints): Call init_curr_operand_mode.

Index: gcc/lra-int.h
===
--- gcc/lra-int.h   2012-10-16 19:32:48.0 +0100
+++ gcc/lra-int.h   2012-10-16 20:35:53.272728031 +0100
@@ -157,6 +157,8 @@ struct lra_operand_data
  This field is set up every time when corresponding
  operand_alternative in lra_static_insn_data is set up.  */
   unsigned int early_clobber : 1;
+  /* True if the operand is an address.  */
+  unsigned int is_address : 1;
 };
 
 /* Info about register in an insn.  */
Index: gcc/lra.c
===
--- gcc/lra.c   2012-10-16 19:32:48.0 +0100
+++ gcc/lra.c   2012-10-16 20

Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Eric Botcazou
> Yes, that is true.  But when you say "if this trend continues" you are
> making a slippery slope argument that I don't think applies.  To date,
> the trend consists of a single example.  We are discussing adding a
> second example, and we may decide against it.  There are no current
> prospects of a third example.

You can hardly counter a slippery slope argument by saying "we're adding one, 
but, for the time being, we won't add more" since that's precisely what is 
denounced: you will say it for every new addition and this will never end.

IMHO we need to decide what to do in this case once for all.

-- 
Eric Botcazou


Re: [asan] WIP protection of globals

2012-10-16 Thread Xinliang David Li
On Tue, Oct 16, 2012 at 7:58 AM, Jakub Jelinek  wrote:
> Hi!
>
> This is a WIP patch for globals protection.
> I'm not filling names yet and has_dynamic_init is always
> false (wonder how to figure it has_dynamic_init out, especially
> with LTO, TYPE_ADDRESSABLE (TREE_TYPE (decl)) probably isn't it,
> and for more I'm afraid we need a langhook).
>
> --- gcc/varasm.c.jj 2012-10-11 19:10:39.0 +0200
> +++ gcc/varasm.c2012-10-16 15:40:37.075662625 +0200
> @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-mudflap.h"
>  #include "cgraph.h"
>  #include "pointer-set.h"
> +#include "asan.h"
>
>  #ifdef XCOFF_DEBUGGING_INFO
>  #include "xcoffout.h"  /* Needed for external data
> @@ -1831,6 +1832,9 @@ assemble_noswitch_variable (tree decl, c
>size = tree_low_cst (DECL_SIZE_UNIT (decl), 1);
>rounded = size;
>
> +  if (flag_asan && asan_protect_global (decl))
> +size += asan_red_zone_size (size);
> +
>/* Don't allocate zero bytes of common,
>   since that means "undefined external" in the linker.  */
>if (size == 0)
> @@ -1897,6 +1901,7 @@ assemble_variable (tree decl, int top_le
>const char *name;
>rtx decl_rtl, symbol;
>section *sect;
> +  bool asan_protected = false;
>
>/* This function is supposed to handle VARIABLES.  Ensure we have one.  */
>gcc_assert (TREE_CODE (decl) == VAR_DECL);
> @@ -1984,6 +1989,15 @@ assemble_variable (tree decl, int top_le
>/* Compute the alignment of this data.  */
>
>align_variable (decl, dont_output_data);
> +
> +  if (flag_asan
> +  && asan_protect_global (decl)
> +  && DECL_ALIGN (decl) < ASAN_RED_ZONE_SIZE * BITS_PER_UNIT)
> +{
> +  asan_protected = true;
> +  DECL_ALIGN (decl) = ASAN_RED_ZONE_SIZE * BITS_PER_UNIT;
> +}
> +
>set_mem_align (decl_rtl, DECL_ALIGN (decl));
>
>if (TREE_PUBLIC (decl))
> @@ -2022,6 +2036,12 @@ assemble_variable (tree decl, int top_le
>if (DECL_ALIGN (decl) > BITS_PER_UNIT)
> ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (DECL_ALIGN_UNIT (decl)));
>assemble_variable_contents (decl, name, dont_output_data);
> +  if (asan_protected)
> +   {
> + unsigned HOST_WIDE_INT int size
> +   = tree_low_cst (DECL_SIZE_UNIT (decl), 1);
> + assemble_zeros (asan_red_zone_size (size));
> +   }
>  }
>  }
>
> @@ -6926,6 +6946,8 @@ place_block_symbol (rtx symbol)
>decl = SYMBOL_REF_DECL (symbol);
>alignment = DECL_ALIGN (decl);
>size = tree_low_cst (DECL_SIZE_UNIT (decl), 1);
> +  if (flag_asan && asan_protect_global (decl))
> +   size += asan_red_zone_size (size);
>  }
>
>/* Calculate the object's offset from the start of the block.  */
> --- gcc/Makefile.in.jj  2012-10-15 09:40:40.0 +0200
> +++ gcc/Makefile.in 2012-10-16 16:54:12.463712014 +0200
> @@ -2712,7 +2712,7 @@ varasm.o : varasm.c $(CONFIG_H) $(SYSTEM
> output.h $(DIAGNOSTIC_CORE_H) xcoffout.h debug.h $(GGC_H) $(TM_P_H) \
> $(HASHTAB_H) $(TARGET_H) langhooks.h gt-varasm.h $(BASIC_BLOCK_H) \
> $(CGRAPH_H) $(TARGET_DEF_H) tree-mudflap.h \
> -   pointer-set.h $(COMMON_TARGET_H)
> +   pointer-set.h $(COMMON_TARGET_H) asan.h
>  function.o : function.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) 
> $(RTL_ERROR_H) \
> $(TREE_H) $(GIMPLE_H) $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) \
> $(OPTABS_H) $(LIBFUNCS_H) $(REGS_H) hard-reg-set.h insn-config.h 
> $(RECOG_H) \
> --- gcc/asan.c.jj   2012-10-16 12:18:41.0 +0200
> +++ gcc/asan.c  2012-10-16 16:52:24.266434151 +0200
> @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.
>  #include "target.h"
>  #include "expr.h"
>  #include "optabs.h"
> +#include "output.h"
>
>  /*
>   AddressSanitizer finds out-of-bounds and use-after-free bugs
> @@ -270,6 +271,48 @@ asan_emit_stack_protection (rtx base, HO
>return ret;
>  }
>
> +/* Return true if DECL is a VAR_DECL that should be protected
> +   by Address Sanitizer, by appending a red zone with protected
> +   shadow memory after it and aligning it to at least
> +   ASAN_RED_ZONE_SIZE bytes.  */
> +
> +bool
> +asan_protect_global (tree decl)
> +{
> +  rtx rtl, symbol;
> +  section *sect;
> +
> +  if (TREE_CODE (decl) != VAR_DECL
> +  || DECL_THREAD_LOCAL_P (decl)
> +  || DECL_EXTERNAL (decl)
> +  || !TREE_ASM_WRITTEN (decl)
> +  || !DECL_RTL_SET_P (decl)
> +  || DECL_ONE_ONLY (decl)
> +  || DECL_COMMON (decl)

Why the above two condition? If the linker picks the larger size one,
it is ok to do the instrumentation.


> +  || (DECL_SECTION_NAME (decl) != NULL_TREE
> + && !DECL_HAS_IMPLICIT_SECTION_NAME_P (decl))

Why is this condition? Is it related to -fdata-sections ?

> +  || DECL_SIZE (decl) == 0
> +  || ASAN_RED_ZONE_SIZE * BITS_PER_UNIT > MAX_OFILE_ALIGNMENT
> +  || !valid_constant_size_p (DECL_SIZE_UNIT (decl))
> +  || DECL_ALIGN_UNIT (decl) > 2 * ASAN_RED_ZONE_SIZE)
> +return false;
> +
> +  rtl =

Re: [asan] Protection of stack vars (take 2)

2012-10-16 Thread Jakub Jelinek
On Tue, Oct 16, 2012 at 01:56:46PM -0700, Xinliang David Li wrote:
> Looks good except for the following:
> 
> 1) I am not sure if the stack slot sharing is handled correctly. If I
> read the code correctly, the redzone var will be only created for the
> representative variable in a partition -- will this lead to false
> negatives? As I asked before, should stack slot sharing even turned
> on?

I thought we don't merge stack slots if the vars have different size,
apparently I've been wrong about that.  So I'll make sure those aren't
shared with flag_asan.  When the size is the same, there should be no false
negatives.

> 2) Performance tuning -- it is probably better to skip those variables
> that are compiler generated -- is there any point guarding them?

In my (admittedly very limited) testing only at -O0 some compiler generated
vars (SSA_NAMEs) got guards.  The trouble here is -fasan -fstack-protector
combination.  I wonder if there can be any DECL_ARTIFICIAL (or non-VAR_DECL/
RESULT_DECL) decls for which stack_protect_decl_phase returns 1 or 2.
If it is unlikely, the best might be to protect all phase 1 and 2 vars and
if flag_asan call expand_stack_vars once more for partitions where any of
the protected vars are user vars, and finally the current expand_stack_vars
(NULL) which would not do any asan protection.

Jakub


Re: [asan] WIP protection of globals

2012-10-16 Thread Jakub Jelinek
On Tue, Oct 16, 2012 at 02:41:42PM -0700, Xinliang David Li wrote:
> > +bool
> > +asan_protect_global (tree decl)
> > +{
> > +  rtx rtl, symbol;
> > +  section *sect;
> > +
> > +  if (TREE_CODE (decl) != VAR_DECL
> > +  || DECL_THREAD_LOCAL_P (decl)
> > +  || DECL_EXTERNAL (decl)
> > +  || !TREE_ASM_WRITTEN (decl)
> > +  || !DECL_RTL_SET_P (decl)
> > +  || DECL_ONE_ONLY (decl)
> > +  || DECL_COMMON (decl)
> 
> Why the above two condition? If the linker picks the larger size one,
> it is ok to do the instrumentation.

For DECL_ONE_ONLY, what LLVM does is plain wrong, it puts address of
even vars exported from shared libraries (which can be overridden) into
the array passed to __asan_*register_globals.  That can't work, you don't
know if the var that is found first was compiled with -fasan or not.
We need to use a local alias in that case (yeah, my WIP patch doesn't do
that yet, but I wanted to post at least something), and I believe local
aliases might be an issue with DECL_ONE_ONLY (and anyway, if a -fno-asan
DECL_ONE_ONLY var wins over -fasan one, there is no padding after it).

LLVM does other things wrong, it increases the size of the vars which is
IMHO a big no no, say for copy relocs etc., and last I'd say the relocations
in the description variable are unnecessary, would be better if it e.g. used
PC relative relocations and could made the array passed to
__asan_*register_globals read-only.

And for DECL_COMMON, you can't put any padding after a common variable
without making the size of the common var larger (and increasing its
alignment), both are undesirable for -fasan/-fno-asan mixing.

> > +  || (DECL_SECTION_NAME (decl) != NULL_TREE
> > + && !DECL_HAS_IMPLICIT_SECTION_NAME_P (decl))
> 
> Why is this condition? Is it related to -fdata-sections ?

-fdata-sections will have non-NULL DECL_SECTION_NAME, but still
DECL_HAS_IMPLICIT_SECTION_NAME_P.  The above condition is not to break
various packages that put say a struct into some user section and expect
the section then to contain an array of those structs.
E.g. Linux kernel does this, systemtap probes, prelink, ...
If padding is inserted, all those would break.

Jakub


[PATCH] TARGET_ support, was [PATCH] Rs6000 infrastructure cleanup

2012-10-16 Thread Michael Meissner
The powerpc port is running out of switches that set bits in target_flags, and
I am in the process of changing all of the switches to use a secondary flag
word that is HOST_WIDE_INT (at least 64-bits on powerpc).  One of the changes
is that the options machinery changes all of the TARGET_ names to
OPTION_.  Joseph and I have been talking about going back to using
TARGET_ all of the time.

I did some digging, and there are only two ports that use this:
  1) The x86 had a similar problem some time ago; (and)
  2) All Android linux ports uses it for setting a single bit to avoid adding
 any bits to the target_flags in the target machine.

I made a set of changes to make this change.  I have bootstrapped the x86
compiler with no problems, and before I go home tonight, I will kick off a
build/make compare of an unmodified tree.  Assuming there are no regressions
found, are these patches ok to check in?

Now, the x86 actually maps the OPTION_ISA_xxx switches to TARGET_xxx switches,
so it is easy to change all of the defines from OPTION_ISA_ to
TARGET_ISA_.  For Android, it was simple to change the one line of
reference.

2012-10-16  Michael Meissner  

* opth-gen.awk (TARGET_* generation): Always generate TARGET_
for Mask options, whether they use Var(...) or not.

* config/linux-android.h (ANDROID_TARGET_OS_CPP_BUILTINS): Use
TARGET_ instead of OPTION_.
* config/i386/i386.h (TARGET_64BIT): Likewise.
(TARGET_MMX): Likewise.
(TARGET_3DNOW): Likewise.
(TARGET_3DNOW_A): Likewise.
(TARGET_SSE): Likewise.
(TARGET_SSE2): Likewise.
(TARGET_SSE3): Likewise.
(TARGET_SSSE3): Likewise.
(TARGET_SSE4_1): Likewise.
(TARGET_SSE4_2): Likewise.
(TARGET_AVX): Likewise.
(TARGET_AVX2): Likewise.
(TARGET_FMA): Likewise.
(TARGET_SSE4A): Likewise.
(TARGET_FMA4): Likewise.
(TARGET_XOP): Likewise.
(TARGET_LWP): Likewise.
(TARGET_ROUND): Likewise.
(TARGET_ABM): Likewise.
(TARGET_BMI): Likewise.
(TARGET_BMI2): Likewise.
(TARGET_LZCNT): Likewise.
(TARGET_TBM): Likewise.
(TARGET_POPCNT): Likewise.
(TARGET_SAHF): Likewise.
(TARGET_MOVBE): Likewise.
(TARGET_CRC32): Likewise.
(TARGET_AES): Likewise.
(TARGET_PCLMUL): Likewise.
(TARGET_CMPXCHG16B): Likewise.
(TARGET_FSGSBASE): Likewise.
(TARGET_RDRND): Likewise.
(TARGET_F16C): Likewise.
(TARGET_RTM ): Likewise.
(TARGET_HLE): Likewise.
(TARGET_RDSEED): Likewise.
(TARGET_PRFCHW): Likewise.
(TARGET_ADX): Likewise.
(TARGET_64BIT): Likewise.
(TARGET_MMX): Likewise.
(TARGET_3DNOW): Likewise.
(TARGET_3DNOW_A): Likewise.
(TARGET_SSE): Likewise.
(TARGET_SSE2): Likewise.
(TARGET_SSE3): Likewise.
(TARGET_SSSE3): Likewise.
(TARGET_SSE4_1): Likewise.
(TARGET_SSE4_2): Likewise.
(TARGET_AVX): Likewise.
(TARGET_AVX2): Likewise.
(TARGET_FMA): Likewise.
(TARGET_SSE4A): Likewise.
(TARGET_FMA4): Likewise.
(TARGET_XOP): Likewise.
(TARGET_LWP): Likewise.
(TARGET_ROUND): Likewise.
(TARGET_ABM): Likewise.
(TARGET_BMI): Likewise.
(TARGET_BMI2): Likewise.
(TARGET_LZCNT): Likewise.
(TARGET_TBM): Likewise.
(TARGET_POPCNT): Likewise.
(TARGET_SAHF): Likewise.
(TARGET_MOVBE): Likewise.
(TARGET_CRC32): Likewise.
(TARGET_AES): Likewise.
(TARGET_PCLMUL): Likewise.
(TARGET_CMPXCHG16B): Likewise.
(TARGET_FSGSBASE): Likewise.
(TARGET_RDRND): Likewise.
(TARGET_F16C): Likewise.
(TARGET_RTM): Likewise.
(TARGET_HLE): Likewise.
(TARGET_RDSEED): Likewise.
(TARGET_PRFCHW): Likewise.
(TARGET_ADX): Likewise.
(TARGET_LP64): Likewise.
(TARGET_X32): Likewise.
(TARGET_ISA_ROUND): Likewise.
* config/i386/darwin.h (TARGET_64BIT): Likewise.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Index: gcc/opth-gen.awk
===
--- gcc/opth-gen.awk(revision 192509)
+++ gcc/opth-gen.awk(working copy)
@@ -375,15 +375,13 @@ for (i = 0; i < n_opts; i++) {
if (name != "" && mask_macros[name] == 0) {
mask_macros[name] = 1
vname = var_name(flags[i])
-   macro = "OPTION_"
mask = "OPTION_MASK_"
if (vname == "") {
vname = "target_flags"
-   macro = "TARGET_"
mask = "MASK_"
extra_mask_macros[name] = 1
}
-   print "#define " macro name \

Re: [wwwdocs] SH 4.8 changes - document thread pointer built-ins

2012-10-16 Thread Gerald Pfeifer
On Tue, 9 Oct 2012, Oleg Endo wrote:
> This documents the new thread pointer built-ins in the SH www changes
> for 4.8.

Thanks, Oleg.

I've got one change and one question:

+Added support for the built-in functions
+__builtin_thread_pointer and
+__builtin_set_thread_pointer.  This assumes that
+GBR is used to hold the thread pointer of the current thread,
+which has been the case since a while already. 

"since a while" -> "for a while", and I made that change.

That said, why is this important, and is there a fixed date or version?

+Memory loads and stores
+relative to the address returned by __builtin_thread_pointer
+will now also utilize GBR based displacement address modes.

Why do these _now_ utilize these address modes, when per the above
__builtin_thread_pointer was just added?  This last sentence implies
a change when there does not seem to be one?

Gerald


Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Ian Lance Taylor
On Tue, Oct 16, 2012 at 2:33 PM, Eric Botcazou  wrote:
>> Yes, that is true.  But when you say "if this trend continues" you are
>> making a slippery slope argument that I don't think applies.  To date,
>> the trend consists of a single example.  We are discussing adding a
>> second example, and we may decide against it.  There are no current
>> prospects of a third example.
>
> You can hardly counter a slippery slope argument by saying "we're adding one,
> but, for the time being, we won't add more" since that's precisely what is
> denounced: you will say it for every new addition and this will never end.

The slippery slope argument is that this will become easier and easier
over time, and that the earlier examples will serve to pave the way
for the later examples.  I don't see any reason to assume that is
true.  There is no slope here.

> IMHO we need to decide what to do in this case once for all.

That is fine with me as long as we acknowledge that the upstream
sources don't care about GCC and will think it is absurd that they
should modify their code to carry untested and unmaintained
GCC-specific annotations.  It would be one thing if the GCC-specific
annotations were clearly better, but in fact I would say that they are
clearly worse.

Ian


Re: [asan] WIP protection of globals

2012-10-16 Thread Xinliang David Li
On Tue, Oct 16, 2012 at 3:03 PM, Jakub Jelinek  wrote:
> On Tue, Oct 16, 2012 at 02:41:42PM -0700, Xinliang David Li wrote:
>> > +bool
>> > +asan_protect_global (tree decl)
>> > +{
>> > +  rtx rtl, symbol;
>> > +  section *sect;
>> > +
>> > +  if (TREE_CODE (decl) != VAR_DECL
>> > +  || DECL_THREAD_LOCAL_P (decl)
>> > +  || DECL_EXTERNAL (decl)
>> > +  || !TREE_ASM_WRITTEN (decl)
>> > +  || !DECL_RTL_SET_P (decl)
>> > +  || DECL_ONE_ONLY (decl)
>> > +  || DECL_COMMON (decl)
>>
>> Why the above two condition? If the linker picks the larger size one,
>> it is ok to do the instrumentation.
>
> For DECL_ONE_ONLY, what LLVM does is plain wrong, it puts address of
> even vars exported from shared libraries (which can be overridden) into
> the array passed to __asan_*register_globals.  That can't work, you don't
> know if the var that is found first was compiled with -fasan or not.
> We need to use a local alias in that case (yeah, my WIP patch doesn't do
> that yet, but I wanted to post at least something), and I believe local

Does that mean that all globals defined in shared libraries can not be
protected as long as they are not protected or hidden? This sounds
like a big limitation.  We need to answer the following two questions:

1) How often are exported variables get preempted?
2) Is it a common use case to mix -fasan and -fno-asan ?

If the answer is no for either of the questions, we should allow the
above at the risk of some possible false positives -- as the goal is
to find as many bugs as possible (without too much noise). In short,
false negatives are worse.


> aliases might be an issue with DECL_ONE_ONLY (and anyway, if a -fno-asan
> DECL_ONE_ONLY var wins over -fasan one, there is no padding after it).
>
> LLVM does other things wrong, it increases the size of the vars which is
> IMHO a big no no, say for copy relocs etc., and last I'd say the relocations
> in the description variable are unnecessary, would be better if it e.g. used
> PC relative relocations and could made the array passed to
> __asan_*register_globals read-only.
>

I like the GCC way better too.


> And for DECL_COMMON, you can't put any padding after a common variable
> without making the size of the common var larger (and increasing its
> alignment), both are undesirable for -fasan/-fno-asan mixing.

If the linker picks the large one (which I believe it does), is that
still a problem?

>
>> > +  || (DECL_SECTION_NAME (decl) != NULL_TREE
>> > + && !DECL_HAS_IMPLICIT_SECTION_NAME_P (decl))
>>
>> Why is this condition? Is it related to -fdata-sections ?
>
> -fdata-sections will have non-NULL DECL_SECTION_NAME, but still
> DECL_HAS_IMPLICIT_SECTION_NAME_P.  The above condition is not to break
> various packages that put say a struct into some user section and expect
> the section then to contain an array of those structs.
> E.g. Linux kernel does this, systemtap probes, prelink, ...
> If padding is inserted, all those would break.
>

Ok -- not common scenarios to be of a concern.

thanks,

David


> Jakub


PR c++/54928 infinite ICE when reporting ICE on macro expansion

2012-10-16 Thread Manuel López-Ibáñez
The problem is that the macro unwinder code is changing the original
diagnostic type and not restoring it, so the code detecting that we
ICE fails to abort, which triggers another ICE, and so on. But there
is no point in modifying the original diagnostic, we can simply create
a temporary copy and use that for macro unwinding.

Bootstrapped and regression tested. I have tested that it fixes the
infiite ICE in the original example, but I am not sure how to add a
testcase for this because I expect the original ICE to be fixed soon
so the testcase will be useless.

Dodji, does it look ok? I am not sure how much testsuite coverage we
have for the macro unwinder, so I hope I didn't mess it up in some
non-trivial testcase.

OK?

2012-10-17  Manuel López-Ibáñez  

gcc/
PR c++/54928
* tree-diagnostic.c (maybe_unwind_expanded_macro_loc):
Do not modify diagnostic passed as argument but a temporary copy.


pr54928.diff
Description: Binary data


Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-16 Thread Diego Novillo

On 2012-10-16 18:50 , Ian Lance Taylor wrote:


That is fine with me as long as we acknowledge that the upstream
sources don't care about GCC and will think it is absurd that they
should modify their code to carry untested and unmaintained
GCC-specific annotations.  It would be one thing if the GCC-specific
annotations were clearly better, but in fact I would say that they are
clearly worse.


Precisely.

Eric, Rainer, what do you think of the other two options I outlined in 
my earlier message?


1- Copy the upstream testsuite into gcc/testsuite/asan.  This gives us 
the flexibility of adding new tests as the GCC implementation matures.


2- Deal with libasan as we deal with zlib/boehm-gc.

I prefer option #1, personally.


Diego.


  1   2   >