RFA: Use "m_foo" rather than "foo_" for member variables

2013-09-29 Thread Richard Sandiford
Michael Matz  writes:
>Trever Saunders  writes:
>> Richard Biener  writes:
>> > Btw, I've come around multiple coding-styles in the past and I 
>> > definitely would prefer m_mode / m_count to mark members vs. mode_ and 
>> > count_. (and s_XXX for static members IIRC).
>>
>> I'd prefer m_/s_foo for members / static things too fwiw.
>
> Me as well.  It's still ugly, but not so unsymmetric as the trailing 
> underscore.

Well, I'm not sure how I came to be the one writing these patches,
but I suppose I prefer m_foo too.  So how about the attached?

The first patch has changes to the coding conventions.  I added
some missing spaces while there.

The second patch has the mechanical code changes.  The reason for
yesterday's mass adding of spaces was because the second patch would
have been pretty inconsistent otherwise.

Tested on x86_64-linux-gnu.

Thanks,
Richard


Index: htdocs/codingconventions.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/codingconventions.html,v
retrieving revision 1.67
diff -u -p -r1.67 codingconventions.html
--- htdocs/codingconventions.html	16 Jul 2012 19:51:44 -	1.67
+++ htdocs/codingconventions.html	29 Sep 2013 08:55:10 -
@@ -685,10 +685,10 @@ The compiler must build cleanly with Assertions
 
-Code should use gcc_assert(EXPR) to check invariants.
-Use gcc_unreachable() to mark places that should never be
+Code should use gcc_assert (EXPR) to check invariants.
+Use gcc_unreachable () to mark places that should never be
 reachable (such as an unreachable default case of a
-switch).  Do not use gcc_assert(0) for such purposes, as
+switch).  Do not use gcc_assert (0) for such purposes, as
 gcc_unreachable gives the compiler more information.  The
 assertions are enabled unless explicitly configured off with
 --enable-checking=none.  Do not use abort.
@@ -1087,7 +1087,8 @@ rather than 
 When structs and/or classes have member functions,
-prefer to name data members with a trailing underscore.
+prefer to name data members with a leading m_
+and static data members with a leading s_.
 
 
 
@@ -1201,7 +1202,7 @@ Prefer to put the entire member head on 
 
 
 
-gnuclass::gnuclass() : base_class()
+gnuclass::gnuclass () : base_class ()
 { 
   ...
 };
@@ -1213,8 +1214,8 @@ place the colon of the initializer claus
 
 
 
-gnuclass::gnuclass()
-: base1(), base2(), member1(), member2(), member3(), member4()
+gnuclass::gnuclass ()
+: base1 (), base2 (), member1 (), member2 (), member3 (), member4 ()
 { 
   ...
 };
@@ -1226,9 +1227,9 @@ move overflowing initializers to the nex
 
 
 
-gnuclass::gnuclass()
-: base1(some_expression), base2(another_expression),
-  member1(my_expressions_everywhere)
+gnuclass::gnuclass ()
+: base1 (some_expression), base2 (another_expression),
+  member1 (my_expressions_everywhere)
 { 
   ...
 };
@@ -1242,7 +1243,7 @@ it should appear on the next line indent
 
 
 void
-very_long_class_name::very_long_function_name(
+very_long_class_name::very_long_function_name (
 very_long_type_name arg)
 {
 
@@ -1256,7 +1257,7 @@ We may wish to do so pre-emptively for a
 
 void
 very_long_template_class_name 
-::very_long_function_name(
+::very_long_function_name (
 very_long_type_name arg)
 {
 
Index: htdocs/codingrationale.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/codingrationale.html,v
retrieving revision 1.2
diff -u -p -r1.2 codingrationale.html
--- htdocs/codingrationale.html	6 Sep 2012 02:54:40 -	1.2
+++ htdocs/codingrationale.html	29 Sep 2013 08:55:10 -
@@ -351,10 +351,10 @@ but the clarity in layout persists.
 Names
 
 
-Naming data members with a trailing underscore
+Prefixing data members with m_
 highlights the extra overhead of access to fields over local variables.
-Think of the trailing underscore
-like you would Pascal's postfix ^ dereference operator.
+Think of the leading m_ as being similar to the
+* dereference operator.
 
 
 
gcc/
	* basic-block.h (edge_list): Prefix member names with "m_".
	* context.h (context): Likewise.
	* domwalk.h (dom_walker): Likewise.
	* gengtype-state.c (s_expr_writer, state_writer): Likewise.
	* graphite-sese-to-poly.c (sese_dom_walker): Likewise.
	* hash-table.h (hash_table): Likewise.
	* machmode.h (bit_field_mode_iterator): Likewise.
	* pass_manager.h (pass_list): Likewise.
	* tree-into-ssa.c (mark_def_dom_walker): Likewise.
	* tree-pass.h (pass_data): Likewise.
	* tree-ssa-dom.c (dom_opt_dom_walker): Likewise.
	* tree-ssa-phiopt.c (nontrapping_dom_walker): Likewise,
	* tree-ssa-uncprop.c (uncprop_dom_walker): Likewise.
	* asan.c (pass_data_asan): Update accordingly.
	* cfganal.c (control_dependences::find_control_dependence): Likewise.
	(control_dependences::control_dependences): Likewise.
	(control_dependences::~control_dependences): Likewise.
	(control_dependences::~control_dependences): Likewise.
	(control_dependences::get_edges_dependent_on):

Re: Add value range support into memcpy/memset expansion

2013-09-29 Thread Jan Hubicka
> > We now produce:
> > movqb(%rip), %rsi
> > movqa(%rip), %rcx
> > movq(%rsi), %rax <- first 8 bytes are moved
> > leaq8(%rcx), %rdi
> > andq$-8, %rdi   <- dest is aligned
> > movq%rax, (%rcx)
> > movq132(%rsi), %rax  <- last 8 bytes are moved
> > movq%rax, 132(%rcx)
> > subq%rdi, %rcx  <- alignment is subtracted from count
> 
> > subq%rcx, %rsi  <- source is aligned
> 
> This (source aligned) is not always true, but nevertheless the
> sequence is very tight.

Yep, sure, it is algigned only if source-dest is aligned, but that is best we
can ask for.
> > Unforutnately the following testcase:
> > char *p,*q;
> > t(int a)
> > {
> >   if (a<100)
> > memcpy(q,p,a);
> >
> > }
> > Won't get inlined.  This is because A is known to be smaller than 100 that
> > results in anti range after conversion to size_t.  This anti range allows 
> > very
> > large values (above INT_MAX) and thus we do not know the block size.
> > I am not sure if the sane range can be recovered somehow.  If not, maybe
> > this is common enough to add support for "probable" upper bound parameter to
> > the template.
> 
> Do we know if there is real code that intentionally does that other
> than security flaws as result of improperly done range check?

I do not think so.
> 
> I think by default GCC should assume the memcpy size range is (0, 100)
> here with perhaps an option to override it.

Indeed, this is what I was suggesting.  Problem is what to pass down to the
expanders as a value range.

We either need to update documentation of the expanders that the ranges are
just highly probably - and I do not want to do that since I want to use the
ranges for move_by_pieces, too.  So I think we will have to introduce two upper
bounds parameters - one sure and other very likely if there is no other
solution.

We play similar tricks in niter code.

Honza


Re: Add value range support into memcpy/memset expansion

2013-09-29 Thread Michael V. Zolotukhin
Hi Jan,

> I also think the misaligned move trick can/should be performed by
> move_by_pieces and we ought to consider sane use of SSE - current vector_loop
> with unrolling factor of 4 seems bit extreme.  At least buldozer is happy with
> 2 and I would expect SSE moves to be especially useful for moving blocks with
> known size where they are not used at all.
> 
> Currently I disabled misaligned move prologues/epilogues for Michael's vector
> loop path since they ends up longer than the traditional code (that use loop
> for epilogue)
Prologues could use this techniques even with vector_loop, as they actually
don't have a loop.
As for epilogues - have you tried to use misaligned vector_moves (movdqu)?  It
looks to me that we need approx. the same amount of instructions in vector-loop
and in usual-loop epilogues, if we use vector-instructions in vector-loop
epilogue.

> Comments are welcome.
BTW, maybe we could generalize expand_small_movmem a bit and make a separate
expanding strategy out of it?  It will expand a memmov with no loop (and
probably no alignment prologue) - just with the biggest available moves.  Also,
a cost model could be added here to make decisions on when we actually want to
align the moves.  Here is a couple of examples of that:

memcpy (a, b, 32); // alignment is unknown
will expand to
  movdqu a, %xmm0
  movdqu a+16, %xmm1
  movdqu %xmm0, b
  movdqu %xmm1, b+16
memcpy (a, b, 32); // alignment is known and equals 64bit
will expand to
a)
  movdqu a, %xmm0
  movdqu a+16, %xmm1
  movdqu %xmm0, b
  movdqu %xmm1, b+16
or b)
  movqa,   %xmm0
  movdqa  a+8, %xmm1
  movqa+24,%xmm2
  movq%xmm0, b
  movdqa  %xmm1, b+8
  movq%xmm2, b+24

We would compute the total cost of both variant and choose the best - for
computation we need just a cost of aligned and misaligned moves.

This strategy is actually pretty similar to move_by_pieces, but as it have much
more target-specific information, it would be able to produce much more
effective code.

And one more question - in a work on vector_loop for memset I tried to merge
many of movmem and setmem routines (e.g. instead of expand_movmem_prologue and
expand_setmem_prologue I made a single routine
expand_movmem_or_setmem_prologue).  What do you think, is it a good idea?  It
reduces code size in i386.c, but slightly complicates the code.  I'll send a
patch shortly, as soon as the testing completes.

Thanks, Michael


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (1).

2013-09-29 Thread Richard Sandiford
Chung-Ju Wu  writes:
>>> +/* Permitting tail calls.  */
>>> +
>>> +static bool
>>> +nds32_warn_func_return (tree decl)
>>> +{
>>> +  /* Naked functions are implemented entirely in assembly, including the
>>> + return sequence, so suppress warnings about this.  */
>>> +  return !nds32_naked_function_p (decl);
>>> +}
>> 
>> The comment above the function doesn't seem to match the function.
>> 
>
> The TARGET_WARN_FUNC_RETURN is located at "17.10.13 Permitting tail calls"
> in GCC Internal 4.9.0 pre-release.

Hmm, good point.  That seems kind-of misplaced.

> I should have added comment to describe
> nds32_warn_func_return() to avoid such confusion.  Like this:
>
>   /* Permitting tail calls.  */
>
>   /* Determine whether we need to enable warning for function return check.  
> */
>   static bool
>   nds32_warn_func_return (tree decl)

Look's good, thanks.

>>> +  /* If the function is 'naked', we do not have to generate
>>> + epilogue code fragment BUT 'ret' instruction.  */
>>> +  if (cfun->machine->naked_p)
>>> +{
>>> +  /* Generate return instruction by using unspec_func_return pattern.
>>> + Make sure this instruction is after gen_blockage().
>>> + NOTE that $lp will become 'live'
>>> + after this instruction has been emitted.  */
>>> +  emit_insn (gen_unspec_func_return ());
>>> +  return;
>>> +}
>> 
>> FWIW, this is different from AVR (for one), which explicitly doesn't emit a
>> return instruction.  One of the uses for AVR is to chain code together
>> into an .init-like section.
>> 
>
> Yes.
> Personally I prefer not to generate 'ret' instruction for a naked function.
> Programmers have responsibility to take all the jobs and do everything
> they need with c code, attribute, and inline asm.
>
> But some of our existing customers also expect that a naked function
> is supposed to imply "no push/pop" only.  Eventually we take their
> requirement into account, having 'ret' produced by compiler automatically.

That's certainly OK with me FWIW.  I don't think there's any realistic
hope of having the attribute behave the same way everywhere, because
it's already been used in incompatible ways on different targets.

>> Redundant brackets around the ==.  There are a few other cases in the
>> patch too, although most of it seems OK.
>> 
>
> Sorry I didn't see such requirement in GNU coding standards or
> GCC coding convention.  If it is not defined in coding convention,
> may I keep these redundant brackets for readability? :)

Now you mention it, I can't find it written down anywhere either.
Maybe it's one of those unwritten things that I've been taking
for granted, or maybe I just imagined it :-)

So I can't push back on this.

> +  /* We need to provide a customized rtx which contains
> + necessary information for data analysis,
> + so we create a parallel rtx like this:
> + (parallel [(unspec [(reg: Rb)
> + (reg: Re)
> + (const_int En4)] UNSPEC_STACK_PUSH_MULTIPLE)
> +(use (reg:SI Rb))
> +(use (reg:SI Re))
> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -4)))
> + (reg:SI LP_REGNUM))
> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -8)))
> + (reg:SI GP_REGNUM))
> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -12)))
> + (reg:SI FP_REGNUM))
> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -16)))
> + (reg:SI Re))
> +...
> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -28)))
> + (reg:SI Rb+1))
> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -32)))
> + (reg:SI Rb))
> +(set (reg:SI SP_REGNUM)
> + (plus (reg:SI SP_REGNUM) (const_int -32)))]) */

The "use"s here should be redundant, since the "set"s say that the
registers are used.  Same comment for:

> +  /* We need to provide a customized rtx which contains
> + necessary information for data analysis,
> + so we create a parallel rtx like this:
> + (NOTE: We have to clearly claim that we are using/clobbering Rb and Re,
> +otherwise it may be renamed by optimization.)
> + (parallel [(unspec [(reg: Rb)
> + (reg: Re)
> + (const_int En4)] UNSPEC_STACK_POP_MULTIPLE)
> +(use (reg:SI Rb))
> +(use (reg:SI Re))
> +(clobber (reg:SI Rb))
> +(clobber (reg:SI Re))
> +(set (reg:SI Rb)
> + (mem (plus (reg:SI SP_REGNUM) (const_int 0
> +(set (reg:SI Rb+1)
> + (mem (plus (reg:SI SP_REGNUM) (const_int 4
> +...
> +(set (reg:SI Re)
> + (mem (plus (reg:SI SP_REGNUM) (const_int 16
> +(set (r

Document x86-tune options

2013-09-29 Thread Jan Hubicka
Hi,
this patch adds documetnation to individual options and also adds vertical 
whitespace.

Comitted as obvious,
Honza
* config/i386/x86-tune.def: Add documentation for each of the options;
add whitespace.
Index: config/i386/x86-tune.def
===
--- config/i386/x86-tune.def(revision 203012)
+++ config/i386/x86-tune.def(working copy)
@@ -24,207 +24,378 @@ see the files COPYING3 and COPYING.RUNTI
work well with PPro base chips.  */
 DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave", 
  m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC)
+
+/* X86_TUNE_PUSH_MEMORY: Enable generation of "push mem" instructions.
+   Some chips, like 486 and Pentium have problems with these sequences.  */
 DEF_TUNE (X86_TUNE_PUSH_MEMORY, "push_memory", 
   m_386 | m_P4_NOCONA | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE 
   | m_GENERIC)
+
+/* X86_TUNE_ZERO_EXTEND_WITH_AND: Use AND instruction instead
+   of mozbl/movwl.  */
 DEF_TUNE (X86_TUNE_ZERO_EXTEND_WITH_AND, "zero_extend_with_and",  m_486 | 
m_PENT)
+
+/* X86_TUNE_UNROLL_STRLEN: Produce (quite lame) unrolled sequence for
+   inline strlen.  This affects only -minline-all-stringops mode. By
+   default we always dispatch to a library since our internal strlen
+   is bad.  */
 DEF_TUNE (X86_TUNE_UNROLL_STRLEN, "unroll_strlen",
 m_486 | m_PENT | m_PPRO | m_ATOM | m_SLM | m_CORE_ALL | m_K6
 | m_AMD_MULTIPLE | m_GENERIC)
+
 /* X86_TUNE_BRANCH_PREDICTION_HINTS: Branch hints were put in P4 based
on simulation result. But after P4 was made, no performance benefit
was observed with branch hints.  It also increases the code size.
As a result, icc never generates branch hints.  */
 DEF_TUNE (X86_TUNE_BRANCH_PREDICTION_HINTS, "branch_prediction_hints", 0)
+
+/* X86_TUNE_DOUBLE_WITH_ADD: Use add instead of sal to double value in
+   an integer register.  */
 DEF_TUNE (X86_TUNE_DOUBLE_WITH_ADD, "double_with_add", ~m_386)
+
+/* X86_TUNE_USE_SAHF: Controls use of SAHF.  */
 DEF_TUNE (X86_TUNE_USE_SAHF, "use_sahf",
   m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_K6_GEODE
   | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC)
+
 /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid
partial dependencies.  */
 DEF_TUNE (X86_TUNE_MOVX, "movx",
   m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_GEODE 
   | m_AMD_MULTIPLE  | m_GENERIC)
-/* X86_TUNE_PARTIAL_REG_STALL: We probably ought to watch for partial
-   register stalls on Generic32 compilation setting as well.  However
-   in current implementation the partial register stalls are not eliminated
+
+/* X86_TUNE_PARTIAL_REG_STALL: Pentium pro, unlike later chips, handled
+   use of partial registers by renaming.  This improved performance of 16bit
+   code where upper halves of registers are not used.  It also leads to
+   an penalty whenever a 16bit store is followed by 32bit use.  This flag
+   disables production of such sequences in common cases.
+   See also X86_TUNE_HIMODE_MATH.
+
+   In current implementation the partial register stalls are not eliminated
very well - they can be introduced via subregs synthesized by combine
and can happen in caller/callee saving sequences.  */
 DEF_TUNE (X86_TUNE_PARTIAL_REG_STALL, "partial_reg_stall", m_PPRO)
+
+/* X86_TUNE_PARTIAL_FLAG_REG_STALL: this flag disables use of of flags
+   set by instructions affecting just some flags (in particular shifts).
+   This is because Core2 resolves dependencies on whole flags register
+   and such sequences introduce false dependency on previous instruction
+   setting full flags.
+
+   The flags does not affect generation of INC and DEC that is controlled
+   by X86_TUNE_USE_INCDEC.
+
+   This flag may be dropped from generic once core2-corei5 machines are
+   rare enough.  */
 DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, "partial_flag_reg_stall",
   m_CORE2 | m_GENERIC)
+
 /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
- * on 16-bit immediate moves into memory on Core2 and Corei7.  */
+   on 16-bit immediate moves into memory on Core2 and Corei7.  */
 DEF_TUNE (X86_TUNE_LCP_STALL, "lcp_stall", m_CORE_ALL | m_GENERIC)
+
+/* X86_TUNE_USE_HIMODE_FIOP: Enables use of x87 instructions with 16bit
+   integer operand.
+   FIXME: Why this is disabled for modern chips?  */
 DEF_TUNE (X86_TUNE_USE_HIMODE_FIOP, "use_himode_fiop", 
   m_386 | m_486 | m_K6_GEODE)
+
+/* X86_TUNE_USE_SIMODE_FIOP: Enables use of x87 instructions with 32bit
+   integer operand.  */
 DEF_TUNE (X86_TUNE_USE_SIMODE_FIOP, "use_simode_fiop",
   ~(m_PENT | m_PPRO | m_CORE_ALL | m_ATOM 
 | m_SLM | m_AMD_MULTIPLE | m_GENERIC))
+
+/* X86_TUNE_USE_MOV0: Use "mov $0, reg" instead of "xor reg, reg" to clear
+   integer register.  */
 DEF_TUNE (X86_TUNE_USE_MOV0, "use_mov0", m_K6)
+
+/* X86_TUNE_USE_CLTD: Controls use of CLTD and CTQO 

Re: Add value range support into memcpy/memset expansion

2013-09-29 Thread Jan Hubicka
> Hi Jan,
> 
> > I also think the misaligned move trick can/should be performed by
> > move_by_pieces and we ought to consider sane use of SSE - current 
> > vector_loop
> > with unrolling factor of 4 seems bit extreme.  At least buldozer is happy 
> > with
> > 2 and I would expect SSE moves to be especially useful for moving blocks 
> > with
> > known size where they are not used at all.
> > 
> > Currently I disabled misaligned move prologues/epilogues for Michael's 
> > vector
> > loop path since they ends up longer than the traditional code (that use loop
> > for epilogue)
> Prologues could use this techniques even with vector_loop, as they actually
> don't have a loop.

Were new prologues lose is the fact that we need to handle all sizes smaller 
than
SIZE_NEEDED.  This is 64bytes that leads to a variant for 32..64, 16..32, 8...16
4..8 and the tail.  It is quite a lot of code.

When block is known to be greater than 64, this is also a win but my current 
patch
does not fine tune this, yet.
Similarly misaligned moves are win when size is known, alignment is not 
performed
and normal fixed size epiogue needs more than one move or when alignment is 
known
but offset is non-zero.
It will need bit of tweaking to handle all the paths well - it is usual problem
with the stringops, they get way too complex as number of factors increase.

It is why I think we may consider vector loop with less than 4 unrollings.
AMD optimization manual recommends two for buldozer... Is there difference 
between
4 and two for Atom?

To be honest I am not quite sure from where constant of 4 comes.  I think I
introduced it long time ago for K8 where it apparently got some extra % of
performance.

It is used for larger blocks only for PPro. AMD chips preffer it for small
blocks apparently becuase they preffer loop-less sequence.

> As for epilogues - have you tried to use misaligned vector_moves (movdqu)?  It
> looks to me that we need approx. the same amount of instructions in 
> vector-loop
> and in usual-loop epilogues, if we use vector-instructions in vector-loop
> epilogue.

Yes, code is in place for this.  You can just remove the check for size_needed
being smaller than 32 and it will produce the movdqu sequence for you (I tested
it on the vector loop testcase in the testusite).

The code will also use SSE for unrolled_loop prologue expansion at least for
memcpy (for memset it does not have broadcast value so it should skip it).

> 
> > Comments are welcome.
> BTW, maybe we could generalize expand_small_movmem a bit and make a separate
> expanding strategy out of it?  It will expand a memmov with no loop (and
> probably no alignment prologue) - just with the biggest available moves.  
> Also,
> a cost model could be added here to make decisions on when we actually want to
> align the moves.  Here is a couple of examples of that:
> 
> memcpy (a, b, 32); // alignment is unknown
> will expand to
>   movdqu a, %xmm0
>   movdqu a+16, %xmm1
>   movdqu %xmm0, b
>   movdqu %xmm1, b+16
> memcpy (a, b, 32); // alignment is known and equals 64bit
> will expand to
> a)
>   movdqu a, %xmm0
>   movdqu a+16, %xmm1
>   movdqu %xmm0, b
>   movdqu %xmm1, b+16
> or b)
>   movq  a,   %xmm0
>   movdqa  a+8, %xmm1
>   movq  a+24,%xmm2
>   movq  %xmm0, b
>   movdqa  %xmm1, b+8
>   movq  %xmm2, b+24
> 
> We would compute the total cost of both variant and choose the best - for
> computation we need just a cost of aligned and misaligned moves.
> 
> This strategy is actually pretty similar to move_by_pieces, but as it have 
> much
> more target-specific information, it would be able to produce much more
> effective code.

I was actually thinking more along lines of teaching move_by_pieces to do the 
tricks.
It seems there is not that much of x86 specific knowledge in here and other 
architectures
will also benefit from it.  We can also enable it when value range is close 
enough.

I plan to look into it today or tomorrow - revisit your old patch to 
move_by_pieces and see
how much of extra API I need to get move_by_pieces to do what 
expand_small_movmem does.
> 
> And one more question - in a work on vector_loop for memset I tried to merge
> many of movmem and setmem routines (e.g. instead of expand_movmem_prologue and
> expand_setmem_prologue I made a single routine
> expand_movmem_or_setmem_prologue).  What do you think, is it a good idea?  It
> reduces code size in i386.c, but slightly complicates the code.  I'll send a
> patch shortly, as soon as the testing completes.

I would like to see it.  I am not too thrilled by the duplication. My original
motivation for that was to keep under control number of code paths thorugh the
expanders.  We already have many of them (and it is easy to get wrong code) as
different variants of prologues/epilgoues and main loops are not exactly the
same and thus the code is not as moduler as I would like.  I am not sure if
adding differences in between memset and memmove is not going to

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-09-29 Thread Teresa Johnson
On Fri, Sep 27, 2013 at 7:15 AM, Teresa Johnson  wrote:
> On Thu, Sep 26, 2013 at 3:02 PM, Jan Hubicka  wrote:
>>>
>>> Why not just have probably_never_executed_bb_p return simply return
>>> false bb->frequency is non-zero (right now it does the opposite -
>>
>> We want to have frequencies guessed for functions that was not trained
>> in the profiling run (that was patch I posted earlier that I think did not
>> go in, yet).
>
> Right, but for splitting and bb layout purposes, for these statically
> guessed unprofiled functions we in fact don't want to do any splitting
> or treat the bbs as never executed (which shouldn't be a change from
> the status quo since all the bbs in these functions are currently 0
> weight, it's only when we inline in the case of comdats that they
> appear colder than the surrounding code, but in fact we don't want
> this).
>
> The only other caller to probably_never_executed_bb_p is
> compute_function_frequency, but in the case of statically guessed
> functions they will have profile_status != PROFILE_READ and won't
> invoke probably_never_executed_bb_p. But re-reading our most recent
> exchange on the comdat profile issue, it sounds like you were
> suggesting guessing profiles for all 0-weight functions early, then
> dropping them from PROFILE_READ to PROFILE_GUESSED only once we
> determine in ipa-inline that there is a potentially non-zero call path
> to them. In that case with the change I describe above to
> probably_never_executed_bb_p, the 0-weight functions with 0 calls to
> them will incorrectly be marked as NODE_FREQUENCY_NORMAL, which would
> be bad as they would not be size optimized or moved into the cold
> section.
>
> So it seems like we want different handling of these guessed
> frequencies in compute_function_frequency and bb-reorder.c. Actually I
> think we can handle this by checking if the function entry block has a
> 0 count. If so, then we just look at the bb counts and not the
> frequencies for determining bb hotness as the frequencies would
> presumably have been statically-guessed. This will ensure that the
> cgraph node continues to be marked unlikely and size-optimized. If the
> function entry block has a non-zero count, then we look at both the bb
> count and the bb frequency - if they are both zero then the bb is
> probably never executed, but if either is non-zero then we should
> treat the block as possibly executed (which will come into play for
> splitting and bb layout).

Here is a patch to handle the profile insanities conservatively during
splitting. It also simplifies the probably_never_executed* code to
treat missing counts within a profiled function differently
(conservatively, based on frequency) from the case where the whole
function has a guessed profile. That way, once a patch to guess
profiles for non-executed functions is added, they will continue to
have their nodes marked as unlikely. I also pulled the guts of the
probably_never_executed_bb_p code out to a helper that is then invoked
by both the bb and edge versions of this function, so they stay in
sync.

This gets rid of a number of the failures with splitting + the linker
script to make the unlikely section non-executable. I have a patch to
fix some jump threading insanities that I will send separately.

Bootstrapped and regression tested on x86_64. Also tested with an lto
profiledbootstrap. Ok for trunk?

Thanks,
Teresa

2013-09-29  Teresa Johnson  

* bb-reorder.c (find_rarely_executed_basic_blocks_and_crossing_edges):
Treat profile insanities conservatively.
* predict.c (probably_never_executed): New function. Treat profile
insanities conservatively.
(probably_never_executed_bb_p): Invoke probably_never_executed.
(probably_never_executed_edge_p): Invoke probably_never_executed.

Index: bb-reorder.c
===
--- bb-reorder.c(revision 202947)
+++ bb-reorder.c(working copy)
@@ -1564,8 +1564,25 @@ find_rarely_executed_basic_blocks_and_crossing_edg
   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB (bb)
 {
+  bool cold_bb = false;
   if (probably_never_executed_bb_p (cfun, bb))
 {
+  /* Handle profile insanities created by upstream optimizations
+ by also checking the incoming edge weights. If there is a non-cold
+ incoming edge, conservatively prevent this block from being split
+ into the cold section.  */
+  cold_bb = true;
+  FOR_EACH_EDGE (e, ei, bb->preds)
+{
+  if (!probably_never_executed_edge_p (cfun, e))
+{
+  cold_bb = false;
+  break;
+}
+}
+}
+  if (cold_bb)
+{
   BB_SET_PARTITION (bb, BB_COLD_PARTITION);
   cold_bb_count++;
 }
Index: predict.c
===

[PATCH, doc]: Fix "@anchor should not appear in @heading" warning

2013-09-29 Thread Uros Bizjak
Hello!

Rather trivial fix - put @anchor before @heading, as texi manual suggests.

2013-09-29  Uros Bizjak  

* doc/install.texi (Host/target specific installation notes for GCC):
Put @anchor before @heading.

Tested by "make doc" with texinfo 5.1 on Fedora 19.

OK for mainline?

Uros.
Index: doc/install.texi
===
--- doc/install.texi(revision 203011)
+++ doc/install.texi(working copy)
@@ -3173,8 +3173,8 @@
 
 
 @end html
-@heading @anchor{alpha-x-x}alpha*-*-*
-
+@anchor{alpha-x-x}
+@heading alpha*-*-*
 This section contains general configuration information for all
 alpha-based platforms using ELF (in particular, ignore this section for
 DEC OSF/1, Digital UNIX and Tru64 UNIX)@.  In addition to reading this
@@ -3188,7 +3188,8 @@
 @html
 
 @end html
-@heading @anchor{alpha-dec-osf51}alpha*-dec-osf5.1
+@anchor{alpha-dec-osf51}
+@heading alpha*-dec-osf5.1
 Systems using processors that implement the DEC Alpha architecture and
 are running the DEC/Compaq/HP Unix (DEC OSF/1, Digital UNIX, or Compaq/HP
 Tru64 UNIX) operating system, for example the DEC Alpha AXP systems.
@@ -3201,14 +3202,15 @@
 @html
 
 @end html
-@heading @anchor{amd64-x-solaris210}amd64-*-solaris2.1[0-9]*
-
+@anchor{amd64-x-solaris210}
+@heading amd64-*-solaris2.1[0-9]*
 This is a synonym for @samp{x86_64-*-solaris2.1[0-9]*}.
 
 @html
 
 @end html
-@heading @anchor{arm-x-eabi}arm-*-eabi
+@anchor{arm-x-eabi}
+@heading arm-*-eabi
 ARM-family processors.  Subtargets that use the ELF object format
 require GNU binutils 2.13 or newer.  Such subtargets include:
 @code{arm-*-netbsdelf}, @code{arm-*-*linux-*}
@@ -3217,8 +3219,8 @@
 @html
 
 @end html
-@heading @anchor{avr}avr
-
+@anchor{avr}
+@heading avr
 ATMEL AVR-family micro controllers.  These are used in embedded
 applications.  There are no standard Unix configurations.
 @ifnothtml
@@ -3254,8 +3256,8 @@
 @html
 
 @end html
-@heading @anchor{bfin}Blackfin
-
+@anchor{bfin}
+@heading Blackfin
 The Blackfin processor, an Analog Devices DSP.
 @ifnothtml
 @xref{Blackfin Options,, Blackfin Options, gcc, Using the GNU Compiler
@@ -3271,11 +3273,11 @@
 @html
 
 @end html
-@heading @anchor{cr16}CR16
+@anchor{cr16}
+@heading CR16
+The CR16 CompactRISC architecture is a 16-bit architecture. This
+architecture is used in embedded applications.
 
-The CR16 CompactRISC architecture is a 16-bit architecture. This architecture 
is 
-used in embedded applications.
-
 @ifnothtml
 @xref{CR16 Options,, CR16 Options, gcc, Using and Porting the GNU Compiler
 Collection (GCC)},
@@ -3288,14 +3290,14 @@
 Use @samp{configure --target=cr16-elf --enable-languages=c,c++} to configure
 GCC@ for building a CR16 elf cross-compiler.
 
-Use @samp{configure --target=cr16-uclinux --enable-languages=c,c++} to 
configure
-GCC@ for building a CR16 uclinux cross-compiler.
+Use @samp{configure --target=cr16-uclinux --enable-languages=c,c++} to
+configure GCC@ for building a CR16 uclinux cross-compiler.
 
 @html
 
 @end html
-@heading @anchor{cris}CRIS
-
+@anchor{cris}
+@heading CRIS
 CRIS is the CPU architecture in Axis Communications ETRAX system-on-a-chip
 series.  These are used in embedded applications.
 
@@ -3329,8 +3331,8 @@
 @html
 
 @end html
-@heading @anchor{dos}DOS
-
+@anchor{dos}
+@heading DOS
 Please have a look at the @uref{binaries.html,,binaries page}.
 
 You cannot install GCC by itself on MSDOS; it will not compile under
@@ -3341,15 +3343,16 @@
 @html
 
 @end html
-@heading @anchor{epiphany-x-elf}epiphany-*-elf
+@anchor{epiphany-x-elf}
+@heading epiphany-*-elf
 Adapteva Epiphany.
 This configuration is intended for embedded systems.
 
 @html
 
 @end html
-@heading @anchor{x-x-freebsd}*-*-freebsd*
-
+@anchor{x-x-freebsd}
+@heading *-*-freebsd*
 Support for FreeBSD 1 was discontinued in GCC 3.2.  Support for
 FreeBSD 2 (and any mutant a.out variants of FreeBSD 3) was
 discontinued in GCC 4.0.
@@ -3386,7 +3389,8 @@
 @html
 
 @end html
-@heading @anchor{h8300-hms}h8300-hms
+@anchor{h8300-hms}
+@heading h8300-hms
 Renesas H8/300 series of processors.
 
 Please have a look at the @uref{binaries.html,,binaries page}.
@@ -3399,7 +3403,8 @@
 @html
 
 @end html
-@heading @anchor{hppa-hp-hpux}hppa*-hp-hpux*
+@anchor{hppa-hp-hpux}
+@heading hppa*-hp-hpux*
 Support for HP-UX version 9 and older was discontinued in GCC 3.4.
 
 We require using gas/binutils on all hppa platforms.  Version 2.19 or
@@ -3451,8 +3456,8 @@
 @html
 
 @end html
-@heading @anchor{hppa-hp-hpux10}hppa*-hp-hpux10
-
+@anchor{hppa-hp-hpux10}
+@heading hppa*-hp-hpux10
 For hpux10.20, we @emph{highly} recommend you pick up the latest sed patch
 @code{PHCO_19798} from HP@.
 
@@ -3464,8 +3469,8 @@
 @html
 
 @end html
-@heading @anchor{hppa-hp-hpux11}hppa*-hp-hpux11
-
+@anchor{hppa-hp-hpux11}
+@heading hppa*-hp-hpux11
 GCC 3.0 and up support HP-UX 11.  GCC 2.95.x is not supported and cannot
 be used to compile GCC 3.0 and up.
 
@@ -3575,8 +3580,8 @@
 @html
 
 @end html
-@heading @anchor

Re: [Patch, Darwin] update t-* and x-* fragments after switch to auto-deps.

2013-09-29 Thread Mike Stump
On Sep 28, 2013, at 2:21 AM, Iain Sandoe  wrote:
> This updates the Darwin port {t,x}-* fragments after the switch to auto-deps 
> (thanks Tom!).

> OK?

Ok.

Re: [Patch, Darwin] improve cross-compiles.

2013-09-29 Thread Mike Stump
On Sep 28, 2013, at 2:35 AM, Iain Sandoe  wrote:
> I've been experimenting with the idea of building native crosses on my most 
> capable machine, for the many variants of darwin we now have, and then using 
> the older/slower hardware for test only.
> 
> This has uncovered a few issues with cross/native cross flags etc.

> (i)   that PIE is disabled for gcc exes on Darwin hosts since it is 
> incompatible with the current PCH implementation.

Yup.

> (ii)  that -mdynamic-no-pic is used for m32 hosts.

RIght.

> OK for trunk?

Ok.

Re: [Patch, Darwin/ppc] Fix altivec dwarf reg sizes.

2013-09-29 Thread Mike Stump
On Sep 28, 2013, at 3:03 AM, Iain Sandoe  wrote:
> Anyway, after considerable debate about this and several approaches, here is 
> a patch that just ensures we set the altivec register size to its correct 
> value.

> OK for trunk and open branches?

Ok.

Re: [Patch, Darwin/PPC] fix PR10901

2013-09-29 Thread Mike Stump
On Sep 28, 2013, at 3:26 AM, Iain Sandoe  wrote:
> We currently generate wrong code for non-local gotos which breaks, amongst 
> other things, nested functions.
> I fixed this a while ago for x86 Darwin and here is a version to fix it on 
> PPC.
> 
> OK for trunk? (and open branches?)

Ok.


[SPARC] Add peephole for memory barriers

2013-09-29 Thread Eric Botcazou
This is a bit gross, but this prevents consecutive memory barriers from being 
generated, for example on accesses to atomic objects in Ada.

Tested on SPARC/Solaris, applied on the mainline.


2013-09-29  Eric Botcazou  

* config/sparc/sync.md: Add peephole for consecutive memory barriers.


-- 
Eric BotcazouIndex: config/sparc/sync.md
===
--- config/sparc/sync.md	(revision 202912)
+++ config/sparc/sync.md	(working copy)
@@ -93,6 +93,18 @@ (define_insn "*membar"
   "membar\t%1"
   [(set_attr "type" "multi")])
 
+(define_peephole2
+  [(set (match_operand:BLK 0 "" "")
+	(unspec:BLK [(match_dup 0) (match_operand:SI 1 "const_int_operand")]
+		UNSPEC_MEMBAR))
+   (set (match_operand:BLK 2 "" "")
+	(unspec:BLK [(match_dup 2) (match_operand:SI 3 "const_int_operand")]
+		UNSPEC_MEMBAR))]
+  ""
+  [(set (match_operand:BLK 0 "" "")
+	(unspec:BLK [(match_dup 0) (match_dup 1)] UNSPEC_MEMBAR))]
+{ operands[1] = GEN_INT (UINTVAL (operands[1]) | UINTVAL (operands[3])); })
+
 (define_expand "atomic_load"
   [(match_operand:I 0 "register_operand" "")
(match_operand:I 1 "memory_operand" "")

Re: RFA [testsuite]: New ARC target specific tests

2013-09-29 Thread Mike Stump
On Sep 28, 2013, at 5:40 AM, Joern Rennecke  wrote:
> This patch adds a number of tests for ARC target specific options.
> 
> I'm a bit uncertain here if I still need approval for this patch.

No, it is not required.  Target maintainers can approve the usual tests suite 
patches.  Everything in gcc.target/ certainly are in this set.  They 
typical changes to .exp for target supports for example are fine, as are the 
typical patches in existing test cases to declare things like small stacks, low 
memory, no common support, 16 bit porting, 32  bit porting and so on…  if in 
doubt, ask, we endeavor to be responsive to patches.

> On the one hand the changes are all in an area that is normally within
> the remit of a target maintainer, and patch to add the gcc.target/arc
> directory has already been accepted.

Though, technically, it should go in as the port goes in.  :-)

I did review the patch anyway, Ok.

[PATCH, doc]: Fix usage of @tie{} command

2013-09-29 Thread Uros Bizjak
Hello!

2013-09-29  Uros Bizjak  

* doc/invoke.texi: Fix usage of @tie{} command.

Tested with texinfo-5.1, committed to mainline SVN.

Uros.

Index: invoke.texi
===
--- invoke.texi (revision 203011)
+++ invoke.texi (working copy)
@@ -12047,7 +12047,7 @@
 as needed before the operation.

 @item
-If the device supports RAM larger than 64@tie{KiB} and the compiler
+If the device supports RAM larger than 64@tie{}KiB and the compiler
 needs to change @code{RAMPZ} to accomplish an operation, @code{RAMPZ}
 is reset to zero after the operation.

@@ -12057,7 +12057,7 @@
 zero in case the ISR code might (implicitly) use it.

 @item
-RAM larger than 64@tie{KiB} is not supported by GCC for AVR targets.
+RAM larger than 64@tie{}KiB is not supported by GCC for AVR targets.
 If you use inline assembler to read from locations outside the
 16-bit address range and change one of the @code{RAMP} registers,
 you must reset it to zero after the access.


Re: Remove algo logic duplication Round 3

2013-09-29 Thread Paolo Carlini


Hi,

"François Dumont"  ha scritto:
>I also get your remark about the open round bracket, I didn't know that
>
>round bracket was the other name for parentheses ! I also fix the one
>you pointed me, I will be more careful next  time.

No problem ;) For your linguistic curiosity, I often find myself using round 
bracket: I find it more precise, less vague. Not 100% sure about french, but in 
italian we of course have the equivalent of parentheses but it's generic, kids 
soon learn about round, square and curly variants. English is probably more 
precise, it usefully provides 3 different words: parentheses, brackets and 
braces, which often, outside a technical context, should be clear enough, 
without further qualifications.

Thanks!
Paolo


[PATCH, doc]: Fix "@tex should only appear at a line beginning'" warnings

2013-09-29 Thread Uros Bizjak
Hello!

Attached patch decorates mail addresses with @mail{} to avoid "@tex
should only appear at a line beginning'" warnings. Also the patch adds
@uref and http:// prefix for a webpage.

2013-09-29  Uros Bizjak  

* doc/gcc.texi (titlepage): Use @uref and http:// prefix for website.
Use @email for email addresses.

Tested with make doc on texinfo 5.1 Fedora 19.

Uros.

Index: gcc.texi
===
--- gcc.texi(revision 203015)
+++ gcc.texi(working copy)
@@ -83,11 +83,11 @@
 Published by:
 @multitable @columnfractions 0.5 0.5
 @item GNU Press
-@tab Website: www.gnupress.org
+@tab Website: @uref{http://www.gnupress.org}
 @item a division of the
-@tab General: @tex press@@gnu.org @end tex
+@tab General: @email{press@@gnu.org}
 @item Free Software Foundation
-@tab Orders:  @tex sales@@gnu.org @end tex
+@tab Orders:  @email{sales@@gnu.org}
 @item 51 Franklin Street, Fifth Floor
 @tab Tel 617-542-5942
 @item Boston, MA 02110-1301 USA


Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-09-29 Thread Yury Gribov

> Can you please be more verbose

Right, I should have been.

So as you can see from the asm log in the bug description, prologue 
writes shadow bytes corresponding to words at frame_shadow_base + { 0, 
4, 8, 12, 16, 24, 28}. Epilogue should clear those but instead it zeros 
out frame_shadow_base + { 0, 4, 8, 12, 16, 40, 44}, thus causing words 
at frame_shadow_base + {24, 28} to remain poisoned and causing false 
Asan errors later.


The reason as I see it is that we change the address of shadow_mem in 
asan_emit_stack_protection twice: once in asan_clear_shadow
  tmp = expand_simple_binop (Pmode, PLUS, addr, gen_int_mode (4, 
Pmode), addr,

 true, OPTAB_LIB_WIDEN);
  if (tmp != addr)
emit_move_insn (addr, tmp);
and asan_emit_stack_protection itself:
  if (last_size)
{
  shadow_mem = adjust_address (shadow_mem, VOIDmode,
   (last_offset - prev_offset)
   >> ASAN_SHADOW_SHIFT);
This would translate into
  add  r4, r4, #4
and
 add  r3, r4, #24
in the asm. The shadow_mem will thus have the next block offset added to 
it _twice_ and will point to invalid position.
My simple fix uses a temp register in asan_clear_shadow to avoid 
spoiling the shadow_mem inside the loop.


I'm not yet a gcc guru so I wanted some experienced person to say 
whether I'm doing something completely wrong here.


BTW I forgot to mention that Asan tests pass both on ARM and on x86_64.

> Also, you are missing a ChangeLog entry.

Attached.

-Y
2013-09-30  Yury Gribov  

PR sanitizer/58543
* asan.c: Use new temporary for iteration in asan_clear_shadow.



RE: [PATCH]Fix computation of offset in ivopt

2013-09-29 Thread bin.cheng


> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Friday, September 27, 2013 4:30 PM
> To: Bin Cheng
> Cc: GCC Patches
> Subject: Re: [PATCH]Fix computation of offset in ivopt
> 
> On Fri, Sep 27, 2013 at 7:07 AM, bin.cheng  wrote:
> >
> >
> >   case INTEGER_CST:
> >   //...
> >   *offset = int_cst_value (expr);
> > change to
> >   case INTEGER_CST:
> >   //...
> >   *offset = sext_hwi (int_cst_value (expr), type);
> >
> > and
> >   case MULT_EXPR:
> >   //...
> >   *offset = sext_hwi (int_cst_value (expr), type); to
> >   case MULT_EXPR:
> >   //...
> >  HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1);
> >   *offset = sext_hwi (xxx, type);
> >
> > Any comments?
> 
> The issue is of course that we end up converting offsets to sizetype at
some
> point which makes them all appear unsigned.  The fix for this is to simply
> interpret them as signed ... but it's really a mess ;)
> 

Hi,  this is updated patch which calculates signed offset in strip_offset_1
then sign extend it in strip_offset.

Bootstrap and test on x86_64/x86/arm. Is it OK?

Thanks.
bin

2013-09-30  Bin Cheng  

* tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type.
Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF.
(strip_offset): Sign extend before return.Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c  (revision 202599)
+++ gcc/tree-ssa-loop-ivopts.c  (working copy)
@@ -2037,12 +2037,12 @@ find_interesting_uses (struct ivopts_data *data)
 
 static tree
 strip_offset_1 (tree expr, bool inside_addr, bool top_compref,
-   unsigned HOST_WIDE_INT *offset)
+   HOST_WIDE_INT *offset)
 {
   tree op0 = NULL_TREE, op1 = NULL_TREE, tmp, step;
   enum tree_code code;
   tree type, orig_type = TREE_TYPE (expr);
-  unsigned HOST_WIDE_INT off0, off1, st;
+  HOST_WIDE_INT off0, off1, st;
   tree orig_expr = expr;
 
   STRIP_NOPS (expr);
@@ -2133,19 +2133,26 @@ strip_offset_1 (tree expr, bool inside_addr, bool
   break;
 
 case COMPONENT_REF:
-  if (!inside_addr)
-   return orig_expr;
+  {
+   tree field;
+   HOST_WIDE_INT boffset = 0;
+   if (!inside_addr)
+ return orig_expr;
 
-  tmp = component_ref_field_offset (expr);
-  if (top_compref
- && cst_and_fits_in_hwi (tmp))
-   {
- /* Strip the component reference completely.  */
- op0 = TREE_OPERAND (expr, 0);
- op0 = strip_offset_1 (op0, inside_addr, top_compref, &off0);
- *offset = off0 + int_cst_value (tmp);
- return op0;
-   }
+   tmp = component_ref_field_offset (expr);
+   field = TREE_OPERAND (expr, 1);
+   if (top_compref
+   && cst_and_fits_in_hwi (tmp)
+   && cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field)))
+ {
+   /* Strip the component reference completely.  */
+   op0 = TREE_OPERAND (expr, 0);
+   op0 = strip_offset_1 (op0, inside_addr, top_compref, &off0);
+   boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field));
+   *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT;
+   return op0;
+ }
+  }
   break;
 
 case ADDR_EXPR:
@@ -2196,7 +2203,16 @@ strip_offset_1 (tree expr, bool inside_addr, bool
 static tree
 strip_offset (tree expr, unsigned HOST_WIDE_INT *offset)
 {
-  return strip_offset_1 (expr, false, false, offset);
+  HOST_WIDE_INT off;
+  tree core = strip_offset_1 (expr, false, false, &off);
+
+  /* Sign extend off if expr is in type which has lower precision
+ than HOST_WIDE_INT.  */
+  if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT)
+off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr)));
+
+  *offset = off;
+  return core;
 }
 
 /* Returns variant of TYPE that can be used as base for different uses.


RE: [PATCH]Fix computation of offset in ivopt

2013-09-29 Thread bin.cheng


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Oleg Endo
> Sent: Wednesday, September 25, 2013 1:41 AM
> To: Richard Biener
> Cc: Bin Cheng; GCC Patches
> Subject: Re: [PATCH]Fix computation of offset in ivopt
> 
> >
> 
> After reading "overflow" and "ivopt", I was wondering whether
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55190 is somehow related.

No, this patch is irrelevant. But I do have some thought on the pr55190 and 
will follow in the bug entry.

Thanks.
bin