Re: [Patch] Whole regex refactoring and current status

2013-08-09 Thread Paolo Carlini

Hi,

On 08/09/2013 07:08 AM, Tim Shen wrote:

On Fri, Aug 9, 2013 at 12:51 AM, Tim Shen  wrote:

So here's the change. It's under testing now, but could took several
hours. If someone has a faster machine, please tell the result :)

Unfortuantely using `unsigned int => unsigned short` cannot work.
Instead, I create a enum and do some operator overloadings.

This patch passed bootstrap and all tests under the configuration under x86_64:

--with-arch=corei7 --with-cpu=corei7 --prefix=/usr/local
--enable-clocale=gnu --with-system-zlib --enable-shared
--with-demangler-in-ld --with-fpmath=sse --enable-languages=c++
Yes, if, as it should, it works on -m32 too, let's go with this. By the 
way, you didn't say how exactly you are testing?!? Because just make 
check doesn't test -m32. I use, something like:


make -k check-target-libstdc++-v3 
RUNTESTFLAGS="--target_board='unix/\{-m32,-m64\}'"


for sure there are other ways to obtain the same result.

Paolo.


Re: [PATCH, vtv update] Fix /tmp directory issues in libvtv

2013-08-09 Thread Florian Weimer

On 08/09/2013 12:09 AM, Caroline Tice wrote:

+  logs_dir = getenv ("VTV_LOGS_DIR");


This needs to use __secure_getenv or secure_getenv, depending on the 
glibc version, so that it doesn't wreak havoc in SUID/SGID binaries (or 
after other kinds of privilege transitions).


Relevant autoconf checks are described here:



--
Florian Weimer / Red Hat Product Security Team


Re: [C++ RFC / Patch] PR 54080, PR 52875 and more (aka SFINAE vs template recursion depth)

2013-08-09 Thread Paolo Carlini
.. another thought I had, less esoteric ;) is the following: we use 
tf_none for two rather different reasons: for SFINAE and to avoid 
recursive Error routines calls, when we call tsubst (... tf_none, ...) 
from dump_template_bindings.


I understand, given your reply, that in general in the first case, thus 
SFINAE, we should avoid all hard errors *but* the one about too deep 
recursions (barring some sort of powerful infinite recursion detector). 
What about the second case, however? Should it be different? An error 
message is being produced in any case, for a reason or another, it 
shouldn't be prevented or made more difficult only because there is deep 
recursion somewhere. I think that in that second case we should suppress 
the error message about too deep recursion too. But how to tell it 
apart? Looks like we want some sort of separate tf_*, a 
tf_in_diagnostic, or something, very similar to tf_none, but truly with 
no exceptions. Actually this vague idea occured to me a number of times, 
I think having that would help in a number of situations.


What do you think?

Thanks,
Paolo.

(*) Maybe there is third one, like in some recent tweaks Jakub did, but 
let's leave it alone here.


Re: [C++ RFC / Patch] PR 54080, PR 52875 and more (aka SFINAE vs template recursion depth)

2013-08-09 Thread Florian Weimer

On 08/09/2013 08:43 AM, Paolo Carlini wrote:


Yes, that is intended.  Changing that could mean that the meaning of
code depends on what max depth the user selected.


Indeed. Yesterday I wondered what would happen if the front-end had a way to detect, in 
some very specific and special cases only of course, really infinite recursions, in the 
sense that no increase in the depth would "cure" them. Would it be ok in that 
case to sfinae away?


Eh, hopefully not.  Otherwise, program behavior would depend on how well 
the compiler solves the halting problem.


It's interesting question, but hopefully we can make all errors due to 
exceeded implementation limits hard errors, not subject to SFINAE.


--
Florian Weimer / Red Hat Product Security Team


Re: [C++ RFC / Patch] PR 54080, PR 52875 and more (aka SFINAE vs template recursion depth)

2013-08-09 Thread Paolo Carlini


Hi,

Florian Weimer  ha scritto:
>On 08/09/2013 08:43 AM, Paolo Carlini wrote:
>
>>> Yes, that is intended.  Changing that could mean that the meaning of
>>> code depends on what max depth the user selected.
>>
>> Indeed. Yesterday I wondered what would happen if the front-end had a
>way to detect, in some very specific and special cases only of course,
>really infinite recursions, in the sense that no increase in the depth
>would "cure" them. Would it be ok in that case to sfinae away?
>
>Eh, hopefully not.  Otherwise, program behavior would depend on how
>well
>the compiler solves the halting problem.

I see. You know, I was trying to figure out the logic other compilers - two of 
them, actually - are following, because the really appear to sfinae away 
infinite recursions. Was trying to imagine cases in which it would be safe.

Paolo


Re: [FIXED] Generic lambda symbol table bug

2013-08-09 Thread Adam Butcher

On 09.08.2013 03:01, Jason Merrill wrote:

On 08/08/2013 06:28 PM, Adam Butcher wrote:

So all seems to be okay with both versions.  Any ideas why?


Hmm, it sounds like processing_template_decl is being set after all,
even without your change.

Yup.  Although the lambda template code I originally added scoped it to 
within the the 'cp_parser_lambda_declarator_opt' function, since the 
call op being declared is an in-class inline member, 
'start_preparsed_function' is entering 
'maybe_begin_member_template_processing'.   'finish_function' matches 
that with 'maybe_end_member_template_processing' prior to the function 
being passed to 'expand_or_defer_fn' (which then causes the erroneous 
symtab entry).


So I think my original fix might be okay here; omitting 
'expand_or_defer_fn' if 'generic_lambda_p'.


What do you think?  Shall I just include it as part of the first patch? 
If so that'd just be two patches, "teach GCC to deal with lambda 
templates" and "implicit function templates with auto".


Cheers,
Adam




Re: [C++ RFC / Patch] PR 54080, PR 52875 and more (aka SFINAE vs template recursion depth)

2013-08-09 Thread Florian Weimer

On 08/09/2013 09:28 AM, Paolo Carlini wrote:


I see. You know, I was trying to figure out the logic other compilers - two of 
them, actually - are following, because the really appear to sfinae away 
infinite recursions. Was trying to imagine cases in which it would be safe.


Could their behavior just be bugs?  Depending on their error recovery 
implementation, not flagging infinite recursion as a hard error in 
SFINAE context could be an easy mistake to make.


--
Florian Weimer / Red Hat Product Security Team


Re: [Patch] Whole regex refactoring and current status

2013-08-09 Thread Tim Shen
On Fri, Aug 9, 2013 at 2:59 PM, Paolo Carlini  wrote:
> Yes, if, as it should, it works on -m32 too, let's go with this. By the way,
> you didn't say how exactly you are testing?!? Because just make check
> doesn't test -m32. I use, something like:
>
> make -k check-target-libstdc++-v3
> RUNTESTFLAGS="--target_board='unix/\{-m32,-m64\}'"
>
> for sure there are other ways to obtain the same result.

Committed.

I entered `gcc-build/x86_64-unknown-linux-gnu/32/libstdc++-v3` then
run `make check`. I see -m32 in test log.


-- 
Tim Shen


Re: [C++ RFC / Patch] PR 54080, PR 52875 and more (aka SFINAE vs template recursion depth)

2013-08-09 Thread Paolo Carlini


Hi,

Florian Weimer  ha scritto:
>Could their behavior just be bugs?  Depending on their error recovery
>implementation, not flagging infinite recursion as a hard error in
>SFINAE context could be an easy mistake to make.

Sure can be. In a sense, as I tried to explain in another sub-thread, we do 
have the complementary one.

Paolo



Re: [PATCH, AArch64] Skip gcc.dg/lower-subreg-1.c

2013-08-09 Thread Marcus Shawcroft
On 26 July 2013 12:06, Yufeng Zhang  wrote:
> Hi,
>
> This patch changes to skip gcc.dg/lower-subreg-1.c for aarch64*-*-*. The
> word mode in aarch64 is 64-bit so the lower-subreg pass won't happen in this
> test case.  The test is currently skipped on aarch64 with lp64 due to the
> directive of "dg-require-effective-target ilp32", but fails when -mabi=ilp32
> is in use.

OK
/Marcus


Re: [C++ RFC / Patch] PR 54080, PR 52875 and more (aka SFINAE vs template recursion depth)

2013-08-09 Thread Gabriel Dos Reis
On Thu, Aug 8, 2013 at 6:40 PM, Jason Merrill  wrote:
> On 08/08/2013 03:54 PM, Paolo Carlini wrote:
>>
>> the really interesting one is decltype28.C, which we don't reject
>> anymore, we simply accept it. What is happening is that the overload
>> which leads to excessive template instantiation depth is SFINAE-ed away
>> and the other one "wins"! Thus, this is the core of my message: it seems
>> that we behave wrt this issue - SFINAE vs template instantiation depth -
>> in a different way vs current clang++ and icc: we produce hard error
>> messages in SFINAE contexts. Is that intended?
>
>
> Yes, that is intended.  Changing that could mean that the meaning of code
> depends on what max depth the user selected.

that would be disturbing…

-- gaby


Re: [C++ RFC / Patch] PR 54080, PR 52875 and more (aka SFINAE vs template recursion depth)

2013-08-09 Thread Gabriel Dos Reis
On Fri, Aug 9, 2013 at 2:13 AM, Florian Weimer  wrote:
> On 08/09/2013 08:43 AM, Paolo Carlini wrote:
>
>>> Yes, that is intended.  Changing that could mean that the meaning of
>>> code depends on what max depth the user selected.
>>
>>
>> Indeed. Yesterday I wondered what would happen if the front-end had a way
>> to detect, in some very specific and special cases only of course, really
>> infinite recursions, in the sense that no increase in the depth would "cure"
>> them. Would it be ok in that case to sfinae away?
>
>
> Eh, hopefully not.  Otherwise, program behavior would depend on how well the
> compiler solves the halting problem.
>
> It's interesting question, but hopefully we can make all errors due to
> exceeded implementation limits hard errors, not subject to SFINAE.

agreed.

-- Gaby


Re: [C++ RFC / Patch] PR 54080, PR 52875 and more (aka SFINAE vs template recursion depth)

2013-08-09 Thread Gabriel Dos Reis
On Fri, Aug 9, 2013 at 2:33 AM, Florian Weimer  wrote:
> On 08/09/2013 09:28 AM, Paolo Carlini wrote:
>
>> I see. You know, I was trying to figure out the logic other compilers -
>> two of them, actually - are following, because the really appear to sfinae
>> away infinite recursions. Was trying to imagine cases in which it would be
>> safe.
>
>
> Could their behavior just be bugs?  Depending on their error recovery
> implementation, not flagging infinite recursion as a hard error in SFINAE
> context could be an easy mistake to make.

Indeed.  The fact we try to recreate template instantiations,
as opposed to using what is already known at the point where
we get the diagnostics is a worrisome aspect of our current
infrastructure.  That obviously manifests itself with the "error re-entered…"
stuff.

-- Gaby


Re: [C++ RFC / Patch] PR 54080, PR 52875 and more (aka SFINAE vs template recursion depth)

2013-08-09 Thread Gabriel Dos Reis
On Fri, Aug 9, 2013 at 2:13 AM, Paolo Carlini  wrote:
> .. another thought I had, less esoteric ;) is the following: we use tf_none
> for two rather different reasons: for SFINAE and to avoid recursive Error
> routines calls, when we call tsubst (... tf_none, ...) from
> dump_template_bindings.
>
> I understand, given your reply, that in general in the first case, thus
> SFINAE, we should avoid all hard errors *but* the one about too deep
> recursions (barring some sort of powerful infinite recursion detector). What
> about the second case, however? Should it be different? An error message is
> being produced in any case, for a reason or another, it shouldn't be
> prevented or made more difficult only because there is deep recursion
> somewhere. I think that in that second case we should suppress the error
> message about too deep recursion too. But how to tell it apart? Looks like
> we want some sort of separate tf_*, a tf_in_diagnostic, or something, very
> similar to tf_none, but truly with no exceptions. Actually this vague idea
> occured to me a number of times, I think having that would help in a number
> of situations.
>
> What do you think?
>
> Thanks,
> Paolo.
>
> (*) Maybe there is third one, like in some recent tweaks Jakub did, but
> let's leave it alone here.

I think we should find ways to have the pretty printer in the
diagnostic framework stop trying to redo most of the work
done by the type checker.  In its current form, that is fragile.

-- Gaby


Re: RFA: implement C11 _Generic

2013-08-09 Thread Marek Polacek
On Tue, Jul 23, 2013 at 01:55:19AM +, Joseph S. Myers wrote:
> I have now revised this patch from a year ago in line with my
> understanding of how _Generic ought to handle the various special
> cases (selector undergoes lvalue-to-rvalue conversion, and decay of
> functions and arrays to pointers, because nothing says it doesn't -
> "The controlling expression of a generic selection was very carefully
> not added to the list of contexts in which lvalue conversion is not
> done and type qualification is discarded", the minutes say - and no
> rvalues can have qualified type), which seems to accord with the
> committee discussion in the Delft minutes, added corresponding
> testcases, and committed this patch.  Bootstrapped with no regressions
> on x86_64-unknown-linux-gnu.

Thanks for doing this.  I see that http://gcc.gnu.org/wiki/C11Status
isn't updated wrt _Generic, shall I do that?

Marek


Re: [AArch64] Fixup the vget_lane RTL patterns and intrinsics

2013-08-09 Thread Marcus Shawcroft

On 05/08/13 21:57, James Greenhalgh wrote:


This patch fixes up the vget_lane RTL patterns to better
exploit the behaviour of their target instructions, and
to allow variants keeping the result in the SIMD register file.





---
gcc/

2013-08-05  James Greenhalgh  

* config/aarch64/aarch64-simd-builtins.def (get_lane_signed): Remove.
(get_lane_unsigned): Likewise.
(dup_lane_scalar): Likewise.
(get_lane): enable for VALL.
* config/aarch64/aarch64-simd.md
(aarch64_dup_lane_scalar): Remove.
(aarch64_get_lane_signed): Likewise.
(aarch64_get_lane_unsigned): Likewise.
(aarch64_get_lane_extend): New.
(aarch64_get_lane_zero_extendsi): Likewise.
(aarch64_get_lane): Enable for all vector modes.
(aarch64_get_lanedi): Remove misleading constraints.
* config/aarch64/arm_neon.h
(__aarch64_vget_lane_any): Define.
(__aarch64_vget_lane_<8,16,32,64>): Likewise.
(vget_lane_<8,16,32,64>): Use __aarch64_vget_lane macros.
(vdup_lane_<8,16,32,64>): Likewise.
* config/aarch64/iterators.md (VDQQH): New.
(VDQQHS): Likewise.
(vwcore): Likewise.


OK
/Marcus




[SPARC] access to atomic compare-and-swap on LEON3

2013-08-09 Thread Eric Botcazou
This enables access to the atomic compare-and-swap on LEON3 in conjunction with 
the binutils patch at:
  http://sourceware.org/ml/binutils/2013-08/msg00038.html

Tested on SPARC/Solaris and sparc-elf, applied on the mainline.


2013-08-09  Eric Botcazou  

* configure.ac: Add GAS check for LEON instructions on SPARC.
* configure: Regenerate.
* config.in: Likewise.
* config.gcc (with_cpu): Remove sparc-leon*-* and deal with LEON in the
sparc*-*-* block.
* config/sparc/sparc.opt (LEON, LEON3): New masks.
* config/sparc/sparc.h (ASM_CPU32_DEFAULT_SPEC): Set to AS_LEON_FLAG
for LEON or LEON3.
(ASM_CPU_SPEC): Pass AS_LEON_FLAG if -mcpu=leon or -mcpu=leon3.
(AS_LEON_FLAG): New macro.
* config/sparc/sparc.c (sparc_option_override): Set MASK_LEON for leon
and MASK_LEON3 for leon3 and unset them if HAVE_AS_LEON is not defined.
Deal with LEON and LEON3 for the memory model.
* config/sparc/sync.m (atomic_compare_and_swap): Enable for LEON3
(atomic_compare_and_swap_1): Likewise.
(*atomic_compare_and_swap_1): Likewise.


-- 
Eric Botcazou
Index: configure.ac
===
--- configure.ac	(revision 201177)
+++ configure.ac	(working copy)
@@ -3613,6 +3613,19 @@ foo:
kasumi_fi_xor %f46, %f48, %f50, %f52],,
   [AC_DEFINE(HAVE_AS_SPARC4, 1,
 [Define if your assembler supports SPARC4 instructions.])])
+
+gcc_GAS_CHECK_FEATURE([LEON instructions],
+  gcc_cv_as_sparc_leon,,
+  [-Aleon],
+  [.text
+   .register %g2, #scratch
+   .register %g3, #scratch
+   .align 4
+   smac %g2, %g3, %g1
+   umac %g2, %g3, %g1
+   cas [[%g2]], %g3, %g1],,
+  [AC_DEFINE(HAVE_AS_LEON, 1,
+[Define if your assembler supports LEON instructions.])])
 ;;
 
 changequote(,)dnl
Index: config.gcc
===
--- config.gcc	(revision 201177)
+++ config.gcc	(working copy)
@@ -3041,11 +3041,18 @@ if test x$with_cpu = x ; then
  with_cpu=8540
   fi   
   ;;
-sparc-leon*-*)
-  with_cpu=v8;
-  ;;
 sparc*-*-*)
-  with_cpu="`echo ${target} | sed 's/-.*$//'`"
+  case ${target} in
+	*-leon-*)
+	  with_cpu=leon
+	  ;;
+	*-leon[3-9]*)
+	  with_cpu=leon3
+	  ;;
+	*)
+	  with_cpu="`echo ${target} | sed 's/-.*$//'`"
+	  ;;
+  esac
   ;;
   esac
 
Index: config/sparc/sparc.opt
===
--- config/sparc/sparc.opt	(revision 201177)
+++ config/sparc/sparc.opt	(working copy)
@@ -211,6 +211,12 @@ Enable workarounds for the errata of the
 Mask(LONG_DOUBLE_128)
 ;; Use 128-bit long double
 
+Mask(LEON)
+;; Generate code for LEON
+
+Mask(LEON3)
+;; Generate code for LEON3
+
 Mask(SPARCLITE)
 ;; Generate code for SPARClite
 
Index: config/sparc/sync.md
===
--- config/sparc/sync.md	(revision 201177)
+++ config/sparc/sync.md	(working copy)
@@ -161,7 +161,8 @@ (define_expand "atomic_compare_and_swap<
(match_operand:SI 5 "const_int_operand" "")		;; is_weak
(match_operand:SI 6 "const_int_operand" "")		;; mod_s
(match_operand:SI 7 "const_int_operand" "")]		;; mod_f
-  "TARGET_V9 && (mode != DImode || TARGET_ARCH64 || TARGET_V8PLUS)"
+  "(TARGET_V9 || TARGET_LEON3)
+   && (mode != DImode || TARGET_ARCH64 || TARGET_V8PLUS)"
 {
   sparc_expand_compare_and_swap (operands);
   DONE;
@@ -176,7 +177,7 @@ (define_expand "atomic_compare_and_swap<
 	 [(match_operand:I48MODE 2 "register_operand" "")
 	  (match_operand:I48MODE 3 "register_operand" "")]
 	 UNSPECV_CAS))])]
-  "TARGET_V9"
+  "TARGET_V9 || TARGET_LEON3"
   "")
 
 (define_insn "*atomic_compare_and_swap_1"
@@ -187,7 +188,7 @@ (define_insn "*atomic_compare_and_swapmode == SImode || TARGET_ARCH64)"
+  "(TARGET_V9 || TARGET_LEON3) && (mode != DImode || TARGET_ARCH64)"
   "cas\t%1, %2, %0"
   [(set_attr "type" "multi")])
 
Index: config/sparc/sparc.c
===
--- config/sparc/sparc.c	(revision 201466)
+++ config/sparc/sparc.c	(working copy)
@@ -1140,9 +1140,8 @@ sparc_option_override (void)
 /* TI TMS390Z55 supersparc */
 { "supersparc",	MASK_ISA, MASK_V8 },
 { "hypersparc",	MASK_ISA, MASK_V8|MASK_FPU },
-/* LEON */
-{ "leon",		MASK_ISA, MASK_V8|MASK_FPU },
-{ "leon3",		MASK_ISA, MASK_V8|MASK_FPU },
+{ "leon",		MASK_ISA, MASK_V8|MASK_LEON|MASK_FPU },
+{ "leon3",		MASK_ISA, MASK_V8|MASK_LEON3|MASK_FPU },
 { "sparclite",	MASK_ISA, MASK_SPARCLITE },
 /* The Fujitsu MB86930 is the original sparclite chip, with no FPU.  */
 { "f930",		MASK_ISA|MASK_FPU, MASK_SPARCLITE },
@@ -1302,6 +1301,9 @@ sparc_option_override (void)
 #ifndef HAVE_AS_SPARC4
 		   & ~MASK_CBCOND
 #endif
+#ifndef HAVE_AS_LEON
+		   & ~(MASK_LEON | MASK_L

Re: [C++ RFC / Patch] PR 54080, PR 52875 and more (aka SFINAE vs template recursion depth)

2013-08-09 Thread Paolo Carlini

Hi,

On 08/09/2013 10:46 AM, Gabriel Dos Reis wrote:
I think we should find ways to have the pretty printer in the 
diagnostic framework stop trying to redo most of the work done by the 
type checker. In its current form, that is fragile. -- Gaby 
Yeah. That tsubst (..., tf_none, ...) from dump_template_bindings, 
always seemed a little weird to me. Fact is, we have got quite a few 
serious diagnostic bugs where we print *very little* sensible before the 
Error reporting routines re-entered message. Like 54080 and 52875, but 
I'm sure there are more.


Do you have any tips about a reasonable way to handle the latter in the 
short term?


Thanks!
Paolo.


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-09 Thread Jan Hubicka
> On Thu, Aug 8, 2013 at 3:23 PM, Jan Hubicka  wrote:
> > Hi,
> > Martin Liska was kind enough to generate disk seeking graph of gimp statrup 
> > with his function reordering.
> > His code simply measures time of firest execution of a function and orders 
> > functions in the given order.
> > The functions stay in the subsections (unlikely/startup/exit/hot/normal) 
> > that are then glued together
> > in this order.
> >
> > I am attaching disk seeking with and without -freorder-blocks-and-partition 
> > (with your patch).
> >
> > In 2.pdf you can see two increasing sequences in the text segment.  If I am 
> > not mistaken the bottom
> > one comes for hot and the top one for normal section.  The big unused part 
> > on bottom is unlikely
> > section since most of gimp is not trained.
> 
> 2.pdf is reordered with Martin's technique?
> 
> >
> > Now 1.pdf is with -freorder-blocks-and-partition and your patch.  You can 
> > see there is third sequence
> > near bottom of the text seciton. that is beggining of unlikely section, so 
> > it tracks jumps where we
> > fall into cold section of function.
> 
> 1.pdf is generated using the usual FDO +
> -freorder-blocks-and-partition (i.e. not using Martin's technique)?

2.pdf is Martin's reordering (that works ortoghonally to what we already have -
it just orders the functions inside idividual subsections. This make the
subsections more visible than without his patch).
1.pdf is Marting's rerodering + yours patch (I asked him to double check it is
the latest verision) + -freorder-blocks-and-partition.

He simply trains and measures the gimp startup, nothing else, so there should 
not
be problem with representativity of the data.
> 
> >
> > It still seems rather bad (i.e. good part of unlikely section is actually 
> > used).  I think the dominator
> > based approach is not going to work too reliably (I can "fix" my testcase 
> > to contain multiple nested
> > conditionals and then the heuristic about predecestors won't help).
> 
> Yes, this doesn't look good. Did you use the latest version of my
> patch that doesn't walk the dominators?
> 
> Do you know how many training runs are done for this benchmark? I
> think a lot of the issues that you pointed out with the hot loop
> preceded by non-looping conditional code as in your earlier example,
> or multiple nested conditionals, comes from the fact that the cold
> cutoff is not 0, but some number less than the number of training
> runs. Perhaps the cutoff for splitting should be 0. Then the main
> issue that needs to be corrected is profile insanities, not code that
> is executed once (since that would not be marked cold).

Hmm, compute_function_frequency uses probably_never_executed_bb_p that requires
the count of basic block to be less than number of train runs.  In Martin's
setup that will be 0.

This is the same as what -frerorder-block-of-partition does?
> 
> The only other issue that I can think of here is that the training
> data was not representative and didn't execute these blocks.
> 
> >
> > What about simply walking the CFG from entry through all edges with non-0 
> > counts and making all reachable
> > blocks hot + forcingly make any hot blocks not reachable this way reachable?
> 
> Is this different than what I currently have + changing the cold
> cutoff to 0? In that case any blocks reachable through non-0 edges
> should be non-0 and marked hot, and the current patch forces the most
> frequent paths to all hot blocks to be hot.

Do we sanity check that the cold partition does not contain any blocks of
count 0?  It may be that the profile is broken enough to make partitioning
not work.
I can think of inlining where the count gets scaled all way down to 0.  Perhaps
count scaling code can be modified to never round towards 0 for block executing
non-0 times...

Honza
> 
> Thanks,
> Teresa
> 
> > I think we are really looking primarily for dead parts of the functions 
> > (sanity checks/error handling)
> > that should not be visited by train run.  We can then see how to make the 
> > heuristic more aggressive?
> >
> > Honza
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb

2013-08-09 Thread Julian Brown
On Thu, 8 Aug 2013 15:44:17 +0100
Kyrylo Tkachov  wrote:

> Hi all,
> 
> The recently added gcc.target/arm/pr58041.c test exposed a bug in the
> backend. When compiling for NEON and with -mno-unaligned-access we
> end up generating the vld1.64 and vst1.64 instructions instead of
> doing the accesses one byte at a time like -mno-unaligned-access
> expects. This patch fixes that by enabling the NEON expander and
> insns that produce these instructions only when unaligned accesses
> are allowed.
> 
> Bootstrapped on arm-linux-gnueabihf. Tested arm-none-eabi on qemu.
> 
> Ok for trunk and 4.8?

I'm not sure if this is right, FWIW -- do the instructions in question
trap if the CPU is set to disallow unaligned accesses? I thought that
control bit only affected ARM core loads & stores, not NEON ones.

Not to say the test case you mention isn't broken anyway, for some
other reason -- but I don't think disabling NEON movmisalign
for !unaligned_access is the right way to fix it.

HTH,

Julian


RE: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb

2013-08-09 Thread Kyrylo Tkachov
Hi Julian,

> > The recently added gcc.target/arm/pr58041.c test exposed a bug in the
> > backend. When compiling for NEON and with -mno-unaligned-access we
> > end up generating the vld1.64 and vst1.64 instructions instead of
> > doing the accesses one byte at a time like -mno-unaligned-access
> > expects. This patch fixes that by enabling the NEON expander and
> > insns that produce these instructions only when unaligned accesses
> > are allowed.
> >
> > Bootstrapped on arm-linux-gnueabihf. Tested arm-none-eabi on qemu.
> >
> > Ok for trunk and 4.8?
> 
> I'm not sure if this is right, FWIW -- do the instructions in question
> trap if the CPU is set to disallow unaligned accesses? I thought that
> control bit only affected ARM core loads & stores, not NEON ones.

Looking at the architecture reference, the SCTLR.A bit also affects the NEON
instructions. When it's set to 1, they produce an alignment fault, the same as
all the other load/store instructions.

Also, reading the gcc documentation for the -mno-unaligned-access option, it
says:

"If unaligned access is not enabled then words in packed
data structures will be accessed a byte at a time"

So using vld1.64 and vst1.64 is definitely against that, since they access 64
bits at a time.
 

> Not to say the test case you mention isn't broken anyway, for some
> other reason -- but I don't think disabling NEON movmisalign
> for !unaligned_access is the right way to fix it.
> 
> HTH,

Thanks,
Kyrill

> 
> Julian






Re: [C++ RFC / Patch] PR 54080, PR 52875 and more (aka SFINAE vs template recursion depth)

2013-08-09 Thread Gabriel Dos Reis
On Fri, Aug 9, 2013 at 4:28 AM, Paolo Carlini  wrote:
> Hi,
>
>
> On 08/09/2013 10:46 AM, Gabriel Dos Reis wrote:
>>
>> I think we should find ways to have the pretty printer in the diagnostic
>> framework stop trying to redo most of the work done by the type checker. In
>> its current form, that is fragile. -- Gaby
>
> Yeah. That tsubst (..., tf_none, ...) from dump_template_bindings, always
> seemed a little weird to me. Fact is, we have got quite a few serious
> diagnostic bugs where we print *very little* sensible before the Error
> reporting routines re-entered message. Like 54080 and 52875, but I'm sure
> there are more.
>
> Do you have any tips about a reasonable way to handle the latter in the
> short term?

I arrived on the west coast with little sleep, and I can't find sleep just right
now (and you may argue reading gcc list isn't a good way to find sleep
either); let
me think about it more.

-- Gaby


Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C

2013-08-09 Thread Aldy Hernandez




--- gcc/expr.c
+++ gcc/expr.c
@@ -9569,6 +9569,21 @@ expand_expr_real_1 (tree exp, rtx target, enum

machine_mode tmode,

}

return expand_constructor (exp, target, modifier, false);
+case INDIRECT_REF:
+  {
+   tree exp1 = TREE_OPERAND (exp, 0);
+   if (modifier != EXPAND_WRITE)
+ {
+   tree t = fold_read_from_constant_string (exp);
+   if (t)
+ return expand_expr (t, target, tmode, modifier);
+ }
+   op0 = expand_expr (exp1, NULL_RTX, VOIDmode, EXPAND_SUM);
+   op0 = memory_address (mode, op0);
+   temp = gen_rtx_MEM (mode, op0);
+   set_mem_attributes (temp, exp, 0);
+   return temp;
+  }


Ughhh, what's the rationale for this?  Are generic changes to
expand_expr really needed?


Yes, I am expanding some variables of type "ptr->value" and those are emitted 
as INDIRECT_REF.


The fact that you are getting an INDIRECT_REF this late in the game is 
suspect.


Are you building with ENABLE_CHECKING, because it seems this should have 
been caught.  See the places in tree-cfg.c with this:


case INDIRECT_REF:
  error ("INDIRECT_REF in gimple IL");
  return t;





+  /* During LTO, the is_cilk_function flag gets cleared.
+ If __cilkrts_pop_frame is called, then this definitely must be a
+ cilk function.  */
+  if (cfun)
+cfun->is_cilk_function = 1;


I don't know much about our LTO implementation, but perhaps you need to
teach LTO to stream this bit in/out?  And of course, an accompanying LTO
test to check for this problem you're encountering would be appropriate.



I also have a limited knowledge of LTO. This seem to be the most 
straightforward way of doing it (atleast for me).


See how other bits in `struct function' are streamed in/out in LTO, for 
example in output_struct_function_base()


  bp_pack_value (&bp, fn->calls_alloca, 1);
  bp_pack_value (&bp, fn->calls_setjmp, 1);
  bp_pack_value (&bp, fn->va_list_fpr_size, 8);
  bp_pack_value (&bp, fn->va_list_gpr_size, 8);

and the corresponding in input_struct_function_base():

  fn->calls_alloca = bp_unpack_value (&bp, 1);
  fn->calls_setjmp = bp_unpack_value (&bp, 1);
  fn->va_list_fpr_size = bp_unpack_value (&bp, 8);
  fn->va_list_gpr_size = bp_unpack_value (&bp, 8);

Also, you will need a testcase to make sure later changes to the 
compiler do not break LTO wrt Cilk features you have added.





+   /* We need frame pointer for all Cilk Plus functions that uses
+ Cilk Keywords.  */
+   || (flag_enable_cilkplus && cfun->is_cilk_function)


"need a frame pointer"

"that use"

s/Keywords/keywords/



It should be keywords, because you need frame-pointer for "_Cilk_spawn and _Cilk_sync" 
and "_Cilk_for"


I meant that you should lowercase the "K".




+  /* This variable will tell whether we are on a spawn helper or not */
+  unsigned int is_cilk_helper_function : 1;


Where is this used?



Well, it is not used now but later on when I add Tools support it will be. I 
will remove it for now.


Yes, please.


--- gcc/ipa-inline-analysis.c
+++ gcc/ipa-inline-analysis.c
@@ -1433,6 +1433,9 @@ initialize_inline_failed (struct cgraph_edge *e)
  e->inline_failed = CIF_REDEFINED_EXTERN_INLINE;
else if (e->call_stmt_cannot_inline_p)
  e->inline_failed = CIF_MISMATCHED_ARGUMENTS;
+  else if (flag_enable_cilkplus && cfun && cfun->calls_spawn)
+/* We can't inline if the function is spawing a function.  */
+e->inline_failed = CIF_BODY_NOT_AVAILABLE;


Hmmm, if we don't have a cfun, perhaps we should be sticking this
calls_spawn bit in the cgraph node.

Richard?  Anyone?



When I am first setting this, I don't think cgraph is available.


See Richard's comment with regards to struct function and its 
availability via the callee edge.  Also see his comment regarding the 
inappropriate error message.



@@ -3496,7 +3510,7 @@ struct GTY(()) tree_function_decl {
   ???  The bitfield needs to be able to hold all target function
  codes as well.  */
ENUM_BITFIELD(built_in_function) function_code : 11;
-  ENUM_BITFIELD(built_in_class) built_in_class : 2;
+  ENUM_BITFIELD(built_in_class) built_in_class  ;


What's this for?



Added a new enum field called BUILT_IN_CILK so we need 3 bits instead of 2, 
since there are 5 fields instead of 4.


Hmm, yeah.  I see you added another field here:


diff --git a/gcc/tree.h b/gcc/tree.h
index 0058a4b..952362f 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -262,6 +262,7 @@ enum built_in_class
   NOT_BUILT_IN = 0,
   BUILT_IN_FRONTEND,
   BUILT_IN_MD,
+  BUILT_IN_CILK,
   BUILT_IN_NORMAL
 };


If you look at the comment above enum built_in_class, you will see that 
these classes specify which part of the compiler created the built-in 
(the frontend, the backend (MD), or a normal builtin).  I don't see how 
Cilk should be treated specially.  And even so, I don't see how you use 
this BUILT_IN_CILK class anywhere.



+  if (DECL_BUILT_IN_CLASS (exp) == BUILT_IN_NO

Re: [RFC Patch, Aarch64] : Macros for profile code generation to enable gprof support

2013-08-09 Thread Venkataramanan Kumar
ping!

On 3 August 2013 23:31, Venkataramanan Kumar
 wrote:
> Hi Maintainers,
>
> This patch adds macros to support gprof in Aarch64. The difference
> from the previous patch is that the compiler, while generating
> "mcount" routine for an instrumented function, also passes the return
> address as argument.
>
> The "mcount" routine in glibc will be modified as follows.
>
> (-Snip-)
>  #define MCOUNT \
> -void __mcount (void) 
> \
> +void __mcount (void* frompc)
>\
>  {
> \
> -  mcount_internal ((u_long) RETURN_ADDRESS (1), (u_long) RETURN_ADDRESS 
> (0)); \
> +  mcount_internal ((u_long) frompc, (u_long) RETURN_ADDRESS (0)); \
>  }
> (-Snip-)
>
> Also in Aarch64 cases__builtin_return_adderess(n) where n>0, still be
> returning 0 as it was doing before.
>
> If this is Ok I will send the patch to glibc as well.
>
> 2013-08-02  Venkataramanan Kumar  
>
>  * config/aarch64/aarch64.h (MCOUNT_NAME): Define.
>(NO_PROFILE_COUNTERS): Likewise.
>(PROFILE_HOOK): Likewise.
>(FUNCTION_PROFILER): Likewise.
> *  config/aarch64/aarch64.c (aarch64_function_profiler): Remove.
>.
>
> regards,
> Venkat.


Re: [C++ Patch / RFC] PR 46206

2013-08-09 Thread Dominique Dhumieres
On x86_64-apple-darwin10, g++.dg/lookup/typedef2.C fails with

FAIL: g++.dg/lookup/typedef2.C -std=c++11 (test for excess errors)
Excess errors:
/opt/gcc/work/gcc/testsuite/g++.dg/lookup/typedef2.C:8:12: error: using 
typedef-name 'Foo1::Bar' after 'struct'
/opt/gcc/work/gcc/testsuite/g++.dg/lookup/typedef2.C:8:19: error: invalid type 
in declaration before ';' token

Dominique


[c-family] Minor tweak to -fdump-ada-spec

2013-08-09 Thread Eric Botcazou
This fixes a segfault on specific C++ code using  when -fdump-ada-spec 
is passed to the compiler.

Tested on x86_64-suse-linux, applied on the mainline and 4.8 branch as obvious.


2013-08-09  Arnaud Charlet  

* c-ada-spec.c (print_ada_declaration): Prevent accessing null asm name


-- 
Eric Botcazou
Index: c-ada-spec.c
===
--- c-ada-spec.c	(revision 201592)
+++ c-ada-spec.c	(working copy)
@@ -2900,7 +2900,7 @@ print_ada_declaration (pretty_printer *b
   pp_string (buffer, "  -- ");
   dump_sloc (buffer, t);
 
-  if (is_abstract)
+  if (is_abstract || !DECL_ASSEMBLER_NAME (t))
 	return 1;
 
   newline_and_indent (buffer, spc);


Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb

2013-08-09 Thread Ramana Radhakrishnan

On 08/09/13 11:01, Julian Brown wrote:

On Thu, 8 Aug 2013 15:44:17 +0100
Kyrylo Tkachov  wrote:


Hi all,

The recently added gcc.target/arm/pr58041.c test exposed a bug in the
backend. When compiling for NEON and with -mno-unaligned-access we
end up generating the vld1.64 and vst1.64 instructions instead of
doing the accesses one byte at a time like -mno-unaligned-access
expects. This patch fixes that by enabling the NEON expander and
insns that produce these instructions only when unaligned accesses
are allowed.

Bootstrapped on arm-linux-gnueabihf. Tested arm-none-eabi on qemu.

Ok for trunk and 4.8?


I'm not sure if this is right, FWIW -- do the instructions in question
trap if the CPU is set to disallow unaligned accesses? I thought that
control bit only affected ARM core loads & stores, not NEON ones.


Thinking again - the ARM-ARM says - the alignment check is for element 
size, so an alternative might be to use vld1.8 instead to allow for this 
at which point we might as well do something else with the test. I note 
that these patterns are not allowed for BYTES_BIG_ENDIAN so that might 
be a better alternative than completely disabling it.



While conservative the current fix is not incorrect.

regards
Ramana



HTH,

Julian






Re: [C++ Patch / RFC] PR 46206

2013-08-09 Thread Paolo Carlini

On 08/09/2013 12:52 PM, domi...@lps.ens.fr wrote:

On x86_64-apple-darwin10, g++.dg/lookup/typedef2.C fails with

FAIL: g++.dg/lookup/typedef2.C -std=c++11 (test for excess errors)
Excess errors:
/opt/gcc/work/gcc/testsuite/g++.dg/lookup/typedef2.C:8:12: error: using 
typedef-name 'Foo1::Bar' after 'struct'
/opt/gcc/work/gcc/testsuite/g++.dg/lookup/typedef2.C:8:19: error: invalid type 
in declaration before ';' token
As I said, the patch solves the issue on -linux, not just x86_64-linux, 
but Linux more generally. Honestly, I don't know how much it will take 
me to figure out a way to extend it to other targets, unless somebody 
concretely helps.


If you want me to revert it, just say it, done.

Paolo.


Re: [C++ Patch / RFC] PR 46206

2013-08-09 Thread Iain Sandoe

On 9 Aug 2013, at 12:12, Paolo Carlini wrote:

> On 08/09/2013 12:52 PM, domi...@lps.ens.fr wrote:
>> On x86_64-apple-darwin10, g++.dg/lookup/typedef2.C fails with
>> 
>> FAIL: g++.dg/lookup/typedef2.C -std=c++11 (test for excess errors)
>> Excess errors:
>> /opt/gcc/work/gcc/testsuite/g++.dg/lookup/typedef2.C:8:12: error: using 
>> typedef-name 'Foo1::Bar' after 'struct'
>> /opt/gcc/work/gcc/testsuite/g++.dg/lookup/typedef2.C:8:19: error: invalid 
>> type in declaration before ';' token
> As I said, the patch solves the issue on -linux, not just x86_64-linux, but 
> Linux more generally. Honestly, I don't know how much it will take me to 
> figure out a way to extend it to other targets, unless somebody concretely 
> helps.

what output/analysis would help you in the short term?
(maybe looking at the common factors between more than one failing target will 
help to isolate the issue).
Iain



Re: [Patch] Whole regex refactoring and current status

2013-08-09 Thread Rainer Orth
Tim Shen  writes:

> On Fri, Aug 9, 2013 at 2:59 PM, Paolo Carlini  
> wrote:
>> Yes, if, as it should, it works on -m32 too, let's go with this. By the way,
>> you didn't say how exactly you are testing?!? Because just make check
>> doesn't test -m32. I use, something like:
>>
>> make -k check-target-libstdc++-v3
>> RUNTESTFLAGS="--target_board='unix/\{-m32,-m64\}'"
>>
>> for sure there are other ways to obtain the same result.
>
> Committed.
>
> I entered `gcc-build/x86_64-unknown-linux-gnu/32/libstdc++-v3` then
> run `make check`. I see -m32 in test log.

As expected, i386-pc-solaris2.10 testsuite results are back to normal
now.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [C++ Patch / RFC] PR 46206

2013-08-09 Thread Rainer Orth
Iain Sandoe  writes:

> On 9 Aug 2013, at 12:12, Paolo Carlini wrote:
>
>> On 08/09/2013 12:52 PM, domi...@lps.ens.fr wrote:
>>> On x86_64-apple-darwin10, g++.dg/lookup/typedef2.C fails with
>>> 
>>> FAIL: g++.dg/lookup/typedef2.C -std=c++11 (test for excess errors)
>>> Excess errors:
>>> /opt/gcc/work/gcc/testsuite/g++.dg/lookup/typedef2.C:8:12: error: using
>>> typedef-name 'Foo1::Bar' after 'struct'
>>> /opt/gcc/work/gcc/testsuite/g++.dg/lookup/typedef2.C:8:19: error:
>>> invalid type in declaration before ';' token
>> As I said, the patch solves the issue on -linux, not just x86_64-linux,
>> but Linux more generally. Honestly, I don't know how much it will take me
>> to figure out a way to extend it to other targets, unless somebody
>> concretely helps.
>
> what output/analysis would help you in the short term?
> (maybe looking at the common factors between more than one failing target
> will help to isolate the issue).

FWIW, I see a similar failure on i386-pc-solaris2.10:

FAIL: g++.dg/lookup/typedef2.C -std=c++98 (test for excess errors)
Excess errors:
/vol/gcc/src/hg/trunk/local/gcc/testsuite/g++.dg/lookup/typedef2.C:18:12: 
error: using typedef-name 'Foo2::Bar' after 'struct'
/vol/gcc/src/hg/trunk/local/gcc/testsuite/g++.dg/lookup/typedef2.C:18:19: 
error: invalid type in declaration before ';' token

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Fix for PR c/57490

2013-08-09 Thread Rainer Orth
Rainer Orth  writes:

> Rainer Orth  writes:
>
>> "Iyer, Balaji V"  writes:
>>
 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Monday, July 01, 2013 1:09 PM
 To: Iyer, Balaji V
 Cc: gcc-patches@gcc.gnu.org; Rainer Orth
 Subject: Re: [PATCH] Fix for PR c/57490
 
 On Mon, Jul 01, 2013 at 05:02:57PM +, Iyer, Balaji V wrote:
 > OK. The fixed patch is attached. Here are the ChangeLog entries:
 >
 > gcc/cp/ChangeLog
 > 2013-07-01  Balaji V. Iyer  
 >
 
 Still
PR c/57490
 hasn't been added to cp/ChangeLog and c/ChangeLog entries.
 > --- /dev/null
 > +++ b/gcc/testsuite/c-c++-common/cilk-plus/AN/pr57490.c
 > @@ -0,0 +1,25 @@
 
>>>
>>> Fixed as you suggested. Here is the fixed Changelogs and patch is attached.
>>>
>>> gcc/cp/ChangeLog
>>> 2013-07-01  Balaji V. Iyer  
>>>
>>> PR c/57490
>>> * cp-array-notation.c (cp_expand_cond_array_notations): Added a
>>> check for truth values.
>>> (expand_array_notation_exprs): Added truth values case.  Removed an
>>> unwanted else.  Added for-loop to walk through subtrees in default
>>> case.
>>>
>>> gcc/c/ChangeLog
>>> 2013-07-01  Balaji V. Iyer  
>>>
>>> PR c/57490
>>> * c-array-notation.c (fix_conditional_array_notations_1): Added a
>>> check for truth values.
>>> (expand_array_notation_exprs): Added truth values case.  Removed an
>>> unwanted else.  Added for-loop to walk through subtrees in default
>>> case.
>>>
>>>
>>> gcc/testsuite/ChangeLog
>>> 2013-07-01  Balaji V. Iyer  
>>>
>>> PR c/57490
>>> * c-c++-common/cilk-plus/AN/pr57490.c: New test.
>>
>> I've just tested this patch on i386-pc-solaris2.10:
>>
>> The c-c++-common/cilk-plus/AN/an-if.c test still FAILs for C++:
>>
>> FAIL: c-c++-common/cilk-plus/AN/an-if.c  -fcilkplus (internal compiler error)
>> FAIL: c-c++-common/cilk-plus/AN/an-if.c  -fcilkplus (test for excess errors)
[...]
> This is still unfixed almost three weeks later.  Balaji, could you
> please have a look?

This bug is now unfixed for two months, and no reaction whatsoever on
the report.  This is getting annoying since it generates large amount of
testsuite noise.

Please fix ASAP!

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [C++ Patch / RFC] PR 46206

2013-08-09 Thread Paolo Carlini

.. reverted.

Paolo.


Speculative call support in the callgraph

2013-08-09 Thread Jan Hubicka
Hi,
this patch adds support for speculative calls into callgraph.  The idea is that
any IPA optimization that believes it knows likely target of an indirect call
(currently I use it for cross-module indirect call profiling, but I expect
Martin J. can easily add support for ipa-cp and I hope to add speculative
devirtualization in foreseeable future since it should make difference for
Firefox). Speculative call replaces indirect call

call_target ()

by:

if (call_target == expected_fn)
  expected_fn ();
else
  call_target ();

We already know how to produce these by gimple_ic, however the idea is to not
modify the body immediately (IPA passes can not do that), but do that only when
we are applying ipa transformations.

Every speculative call is represented by three components attached
to the same call statement (the original indirect call):
 1) a direct call (to expected_fn)
 2) an indirect call (to call_target)
 3) a IPA_REF_ADDR refrence to expected_fn.

Each of those can be found by having "speculative" flag set.

This representation allows IPA passes to be smart about speculative calls.
1) the direct calls can be inlined or redirected to cprop clones just as any
   ordinary calls.  The reference still links to the original call target making
   it possible to later expand the test.
2) When indirect call target is found by ipa propagation, the call can be 
turned back
   to a direct call. Either by using the exisitng speculative direct call (if
   speculation was right) and keeping optimizations already attached to it
   or by turning the indirect call into new target if speculation was wrong.
   (this is done by resolve_speuclative_call)
3) at the end of IPA optimization we may get rid of speuclative calls that
   don't seem to buy anything.  I think that on most modern cores the
   speculative sequence is actually worse than the indirect call when
   the expected_fn call stays uninlined or unpropagated.
   (not implemented yet)
4) inliner can adjust its metric knowing that inlining speuclative call
   is costier (disabling 3) (also not implemented yet).

This patch implements the infrastructure only. I will send the corss-module
indirect call profiling code in followup patch.

I hoped this all to go quite smoothly into exisitng WHOPR infrastructure that
juding from the size of patch is not true.  I had to deal with some issues:

1) I am not linking the three components of calls explicitely together. Instead
   I rely on statement pointers in both edges&refs to allow later lookup of
   the components (by cgraph_speculative_call_info).

   ipa-reference call statements are however not streamed by LTO, so I had to
   implement the streaming. (hitting some surprises handled by preparation
   patches)
2) IPA-refs are not really maintained through versioning and inlining that is
   needed now for speculative expansion to work.  Currently we update only 
   callgraph edges and we rebuild ipa-refs (and wrongly so not updating
   clones that is harmless since they are used only for unreachable function
   removal).

   I solved this partly by updating speculative calls only (since I can use
   the existing callgraph edge path).
   tree-inline got fat&ugly, I need to push higher the plans to clean it up.
3) cgraph_edge statement hash expect only one call per statement.  I fixed
   this by always recording the direct speculative call only.
4) ipa-prop and ipa-cp was not expecting the function to turn indirect call
   to direct call to return different edge that is passed to it (that happens
   when speculation was right).  It also accesses indirect_call_info in
   edges that are already direct.
   I fixed these and added code to explicitely free the indirect call info
   when they are turned to direct edges. Until now it stayed dormant
   in memory.
5) Indirect call expansion breaks basic blocks. Currently it happens during 
   call redirection performed by inliner.  It needs to be moved later in
   tree-inline after the body is fully duplicated or we ICE on trying to fixup
   not yet finished loop structure in split_bb.
6) ipa-inline was not ready for callgraph edges to be removed as result
   of inlining.  This was longer time on my TODO, so I fixed it by adding
   the missing edge removal hook.

Bootstrapped/regtested x86_64-linux, will commit it later today. Comments 
welcome.

Honza
* cgraphbuild.c (cgraph_rebuild_references): Rebuild only 
non-speculative
refs.
* cgraph.c (cgraph_update_edge_in_call_site_hash): New function.
(cgraph_add_edge_to_call_site_hash): Deal with speculative calls.
(cgraph_set_call_stmt): Likewise.
(cgraph_create_edge_1): Fix release checking compilatoin;
clear lto_stmt_uid.
(cgraph_free_edge): Free indirect info.
(cgraph_turn_edge_to_speculative): New function.
(cgraph_speculative_call_info): New function.
(cgraph_make_edge_direct): Return direct edge; handle speculation.
   

Re: [PATCH, i386, MPX 1/X] Support of Intel MPX ISA

2013-08-09 Thread Ilya Enkovich
2013/8/8 Joseph S. Myers :
> On Thu, 8 Aug 2013, Ilya Enkovich wrote:
>
>> > That is not a big issue to rename generic names. But I'm just still
>> > trying to choose proper names. I looked into -fbounds-check but its
>> > description already mention C/C++ and its semantics differs from what
>> > new instrumentation does. I consider using -fcheck=pointer (currently
>> > valid for Fortran) and 'chkp' instead of 'mpx' for generic things.
>> > Does it look OK?
>> I just realized that usage of option which is already defined for
>> other languages may be problematic when this option is passed to
>> MULTILIB_OPTIONS. So, probably, new common option -fcheck-pointers?
>
> Seems reasonable to me.
>
>> > I made an attempt to use multilibs instead. I tried to add mpx variant
>> > to target libraries build but got fail for libgfortran build. Does
>> > multilib support partial library rebuild? Actually I do not need
>> > libgfortran library (an many other libraries) to be in mpx version. Is
>> > it possible to get some libs from one place and some libs from another
>> > place?
>
> I'm not sure why the libgfortran build would have failed ... maybe some
> libraries don't in fact do anything with pointers for which the checks
> would help, but if so then I'd expect the option simply not to have any
> effect on the code generated for those libraries.  Multilibs are expected
> to be the same for all libraries (but packagers could no doubt optimize
> things in their packages, if in fact some libraries are identical when
> built both with and without MPX).

The reason for libgfortran build was trivial - -fmpx usage combined
with -Werror. This is easily fixed and seems the only problem now is
to allow legacy library usage with MPX binary (see
http://gcc.gnu.org/ml/gcc/2013-08/msg00114.html).

Thanks,
Ilya
>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH] Fix PR48493

2013-08-09 Thread Richard Biener
On 8/9/13 1:29 AM, Mike Stump wrote:
> In the below, the test case tries to write to the stack outside the bounds of 
> the s variable?  I can't imagine any good coming from this, and indeed, would 
> be nice for the compiler to complain about such code.  If S had a few more 
> bytes at the end, at least the code would not be wildly bad.
> 
> Thoughts?

It's not that easy - you need to make sure the variable is promoted to a
register, too.  Feel free to experiment but make sure the testcase
fails in the same way before the patch went in.

Richard.

> On May 31, 2012, at 3:59 AM, Richard Guenther  wrote:
>> This fixes PR48493 by backporting a one-liner - we should not go
>> the movmisalign path for destinations that are not memory.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu and on
>> mips by Andrew, installed.
>>
>> Richard.
>>
>> 2012-05-31  Richard Guenther  
>>
>>  PR middle-end/48493
>>  * expr.c (expand_assignment): Do not use movmisalign on
>>  non-memory.
>>
>>  * gcc.dg/torture/pr48493.c: New testcase.
> 
>> Index: gcc/testsuite/gcc.dg/torture/pr48493.c
>> ===
>> *** gcc/testsuite/gcc.dg/torture/pr48493.c   (revision 0)
>> --- gcc/testsuite/gcc.dg/torture/pr48493.c   (revision 0)
>> ***
>> *** 0 
>> --- 1,18 
>> + /* { dg-do compile } */
>> + 
>> + typedef long long T __attribute__((may_alias, aligned (1)));
>> + 
>> + struct S
>> + {
>> +   _Complex float d __attribute__((aligned (8)));
>> + };
>> + 
>> + void bar (struct S);
>> + 
>> + void
>> + f1 (T x)
>> + {
>> +   struct S s;
>> +   *(T *) ((char *) &s.d + 1) = x;
>> +   bar (s);
>> + }
> 



[PATCH] Fix PR57980

2013-08-09 Thread Marek Polacek
In this PR the problem was that when dealing with the gimple assign in
the tailcall optimization, we, when the rhs operand is of a vector
type, need to create -1 also of a vector type, but build_int_cst
doesn't create vectors (ICEs).  Instead, we should use build_minus_one_cst
because that can create even the VECTOR_TYPE constant (and, it can
create even REAL_TYPE/COMPLEX_TYPE), as suggested by Marc.

Regtested/bootstrapped on x86_64-linux, ok for trunk and 4.8?

2013-08-09  Marek Polacek  
Marc Glisse  

PR tree-optimization/57980
* tree-tailcall.c (process_assignment): Call build_minus_one_cst
when creating -1 constant.

* gcc.dg/pr57980.c: New test.

--- gcc/tree-tailcall.c.mp  2013-08-09 16:07:55.778616996 +0200
+++ gcc/tree-tailcall.c 2013-08-09 16:08:00.068635823 +0200
@@ -326,11 +326,7 @@ process_assignment (gimple stmt, gimple_
   return true;
 
 case NEGATE_EXPR:
-  if (FLOAT_TYPE_P (TREE_TYPE (op0)))
-*m = build_real (TREE_TYPE (op0), dconstm1);
-  else
-*m = build_int_cst (TREE_TYPE (op0), -1);
-
+  *m = build_minus_one_cst (TREE_TYPE (op0));
   *ass_var = dest;
   return true;
 
@@ -339,11 +335,7 @@ process_assignment (gimple stmt, gimple_
 *a = fold_build1 (NEGATE_EXPR, TREE_TYPE (non_ass_var), non_ass_var);
   else
 {
-  if (FLOAT_TYPE_P (TREE_TYPE (non_ass_var)))
-*m = build_real (TREE_TYPE (non_ass_var), dconstm1);
-  else
-*m = build_int_cst (TREE_TYPE (non_ass_var), -1);
-
+ *m = build_minus_one_cst (TREE_TYPE (non_ass_var));
   *a = fold_build1 (NEGATE_EXPR, TREE_TYPE (non_ass_var), non_ass_var);
 }
 
--- gcc/testsuite/gcc.dg/pr57980.c.mp   2013-08-09 16:07:25.967485356 +0200
+++ gcc/testsuite/gcc.dg/pr57980.c  2013-08-09 16:07:17.967450526 +0200
@@ -0,0 +1,19 @@
+/* PR tree-optimization/57980 */
+/* { dg-do compile } */
+/* { dg-options "-O -foptimize-sibling-calls" } */
+
+typedef int V __attribute__ ((vector_size (sizeof (int;
+extern V f (void);
+
+V
+bar (void)
+{
+  return -f ();
+}
+
+V
+foo (void)
+{
+  V v = { };
+  return v - f ();
+}

Marek


RE: [PATCH] Fix for PR c/57490

2013-08-09 Thread Iyer, Balaji V


> -Original Message-
> From: Rainer Orth [mailto:r...@cebitec.uni-bielefeld.de]
> Sent: Friday, August 09, 2013 7:54 AM
> To: Iyer, Balaji V
> Cc: Jakub Jelinek; gcc-patches@gcc.gnu.org; Marek Polacek
> (pola...@redhat.com)
> Subject: Re: [PATCH] Fix for PR c/57490
> 
> Rainer Orth  writes:
> 
> > Rainer Orth  writes:
> >
> >> "Iyer, Balaji V"  writes:
> >>
>  -Original Message-
>  From: Jakub Jelinek [mailto:ja...@redhat.com]
>  Sent: Monday, July 01, 2013 1:09 PM
>  To: Iyer, Balaji V
>  Cc: gcc-patches@gcc.gnu.org; Rainer Orth
>  Subject: Re: [PATCH] Fix for PR c/57490
> 
>  On Mon, Jul 01, 2013 at 05:02:57PM +, Iyer, Balaji V wrote:
>  > OK. The fixed patch is attached. Here are the ChangeLog entries:
>  >
>  > gcc/cp/ChangeLog
>  > 2013-07-01  Balaji V. Iyer  
>  >
> 
>  Still
>   PR c/57490
>  hasn't been added to cp/ChangeLog and c/ChangeLog entries.
>  > --- /dev/null
>  > +++ b/gcc/testsuite/c-c++-common/cilk-plus/AN/pr57490.c
>  > @@ -0,0 +1,25 @@
> 
> >>>
> >>> Fixed as you suggested. Here is the fixed Changelogs and patch is 
> >>> attached.
> >>>
> >>> gcc/cp/ChangeLog
> >>> 2013-07-01  Balaji V. Iyer  
> >>>
> >>> PR c/57490
> >>> * cp-array-notation.c (cp_expand_cond_array_notations): Added a
> >>> check for truth values.
> >>> (expand_array_notation_exprs): Added truth values case.  Removed 
> >>> an
> >>> unwanted else.  Added for-loop to walk through subtrees in default
> >>> case.
> >>>
> >>> gcc/c/ChangeLog
> >>> 2013-07-01  Balaji V. Iyer  
> >>>
> >>> PR c/57490
> >>> * c-array-notation.c (fix_conditional_array_notations_1): Added a
> >>> check for truth values.
> >>> (expand_array_notation_exprs): Added truth values case.  Removed 
> >>> an
> >>> unwanted else.  Added for-loop to walk through subtrees in default
> >>> case.
> >>>
> >>>
> >>> gcc/testsuite/ChangeLog
> >>> 2013-07-01  Balaji V. Iyer  
> >>>
> >>> PR c/57490
> >>> * c-c++-common/cilk-plus/AN/pr57490.c: New test.
> >>
> >> I've just tested this patch on i386-pc-solaris2.10:
> >>
> >> The c-c++-common/cilk-plus/AN/an-if.c test still FAILs for C++:
> >>
> >> FAIL: c-c++-common/cilk-plus/AN/an-if.c  -fcilkplus (internal
> >> compiler error)
> >> FAIL: c-c++-common/cilk-plus/AN/an-if.c  -fcilkplus (test for excess
> >> errors)
> [...]
> > This is still unfixed almost three weeks later.  Balaji, could you
> > please have a look?
> 
> This bug is now unfixed for two months, and no reaction whatsoever on the
> report.  This is getting annoying since it generates large amount of testsuite
> noise.
> 
> Please fix ASAP!
> 

Ok. I will get on this this Monday. I am away from office today and this 
weekend.

-Balaji V. Iyer.

>   Rainer
> 
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-09 Thread Teresa Johnson
On Fri, Aug 9, 2013 at 2:58 AM, Jan Hubicka  wrote:
>> On Thu, Aug 8, 2013 at 3:23 PM, Jan Hubicka  wrote:
>> > Hi,
>> > Martin Liska was kind enough to generate disk seeking graph of gimp 
>> > statrup with his function reordering.
>> > His code simply measures time of firest execution of a function and orders 
>> > functions in the given order.
>> > The functions stay in the subsections (unlikely/startup/exit/hot/normal) 
>> > that are then glued together
>> > in this order.
>> >
>> > I am attaching disk seeking with and without 
>> > -freorder-blocks-and-partition (with your patch).
>> >
>> > In 2.pdf you can see two increasing sequences in the text segment.  If I 
>> > am not mistaken the bottom
>> > one comes for hot and the top one for normal section.  The big unused part 
>> > on bottom is unlikely
>> > section since most of gimp is not trained.
>>
>> 2.pdf is reordered with Martin's technique?
>>
>> >
>> > Now 1.pdf is with -freorder-blocks-and-partition and your patch.  You can 
>> > see there is third sequence
>> > near bottom of the text seciton. that is beggining of unlikely section, so 
>> > it tracks jumps where we
>> > fall into cold section of function.
>>
>> 1.pdf is generated using the usual FDO +
>> -freorder-blocks-and-partition (i.e. not using Martin's technique)?
>
> 2.pdf is Martin's reordering (that works ortoghonally to what we already have 
> -
> it just orders the functions inside idividual subsections. This make the
> subsections more visible than without his patch).
> 1.pdf is Marting's rerodering + yours patch (I asked him to double check it is
> the latest verision) + -freorder-blocks-and-partition.
>
> He simply trains and measures the gimp startup, nothing else, so there should 
> not
> be problem with representativity of the data.

Ok, so a single training run, and it is essentially the same as what
is being used to create the graph after optimization.

>>
>> >
>> > It still seems rather bad (i.e. good part of unlikely section is actually 
>> > used).  I think the dominator
>> > based approach is not going to work too reliably (I can "fix" my testcase 
>> > to contain multiple nested
>> > conditionals and then the heuristic about predecestors won't help).
>>
>> Yes, this doesn't look good. Did you use the latest version of my
>> patch that doesn't walk the dominators?
>>
>> Do you know how many training runs are done for this benchmark? I
>> think a lot of the issues that you pointed out with the hot loop
>> preceded by non-looping conditional code as in your earlier example,
>> or multiple nested conditionals, comes from the fact that the cold
>> cutoff is not 0, but some number less than the number of training
>> runs. Perhaps the cutoff for splitting should be 0. Then the main
>> issue that needs to be corrected is profile insanities, not code that
>> is executed once (since that would not be marked cold).
>
> Hmm, compute_function_frequency uses probably_never_executed_bb_p that 
> requires
> the count of basic block to be less than number of train runs.  In Martin's
> setup that will be 0.
>
> This is the same as what -frerorder-block-of-partition does?

Right, it simply puts blocks that are probably_never_executed_bb_p
into the cold section. But it sounds like that is not an issue here
since Martin is doing a single training run so the cutoff is
essentially 0.

>>
>> The only other issue that I can think of here is that the training
>> data was not representative and didn't execute these blocks.
>>
>> >
>> > What about simply walking the CFG from entry through all edges with non-0 
>> > counts and making all reachable
>> > blocks hot + forcingly make any hot blocks not reachable this way 
>> > reachable?
>>
>> Is this different than what I currently have + changing the cold
>> cutoff to 0? In that case any blocks reachable through non-0 edges
>> should be non-0 and marked hot, and the current patch forces the most
>> frequent paths to all hot blocks to be hot.
>
> Do we sanity check that the cold partition does not contain any blocks of
> count 0?  It may be that the profile is broken enough to make partitioning
> not work.

Do you mean sanity check that the cold partition does not contain any
blocks of count > 0? (they should all be zero) I don't think that
sanity check is there, but I can try adding that.

The issue with such a sanity check may be due to the later fixup I
have in this patch (fixup_partitions). It is invoked after certain
optimizations on the cfg that may make hot blocks previously reached
by both hot and cold edges only reachable by cold blocks. These blocks
are remarked cold. If the profile data hasn't been updated correctly
it is possible that they would still have a non-0 count, although they
are essentially cold after the cfg transformation.

But certainly such a sanity check should always succeed after the
original partitioning.

> I can think of inlining where the count gets scaled all way down to 0.  
> Perhaps
> count scaling code can be mo

Re: [C++ Patch / RFC] PR 46206

2013-08-09 Thread David Edelsohn
On Fri, Aug 9, 2013 at 7:35 AM, Iain Sandoe  wrote:
>
> On 9 Aug 2013, at 12:12, Paolo Carlini wrote:
>
>> On 08/09/2013 12:52 PM, domi...@lps.ens.fr wrote:
>>> On x86_64-apple-darwin10, g++.dg/lookup/typedef2.C fails with
>>>
>>> FAIL: g++.dg/lookup/typedef2.C -std=c++11 (test for excess errors)
>>> Excess errors:
>>> /opt/gcc/work/gcc/testsuite/g++.dg/lookup/typedef2.C:8:12: error: using 
>>> typedef-name 'Foo1::Bar' after 'struct'
>>> /opt/gcc/work/gcc/testsuite/g++.dg/lookup/typedef2.C:8:19: error: invalid 
>>> type in declaration before ';' token
>> As I said, the patch solves the issue on -linux, not just x86_64-linux, but 
>> Linux more generally. Honestly, I don't know how much it will take me to 
>> figure out a way to extend it to other targets, unless somebody concretely 
>> helps.
>
> what output/analysis would help you in the short term?
> (maybe looking at the common factors between more than one failing target 
> will help to isolate the issue).

Exactly. What is the common factor on AIX, Darwin and Solaris that is
different from Linux? A difference in system types?

How can we help?

Thanks, David


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-09 Thread Jan Hubicka
> > Do we sanity check that the cold partition does not contain any blocks of
> > count 0?  It may be that the profile is broken enough to make partitioning
> > not work.
> 
> Do you mean sanity check that the cold partition does not contain any
> blocks of count > 0? (they should all be zero) I don't think that
> sanity check is there, but I can try adding that.

Thanks, lets start with this - I suppose we need to figure out if
 1) the reachable blocks goes to cold section because partitioning decides
so even if they have non-0 count.
 2) the reachable blocks goes to cold section because they have incorrectly
updated count to 0 by someone
 3) profiling gets some blocks wrong.
> 
> The issue with such a sanity check may be due to the later fixup I
> have in this patch (fixup_partitions). It is invoked after certain
> optimizations on the cfg that may make hot blocks previously reached
> by both hot and cold edges only reachable by cold blocks. These blocks
> are remarked cold. If the profile data hasn't been updated correctly
> it is possible that they would still have a non-0 count, although they
> are essentially cold after the cfg transformation.

Well, or the other posibility is that the edges was updated wrong
and the blocks are really cold.  We need to figure out if that happens
commonly enough.

I will try to think of some artificial testcases.

> 
> But certainly such a sanity check should always succeed after the
> original partitioning.
> 
> > I can think of inlining where the count gets scaled all way down to 0.  
> > Perhaps
> > count scaling code can be modified to never round towards 0 for block 
> > executing
> > non-0 times...
> 
> This reminds me of why this situation could happen. When I have been
> testing this on the google branch I found situations where COMDAT
> routines have 0 profile counts (if I remember correctly, this happens
> when profile-gen binary has call to out-of-line copy of COMDAT in
> module A, linker chooses the out-of-line copy from module B, therefore
> the profile data for COMDAT in module A is 0). When the COMDAT gets
> inlined, the 0 counts on its bbs are scaled to 0, even though the
> callsite is non-zero. I have a patch that I was planning to send as a
> follow-up that handles this case by propagating the callsite bb's
> count to the inlined code when it has 0 counts, scaling by the edge
> frequencies. I can either include that patch in this one, or send it
> for review separately right now. Do you want to give it a try with
> this one to see if it addresses the issue?

This scenario should not happen with LTO setup: the LTO symbol tables contains
code before early optimization and should be identical with profiling or
without (modulo the new references and call from profile code).

But this patch seems useful as a backup solution for non-LTO, so yes, please
send it separately and I can try to double check that it really do not happen
with LTO.
(acutally LTO symtab may just chose COMDAT from module that has counts with it.
It has all the info for it.  I was thinkin about it few weeks back.  It is
bit hard to do - you need to verify that all references from the function are
the same or linking might fail if you overwrite linker's decisiosns).
> 
> Also, can you send me reproduction instructions for gimp? I don't
> think I need Martin's patch, but which version of gimp and what is the
> equivalent way for me to train it? I have some scripts to generate a
> similar type of instruction heat map graph that I have been using to
> tune partitioning and function reordering. Essentially it uses linux
> perf to sample on instructions_retired and then munge the data in
> several ways to produce various stats and graphs. One thing that has
> been useful has been to combine the perf data with nm output to
> determine which cold functions are being executed at runtime.

Martin?

> 
> However, for this to tell me which split cold bbs are being executed I
> need to use a patch that Sri sent for review several months back that
> gives the split cold section its own name:
>   http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html
> Steven had some follow up comments that Sri hasn't had a chance to address 
> yet:
>   http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html
> (cc'ing Sri as we should probably revive this patch soon to address
> gdb and other issues with detecting split functions properly)

Intresting, I used linker script for this purposes, but that his GNU ld only...

Honza
> 
> Thanks!
> Teresa
> 
> >
> > Honza
> >>
> >> Thanks,
> >> Teresa
> >>
> >> > I think we are really looking primarily for dead parts of the functions 
> >> > (sanity checks/error handling)
> >> > that should not be visited by train run.  We can then see how to make 
> >> > the heuristic more aggressive?
> >> >
> >> > Honza
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn.

Re: [Patch, Fortran] gfc_get_code cleanup

2013-08-09 Thread Mikael Morin
Le 06/08/2013 17:12, Janus Weil a écrit :
> Hi all,
> 
> attached is a cleanup patch which concerns the gfc_code structure and
> gfc_get_code function (in st.c). It basically does two things:
> 
> 1) It replaces the many occurrences of "XCNEW (gfc_code)" in class.c
> by "gfc_get_code ()", which internally sets the locus and saves us
> from doing it manually afterward.
> 
> 2) It adds an argument "op" to gfc_get_code to directly set the .op
> component of gfc_code. Every time we set up a new gfc_code structure,
> we certainly want to set its op.
> 
> 2b) There are a few instances where we do not set the op after calling
> gfc_get_code, but instead copy the whole structure from someplace
> else. For those cases I'm using "XCNEW (gfc_code)" now (which also
> avoids setting the locus twice).
> 
> 2c) In one place I'm using "gfc_get_code (EXEC_NOP)" for technical
> reasons, see 'new_level' in parse.c.
> 
> Both items (1) and (2) result in more compact code and save a few
> extra lines (see diffstat below). Regtested on
> x86_64-unknown-linux-gnu. Ok for trunk with a suitable ChangeLog?
> 
Yes, thanks

Mikael


Re: [Patch, Fortran] PR 58058: [4.7/4.8/4.9 Regression] Memory leak with transfer function

2013-08-09 Thread Mikael Morin
Le 08/08/2013 16:54, Janus Weil a écrit :
> ping!
> 
> 2013/8/3 Janus Weil :
>> Hi all,
>>
>> the attached patch plugs a memory leak of the TRANSFER intrinsic,
>> which can occur when transferring to CHARACTER strings. For details
>> see the PR.
>>
>> Regtested on x86_64-unknown-linux-gnu. Ok for trunk/4.8/4.7?
>>
Looks good, yes.
Thanks.

Mikael


Re: [C++ Patch / RFC] PR 46206

2013-08-09 Thread Paolo Carlini

Hi,

On 08/09/2013 05:22 PM, David Edelsohn wrote:
Exactly. What is the common factor on AIX, Darwin and Solaris that is 
different from Linux? A difference in system types? How can we help? 

Thanks David, all, for your kind offers.

As I said the issue is weird, I think the only way in practice to make 
progress is very serious debugging on AIX and either Darwin or Solaris. 
Note that AIX and Darwin are already different: on the latter only the 
second new subtest fails, that for Foo2, on AIX both.


For the time being I decided to revert the patch, otherwise the issue 
only makes me nervous. If somebody has insights, basing on my original 
analysis here maybe:


http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00239.html

I would be glad to work again on the issue at a later time, in say a 
week or two. At the moment my TODO list is already full.


Thanks again,
Paolo.



Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-09 Thread Martin Liška
Hi

On 9 August 2013 17:28, Jan Hubicka  wrote:
>> > Do we sanity check that the cold partition does not contain any blocks of
>> > count 0?  It may be that the profile is broken enough to make partitioning
>> > not work.
>>
>> Do you mean sanity check that the cold partition does not contain any
>> blocks of count > 0? (they should all be zero) I don't think that
>> sanity check is there, but I can try adding that.
>
> Thanks, lets start with this - I suppose we need to figure out if
>  1) the reachable blocks goes to cold section because partitioning decides
> so even if they have non-0 count.
>  2) the reachable blocks goes to cold section because they have incorrectly
> updated count to 0 by someone
>  3) profiling gets some blocks wrong.
>>
>> The issue with such a sanity check may be due to the later fixup I
>> have in this patch (fixup_partitions). It is invoked after certain
>> optimizations on the cfg that may make hot blocks previously reached
>> by both hot and cold edges only reachable by cold blocks. These blocks
>> are remarked cold. If the profile data hasn't been updated correctly
>> it is possible that they would still have a non-0 count, although they
>> are essentially cold after the cfg transformation.
>
> Well, or the other posibility is that the edges was updated wrong
> and the blocks are really cold.  We need to figure out if that happens
> commonly enough.
>
> I will try to think of some artificial testcases.
>
>>
>> But certainly such a sanity check should always succeed after the
>> original partitioning.
>>
>> > I can think of inlining where the count gets scaled all way down to 0.  
>> > Perhaps
>> > count scaling code can be modified to never round towards 0 for block 
>> > executing
>> > non-0 times...
>>
>> This reminds me of why this situation could happen. When I have been
>> testing this on the google branch I found situations where COMDAT
>> routines have 0 profile counts (if I remember correctly, this happens
>> when profile-gen binary has call to out-of-line copy of COMDAT in
>> module A, linker chooses the out-of-line copy from module B, therefore
>> the profile data for COMDAT in module A is 0). When the COMDAT gets
>> inlined, the 0 counts on its bbs are scaled to 0, even though the
>> callsite is non-zero. I have a patch that I was planning to send as a
>> follow-up that handles this case by propagating the callsite bb's
>> count to the inlined code when it has 0 counts, scaling by the edge
>> frequencies. I can either include that patch in this one, or send it
>> for review separately right now. Do you want to give it a try with
>> this one to see if it addresses the issue?
>
> This scenario should not happen with LTO setup: the LTO symbol tables contains
> code before early optimization and should be identical with profiling or
> without (modulo the new references and call from profile code).
>
> But this patch seems useful as a backup solution for non-LTO, so yes, please
> send it separately and I can try to double check that it really do not happen
> with LTO.
> (acutally LTO symtab may just chose COMDAT from module that has counts with 
> it.
> It has all the info for it.  I was thinkin about it few weeks back.  It is
> bit hard to do - you need to verify that all references from the function are
> the same or linking might fail if you overwrite linker's decisiosns).
>>
>> Also, can you send me reproduction instructions for gimp? I don't
>> think I need Martin's patch, but which version of gimp and what is the
>> equivalent way for me to train it? I have some scripts to generate a
>> similar type of instruction heat map graph that I have been using to
>> tune partitioning and function reordering. Essentially it uses linux
>> perf to sample on instructions_retired and then munge the data in
>> several ways to produce various stats and graphs. One thing that has
>> been useful has been to combine the perf data with nm output to
>> determine which cold functions are being executed at runtime.
>
> Martin?

I use gimp from git repository, commit:
88ecd59c3436d302b644a5d25c1938c0e7b60ae0 (from Fet 5 2013)
Link: http://www.gimp.org/source/#gimp_from_git

Martin

>>
>> However, for this to tell me which split cold bbs are being executed I
>> need to use a patch that Sri sent for review several months back that
>> gives the split cold section its own name:
>>   http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html
>> Steven had some follow up comments that Sri hasn't had a chance to address 
>> yet:
>>   http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html
>> (cc'ing Sri as we should probably revive this patch soon to address
>> gdb and other issues with detecting split functions properly)
>
> Intresting, I used linker script for this purposes, but that his GNU ld 
> only...
>
> Honza
>>
>> Thanks!
>> Teresa
>>
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >> Teresa
>> >>
>> >> > I think we are really looking primarily for dead parts of the functions 
>> >> > (sa

Re: [SPARC] access to atomic compare-and-swap on LEON3

2013-08-09 Thread Deng Hengyi
Hi Eric,

Thank you very much for your work. I will test it ASAP

WeiY
Best Regards
在 2013-8-9,下午5:06,Eric Botcazou  写道:

> This enables access to the atomic compare-and-swap on LEON3 in conjunction 
> with 
> the binutils patch at:
>  http://sourceware.org/ml/binutils/2013-08/msg00038.html
> 
> Tested on SPARC/Solaris and sparc-elf, applied on the mainline.
> 
> 
> 2013-08-09  Eric Botcazou  
> 
>   * configure.ac: Add GAS check for LEON instructions on SPARC.
>   * configure: Regenerate.
>   * config.in: Likewise.
>   * config.gcc (with_cpu): Remove sparc-leon*-* and deal with LEON in the
>   sparc*-*-* block.
>   * config/sparc/sparc.opt (LEON, LEON3): New masks.
>   * config/sparc/sparc.h (ASM_CPU32_DEFAULT_SPEC): Set to AS_LEON_FLAG
>   for LEON or LEON3.
>   (ASM_CPU_SPEC): Pass AS_LEON_FLAG if -mcpu=leon or -mcpu=leon3.
>   (AS_LEON_FLAG): New macro.
>   * config/sparc/sparc.c (sparc_option_override): Set MASK_LEON for leon
>   and MASK_LEON3 for leon3 and unset them if HAVE_AS_LEON is not defined.
>   Deal with LEON and LEON3 for the memory model.
>   * config/sparc/sync.m (atomic_compare_and_swap): Enable for LEON3
>   (atomic_compare_and_swap_1): Likewise.
>   (*atomic_compare_and_swap_1): Likewise.
> 
> 
> -- 
> Eric Botcazou
> 



[C++ Draft] PR 54080 / PR 52875, others

2013-08-09 Thread Paolo Carlini

Hi,

for reference I decided to start a new thread with a preliminary 
implementation of this idea:


http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00519.html

I know Gaby doesn't like it much, would like to see something more 
ambitious in this area.


IMHO, within its limits, the idea works as I expected it would, both the 
mentioned bugs are fixed and we produce a sensible diagnostic. No 
regressions, testsuite changes, nothing changes in terms of rejecting 
testcases which exceed in SFINAE context the recursive instantiation / 
substitution limit.


I'm thus attaching the draft below, let me known if you can imagine ways 
to incrementally improve it in the near future.


Thanks!
Paolo.

/
Index: cp/call.c
===
--- cp/call.c   (revision 201631)
+++ cp/call.c   (working copy)
@@ -2919,7 +2919,8 @@ add_template_candidate_real (struct z_candidate **
args_without_in_chrg,
nargs_without_in_chrg,
return_type, strict, flags, false,
-   complain & tf_decltype);
+   complain & tf_decltype,
+   complain & tf_diagnostic);
 
   if (fn == error_mark_node)
 {
@@ -3235,7 +3236,7 @@ print_z_candidate (location_t loc, const char *msg
   r->u.template_unification.return_type,
   r->u.template_unification.strict,
   r->u.template_unification.flags,
-  true, false);
+  true, false, false);
  break;
case rr_invalid_copy:
  inform (cloc,
Index: cp/class.c
===
--- cp/class.c  (revision 201631)
+++ cp/class.c  (working copy)
@@ -7363,7 +7363,8 @@ resolve_address_of_overloaded_function (tree targe
  instantiation = fn_type_unification (fn, explicit_targs, targs, args,
   nargs, ret,
  DEDUCE_EXACT, LOOKUP_NORMAL,
-  false, false);
+  false, false,
+  flags & tf_diagnostic);
  if (instantiation == error_mark_node)
/* Instantiation failed.  */
continue;
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 201631)
+++ cp/cp-tree.h(working copy)
@@ -4251,6 +4251,7 @@ enum tsubst_flags {
for calls in decltype (5.2.2/11).  */
   tf_partial = 1 << 8,  /* Doing initial explicit argument
substitution in fn_type_unification.  */
+  tf_diagnostic = 1 << 9,/* Diagnostic ctx, see push_tinst_level.  */
   /* Convenient substitution flags combinations.  */
   tf_warning_or_error = tf_warning | tf_error
 };
@@ -5479,7 +5480,7 @@ extern tree instantiate_template  (tree, tree, tsu
 extern tree fn_type_unification(tree, tree, tree,
 const tree *, unsigned int,
 tree, unification_kind_t, int,
-bool, bool);
+bool, bool, bool);
 extern void mark_decl_instantiated (tree, int);
 extern int more_specialized_fn (tree, tree, int);
 extern void do_decl_instantiation  (tree, tree);
@@ -5541,7 +5542,7 @@ extern tree fold_non_dependent_expr_sfinae(tree,
 extern bool alias_type_or_template_p(tree);
 extern bool alias_template_specialization_p (const_tree);
 extern bool explicit_class_specialization_p (tree);
-extern int push_tinst_level (tree);
+extern int push_tinst_level (tree, bool);
 extern void pop_tinst_level (void);
 extern struct tinst_level *outermost_tinst_level(void);
 extern void init_template_processing   (void);
Index: cp/error.c
===
--- cp/error.c  (revision 201631)
+++ cp/error.c  (working copy)
@@ -321,7 +321,7 @@ dump_template_bindings (tree parms, tree args, vec
   pp_equal (cxx_pp);
   pp_cxx_whitespace (cxx_pp);
   push_deferring_access_checks (dk_no_check);
-  t = tsubst (t, args, tf_none, NULL_TREE);
+  t = tsubst (t, args, tf_diagnostic, NULL_TREE);
   pop_deferring_access_checks ();
   /* Strip typedefs.  We can't just use TFF_CHASE_TYPEDEF because
 pp_simple_type_specifier doesn't know about it.  */
Index: cp/mangle.c
===
--- cp/mang

RE: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C

2013-08-09 Thread Joseph S. Myers
On Thu, 8 Aug 2013, Iyer, Balaji V wrote:

> +enum add_variable_type  {

Two spaces before '{', should be one.

> +static HOST_WIDE_INT cilk_wrapper_count;

This is HOST_WIDE_INT but you use it later with sprintf with %ld; you need 
to use HOST_WIDE_INT_PRINT_DEC in such a case

> +  tree map = (tree)*map0;

There are several places like this where you are missing a space in a 
cast, which should be "(tree) *map0".

> +  /* Build the Cilk Stack Frame:
> + struct __cilkrts_stack_frame {
> +   uint32_t flags;
> +   uint32_t size;
> +   struct __cilkrts_stack_frame *call_parent;
> +   __cilkrts_worker *worker;
> +   void *except_data;
> +   void *ctx[4];
> +   uint32_t mxcsr;
> +   uint16_t fpcsr;
> +   uint16_t reserved;
> +   __cilkrts_pedigree pedigree;
> + };  */
> +
> +  tree frame = lang_hooks.types.make_type (RECORD_TYPE);
> +  tree frame_ptr = build_pointer_type (frame);
> +  tree worker_type = lang_hooks.types.make_type (RECORD_TYPE);
> +  tree worker_ptr = build_pointer_type (worker_type);
> +  tree s_type_node = build_int_cst (size_type_node, 4);
> +
> +  tree flags = add_field ("flags", unsigned_type_node, NULL_TREE);
> +  tree size = add_field ("size", unsigned_type_node, flags);

You refer to some fields as uint32_t above but then build them as unsigned 
int; you should be consistent.

I'm also suspicious of the "mxcsr" and "fpcsr" fields and associated enum 
values.  They don't really appear to be *used* for anything semantic in 
this patch - there's just boilerplate code dealing with creating them.  So 
I don't know what the point of them is at all - is there an associated 
runtime using them to be added by another patch in this series?  The 
problem is that they sound architecture-specific - they sound like they 
relate to registers on one particular architecture - meaning that they 
should actually be created by target hooks which might create different 
sets or sizes of such fields on different architectures (and they 
shouldn't appear at all in an enum in architecture-independent code, in 
that case).

> +  tree mxcsr = add_field ("mxcsr", uint32_type_node, context);
> +  tree fpcsr = add_field ("fpcsr", uint16_type_node, mxcsr);
> +  tree reserved = add_field ("reserved", uint16_type_node, fpcsr);

Note that uint32_type_node and uint16_type_node are internal types that 
may or may not correspond to the stdint.h C typedefs (one could be 
unsigned int and the other unsigned long, for example).  If this matters 
then you may need to work out how to use c_uint32_type_node etc. instead 
(but this code is outside the front end, so that may not be easy to do).  
(Cf. what I said in 
 about the 
remaining, presumably unmaintained, targets without stdint.h type 
information in GCC.)

> +   int32_t self;

> +  field = add_field ("self", unsigned_type_node, field);

Again, inconsistency of type.

> +Used to represent a spawning function in the Cilk Plus language extension.  
> +This tree has one field that holds the name of the spawning function.
> +_Cilk_spawn can be written in C in the following way:

@code{_Cilk_spawn} (and likewise _Cilk_sync), in several places.

> +Detailed description for usage and functionality of _Cilk_spawn can be found 
> at
> +http://www.cilkplus.org

Use an actual link.

> diff --git a/gcc/tree.h b/gcc/tree.h
> index 0058a4b..952362f 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -262,6 +262,7 @@ enum built_in_class
>NOT_BUILT_IN = 0,
>BUILT_IN_FRONTEND,
>BUILT_IN_MD,
> +  BUILT_IN_CILK,
>BUILT_IN_NORMAL
>  };
>  
> @@ -439,6 +440,8 @@ struct GTY(()) tree_base {
>unsigned protected_flag : 1;
>unsigned deprecated_flag : 1;
>unsigned default_def_flag : 1;
> +  unsigned is_cilk_spawn : 1;
> +  unsigned is_cilk_spawn_detach_point : 1;

No, absolutely not.  This would expand all trees by a whole word.  Find a 
way to do whatever you need without increasing memory consumption like 
that.

> @@ -3496,7 +3508,7 @@ struct GTY(()) tree_function_decl {
>   ???  The bitfield needs to be able to hold all target function
> codes as well.  */
>ENUM_BITFIELD(built_in_function) function_code : 11;
> -  ENUM_BITFIELD(built_in_class) built_in_class : 2;
> +  ENUM_BITFIELD(built_in_class) built_in_class : 3;
>  
>unsigned static_ctor_flag : 1;
>unsigned static_dtor_flag : 1;

Again, no.  See the comment "No bits left." at the bottom of this 
structure.  Expanding widely used datastructures is a bad idea, although 
this one isn't as bad to expand as tree_base.

-- 
Joseph S. Myers
jos...@codesourcery.com


Implement cross-module indirect call value profiling

2013-08-09 Thread Jan Hubicka
Hi,
this patch makes indirect call profiling to work cross module.  Unlike LIPO I
am not adding module IDs, since I do not know how to make them stable across
multiple uses of same .o files.  Instead I simply assign unique ID to each
possibly indirectly called function in program.  This is done by combining its
assembler name, file&line and gcov filename into single hash.  For GCC this
gives no colisions.

The rest of updates is quite obvious.  Currently we have moudle local
__gcov_indirect_call_callee and __gcov_indirect_call_counters to track the
calls.  I made the global and define them in libgcov.
__gcov_indirect_call_profiler used to take these two as parameters and I
replaced it by __gcov_indirect_call_profiler_v2 that has those two
hard coded to simplify the call sequence.

This patch has only purpose to measure the cross-module calls and get sane
histograms attached to indirect calls.  In the third patch of series I will
actually make them used by the LTO ipa-profile pass.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

Index: libgcc/libgcov.c
===
--- libgcc/libgcov.c(revision 201539)
+++ libgcc/libgcov.c(working copy)
@@ -1121,6 +1121,20 @@ __gcov_one_value_profiler (gcov_type *co
 
 #ifdef L_gcov_indirect_call_profiler
 
+/* These two variables are used to actually track caller and callee.  Keep
+   them in TLS memory so races are not common (they are written to often).
+   The variables are set directly by GCC instrumented code, so declaration
+   here must match one in tree-profile.c  */
+
+#ifdef HAVE_CC_TLS
+__thread 
+#endif
+void * __gcov_indirect_call_callee;
+#ifdef HAVE_CC_TLS
+__thread 
+#endif
+gcov_type * __gcov_indirect_call_counters;
+
 /* By default, the C++ compiler will use function addresses in the
vtable entries.  Setting TARGET_VTABLE_USES_DESCRIPTORS to nonzero
tells the compiler to use function descriptors instead.  The value
@@ -1140,19 +1154,43 @@ __gcov_one_value_profiler (gcov_type *co
 
 /* Tries to determine the most common value among its inputs. */
 void
-__gcov_indirect_call_profiler (gcov_type* counter, gcov_type value,
-  void* cur_func, void* callee_func)
+__gcov_indirect_call_profiler_v2 (gcov_type value, void* cur_func)
 {
   /* If the C++ virtual tables contain function descriptors then one
  function may have multiple descriptors and we need to dereference
  the descriptors to see if they point to the same function.  */
-  if (cur_func == callee_func
-  || (VTABLE_USES_DESCRIPTORS && callee_func
- && *(void **) cur_func == *(void **) callee_func))
-__gcov_one_value_profiler_body (counter, value);
+  if (cur_func == __gcov_indirect_call_callee
+  || (VTABLE_USES_DESCRIPTORS && __gcov_indirect_call_callee
+ && *(void **) cur_func == *(void **) __gcov_indirect_call_callee))
+__gcov_one_value_profiler_body (__gcov_indirect_call_counters, value);
 }
 #endif
 
 
 #ifdef L_gcov_average_profiler
 /* Increase corresponding COUNTER by VALUE.  FIXME: Perhaps we want
Index: gcc/value-prof.c
===
--- gcc/value-prof.c(revision 201632)
+++ gcc/value-prof.c(working copy)
@@ -1173,24 +1173,56 @@ gimple_mod_subtract_transform (gimple_st
   return true;
 }
 
-static vec cgraph_node_map
-= vNULL;
+static pointer_map_t *cgraph_node_map;
 
 /* Initialize map from FUNCDEF_NO to CGRAPH_NODE.  */
 
 void
-init_node_map (void)
+init_node_map (bool local)
 {
   struct cgraph_node *n;
+  cgraph_node_map = pointer_map_create ();
 
-  if (get_last_funcdef_no ())
-cgraph_node_map.safe_grow_cleared (get_last_funcdef_no ());
-
-  FOR_EACH_FUNCTION (n)
-{
-  if (DECL_STRUCT_FUNCTION (n->symbol.decl))
-cgraph_node_map[DECL_STRUCT_FUNCTION (n->symbol.decl)->funcdef_no] = n;
-}
+  FOR_EACH_DEFINED_FUNCTION (n)
+if (cgraph_function_with_gimple_body_p (n)
+   && !cgraph_only_called_directly_p (n))
+  {
+   void **val;
+   if (local)
+ {
+   n->profile_id = coverage_compute_profile_id (n);
+   while ((val = pointer_map_contains (cgraph_node_map, (void 
*)(size_t)n->profile_id)) || !n->profile_id)
+ {
+   if (dump_file)
+ fprintf (dump_file, "Local profile-id %i conflict with nodes 
%s/%i %s/%i\n",
+  n->profile_id,
+  cgraph_node_name (n),
+  n->symbol.order,
+  symtab_node_name (*(symtab_node*)val),
+  (*(symtab_node *)val)->symbol.order);
+   n->profile_id = (n->profile_id + 1) & 0x7fff;
+ }
+ }
+   else if (!n->profile_id)
+ {
+   if (dump_file)
+ fprintf (dump_file, "Node %s/%i has no profile-id (profile 
feedback missing?)\n",
+  cgraph_node_n

[PING][PATCH] doc: mention that -ftls-model is subject to optimization

2013-08-09 Thread Alexander Monakov
Ping.  I've tried to phrase the additional doc text as concisely as possible.

Would a runtime warning be more appropriate?

Thanks.

On Wed, 24 Jul 2013, Alexander Monakov wrote:

> Hello,
> 
> As discussed here, the current behavior of -ftls-model is intended:
> http://gcc.gnu.org/ml/gcc/2013-07/msg00248.html
> (in short: unlike the attribute, -ftls-model sets a "most general" model to be
> used, but the compiler can use a less general model than specified, making it
> impossible to set global-dynamic model on the command line for non-PIC code).
> 
> I'd like to add a clarification to the docs. OK for trunk?
> 
> 2013-07-24  Alexander Monakov  
> 
>   * doc/invoke.texi: Mention that -ftls-model does not force the final
>   model.
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index d99d217..4d1fbee 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -20802,6 +20802,9 @@ Not all targets provide complete support for this 
> switch.
>  Alter the thread-local storage model to be used (@pxref{Thread-Local}).
>  The @var{model} argument should be one of @code{global-dynamic},
>  @code{local-dynamic}, @code{initial-exec} or @code{local-exec}.
> +Note that the choice is subject to optimization: the compiler may use
> +a more efficient model for symbols not visible outside of the translation
> +unit, or if @option{-fpic} is not given on the command line.
>  
>  The default without @option{-fpic} is @code{initial-exec}; with
>  @option{-fpic} the default is @code{global-dynamic}.
> 
> 


Re: [Patch, Fortran] PR 58058: [4.7/4.8/4.9 Regression] Memory leak with transfer function

2013-08-09 Thread Janus Weil
Hi Mikael,

>>> the attached patch plugs a memory leak of the TRANSFER intrinsic,
>>> which can occur when transferring to CHARACTER strings. For details
>>> see the PR.
>>>
>>> Regtested on x86_64-unknown-linux-gnu. Ok for trunk/4.8/4.7?
>>>
> Looks good, yes.
> Thanks.

thanks a lot for the review! I have committed to trunk as r201633.
Will do 4.8 and 4.7 soon (assuming your OK also applies to those ...)

Cheers,
Janus


Re: [RFC Patch, Aarch64] : Macros for profile code generation to enable gprof support

2013-08-09 Thread Marcus Shawcroft

On 03/08/13 19:01, Venkataramanan Kumar wrote:



2013-08-02  Venkataramanan Kumar  

  * config/aarch64/aarch64.h (MCOUNT_NAME): Define.
(NO_PROFILE_COUNTERS): Likewise.
(PROFILE_HOOK): Likewise.
(FUNCTION_PROFILER): Likewise.
 *  config/aarch64/aarch64.c (aarch64_function_profiler): Remove.
.

regards,
Venkat.



Hi Venkat,

Looking at the various other ports it looks that the majority choose to 
use FUNCTION_PROFILER_HOOK rather than PROFILE_HOOK.


Using PROFILE_HOOK to inject a regular call to to _mcount() means that 
all arguments passed in registers in every function will be spilled and 
reloaded because the _mcount call will kill the caller save registers.


Using the FUNCTION_PROFILER_HOOK and taking care not to kill the caller 
save registers would be less invasive.  The LR argument to _mcount would 
need to be passed in a temporary register, say x9 and _mcount would also 
need to ensure caller save registers are saved and restored.


The latter seems to be a better option to me, is there compelling reason 
to choose PROFILE_HOOK over FUNCTION_PROFILER_HOOK ??


Cheers
/Marcus



Re: [PATCH] doc: mention that -ftls-model is subject to optimization

2013-08-09 Thread Ian Lance Taylor
On Wed, Jul 24, 2013 at 9:53 AM, Alexander Monakov  wrote:
>
> 2013-07-24  Alexander Monakov  
>
> * doc/invoke.texi: Mention that -ftls-model does not force the final
> model.

This is OK.

Thanks.

Ian


Re: [Patch, Fortran] PR 58099: [4.8/4.9 Regression] [F03] over-zealous procedure-pointer error checking

2013-08-09 Thread Mikael Morin
Le 07/08/2013 16:02, Janus Weil a écrit :
> Hi all,
> 
> here is a small regression-fix patch for a problem with procedure
> pointers and the PURE attribute, for details see the PR. In essence:
> gfc_compare_interfaces is asymmetric in the two interfaces it compares
> (e.g. regarding the PURE attribute), and we must not symmetrize it by
> calling it twice with exchanged arguments.
> 
> Regtested on x86_64-unknown-linux-gnu. Ok for trunk?
> 
This looks good to me, but I let Tobias have the final word as he
expressed some concerns in the PR audit trail.

Mikael


Re: [Patch, Fortran] PR 58099: [4.8/4.9 Regression] [F03] over-zealous procedure-pointer error checking

2013-08-09 Thread Mikael Morin
Le 07/08/2013 16:02, Janus Weil a écrit :
> Hi all,
> 
> here is a small regression-fix patch for a problem with procedure
> pointers and the PURE attribute, for details see the PR. In essence:
> gfc_compare_interfaces is asymmetric in the two interfaces it compares
> (e.g. regarding the PURE attribute), and we must not symmetrize it by
> calling it twice with exchanged arguments.
> 
> Regtested on x86_64-unknown-linux-gnu. Ok for trunk?
> 
This looks good to me, but I let Tobias have the final word as he
expressed some concerns in the PR audit trail.

Mikael


Re: [Patch, Fortran] PR 58058: [4.7/4.8/4.9 Regression] Memory leak with transfer function

2013-08-09 Thread Mikael Morin
Le 09/08/2013 19:06, Janus Weil a écrit :
> thanks a lot for the review! I have committed to trunk as r201633.
> Will do 4.8 and 4.7 soon (assuming your OK also applies to those ...)
> 
Yes, of course.

Mikael




Re: New parameters to control stringop expansion libcall strategy

2013-08-09 Thread Xinliang David Li
Is this version ok for trunk?

thanks,

David

On Thu, Aug 8, 2013 at 9:31 AM, Xinliang David Li  wrote:
> Updated.
>
> thanks,
>
> David
>
> On Thu, Aug 8, 2013 at 8:18 AM, Joseph S. Myers  
> wrote:
>> On Wed, 7 Aug 2013, Xinliang David Li wrote:
>>
>>> Updated patch attached (fixed header, buffer overflow, and warning -->
>>> error problems).
>>
>> You still have diagnostics starting with a capital letter, contrary to the
>> GNU Coding Standards.
>>
>> --
>> Joseph S. Myers
>> jos...@codesourcery.com


Re: New parameters to control stringop expansion libcall strategy

2013-08-09 Thread Jan Hubicka
> Is this version ok for trunk?

It looks resonable, but I still do not like much the removal of const for 
tables.
Doing so will push them all into David Malcom's per-thread global universe.

Currently the algorithm is selected based on cost->memset/cost->memcpy.
Instead of removing the const of all the CPU tables, I would preffer
introducing two readwrite global variables memset_algs/memcpy_algs and feed
them with proper table at a time we set up ix86_tune_features.

This has chance to do the right thing with optimize attribute specifying 
algorithms
and with the longer term threading plan.

Honza
> 
> thanks,
> 
> David
> 
> On Thu, Aug 8, 2013 at 9:31 AM, Xinliang David Li  wrote:
> > Updated.
> >
> > thanks,
> >
> > David
> >
> > On Thu, Aug 8, 2013 at 8:18 AM, Joseph S. Myers  
> > wrote:
> >> On Wed, 7 Aug 2013, Xinliang David Li wrote:
> >>
> >>> Updated patch attached (fixed header, buffer overflow, and warning -->
> >>> error problems).
> >>
> >> You still have diagnostics starting with a capital letter, contrary to the
> >> GNU Coding Standards.
> >>
> >> --
> >> Joseph S. Myers
> >> jos...@codesourcery.com


Re: [PATCH] Fix PR57980

2013-08-09 Thread Richard Biener
Marek Polacek  wrote:
>In this PR the problem was that when dealing with the gimple assign in
>the tailcall optimization, we, when the rhs operand is of a vector
>type, need to create -1 also of a vector type, but build_int_cst
>doesn't create vectors (ICEs).  Instead, we should use
>build_minus_one_cst
>because that can create even the VECTOR_TYPE constant (and, it can
>create even REAL_TYPE/COMPLEX_TYPE), as suggested by Marc.
>
>Regtested/bootstrapped on x86_64-linux, ok for trunk and 4.8?

Ok. Double-check that this function exists on the branch please.

Thanks,
Richard.

>2013-08-09  Marek Polacek  
>   Marc Glisse  
>
>   PR tree-optimization/57980
>   * tree-tailcall.c (process_assignment): Call build_minus_one_cst
>   when creating -1 constant.
>
>   * gcc.dg/pr57980.c: New test.
>
>--- gcc/tree-tailcall.c.mp 2013-08-09 16:07:55.778616996 +0200
>+++ gcc/tree-tailcall.c2013-08-09 16:08:00.068635823 +0200
>@@ -326,11 +326,7 @@ process_assignment (gimple stmt, gimple_
>   return true;
> 
> case NEGATE_EXPR:
>-  if (FLOAT_TYPE_P (TREE_TYPE (op0)))
>-*m = build_real (TREE_TYPE (op0), dconstm1);
>-  else
>-*m = build_int_cst (TREE_TYPE (op0), -1);
>-
>+  *m = build_minus_one_cst (TREE_TYPE (op0));
>   *ass_var = dest;
>   return true;
> 
>@@ -339,11 +335,7 @@ process_assignment (gimple stmt, gimple_
>  *a = fold_build1 (NEGATE_EXPR, TREE_TYPE (non_ass_var), non_ass_var);
>   else
> {
>-  if (FLOAT_TYPE_P (TREE_TYPE (non_ass_var)))
>-*m = build_real (TREE_TYPE (non_ass_var), dconstm1);
>-  else
>-*m = build_int_cst (TREE_TYPE (non_ass_var), -1);
>-
>+*m = build_minus_one_cst (TREE_TYPE (non_ass_var));
>  *a = fold_build1 (NEGATE_EXPR, TREE_TYPE (non_ass_var), non_ass_var);
> }
> 
>--- gcc/testsuite/gcc.dg/pr57980.c.mp  2013-08-09 16:07:25.967485356
>+0200
>+++ gcc/testsuite/gcc.dg/pr57980.c 2013-08-09 16:07:17.967450526 +0200
>@@ -0,0 +1,19 @@
>+/* PR tree-optimization/57980 */
>+/* { dg-do compile } */
>+/* { dg-options "-O -foptimize-sibling-calls" } */
>+
>+typedef int V __attribute__ ((vector_size (sizeof (int;
>+extern V f (void);
>+
>+V
>+bar (void)
>+{
>+  return -f ();
>+}
>+
>+V
>+foo (void)
>+{
>+  V v = { };
>+  return v - f ();
>+}
>
>   Marek




Re: [Patch, Fortran] gfc_get_code cleanup

2013-08-09 Thread Janus Weil
2013/8/9 Mikael Morin :
> Le 06/08/2013 17:12, Janus Weil a écrit :
>> Hi all,
>>
>> attached is a cleanup patch which concerns the gfc_code structure and
>> gfc_get_code function (in st.c). It basically does two things:
>>
>> 1) It replaces the many occurrences of "XCNEW (gfc_code)" in class.c
>> by "gfc_get_code ()", which internally sets the locus and saves us
>> from doing it manually afterward.
>>
>> 2) It adds an argument "op" to gfc_get_code to directly set the .op
>> component of gfc_code. Every time we set up a new gfc_code structure,
>> we certainly want to set its op.
>>
>> 2b) There are a few instances where we do not set the op after calling
>> gfc_get_code, but instead copy the whole structure from someplace
>> else. For those cases I'm using "XCNEW (gfc_code)" now (which also
>> avoids setting the locus twice).
>>
>> 2c) In one place I'm using "gfc_get_code (EXEC_NOP)" for technical
>> reasons, see 'new_level' in parse.c.
>>
>> Both items (1) and (2) result in more compact code and save a few
>> extra lines (see diffstat below). Regtested on
>> x86_64-unknown-linux-gnu. Ok for trunk with a suitable ChangeLog?
>>
> Yes, thanks

Thanks, committed as r201635:

http://gcc.gnu.org/viewcvs/gcc?limit_changes=0&view=revision&revision=201635


Cheers,
Janus


Re: [C++ Draft] PR 54080 / PR 52875, others

2013-08-09 Thread Paolo Carlini

. patch draft fixes c++/53401 too.

Paolo.


Re: New parameters to control stringop expansion libcall strategy

2013-08-09 Thread Xinliang David Li
On Fri, Aug 9, 2013 at 11:33 AM, Jan Hubicka  wrote:
>> Is this version ok for trunk?
>
> It looks resonable, but I still do not like much the removal of const for 
> tables.
> Doing so will push them all into David Malcom's per-thread global universe.
>
> Currently the algorithm is selected based on cost->memset/cost->memcpy.
> Instead of removing the const of all the CPU tables, I would preffer
> introducing two readwrite global variables memset_algs/memcpy_algs and feed
> them with proper table at a time we set up ix86_tune_features.
>

I can do that in this patch. In the future, when we need to do tunings
for those constants, we can revisit it.

thanks,

David

> This has chance to do the right thing with optimize attribute specifying 
> algorithms
> and with the longer term threading plan.
>
> Honza
>>
>> thanks,
>>
>> David
>>
>> On Thu, Aug 8, 2013 at 9:31 AM, Xinliang David Li  wrote:
>> > Updated.
>> >
>> > thanks,
>> >
>> > David
>> >
>> > On Thu, Aug 8, 2013 at 8:18 AM, Joseph S. Myers  
>> > wrote:
>> >> On Wed, 7 Aug 2013, Xinliang David Li wrote:
>> >>
>> >>> Updated patch attached (fixed header, buffer overflow, and warning -->
>> >>> error problems).
>> >>
>> >> You still have diagnostics starting with a capital letter, contrary to the
>> >> GNU Coding Standards.
>> >>
>> >> --
>> >> Joseph S. Myers
>> >> jos...@codesourcery.com


Re: New parameters to control stringop expansion libcall strategy

2013-08-09 Thread Jan Hubicka
> On Fri, Aug 9, 2013 at 11:33 AM, Jan Hubicka  wrote:
> >> Is this version ok for trunk?
> >
> > It looks resonable, but I still do not like much the removal of const for 
> > tables.
> > Doing so will push them all into David Malcom's per-thread global universe.
> >
> > Currently the algorithm is selected based on cost->memset/cost->memcpy.
> > Instead of removing the const of all the CPU tables, I would preffer
> > introducing two readwrite global variables memset_algs/memcpy_algs and feed
> > them with proper table at a time we set up ix86_tune_features.
> >
> 
> I can do that in this patch. In the future, when we need to do tunings
> for those constants, we can revisit it.

Yep, I think we can follow same strategy and just move them to a global 
constant.
Those are part of the context/universum since they will be user rewritable then.

Thanks, the patch is OK with this change.
Honza
> 
> thanks,
> 
> David
> 
> > This has chance to do the right thing with optimize attribute specifying 
> > algorithms
> > and with the longer term threading plan.
> >
> > Honza
> >>
> >> thanks,
> >>
> >> David
> >>
> >> On Thu, Aug 8, 2013 at 9:31 AM, Xinliang David Li  
> >> wrote:
> >> > Updated.
> >> >
> >> > thanks,
> >> >
> >> > David
> >> >
> >> > On Thu, Aug 8, 2013 at 8:18 AM, Joseph S. Myers 
> >> >  wrote:
> >> >> On Wed, 7 Aug 2013, Xinliang David Li wrote:
> >> >>
> >> >>> Updated patch attached (fixed header, buffer overflow, and warning -->
> >> >>> error problems).
> >> >>
> >> >> You still have diagnostics starting with a capital letter, contrary to 
> >> >> the
> >> >> GNU Coding Standards.
> >> >>
> >> >> --
> >> >> Joseph S. Myers
> >> >> jos...@codesourcery.com


Re: Speculative call support in the callgraph

2013-08-09 Thread Jan Hubicka
> I have not looked at the details. One high level question: this form
> seems to only support one indirect target case. LIPO uses TOPN
> indirect target profiling (tracking multiple targets), which can be
> used by LTO as well (when the topn profiling gets into trunk).

Well, adding multiple direct edges for given call will need extension into
cgraph_turn_edge_to_speculative and cgraph_speculative_call_info APIs (to allow
multple direct edges) to indirect_info common_target datastructure, to
profiling histograms and to the gimple_ic code.  Otherwise there is nothing
really hard coded about single direct target.

How much benefits do you see from having multiple direct targets? I would
expect them to be quite quickly disappearing as N increases...
How the TOPN profiling counter is implemented?

Honza


Re: [go-nuts] Solaris gccgo http.Get error?

2013-08-09 Thread Ian Lance Taylor
On Thu, Aug 8, 2013 at 11:22 PM, Jakob Borg  wrote:
>
> But, adding a
>
>  hints.ai_socktype = SOCK_STREAM;
>
> gives me
>
> jb@zlogin2:~ $ ./test
> canonical name: www.google.com
> 26 2 6
> 26 2 6
> 26 2 6
> 26 2 6
> 26 2 6
> 26 2 6
>
> It seems we might need a tweak to support Solaris... :/

Looks like it.  I committed a patch to the master repository.  This
patch copies it over to gccgo.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.8 branch.

Note that I have not made the change on the 4.7 branch which is what
you are using.  The same patch should work for the 4.7 sources,
though, if you want to copy it over.

Ian


foo.patch
Description: Binary data


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-09 Thread Teresa Johnson
On Fri, Aug 9, 2013 at 8:54 AM, Martin Liška  wrote:
> Hi
>
> On 9 August 2013 17:28, Jan Hubicka  wrote:
>>> > Do we sanity check that the cold partition does not contain any blocks of
>>> > count 0?  It may be that the profile is broken enough to make partitioning
>>> > not work.
>>>
>>> Do you mean sanity check that the cold partition does not contain any
>>> blocks of count > 0? (they should all be zero) I don't think that
>>> sanity check is there, but I can try adding that.
>>
>> Thanks, lets start with this - I suppose we need to figure out if
>>  1) the reachable blocks goes to cold section because partitioning decides
>> so even if they have non-0 count.
>>  2) the reachable blocks goes to cold section because they have incorrectly
>> updated count to 0 by someone
>>  3) profiling gets some blocks wrong.
>>>
>>> The issue with such a sanity check may be due to the later fixup I
>>> have in this patch (fixup_partitions). It is invoked after certain
>>> optimizations on the cfg that may make hot blocks previously reached
>>> by both hot and cold edges only reachable by cold blocks. These blocks
>>> are remarked cold. If the profile data hasn't been updated correctly
>>> it is possible that they would still have a non-0 count, although they
>>> are essentially cold after the cfg transformation.
>>
>> Well, or the other posibility is that the edges was updated wrong
>> and the blocks are really cold.  We need to figure out if that happens
>> commonly enough.
>>
>> I will try to think of some artificial testcases.
>>
>>>
>>> But certainly such a sanity check should always succeed after the
>>> original partitioning.
>>>
>>> > I can think of inlining where the count gets scaled all way down to 0.  
>>> > Perhaps
>>> > count scaling code can be modified to never round towards 0 for block 
>>> > executing
>>> > non-0 times...
>>>
>>> This reminds me of why this situation could happen. When I have been
>>> testing this on the google branch I found situations where COMDAT
>>> routines have 0 profile counts (if I remember correctly, this happens
>>> when profile-gen binary has call to out-of-line copy of COMDAT in
>>> module A, linker chooses the out-of-line copy from module B, therefore
>>> the profile data for COMDAT in module A is 0). When the COMDAT gets
>>> inlined, the 0 counts on its bbs are scaled to 0, even though the
>>> callsite is non-zero. I have a patch that I was planning to send as a
>>> follow-up that handles this case by propagating the callsite bb's
>>> count to the inlined code when it has 0 counts, scaling by the edge
>>> frequencies. I can either include that patch in this one, or send it
>>> for review separately right now. Do you want to give it a try with
>>> this one to see if it addresses the issue?
>>
>> This scenario should not happen with LTO setup: the LTO symbol tables 
>> contains
>> code before early optimization and should be identical with profiling or
>> without (modulo the new references and call from profile code).
>>
>> But this patch seems useful as a backup solution for non-LTO, so yes, please
>> send it separately and I can try to double check that it really do not happen
>> with LTO.
>> (acutally LTO symtab may just chose COMDAT from module that has counts with 
>> it.
>> It has all the info for it.  I was thinkin about it few weeks back.  It is
>> bit hard to do - you need to verify that all references from the function are
>> the same or linking might fail if you overwrite linker's decisiosns).
>>>
>>> Also, can you send me reproduction instructions for gimp? I don't
>>> think I need Martin's patch, but which version of gimp and what is the
>>> equivalent way for me to train it? I have some scripts to generate a
>>> similar type of instruction heat map graph that I have been using to
>>> tune partitioning and function reordering. Essentially it uses linux
>>> perf to sample on instructions_retired and then munge the data in
>>> several ways to produce various stats and graphs. One thing that has
>>> been useful has been to combine the perf data with nm output to
>>> determine which cold functions are being executed at runtime.
>>
>> Martin?
>
> I use gimp from git repository, commit:
> 88ecd59c3436d302b644a5d25c1938c0e7b60ae0 (from Fet 5 2013)
> Link: http://www.gimp.org/source/#gimp_from_git

Thanks. Were you building with LTO? And just -O2, or any other options
I should use?

Teresa

>
> Martin
>
>>>
>>> However, for this to tell me which split cold bbs are being executed I
>>> need to use a patch that Sri sent for review several months back that
>>> gives the split cold section its own name:
>>>   http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html
>>> Steven had some follow up comments that Sri hasn't had a chance to address 
>>> yet:
>>>   http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html
>>> (cc'ing Sri as we should probably revive this patch soon to address
>>> gdb and other issues with detecting split functions properly)
>>
>> Intresting

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-09 Thread Teresa Johnson
On Fri, Aug 9, 2013 at 8:28 AM, Jan Hubicka  wrote:
>> > Do we sanity check that the cold partition does not contain any blocks of
>> > count 0?  It may be that the profile is broken enough to make partitioning
>> > not work.
>>
>> Do you mean sanity check that the cold partition does not contain any
>> blocks of count > 0? (they should all be zero) I don't think that
>> sanity check is there, but I can try adding that.
>
> Thanks, lets start with this - I suppose we need to figure out if
>  1) the reachable blocks goes to cold section because partitioning decides
> so even if they have non-0 count.

Right, this should be easy enough to check and should hopefully never happen.

>  2) the reachable blocks goes to cold section because they have incorrectly
> updated count to 0 by someone

A sanity check should find this too. But it can happen now for various
reasons like the comdat issue I described below. But it will be good
to flag these and fix them.

>  3) profiling gets some blocks wrong.

This is the one that will be tough to fix, if the training run isn't
representative.

>>
>> The issue with such a sanity check may be due to the later fixup I
>> have in this patch (fixup_partitions). It is invoked after certain
>> optimizations on the cfg that may make hot blocks previously reached
>> by both hot and cold edges only reachable by cold blocks. These blocks
>> are remarked cold. If the profile data hasn't been updated correctly
>> it is possible that they would still have a non-0 count, although they
>> are essentially cold after the cfg transformation.
>
> Well, or the other posibility is that the edges was updated wrong
> and the blocks are really cold.  We need to figure out if that happens
> commonly enough.
>
> I will try to think of some artificial testcases.
>
>>
>> But certainly such a sanity check should always succeed after the
>> original partitioning.
>>
>> > I can think of inlining where the count gets scaled all way down to 0.  
>> > Perhaps
>> > count scaling code can be modified to never round towards 0 for block 
>> > executing
>> > non-0 times...
>>
>> This reminds me of why this situation could happen. When I have been
>> testing this on the google branch I found situations where COMDAT
>> routines have 0 profile counts (if I remember correctly, this happens
>> when profile-gen binary has call to out-of-line copy of COMDAT in
>> module A, linker chooses the out-of-line copy from module B, therefore
>> the profile data for COMDAT in module A is 0). When the COMDAT gets
>> inlined, the 0 counts on its bbs are scaled to 0, even though the
>> callsite is non-zero. I have a patch that I was planning to send as a
>> follow-up that handles this case by propagating the callsite bb's
>> count to the inlined code when it has 0 counts, scaling by the edge
>> frequencies. I can either include that patch in this one, or send it
>> for review separately right now. Do you want to give it a try with
>> this one to see if it addresses the issue?
>
> This scenario should not happen with LTO setup: the LTO symbol tables contains
> code before early optimization and should be identical with profiling or
> without (modulo the new references and call from profile code).
>
> But this patch seems useful as a backup solution for non-LTO, so yes, please
> send it separately and I can try to double check that it really do not happen
> with LTO.
> (acutally LTO symtab may just chose COMDAT from module that has counts with 
> it.
> It has all the info for it.  I was thinkin about it few weeks back.  It is
> bit hard to do - you need to verify that all references from the function are
> the same or linking might fail if you overwrite linker's decisiosns).

I see, yes LTO can deal with this better since it has global
information. In non-LTO mode (including LIPO) we have the issue.

I take it gimp is built with LTO and therefore shouldn't be hitting
this comdat issue?

Let me do a couple things:
- port over my comdat inlining fix from the google branch to trunk and
send it for review. If you or Martin could try it to see if it helps
with function splitting to avoid the hits from the cold code that
would be great
- I'll add some new sanity checking to try to detect non-zero blocks
in the cold section, or 0 blocks reached by non-zero edges and see if
I can flush out any problems with my tests or a profiledbootstrap or
gimp.
- I'll try building and profiling gimp myself to see if I can
reproduce the issue with code executing out of the cold section.

Thanks,
Teresa

>>
>> Also, can you send me reproduction instructions for gimp? I don't
>> think I need Martin's patch, but which version of gimp and what is the
>> equivalent way for me to train it? I have some scripts to generate a
>> similar type of instruction heat map graph that I have been using to
>> tune partitioning and function reordering. Essentially it uses linux
>> perf to sample on instructions_retired and then munge the data in
>> several ways to pro

Re: Speculative call support in the callgraph

2013-08-09 Thread Andi Kleen
Jan Hubicka  writes:

> Hi,
> this patch adds support for speculative calls into callgraph.  The idea is 
> that
> any IPA optimization that believes it knows likely target of an indirect call
> (currently I use it for cross-module indirect call profiling, but I expect
> Martin J. can easily add support for ipa-cp and I hope to add speculative
> devirtualization in foreseeable future since it should make difference for
> Firefox). Speculative call replaces indirect call

Patch appears to break boot strap on x86_64-linux 
(or maybe your other one)

0x827ab7 crash_signal
../../gcc/gcc/toplev.c:335
0x827ab7 crash_signal
../../gcc/gcc/toplev.c:335
0x827ab7 crash_signal
../../gcc/gcc/toplev.c:335
0x827ab7 crash_signal
../../gcc/gcc/toplev.c:335
0x827ab7 crash_signal
../../gcc/gcc/toplev.c:335
0x827ab7 crash_signal
../../gcc/gcc/toplev.c:335
0x827ab7 crash_signal
../../gcc/gcc/toplev.c:335
0x5d0c6c cgraph_speculative_call_info(cgraph_edge*, cgraph_edge*&,
cgraph_edge*&, ipa_ref*&)
../../gcc/gcc/cgraph.c:1098
0x5d1270 cgraph_set_call_stmt(cgraph_edge*, gimple_statement_d*, bool)
../../gcc/gcc/cgraph.c:774
0x5d0c6c cgraph_speculative_call_info(cgraph_edge*, cgraph_edge*&,
cgraph_edge*&, ipa_ref*&)
../../gcc/gcc/cgraph.c:1098
0x5d1270 cgraph_set_call_stmt(cgraph_edge*, gimple_statement_d*, bool)
../../gcc/gcc/cgraph.c:774
0x5d0c6c cgraph_speculative_call_info(cgraph_edge*, cgraph_edge*&,
cgraph_edge*&, ipa_ref*&)
../../gcc/gcc/cgraph.c:1098
...


-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: Speculative call support in the callgraph

2013-08-09 Thread Xinliang David Li
On Fri, Aug 9, 2013 at 1:24 PM, Jan Hubicka  wrote:
>> I have not looked at the details. One high level question: this form
>> seems to only support one indirect target case. LIPO uses TOPN
>> indirect target profiling (tracking multiple targets), which can be
>> used by LTO as well (when the topn profiling gets into trunk).
>
> Well, adding multiple direct edges for given call will need extension into
> cgraph_turn_edge_to_speculative and cgraph_speculative_call_info APIs (to 
> allow
> multple direct edges) to indirect_info common_target datastructure, to
> profiling histograms and to the gimple_ic code.  Otherwise there is nothing
> really hard coded about single direct target.
>
> How much benefits do you see from having multiple direct targets?

It can be large depending on applications. For some apps, there are
very callsites with 2 or 3 very hot targets.

> I would
> expect them to be quite quickly disappearing as N increases...

For the large apps I see, N rarely exceeds 4. For most of the cases,
the hot targets are 2 or 3 with 2 being very common.  In LIPO, we
track 4 targets, but only use top 2.

> How the TOPN profiling counter is implemented?

Currently FDO's value profiler is one value based -- it is quite
faulty because it can not handle cases where hot targets alternates.
For instance for sequence 1, 2, 1, 2, 1, 2, 3,3, 1 -- the winner will
be 3 instead of 1.

TOPN algorithm is a LFU based --- when match is not found, the least
frequent used entry will be evicted. To avoid ping-pong effect, total
eviction count is also tracked -- when it exceeds a threshold, the
bottom counter will be clear to make room for hot entries (instead of
letting them killing each other).

thanks,

David


>
> Honza


Cross-module indirect call transformation

2013-08-09 Thread Jan Hubicka
Hi,
this makes the whole indirect call machinery to fly.  The histograms for cross
module indirect calls now collected by value-prof are still at compile time
turned into common targets stored into cgraph edges.  (in
ipa_profile_generate_summary)

Common targets can be used not only for speculation: I expect that code
placement pass can actually look up the target even when the call stays
unspeculative.

At LTO WPA time the common targets are turned into speculative edges by
ipa_profile.

Finally the clonning machinery turns the speuclative edges into conditional
direct calls.

The most painful part of this patch was undoubtely creation of a testcase.  We
have no way to grep LTO dump files, so I had to verify that transformation
happens (or not happen when not welcome) by __builtin_constant_p.

Second lto.exp has no support for FDO compilation and tree-prof has
just partly working support for multiple source testcases.  While there
is way to add additional source, there is no way I can think of to avoid
the other source from being copmiled, so I just turned it into empty
testcase.

Bootstrapped/regtested x86_64-linux

* cgraph.c (cgraph_resolve_speculation): Cut frequency to
CGRAPH_FREQ_MAX.
(dump_cgraph_node): Dump profile-id.
* cgraph.h (cgraph_indirect_call_info): Add common_target_id
and common_target_probability.
* lto-cgraph.c (lto_output_edge): Stream common targets.
(lto_output_node): Stream profile ids.
(input_node): Stream profile ids.
(input_edge): Stream common targets.
* lto-streamer-in.c (fixup_call_stmt_edges_1): Fix formatting.
* ipa.c: Include value-prof.h
(ipa_profile_generate_summary): Turn indirect call statement histograms
into common targets.
(ipa_profile): Turn common targets into speculative edges.
Index: cgraph.c
===
*** cgraph.c(revision 201633)
--- cgraph.c(working copy)
*** cgraph_resolve_speculation (struct cgrap
*** 1176,1181 
--- 1176,1183 
  }
edge->count += e2->count;
edge->frequency += e2->frequency;
+   if (edge->frequency > CGRAPH_FREQ_MAX)
+ edge->frequency = CGRAPH_FREQ_MAX;
edge->speculative = false;
e2->speculative = false;
if (e2->indirect_unknown_callee || e2->inline_failed)
*** dump_cgraph_node (FILE *f, struct cgraph
*** 1801,1806 
--- 1803,1811 
  fprintf (f, "  Availability: %s\n",
 cgraph_availability_names [cgraph_function_body_availability 
(node)]);
  
+   if (node->profile_id)
+ fprintf (f, "  Profile id: %i\n",
+node->profile_id);
fprintf (f, "  Function flags:");
if (node->count)
  fprintf (f, " executed "HOST_WIDEST_INT_PRINT_DEC"x",
Index: cgraph.h
===
*** cgraph.h(revision 201634)
--- cgraph.h(working copy)
*** struct GTY(()) cgraph_indirect_call_info
*** 435,440 
--- 435,444 
int param_index;
/* ECF flags determined from the caller.  */
int ecf_flags;
+   /* Profile_id of common target obtrained from profile.  */
+   int common_target_id;
+   /* Probability that call will land in function with COMMON_TARGET_ID.  */
+   int common_target_probability;
  
/* Set when the call is a virtual call with the parameter being the
   associated object pointer rather than a simple direct call.  */
Index: lto-cgraph.c
===
*** lto-cgraph.c(revision 201633)
--- lto-cgraph.c(working copy)
*** lto_output_edge (struct lto_simple_outpu
*** 299,304 
--- 299,312 
 | ECF_NOVOPS)));
  }
streamer_write_bitpack (&bp);
+   if (edge->indirect_unknown_callee)
+ {
+   streamer_write_hwi_stream (ob->main_stream,
+edge->indirect_info->common_target_id);
+   if (edge->indirect_info->common_target_id)
+   streamer_write_hwi_stream
+  (ob->main_stream, edge->indirect_info->common_target_probability);
+ }
  }
  
  /* Return if LIST contain references from other partitions.  */
*** lto_output_node (struct lto_simple_outpu
*** 519,524 
--- 527,533 
streamer_write_uhwi_stream (ob->main_stream, node->thunk.fixed_offset);
streamer_write_uhwi_stream (ob->main_stream, node->thunk.virtual_value);
  }
+   streamer_write_hwi_stream (ob->main_stream, node->profile_id);
  }
  
  /* Output the varpool NODE to OB. 
*** input_node (struct lto_file_decl_data *f
*** 1057,1062 
--- 1066,1072 
  }
if (node->symbol.alias && !node->symbol.analyzed && node->symbol.weakref)
  node->symbol.alias_target = get_alias_symbol (node->symbol.decl);
+   node->profile_id = streamer_read_hwi (ib);
return node;
  }
  
*** input_edge (struct 

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-09 Thread Jan Hubicka
> 
> I see, yes LTO can deal with this better since it has global
> information. In non-LTO mode (including LIPO) we have the issue.

Thinking about it, there is still one problem left: I usually suggest
users to train with -fno-lto to avoid excessive linking time with
instrumentation.  This actually will bring differences in between
compile and link decisions of linker.  
We can't even replace one body by another, since inlining done
at training stage may distribute the profile across multiple comdat
copies.  I suppose only way is to extend lto-symtab to actually
merge profiles.  To merge CFG profiles it will need to read in the body.

Martin is working on identical function merging, I suppose once we
implement CFG profile merging for that purpose, we can use it here.
I will try to look into this.
> 
> I take it gimp is built with LTO and therefore shouldn't be hitting
> this comdat issue?

I think gimp is still mostly C program. (did not double check my last
contribution to it is from 90s :)
> 
> Let me do a couple things:
> - port over my comdat inlining fix from the google branch to trunk and
> send it for review. If you or Martin could try it to see if it helps
> with function splitting to avoid the hits from the cold code that
> would be great
> - I'll add some new sanity checking to try to detect non-zero blocks
> in the cold section, or 0 blocks reached by non-zero edges and see if
> I can flush out any problems with my tests or a profiledbootstrap or
> gimp.
> - I'll try building and profiling gimp myself to see if I can
> reproduce the issue with code executing out of the cold section.
> 
> Thanks,
> Teresa
> 
> >>
> >> Also, can you send me reproduction instructions for gimp? I don't
> >> think I need Martin's patch, but which version of gimp and what is the
> >> equivalent way for me to train it? I have some scripts to generate a
> >> similar type of instruction heat map graph that I have been using to
> >> tune partitioning and function reordering. Essentially it uses linux
> >> perf to sample on instructions_retired and then munge the data in
> >> several ways to produce various stats and graphs. One thing that has
> >> been useful has been to combine the perf data with nm output to
> >> determine which cold functions are being executed at runtime.
> >
> > Martin?
> >
> >>
> >> However, for this to tell me which split cold bbs are being executed I
> >> need to use a patch that Sri sent for review several months back that
> >> gives the split cold section its own name:
> >>   http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html
> >> Steven had some follow up comments that Sri hasn't had a chance to address 
> >> yet:
> >>   http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html
> >> (cc'ing Sri as we should probably revive this patch soon to address
> >> gdb and other issues with detecting split functions properly)
> >
> > Intresting, I used linker script for this purposes, but that his GNU ld 
> > only...
> >
> > Honza
> >>
> >> Thanks!
> >> Teresa
> >>
> >> >
> >> > Honza
> >> >>
> >> >> Thanks,
> >> >> Teresa
> >> >>
> >> >> > I think we are really looking primarily for dead parts of the 
> >> >> > functions (sanity checks/error handling)
> >> >> > that should not be visited by train run.  We can then see how to make 
> >> >> > the heuristic more aggressive?
> >> >> >
> >> >> > Honza
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: Speculative call support in the callgraph

2013-08-09 Thread Jan Hubicka
> Jan Hubicka  writes:
> 
> > Hi,
> > this patch adds support for speculative calls into callgraph.  The idea is 
> > that
> > any IPA optimization that believes it knows likely target of an indirect 
> > call
> > (currently I use it for cross-module indirect call profiling, but I expect
> > Martin J. can easily add support for ipa-cp and I hope to add speculative
> > devirtualization in foreseeable future since it should make difference for
> > Firefox). Speculative call replaces indirect call
> 
> Patch appears to break boot strap on x86_64-linux 
> (or maybe your other one)

How do you bootstrap? This should not be used w/o lto+profiledbootstrap and
that seems to work for me...

I will re-try with clean tree now.

Honza
> 
> 0x827ab7 crash_signal
> ../../gcc/gcc/toplev.c:335
> 0x827ab7 crash_signal
> ../../gcc/gcc/toplev.c:335
> 0x827ab7 crash_signal
> ../../gcc/gcc/toplev.c:335
> 0x827ab7 crash_signal
> ../../gcc/gcc/toplev.c:335
> 0x827ab7 crash_signal
> ../../gcc/gcc/toplev.c:335
> 0x827ab7 crash_signal
> ../../gcc/gcc/toplev.c:335
> 0x827ab7 crash_signal
> ../../gcc/gcc/toplev.c:335
> 0x5d0c6c cgraph_speculative_call_info(cgraph_edge*, cgraph_edge*&,
> cgraph_edge*&, ipa_ref*&)
> ../../gcc/gcc/cgraph.c:1098
> 0x5d1270 cgraph_set_call_stmt(cgraph_edge*, gimple_statement_d*, bool)
> ../../gcc/gcc/cgraph.c:774
> 0x5d0c6c cgraph_speculative_call_info(cgraph_edge*, cgraph_edge*&,
> cgraph_edge*&, ipa_ref*&)
> ../../gcc/gcc/cgraph.c:1098
> 0x5d1270 cgraph_set_call_stmt(cgraph_edge*, gimple_statement_d*, bool)
> ../../gcc/gcc/cgraph.c:774
> 0x5d0c6c cgraph_speculative_call_info(cgraph_edge*, cgraph_edge*&,
> cgraph_edge*&, ipa_ref*&)
> ../../gcc/gcc/cgraph.c:1098
> ...
> 
> 
> -Andi
> 
> -- 
> a...@linux.intel.com -- Speaking for myself only


Re: Speculative call support in the callgraph

2013-08-09 Thread Jan Hubicka
> On Fri, Aug 9, 2013 at 1:24 PM, Jan Hubicka  wrote:
> >> I have not looked at the details. One high level question: this form
> >> seems to only support one indirect target case. LIPO uses TOPN
> >> indirect target profiling (tracking multiple targets), which can be
> >> used by LTO as well (when the topn profiling gets into trunk).
> >
> > Well, adding multiple direct edges for given call will need extension into
> > cgraph_turn_edge_to_speculative and cgraph_speculative_call_info APIs (to 
> > allow
> > multple direct edges) to indirect_info common_target datastructure, to
> > profiling histograms and to the gimple_ic code.  Otherwise there is nothing
> > really hard coded about single direct target.
> >
> > How much benefits do you see from having multiple direct targets?
> 
> It can be large depending on applications. For some apps, there are
> very callsites with 2 or 3 very hot targets.
> 
> > I would
> > expect them to be quite quickly disappearing as N increases...
> 
> For the large apps I see, N rarely exceeds 4. For most of the cases,
> the hot targets are 2 or 3 with 2 being very common.  In LIPO, we
> track 4 targets, but only use top 2.
> 
> > How the TOPN profiling counter is implemented?
> 
> Currently FDO's value profiler is one value based -- it is quite
> faulty because it can not handle cases where hot targets alternates.
> For instance for sequence 1, 2, 1, 2, 1, 2, 3,3, 1 -- the winner will
> be 3 instead of 1.

Or rather no-one, since 3 won't have high enough frequency.  The
counter was really intended to look for most common destination by
a huge margin.
(like for GCC, most devirtualizations actually have probability of 1,
so these are more missed cases of static analysis)
> 
> TOPN algorithm is a LFU based --- when match is not found, the least
> frequent used entry will be evicted. To avoid ping-pong effect, total
> eviction count is also tracked -- when it exceeds a threshold, the
> bottom counter will be clear to make room for hot entries (instead of
> letting them killing each other).

Ok, this seems to make sense.  Lets merge the profiling/transformation
infrastructure from LIPO and I will add support for multiple speculation
into callgraph.  I was trying to keep in mind that devirtualization may
also want to eventually do more than one target, but I tought it is
better to keep things easy at beggining.

Thanks,
Honza


Re: [PATCH, i386, MPX 1/X] Support of Intel MPX ISA

2013-08-09 Thread Joseph S. Myers
On Mon, 29 Jul 2013, Ilya Enkovich wrote:

> Hi,
> 
> Here is updated version of the patch. I removed redundant
> mode_for_bound, added comments to BOUND_TYPE and added -mmpx option.
> I also fixed bndmk/bndldx/bndstx constraints to avoid incorrect
> register allocation (created two new constraints for that).

I think the -mmpx option should be documented in invoke.texi, and the new 
machine modes / mode class should be documented in rtl.texi where other 
machine modes / mode classes are documented.  Beyond that, I have no 
comments on this patch revision.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Speculative call support in the callgraph

2013-08-09 Thread Xinliang David Li
On Fri, Aug 9, 2013 at 4:21 PM, Jan Hubicka  wrote:
>> On Fri, Aug 9, 2013 at 1:24 PM, Jan Hubicka  wrote:
>> >> I have not looked at the details. One high level question: this form
>> >> seems to only support one indirect target case. LIPO uses TOPN
>> >> indirect target profiling (tracking multiple targets), which can be
>> >> used by LTO as well (when the topn profiling gets into trunk).
>> >
>> > Well, adding multiple direct edges for given call will need extension into
>> > cgraph_turn_edge_to_speculative and cgraph_speculative_call_info APIs (to 
>> > allow
>> > multple direct edges) to indirect_info common_target datastructure, to
>> > profiling histograms and to the gimple_ic code.  Otherwise there is nothing
>> > really hard coded about single direct target.
>> >
>> > How much benefits do you see from having multiple direct targets?
>>
>> It can be large depending on applications. For some apps, there are
>> very callsites with 2 or 3 very hot targets.
>>
>> > I would
>> > expect them to be quite quickly disappearing as N increases...
>>
>> For the large apps I see, N rarely exceeds 4. For most of the cases,
>> the hot targets are 2 or 3 with 2 being very common.  In LIPO, we
>> track 4 targets, but only use top 2.
>>
>> > How the TOPN profiling counter is implemented?
>>
>> Currently FDO's value profiler is one value based -- it is quite
>> faulty because it can not handle cases where hot targets alternates.
>> For instance for sequence 1, 2, 1, 2, 1, 2, 3,3, 1 -- the winner will
>> be 3 instead of 1.
>
> Or rather no-one, since 3 won't have high enough frequency.  The
> counter was really intended to look for most common destination by
> a huge margin.
> (like for GCC, most devirtualizations actually have probability of 1,
> so these are more missed cases of static analysis)
>>
>> TOPN algorithm is a LFU based --- when match is not found, the least
>> frequent used entry will be evicted. To avoid ping-pong effect, total
>> eviction count is also tracked -- when it exceeds a threshold, the
>> bottom counter will be clear to make room for hot entries (instead of
>> letting them killing each other).
>
> Ok, this seems to make sense.  Lets merge the profiling/transformation
> infrastructure from LIPO and I will add support for multiple speculation
> into callgraph.  I was trying to keep in mind that devirtualization may
> also want to eventually do more than one target, but I tought it is
> better to keep things easy at beggining.

Ok. Rong, can you help rip the TOPN profiler part from google branch
and submit to trunk? TopN profiler can be used in other scenarios
other than indirect call profiling.

thanks,

David


>
> Thanks,
> Honza


Re: [PATCH] Enable non-complex math builtins from C99 for Bionic

2013-08-09 Thread Joseph S. Myers
On Sun, 28 Jul 2013, Alexander Ivchenko wrote:

> Hi Joseph, thanks for your comments.
> 
> I updated the patch:
> 
> 1) The function name as a second argument in libc_has_function target
> hook was removed - was not usefull so far.
> 2) By using contrib/config-list.mk (thanks for the hint - great tool!)
> and analysing tm.h files and what is included in them I have checked
> 197 targets. That analysis includes all issues that you raised in your
> comments - everything is fixed now. I don't like that sometimes we
> have to redefine the version of the hook back to the default one due
> to a poisoning of including elfos.h, but I couldn't find a better
> solution - I commented all those cases.
> 
> Regtesting is in progress now. I have already tested the patch before,
> so I don't expect to see any new problems.
> 
> If all the tests pass, is the patch OK for trunk?

This is OK, with function_gnu removed (nothing appears to use it), if no 
OS port maintainers object to the changes for their OSes within the next 
week.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC - Refactor tree.h

2013-08-09 Thread Mike Stump
On Aug 9, 2013, at 3:36 PM, Diego Novillo  wrote:
> This patch is still WIP.  It builds stage1, but I'm getting ICEs
> during stage 2.
> 
> The patch splits tree.h into three files:
> 
> - tree-core.h: All data structures, enums and typedefs from
>  tree.h
> 
> - tree-api.h: All extern function definitions from tree.h
> 
> - tree-macros.h: All macro accessors, tree checks and other
>  inline functions.

I don't like this split.  You focus in on the details and sort code by detail.  
I think this is wrong.  I want code sorted by the features and functions it 
provides, and all like this, go into explainable functional bins.  One day, a 
function might be implemented by a data structure, the next day, by a routine, 
or a macro, or an inline function, the form of it doesn't matter.

It is like sorting routines by the first character of the spelling of the name 
of the routine.  tree_to_int, goes into t.c, because tree starts with a t.  You 
rename the function, and suddenly it moves files?  Bad.

Examine double_int.  All macros for double int, go in double-int.h or 
double-int.c.  All function go in one or the other.  All routines goes in one 
or the other.  All double-int code goes into this bin.  What bin is it?  
macros?  No, the bin is double-int.

Likewise vec.[ch].  You want tree.h smaller, pick the largest abstract group 
that is contained in that file, and remove it to it's own file.  For example, 
fold.  fold is fold, fold is special.  fold can be in fold.h.  If you examine 
the nature of the code in tree.h:

/* In fold-const.c */

/* Non-zero if we are folding constants inside an initializer; zero 
   
   otherwise.  */
extern int folding_initializer;

This is a dead giveaway.  The second giveaway, fold-const.c exists.  That code 
isn't in tree.c.  Now, maybe you call it fold-const.h, maybe that's better, you 
get the idea.

Look for isolatable hunks that can exist in their own group of some sort.  Look 
for "/* In ", this is a better way to sort.

Re: Speculative call support in the callgraph

2013-08-09 Thread Andi Kleen
On Sat, Aug 10, 2013 at 01:15:11AM +0200, Jan Hubicka wrote:
> > Jan Hubicka  writes:
> > 
> > > Hi,
> > > this patch adds support for speculative calls into callgraph.  The idea 
> > > is that
> > > any IPA optimization that believes it knows likely target of an indirect 
> > > call
> > > (currently I use it for cross-module indirect call profiling, but I expect
> > > Martin J. can easily add support for ipa-cp and I hope to add speculative
> > > devirtualization in foreseeable future since it should make difference for
> > > Firefox). Speculative call replaces indirect call
> > 
> > Patch appears to break boot strap on x86_64-linux 
> > (or maybe your other one)
> 
> How do you bootstrap? This should not be used w/o lto+profiledbootstrap and
> that seems to work for me...

Neither LTO nor profiled

../gcc/configure --prefix=/pkg/gcc-$V-$D --enable-lto \
--with-plugin-ld=/usr/local/bin/ld-plugin  \
--disable-nls --enable-languages=$FRONTEND > /dev/null \
--enable-checking=release --disable-libstdcxx-pch  \
--disable-fixincl 

-Andi


[PATCH] MAINTAINERS: Update email address.

2013-08-09 Thread Carlos O'Donell
I'm working at Red Hat now as the glibc team lead within the tools group.

Please feel free to reach out to me if you have any glibc related questions.

Still having fun working on GNU tools :-)

Committed.

2013-08-09  Carlos O'Donell  

* MAINTAINERS (Write After Approval): Update email.

Index: MAINTAINERS
===
--- MAINTAINERS (revision 201643)
+++ MAINTAINERS (working copy)
@@ -478,7 +478,7 @@
 Dan Nicolaescu d...@ics.uci.edu
 Dorit Nuzman   do...@il.ibm.com
 David O'Brien  obr...@freebsd.org
-Carlos O'Donellcar...@codesourcery.com
+Carlos O'Donellcar...@redhat.com
 Peter O'Gorman po...@thewrittenword.com
 Andrea Ornsteinandrea.ornst...@st.com
 Seongbae Park  seongbae.p...@gmail.com
---

Cheers,
Carlos.


Re: Speculative call support in the callgraph

2013-08-09 Thread Andi Kleen
On Sat, Aug 10, 2013 at 02:25:21AM +0200, Andi Kleen wrote:
> On Sat, Aug 10, 2013 at 01:15:11AM +0200, Jan Hubicka wrote:
> > > Jan Hubicka  writes:
> > > 
> > > > Hi,
> > > > this patch adds support for speculative calls into callgraph.  The idea 
> > > > is that
> > > > any IPA optimization that believes it knows likely target of an 
> > > > indirect call
> > > > (currently I use it for cross-module indirect call profiling, but I 
> > > > expect
> > > > Martin J. can easily add support for ipa-cp and I hope to add 
> > > > speculative
> > > > devirtualization in foreseeable future since it should make difference 
> > > > for
> > > > Firefox). Speculative call replaces indirect call
> > > 
> > > Patch appears to break boot strap on x86_64-linux 
> > > (or maybe your other one)
> > 
> > How do you bootstrap? This should not be used w/o lto+profiledbootstrap and
> > that seems to work for me...
> 
> Neither LTO nor profiled

My tree bootstraps again when I revert

 * cgraphbuild.c (cgraph_rebuild_references): Rebuild only
 * non-speculative
   refs.
 * cgraph.c (cgraph_update_edge_in_call_site_hash): New
 * function.

svn+ssh://gcc.gnu.org/svn/gcc/trunk@201632

(and also the later patch)

-Andi