Re: Proposal for path splitting for reduction in register pressure for Loops.
On Sun, Mar 8, 2015 at 8:49 PM, Jeff Law wrote: > On 03/08/15 12:13, Richard Biener wrote: >> >> >> I see. This basically creates two loop latches and thus will make our >> loop detection code turn the loop into a fake loop nest. Not sure if that >> is a good idea. > > I'd have to sit down and ponder this for a while -- what would be the > register pressure implications of duplicating the contents of the join block > into its predecessors, leaving an empty joiner so that we still have a > single latch? Good question. Now another question is why we don't choose this way to disambiguate loops with multiple latches? (create a forwarder as new single latch block) Richard. > Jeff
wrong mirror on GCC mirror sites page
The list of mirror sites seems to have a bug: https://gcc.gnu.org/mirrors.html The mirror for Australia is listed as: http://mirrors-au.go-parts.com/gcc | ..., thanks to Dan Derebenskiy (dderebens...@go-parts.com) at Go-Parts. Going to the above site leads to a car parts seller. How did this get into the mirror list?
Re: wrong mirror on GCC mirror sites page
On 9 March 2015 at 13:04, Conrad S wrote: > The list of mirror sites seems to have a bug: > https://gcc.gnu.org/mirrors.html > > The mirror for Australia is listed as: > http://mirrors-au.go-parts.com/gcc | ..., thanks to Dan Derebenskiy > (dderebens...@go-parts.com) at Go-Parts. > > Going to the above site leads to a car parts seller. > > How did this get into the mirror list? Because they said they would provide mirrors: https://gcc.gnu.org/ml/gcc/2014-06/msg00251.html https://gcc.gnu.org/ml/gcc/2014-07/msg00156.html
Re: wrong mirror on GCC mirror sites page
On 9 March 2015 at 23:08, Jonathan Wakely wrote: >> How did this get into the mirror list? > > Because they said they would provide mirrors: > https://gcc.gnu.org/ml/gcc/2014-06/msg00251.html > https://gcc.gnu.org/ml/gcc/2014-07/msg00156.html Upon closer inspection there's actually more junk in the mirror list site: Australia: http://mirrors-au.go-parts.com/gcc Russia: http://mirrors-ru.go-parts.com/gcc UK: http://mirrors-uk.go-parts.com/gcc/ US: http://mirrors-usa.go-parts.com/gcc Perhaps all mirror sites need to be first checked before putting them up on the list.
RE: wrong mirror on GCC mirror sites page
Conrad S writes: > On 9 March 2015 at 23:08, Jonathan Wakely wrote: > >> How did this get into the mirror list? > > > > Because they said they would provide mirrors: > > https://gcc.gnu.org/ml/gcc/2014-06/msg00251.html > > https://gcc.gnu.org/ml/gcc/2014-07/msg00156.html > > Upon closer inspection there's actually more junk in the mirror list > site: > > Australia: http://mirrors-au.go-parts.com/gcc > Russia: http://mirrors-ru.go-parts.com/gcc > UK: http://mirrors-uk.go-parts.com/gcc/ > US: http://mirrors-usa.go-parts.com/gcc The last three here appear to work. I think you just got unlucky that the mirrors-au one is broken at the moment. Matthew
Re: wrong mirror on GCC mirror sites page
On 9 March 2015 at 13:12, Conrad S wrote: > Upon closer inspection there's actually more junk in the mirror list site: > > Australia: http://mirrors-au.go-parts.com/gcc > Russia: http://mirrors-ru.go-parts.com/gcc > UK: http://mirrors-uk.go-parts.com/gcc/ > US: http://mirrors-usa.go-parts.com/gcc > > Perhaps all mirror sites need to be first checked before putting them > up on the list. They do get checked before being added. Did you check them before declaring them junk? :-)
Re: wrong mirror on GCC mirror sites page
On 9 March 2015 at 23:31, Jonathan Wakely wrote: > They do get checked before being added. Did you check them before > declaring them junk? :-) Apologies. I incorrectly assumed that any mention of the "go-parts" domain was link spam, based on the Australian mirror being broken.
Re: Proposal for path splitting for reduction in register pressure for Loops.
On 03/09/15 05:40, Richard Biener wrote: On Sun, Mar 8, 2015 at 8:49 PM, Jeff Law wrote: On 03/08/15 12:13, Richard Biener wrote: I see. This basically creates two loop latches and thus will make our loop detection code turn the loop into a fake loop nest. Not sure if that is a good idea. I'd have to sit down and ponder this for a while -- what would be the register pressure implications of duplicating the contents of the join block into its predecessors, leaving an empty joiner so that we still have a single latch? Good question. Now another question is why we don't choose this way to disambiguate loops with multiple latches? (create a forwarder as new single latch block) Dunno. The RTL jump optimizer ought to eliminate the unnecessary jumping late in the optimization pipeline and creating the forwarder allows us to put the loop into a "more canonical" form for the loop optimizers. Seems like it'd be worth some experimentation. I'm having trouble seeing how Ajit's proposal helps reduce register pressure in any significant way except perhaps by exposing partially dead code. And if that's the primary benefit, we may better off actually building a proper framework for PDCE. I've often pondered if PDCE is really worth it. There's some good PLDI papers from many years ago. One is closely related to our RTL GCSE implementation (Knoop). But RTL seems the wrong place to be doing this. Click's GCM/GVN can achieve similar results by the nature of code motion algorithm IIRC, but as much as I like Click's algorithm, I wasn't ever able to get it to do anything significantly better than existing bits in GCC. Jeff
Proposal for adding splay_tree_find (to find elements without updating the nodes).
w.r.t, https://gcc.gnu.org/wiki/Speedup_areas where we want to improve the performance of splay trees. The function `splay_tree_node splay_tree_lookup (splay_tree, splay_tree_key);' updates the nodes every time a lookup is done. IIUC, There are places where we call this function in a loop i.e., we lookup different elements every time. e.g., In this exaple we are looking for a different `t' in each iteration. gcc/gimplify.c:1096: splay_tree_lookup (ctx->variables, (splay_tree_key) t) == NULL) Here, we change the tree itself `ctx' gcc/gimplify.c:5532: n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl); I think we don't need to update the tree in these cases at least. -Aditya
Undefined behavior due to 6.5.16.1p3
I was wondering whether GCC uses 6.5.16.1p3 of the C11 standard as a license to perform certain optimizations. If so, could anyone provide me an example program. In particular, I am interested about the "then the overlap shall be exact" part of 6.5.16.1p3: If the value being stored in an object is read from another object that overlaps in any way the storage of the first object, then the overlap shall be exact and the two objects shall have qualified or unqualified versions of a compatible type; otherwise, the behavior is undefined.
Re: Proposal for adding splay_tree_find (to find elements without updating the nodes).
On Mon, Mar 09, 2015 at 06:59:10PM +, vax mzn wrote: > w.r.t, https://gcc.gnu.org/wiki/Speedup_areas where we want to improve the > performance of splay trees. > > The function `splay_tree_node splay_tree_lookup (splay_tree, splay_tree_key);' > updates the nodes every time a lookup is done. > > IIUC, There are places where we call this function in a loop i.e., we lookup > different elements every time. So, this makes sense, but I've always wondered if it wouldn't make more sense to just use the red black tree in libstdc++ (or does it have a splay tree of its own too?) Trev > e.g., > In this exaple we are looking for a different `t' in each iteration. > > gcc/gimplify.c:1096: splay_tree_lookup (ctx->variables, (splay_tree_key) t) > == NULL) > > Here, we change the tree itself `ctx' > gcc/gimplify.c:5532: n = splay_tree_lookup (ctx->variables, > (splay_tree_key)decl); > > > I think we don't need to update the tree in these cases at least. > > > -Aditya >
RE: Proposal for adding splay_tree_find (to find elements without updating the nodes).
> Date: Mon, 9 Mar 2015 15:26:26 -0400 > From: tbsau...@tbsaunde.org > To: gcc@gcc.gnu.org > Subject: Re: Proposal for adding splay_tree_find (to find elements without > updating the nodes). > > On Mon, Mar 09, 2015 at 06:59:10PM +, vax mzn wrote: >> w.r.t, https://gcc.gnu.org/wiki/Speedup_areas where we want to improve the >> performance of splay trees. >> >> The function `splay_tree_node splay_tree_lookup (splay_tree, >> splay_tree_key);' >> updates the nodes every time a lookup is done. >> >> IIUC, There are places where we call this function in a loop i.e., we lookup >> different elements every time. > > So, this makes sense, but I've always wondered if it wouldn't make more > sense to just use the red black tree in libstdc++ (or does it have a > splay tree of its own too?) > > Trev > Red-black trees do have better cost of balancing, although at a cost of higher complexity. I'm not sure if using red-black trees would improve the performance because I don't have any data to back. -Aditya >> e.g., >> In this exaple we are looking for a different `t' in each iteration. >> >> gcc/gimplify.c:1096: splay_tree_lookup (ctx->variables, (splay_tree_key) t) >> == NULL) >> >> Here, we change the tree itself `ctx' >> gcc/gimplify.c:5532: n = splay_tree_lookup (ctx->variables, >> (splay_tree_key)decl); >> >> >> I think we don't need to update the tree in these cases at least. >> >> >> -Aditya >>
Re: Proposal for adding splay_tree_find (to find elements without updating the nodes).
On Mon, Mar 9, 2015 at 7:59 PM, vax mzn wrote: > w.r.t, https://gcc.gnu.org/wiki/Speedup_areas where we want to improve the > performance of splay trees. > > The function `splay_tree_node splay_tree_lookup (splay_tree, splay_tree_key);' > updates the nodes every time a lookup is done. > > IIUC, There are places where we call this function in a loop i.e., we lookup > different elements every time. > e.g., > In this exaple we are looking for a different `t' in each iteration. If that's really true, then a splay tree is a horrible choice of data structure. The splay tree will simply degenerate to a linked list. The right thing to do would be, not to "break" one of the key features of splay trees (i.e. the latest lookup is always on top), but to use another data structure. Ciao! Steven
ARM emit jump insn
Hi I'm doing an investigation trying to modify the generation of prologues and epilogues for ARM cross compiler. So far I could successfully add some lines to the prologue generation, but Im having problems with some thing: How do you suggest is the best way to emit an assembler line that makes a branch (not branch with link) to a certain label. I've tried the following: char funcname[] ="funcname"; (global) rtx mylabel , from; LABEL_NAME(mylabel) = funcname; from = emit_jump_insn(gen_jump (mylabel)); JUMP_LABEL (from) = mylabel; ++LABEL_NUSES (mylabel); But this emits a jump to an internal label (for example .L0) what I wanted to do is that those lines generate a jump to funcname: b funcname Thanks! -- __ Marcos Díaz Software Engineer San Lorenzo 47, 3rd Floor, Office 5 Córdoba, Argentina Phone: +54 351 4217888 / +54 351 4218211/ +54 351 7617452 Skype: markdiaz22
RE: Proposal for path splitting for reduction in register pressure for Loops.
-Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Monday, March 09, 2015 11:01 PM To: Richard Biener Cc: Ajit Kumar Agarwal; vmaka...@redhat.com; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: Proposal for path splitting for reduction in register pressure for Loops. On 03/09/15 05:40, Richard Biener wrote: > On Sun, Mar 8, 2015 at 8:49 PM, Jeff Law wrote: >> On 03/08/15 12:13, Richard Biener wrote: >>> >>> >>> I see. This basically creates two loop latches and thus will make >>> our loop detection code turn the loop into a fake loop nest. Not >>> sure if that is a good idea. >> >> I'd have to sit down and ponder this for a while -- what would be the >> register pressure implications of duplicating the contents of the >> join block into its predecessors, leaving an empty joiner so that we >> still have a single latch? > > Good question. Now another question is why we don't choose this way > to disambiguate loops with multiple latches? (create a forwarder as > new single latch block) >>Dunno. The RTL jump optimizer ought to eliminate the unnecessary jumping >>late in the optimization pipeline and creating the forwarder allows us to put >>the loop into a "more canonical" >>form for the loop optimizers. Seems like >>it'd be worth some experimentation. I agree with Jeff on this. The above approach of path splitting will keep the loop into more canonical form for the Loop optimizers. >>I'm having trouble seeing how Ajit's proposal helps reduce register pressure >>in any significant way except perhaps by exposing partially dead code. And >>if that's the primary benefit, we >>may better off actually building a proper >>framework for PDCE. Jeff: The above approach of path splitting is very similar to superblock formation as we are duplicating the join nodes into all its predecessors. By doing so, The predecessors blocks duplicated from the join nodes will achieve more granularity with the scope of the scheduling and the register allocation. Due to The bigger granularity of the predecessors blocks, the advantages will have similar to having superblock with more granularity. This gives more scheduling Opportunities of basic blocks scheduling the independent operations. Due to the above path splitting approach, the LRA will have a more significant impact with the respect to register allocation. Because of more granular predecessors blocks the scope of LRA will increase, and the LRA can reuse the registers thus impacting the Live range and the register pressure and we can use better heuristics with respect to spill code in LRA. >>I've often pondered if PDCE is really worth it. There's some good PLDI >>papers from many years ago. One is closely related to our RTL GCSE >>implementation (Knoop). But RTL seems the >>wrong place to be doing this. >>Click's GCM/GVN can achieve similar results by the nature of code motion >>algorithm IIRC, but as much as I like Click's algorithm, I wasn't ever able >>to get it to do anything significantly >>better than existing bits in GCC. Thanks Jeff. Thanks & Regards Ajit Jeff
Re: ARM emit jump insn
On 03/09/15 17:42, Marcos Díaz wrote: Hi I'm doing an investigation trying to modify the generation of prologues and epilogues for ARM cross compiler. So far I could successfully add some lines to the prologue generation, but Im having problems with some thing: How do you suggest is the best way to emit an assembler line that makes a branch (not branch with link) to a certain label. I've tried the following: char funcname[] ="funcname"; (global) rtx mylabel , from; LABEL_NAME(mylabel) = funcname; from = emit_jump_insn(gen_jump (mylabel)); JUMP_LABEL (from) = mylabel; ++LABEL_NUSES (mylabel); But this emits a jump to an internal label (for example .L0) what I wanted to do is that those lines generate a jump to funcname: b funcname Two approaches. If you want to represent the call in RTL, emit_call_insn (...). That's generally the way to emit a function call. However there are also cases where the call you're making does not necessarily confirm to the target ABI. In those cases most folks have an insn in their backend (a define_insn) that internally performs the call. If you have a named pattern fubar, you can emit an insn with that pattern using emit_insn (gen_fubar (arguments))) jeff
Re: Proposal for path splitting for reduction in register pressure for Loops.
On 2015-03-09 8:10 PM, Ajit Kumar Agarwal wrote: -Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Monday, March 09, 2015 11:01 PM To: Richard Biener Cc: Ajit Kumar Agarwal; vmaka...@redhat.com; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: Proposal for path splitting for reduction in register pressure for Loops. On 03/09/15 05:40, Richard Biener wrote: On Sun, Mar 8, 2015 at 8:49 PM, Jeff Law wrote: On 03/08/15 12:13, Richard Biener wrote: I see. This basically creates two loop latches and thus will make our loop detection code turn the loop into a fake loop nest. Not sure if that is a good idea. I'd have to sit down and ponder this for a while -- what would be the register pressure implications of duplicating the contents of the join block into its predecessors, leaving an empty joiner so that we still have a single latch? Good question. Now another question is why we don't choose this way to disambiguate loops with multiple latches? (create a forwarder as new single latch block) Dunno. The RTL jump optimizer ought to eliminate the unnecessary jumping late in the optimization pipeline and creating the forwarder allows us to put the loop into a "more canonical" >>form for the loop optimizers. Seems like it'd be worth some experimentation. I agree with Jeff on this. The above approach of path splitting will keep the loop into more canonical form for the Loop optimizers. I'm having trouble seeing how Ajit's proposal helps reduce register pressure in any significant way except perhaps by exposing partially dead code. And if that's the primary benefit, we >>may better off actually building a proper framework for PDCE. Jeff: The above approach of path splitting is very similar to superblock formation as we are duplicating the join nodes into all its predecessors. By doing so, The predecessors blocks duplicated from the join nodes will achieve more granularity with the scope of the scheduling and the register allocation. Due to The bigger granularity of the predecessors blocks, the advantages will have similar to having superblock with more granularity. This gives more scheduling Opportunities of basic blocks scheduling the independent operations. Due to the above path splitting approach, the LRA will have a more significant impact with the respect to register allocation. Because of more granular predecessors blocks the scope of LRA will increase, and the LRA can reuse the registers thus impacting the Live range and the register pressure and we can use better heuristics with respect to spill code in LRA. We already have superblock formations (-ftracer and -fsched2-use-superblocks). So it can be tried. But I guess they are not default for a reason. The proposal can improve code for some cases, e.g. I can say it can definitely help the current inheritance in LRA. The problem is to get a stable improvements. So many things melt here, e.g. I-cache behaviour (loop unrolling has the same problems), alignments and so on. It is very hard to predict the results for such optimizations for modern Intel CPUs which are very complicated block box interpreter roughly speaking. It is easy for less sophisticated architectures. It was easy for much more predictable Itanium. That is why we have already such code. In any case, it would be interesting to try RA on bigger blocks and EBBs. May be some things can be found what can be improved for RA. Spec95 f (a famous benchmark with huge basic block and very high register pressure) is too big to analyze RA behaviour by a human.
RE: Proposal for adding splay_tree_find (to find elements without updating the nodes).
> From: stevenb@gmail.com > Date: Mon, 9 Mar 2015 23:59:52 +0100 > Subject: Re: Proposal for adding splay_tree_find (to find elements without > updating the nodes). > To: hiradi...@msn.com > CC: gcc@gcc.gnu.org > > On Mon, Mar 9, 2015 at 7:59 PM, vax mzn wrote: >> w.r.t, https://gcc.gnu.org/wiki/Speedup_areas where we want to improve the >> performance of splay trees. >> >> The function `splay_tree_node splay_tree_lookup (splay_tree, >> splay_tree_key);' >> updates the nodes every time a lookup is done. >> >> IIUC, There are places where we call this function in a loop i.e., we lookup >> different elements every time. >> e.g., >> In this exaple we are looking for a different `t' in each iteration. > > > If that's really true, then a splay tree is a horrible choice of data > structure. The splay tree will simply degenerate to a linked list. The > right thing to do would be, not to "break" one of the key features of > splay trees (i.e. the latest lookup is always on top), but to use > another data structure. > > Ciao! > Steven So I have this patch which replaces splay_tree_lookup with a new function splay_tree_find at some places. I hope this is helpful. commit 64f203f36661efd95958474f31b588a134dedb41 Author: Aditya Date: Mon Mar 9 22:47:04 2015 -0500 add splay_tree_find for finding elements without updating the tree diff --git a/gcc/gimplify.c b/gcc/gimplify.c index d822913..1053eee 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1093,7 +1093,7 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p) /* Mark variable as local. */ if (ctx && !DECL_EXTERNAL (t) && (! DECL_SEEN_IN_BIND_EXPR_P (t) - || splay_tree_lookup (ctx->variables, + || splay_tree_find (ctx->variables, (splay_tree_key) t) == NULL)) { if (ctx->region_type == ORT_SIMD @@ -5529,7 +5529,7 @@ omp_firstprivatize_variable (struct gimplify_omp_ctx *ctx, tree decl) do { - n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl); + n = splay_tree_find (ctx->variables, (splay_tree_key)decl); if (n != NULL) { if (n->value & GOVD_SHARED) @@ -6428,7 +6428,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data) while (ctx != NULL) { splay_tree_node on - = splay_tree_lookup (ctx->variables, (splay_tree_key) decl); + = splay_tree_find (ctx->variables, (splay_tree_key) decl); if (on && (on->value & (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE | GOVD_PRIVATE | GOVD_REDUCTION | GOVD_LINEAR | GOVD_MAP)) != 0) @@ -6529,7 +6529,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree *list_p) case OMP_CLAUSE_FIRSTPRIVATE: case OMP_CLAUSE_LINEAR: decl = OMP_CLAUSE_DECL (c); - n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl); + n = splay_tree_find (ctx->variables, (splay_tree_key) decl); remove = !(n->value & GOVD_SEEN); if (! remove) { @@ -6551,7 +6551,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree *list_p) if (ctx->outer_context->combined_loop && !OMP_CLAUSE_LINEAR_NO_COPYIN (c)) { - n = splay_tree_lookup (ctx->outer_context->variables, + n = splay_tree_find (ctx->outer_context->variables, (splay_tree_key) decl); if (n == NULL || (n->value & GOVD_DATA_SHARE_CLASS) == 0) @@ -6578,7 +6578,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree *list_p) /* Make sure OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE is set to accurately reflect the presence of a FIRSTPRIVATE clause. */ decl = OMP_CLAUSE_DECL (c); - n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl); + n = splay_tree_find (ctx->variables, (splay_tree_key) decl); OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE (c) = (n->value & GOVD_FIRSTPRIVATE) != 0; break; @@ -6587,7 +6587,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree *list_p) decl = OMP_CLAUSE_DECL (c); if (!is_global_var (decl)) { - n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl); + n = splay_tree_find (ctx->variables, (splay_tree_key) decl); remove = n == NULL || !(n->value & GOVD_SEEN); if (!remove && TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE) { @@ -6600,7 +6600,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree *list_p) for (octx = ctx->outer_context; octx; octx = octx->outer_context) { - n =
GNU Tools Cauldron 2015 - Call for Abstracts and Participation
== GNU Tools Cauldron 2015 http://gcc.gnu.org/wiki/cauldron2015 Call for Abstracts and Participation August 7 to 9, Prague, Czech Republic == We are pleased to announce another gathering of GNU tools developers. The basic format of this meeting will be similar to the previous meetings. The purpose of this workshop is to gather all GNU tools developers, discuss current/future work, coordinate efforts, exchange reports on ongoing efforts, discuss development plans for the next 12 months, developer tutorials and any other related discussions. This time we will meet in Prague, Czech Republic from 7/Aug/2015 to 9/Aug/2015. Exact details on location and venue will be available shortly. We are inviting every developer working in the GNU toolchain: GCC, GDB, binutils, runtimes, etc. In addition to discussion topics selected at the conference, we are looking for advance submissions. If you have a topic that you would like to present, please submit an abstract describing what you plan to present. We are accepting three types of submissions: - Prepared presentations: demos, project reports, etc. - BoFs: coordination meetings with other developers. - Tutorials for developers. No user tutorials, please. Note that we will not be doing in-depth reviews of the presentations. Mainly we are looking for applicability and to decide scheduling. There will be time at the conference to add other topics of discussion, similarly to what we did at the previous meetings. To register your abstract, send e-mail to tools-cauldron-ad...@googlegroups.com. Your submission should contain the following information: Title: Authors: Abstract: If you intend to participate, but not necessarily present, please let us know as well. Send a message to tools-cauldron-ad...@googlegroups.com stating your intent to participate. Please indicate your dietary requirements, affiliation and t-shirt size.