Hi,
> > I noticed there is below code/comments about may_be_zero field in loop
> > niter desc:
> >
> > tree may_be_zero;/* The boolean expression. If it evaluates to true,
> >the loop will exit in the first iteration (i.e.
> >its latch will not be executed),
Hi,
> On Tue, 2013-04-23 at 15:24 -0600, Jeff Law wrote:
>
> > Well, you have to copy the blocks, adjust the edges and rewrite the SSA
> > graph. I'd use duplicate_block to help.
> >
> > You really want to look at tree-ssa-threadupdate.c. There's a nice big
> > block comment which gives the
Hi,
> > > > Why can't we replace function force_expr_to_var_cost directly with
> function
> > > > computation_cost in tree-ssa-loop-ivopt.c?
> > > >
> > > > Actually I think it is inaccurate for the current recursive algorithm
> in
> > > > force_expr_to_var_cost to estimate expr cost. Instead
> co
Hi,
> > Why can't we replace function force_expr_to_var_cost directly with function
> > computation_cost in tree-ssa-loop-ivopt.c?
> >
> > Actually I think it is inaccurate for the current recursive algorithm in
> > force_expr_to_var_cost to estimate expr cost. Instead computation_cost can
> > cou
Hi,
> I'm still developping a new private target backend (gcc4.5.2) and I noticed
> something strange in the assembler generated for conditionnal jump.
>
>
> The compiled C code source is :
>
> void funct (int c) {
> int a;
> a = 7;
> if (c < 0)
> a = 4;
> return a;
> }
>
Hi,
> I'm looking at a missed optimizations in combine and it is similar to the one
> you've fixed in PR18942
> (http://thread.gmane.org/gmane.comp.gcc.patches/81504).
>
> I'm trying to make GCC optimize
> (leu:SI
> (plus:SI (reg:SI) (const_int -1))
> (const_int 1))
>
> into
>
> (leu:SI
>
Hi,
> I'm investigating an ICE in loop-iv.c:get_biv_step(). I hope you can shed
> some light on what the correct fix would be.
>
> The ICE happens when processing:
> ==
> (insn 111 (set (reg:SI 304)
>(plus (subreg:SI (reg:DI 251) 4)
> (const_int 1
>
> (
> On Mon, Nov 15, 2010 at 10:00 PM, Paolo Bonzini wrote:
> > We currently have 3 non-algorithmic maintainers:
> >
> > loop optimizer Zdenek Dvorak o...@ucw.cz
> > loop optimizer Daniel Berlin dber...@dberlin.org
> > l
Hi,
> Doloop optimization fails to be applied on the following inner loop
> when compiling for PowerPC (GCC -r162294) due to:
>
> Doloop: number of iterations too costly to compute.
strength reduction is performed in ivopts, introducing new variable:
for (p = inptr; p < something; p += 3)
..
Hi,
> > I'm working on decompiling x86-64 binary programs, using branches to rebuild
> > a control-flow graph and looking for loops. I've found a significant number
> > of irreducible loops in gcc-produced code (irreducible loops are loops with
> > more than one entry point), especially in -O3 opt
Hi,
> > Is there a way to pass to the unroller the maximum number of iterations
> > of the loop such that it can decide to avoid unrolling if
> > the maximum number is small.
> >
> > To be more specific, I am referring to the following case:
> > After the vectorizer decides to peel for alignment
Hi,
> I faced a similar issue a while ago. I filed a bug report
> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36712) In the end,
> I implemented a simple tree-level unrolling pass in our port
> which uses all the existing infrastructure. It works quite well for
> our purpose, but I hesitated t
Hi,
> Ok, I've actually gone a different route. Instead of waiting for the
> middle end to perform this, I've directly modified the parser stage to
> unroll the loop directly there.
I think this is a very bad idea. First of all, getting the information
needed to decide at this stage whether unro
Hi,
> 2) I was using a simple example:
>
> #pragma unroll 2
> for (i=0;i<6;i++)
> {
> printf ("Hello world\n");
> }
>
> If I do this, instead of transforming the code into :
> for (i=0;i<3;i++)
> {
> printf ("Hello world\n");
>
> On Wed, Oct 7, 2009 at 7:21 PM, Tobias Grosser
> wrote:
> > On Wed, 2009-10-07 at 18:30 +0200, Tobias Grosser wrote:
> >> On Wed, 2009-10-07 at 17:44 +0200, Richard Guenther wrote:
> >> > On Wed, Oct 7, 2009 at 5:35 PM, Tobias Grosser
> >> > wrote:
> >> > > On Wed, 2009-10-07 at 17:23 +0200, Ri
Hi,
> > Ah, indeed. Sorry for being confused. Is tree-niter-desc->assumptions
> > or ->may_be_zero non-NULL?
>
> Yes both. I attached the gdb content for both.
you need to check may_be_zero, which in your case should contain
something like N <= 49. If this condition is true, the number of
ite
Hi,
> I was wondering if it was possible to turn off the unrolling to
> certain loops. Basically, I'd like the compiler not to consider
> certain loops for unrolling but fail to see how exactly I can achieve
> that.
>
> I've traced the unrolling code to multiple places in the code (I'm
> working
Hi,
> IVOpts cannot identify start_26, start_4 and ivtmp_32_7 to be copies.
> The root cause is that expression 'i + start' is identified as a common
> expression between the test in the header and the index operation in the
> latch. This is unified by copy propagation or FRE prior to loop
> opt
Hi,
> Can we somehow make this fix contingent on ports that have suitable
> integral modes?
yes; however, maybe it would be easier to wait till Richard finishes the
work on not representing the overflow semantics in types (assuming that's
going to happen say in a few weeks?), which should make th
Hi,
> > I obviously thought about this. The issue with using a flag is
> > that there is no convenient place to stick it and that it makes
> > the distinction between the two variants less visible. Consider
> > the folding routines that take split trees for a start.
> >
> > IMHO using new tree-
Hi,
> > introducing new codes seems like a bad idea to me. There are many
> > places that do not care about the distinction between PLUS_EXPR and
> > PLUSV_EXPR, and handling both cases will complicate the code (see eg.
> > the problems caused by introducing POINTER_PLUS_EXPR vs PLUS_EXPR
> > dis
Hi,
in general, I like this proposal a lot. However,
> As a start there will be no-overflow variants of NEGATE_EXPR,
> PLUS_EXPR, MINUS_EXPR, MULT_EXPR and POINTER_PLUS_EXPR.
>
> The sizetypes will simply be operated on in no-overflow variants
> by default (by size_binop and friends).
>
> Nami
Hi,
> As far as I get it, there is no real failure here.
> Parloop, unaware of the array's upper bound, inserts the 'enough
> iterations' condition (i>400-1), and thereby
> makes the last iteration range from 400 upwards.
> VRP now has a constant it can compare to the array's upper bound.
> Cor
Hi,
> > but you only take the hash of the argument of the phi node (i.e., the
> > ssa name), not the computations on that it is based
>
> Is this something like what you had in mind ?
>
> gen_hash (stmt)
> {
>
> if (stmt == NULL)
> return 0;
>
> use_operand_p use_p;
> ssa_op_
Hi,
> >> >> So if the ssa_names are infact reused they won't be the same
> >> >> computations.
> >> >
> >> > do you also check this for ssa names inside the loop (in your example,
> >> > D.10_1?
> >>
> >> If we have to reinsert for a = phi (B) . We do the following checks.
> >>
> >> 1. If the edge
Hi,
> [Sorry about dropping the ball on this. I've had some trouble with
> internet connectivity and was on vacation for a few days. ]
>
> On Thu, Oct 2, 2008 at 2:56 AM, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> >> >> b) If a
Hi,
> > I would disagree on that. Whether a final value replacement is
> > profitable or not largely depends on whether it makes further
> > optimization of the loop possible or not; this makes it difficult
> > to find a good cost model. I think undoing FVR is a good approach
> > to solve this p
Hi,
> >> b) If any PHI node has count zero it can be inserted back and its
> >> corresponding computations removed, iff the argument of the PHI
> >> node
> >> still exists as an SSA variable. This means that we can insert
> >> a_1 = PHI if D.10_1 still exists and hasnt b
Hi,
> On Wed, Oct 1, 2008 at 3:59 PM, Richard Guenther
> <[EMAIL PROTECTED]> wrote:
> > On Wed, Oct 1, 2008 at 3:22 PM, Ramana Radhakrishnan <[EMAIL PROTECTED]>
> > wrote:
> >> Hi ,
> >>
> >> Based on the conversation in the thread at
> >> http://gcc.gnu.org/ml/gcc/2008-03/msg00513.html , we've t
Hi,
> > Based on the conversation in the thread at
> > http://gcc.gnu.org/ml/gcc/2008-03/msg00513.html , we've tried to get a
> > pass trying to undo final value replacement going. The initial
> > implementation was done by Pranav Bhandarkar when he was employed at
> > Azingo as part of work spons
Hi,
> b) If any PHI node has count zero it can be inserted back and its
> corresponding computations removed, iff the argument of the PHI node
> still exists as an SSA variable. This means that we can insert
> a_1 = PHI if D.10_1 still exists and hasnt been removed by
>
Hi,
> > I am probably missing something:
> >
> >> The basic idea is enabling cfglayout mode and then ensuring that insn
> >> stream and control flow are in sync with each other at all times. This
> >> is required because e.g. on Itanium the final bundling happens right
> >> after scheduling, and a
Hi,
I am probably missing something:
> The basic idea is enabling cfglayout mode and then ensuring that insn
> stream and control flow are in sync with each other at all times. This
> is required because e.g. on Itanium the final bundling happens right
> after scheduling, and any extra jumps emit
Hi,
> I'm trying to add a simple function to the callgraph using
> cgraph_add_new_function() ( new function body is obtained by function
> actually processed) .
> I put my pass in pass_tree_loop.sub as first pass just after
> pass_tree_loop_init pass, but I have some problems because the code
> th
Hi,
> > > To clarify what Richard means, your assertion that "you have updated
> > > SSA information" is false.
> > > If you had updated the SSA information, the error would not occur :).
> > >
> > > How exactly are you updating the ssa information?
> > >
> > > The general way to update SSA
Hi,
> To clarify what Richard means, your assertion that "you have updated
> SSA information" is false.
> If you had updated the SSA information, the error would not occur :).
>
> How exactly are you updating the ssa information?
>
> The general way to update SSA for this case would be:
>
> For
Hi,
> The error is rectified. The bug is in the function that calls fuse_loops().
> Now I am trying to transfer all the statements, using code -
>
> /* The following function fuses two loops. */
>
> void
> fuse_loops (struct loop *loop_a, struct loop *loop_b)
> {
> debug_loop (loop_a, 10);
>
Hi,
> I am trying to fuse two loops in tree level. For that, I am trying to
> transfer statements in the header of one loop to the header of the
> other one.
> The code " http://rafb.net/p/fha0IG57.html " contains the 2 loops.
> After moving a statement from one BB to another BB, do I need to
Hi,
> > > A statistics event consists of a function (optional), a statement
> > > (optional) and the counter ID. I converted the counters from
> > > tree-ssa-propagate.c as an example, instead of
> > >
> > > prop_stats.num_copy_prop++;
> > >
> > > you now write
> > >
> > >
Hi,
> This is an attempt to provide (pass) statistics collection. The
> goal is to provide infrastructure to handle the current (pass specific)
> statistics dumping that is done per function and per pass along the
> regular tree/rtl dumps as well as to allow CU wide "fancy" analysis.
>
> The mos
Hi,
> On 03/10/08 08:24, Richard Guenther wrote:
>
> >You could either do
> >
> >GIMPLE_ASSIGN
>
> But 'cond' would be an unflattened tree expression. I'm trying to avoid
> that.
>
> >or invent COND_GT_EXPR, COND_GE_EXPR, etc. (at least in GIMPLE
> >we always have a comparison in COND_EXPR_C
Hi,
> Now tree scalar evolution goes over PHI nodes and realises that
> aligned_src_35 has a scalar evolution {aligned_src_22 + 16, +, 16}_1)
> where aligned_src_22 is
> (const long int *) src0_12(D) i.e the original src pointer. Therefore
> to calculate aligned_src_62 before the second loop comp
Hi,
> On 3/9/08 3:24 PM, Zdenek Dvorak wrote:
>
> >however, it would make things simpler. Now, we need to distiguish
> >three cases -- SINGLE, UNARY and BINARY; if we pretended that
> >GIMPLE_COPY is an unary operator, this would be reduced just
> >to UNARY and B
Hi,
> >>So, what about adding a GIMPLE_COPY code? The code would have 0
> >>operands and used only for its numeric value.
> >
> >another possibility would be to make GIMPLE_COPY an unary operator, and
> >get rid of the SINGLE_RHS case altogether (of course, unlike any other
> >unary operator, it
Hi,
> So, what about adding a GIMPLE_COPY code? The code would have 0
> operands and used only for its numeric value.
another possibility would be to make GIMPLE_COPY an unary operator, and
get rid of the SINGLE_RHS case altogether (of course, unlike any other
unary operator, it would not requir
Hi,
> On Sun, Mar 9, 2008 at 2:17 PM, Diego Novillo <[EMAIL PROTECTED]> wrote:
> > On Sun, Mar 9, 2008 at 08:15, Richard Guenther
> > <[EMAIL PROTECTED]> wrote:
> >
> > > What is GIMPLE_SINGLE_RHS after all?
> >
> > Represents a "copy" operation, an operand with no operator (e.g., a = 3, b
>
Hi,
I just noticed an error in a part of the code that I converted, that
looks this way:
switch (gimple_assign_subcode (stmt))
{
case SSA_NAME:
handle_ssa_name ();
break;
case PLUS_EXPR:
handle_plus ();
break;
default:
something ();
}
The problem of course is that for
Hi,
> I'd like to know your experiences with the gcc loop optimizations.
>
> What loop optimizations (in your opinion) can be applied to a large
> number of programs and yield a (significant) improvement of the
> program run-time?
in general, I would say invariant motion, load/store motion, stre
Hi,
> I'm trying to add a simple statement to GIMPLE code adding a new pass,
> that I put in pass_tree_loop.sub as last pass just before
> pass_tree_loop_done pass. Just as test I'd like to add a call like:
>
> .palp = shmalloc (16);
>
> This is the code I'm using:
>
> t = build_fu
Hi,
> Zdenek, you committed changes to tree-tailcall.c but you didn't fully
> convert the file. Was that a mis-commit? The file does not compile and
> uses PHI_RESULT instead of gimple_phi_result.
the file compiles for me; it indeed uses PHI_RESULT, but since
that is equivalent to DEF_FROM_PT
Hi,
> Everything else should work well enough for passes to be converted.
> If anyone has some free cycles and are willing to put up with various
> broken bits, would you be willing to help converting passes? There is
> a list of the passes that need conversion in the tuples wiki
> (http://gcc.gn
Hi,
> >> I believe that this is something new and is most likely fallout from
> >> diego's reworking of the tree to rtl converter.
> >>
> >> To fix this will require a round of copy propagation, most likely in
> >> concert with some induction variable detection, since the most
> >> profitable plac
Hi,
> I believe that this is something new and is most likely fallout from
> diego's reworking of the tree to rtl converter.
>
> To fix this will require a round of copy propagation, most likely in
> concert with some induction variable detection, since the most
> profitable place for this will b
Hi,
> > > So I am guessing the Felix version is lucky there are
> > > no gratuitous temporaries to be saved when this happens,
> > > and the C code is unlucky and there are.
> > >
> > > Maybe someone who knows how the optimiser works can comment?
> >
> > One problem with departing from the ABI eve
Hi,
> traceback, tt, and ops follow. Why is this going wrong?
> [ gdb ] call debug_tree(arg0)
> type
> [ gdb ] call debug_tree(arg1)
> type
compilers in general (so that what
you say makes some sense)?
While I was mildly annoyed by your previous "contributions" to the
discussion in the gcc mailing list, I could tolerate those. But
answering a seriously ment question of a beginner by this confusing
and completely irrelevant drivel is another thing.
Sincerely,
Zdenek Dvorak
Hello,
> I have several global variables which are of type rtx. They are used
> in flow.c ia64.c and final.c. As stated in the internal doc with
> types. I add GTY(()) marker after the keyword 'extern'. for example:
> extern GTY(()) rtx a;
> these 'extern's are added in regs.h which is in
Hello,
> An important missing piece is correction of exported information for
> loop unrolling. As far as I can tell, for loop unrolled by factor N we
> need to clone MEM_ORIG_EXPRs and datarefs for newly-created MEMs, create
> no-dependence DDRs for those pairs, for which original DDR was
> no-d
Hello,
> And finally at the stage of rtl unrolling it looks like this:
> [6] r186 = r2 + C;
> r318 = r186 + 160;
> loop:
> r186 = r186 + 16
> if (r186 != r318) then goto loop else exit
>
> Then, in loop-unroll.c we call iv_number_of
Hello,
> >> And finally at the stage of rtl unrolling it looks like this:
> >> [6] r186 = r2 + C;
> >> r318 = r186 + 160;
> >> loop:
> >> r186 = r186 + 16
> >> if (r186 != r318) then goto loop else exit
> >>
> >> Then, in loop-unroll.c we call iv_number_of_iterations, whi
Hello,
> And finally at the stage of rtl unrolling it looks like this:
> [6] r186 = r2 + C;
> r318 = r186 + 160;
> loop:
> r186 = r186 + 16
> if (r186 != r318) then goto loop else exit
>
> Then, in loop-unroll.c we call iv_number_of_iterations, which eventually
> calls i
Hello,
> > Are there any folks out there who have projects for Stage 1 or Stage 2
> > that they are having trouble getting reviewed? Any comments
> > re. timing for Stage 3?
>
> Zadeck has the parloop branch patches, which I've been reviewing. I am
> not sure how many other patches are left, bu
Hello,
> I liked the idea of 'Reviewers' more than any of the other options.
> I would like to go with this patch, unless we find a much better
> option?
to cancel this category of maintainers completely? I guess it was
probably discussed before (I am too lazy to check), but the existence
of non
Hello,
> Can you send out your presentation too?
the slides and the example code are at
http://kam.mff.cuni.cz/~rakdver/slides-gcc2007.pdf
http://kam.mff.cuni.cz/~rakdver/diff_reverse.diff
Zdenek
Hello,
you can find the cheatsheet I used during my loop optimizations tutorial
on gccsummit at
http://kam.mff.cuni.cz/~rakdver/loopcheat.ps
Zdenek
Hello,
> Testing on tree-vectorizer testsuite and some of the GCC source files
> showed that frequent source of apparent loss of exported information
> were passes that performed basic block reordering or jump threading.
> The verifier asserted that number of loops was constant and the order
> the
Hello,
> > > It doesn't seem that the number of iterations analysis from loop-iv.c
> deals
> > > with EQ closing branches.
> >
> > loop-iv works just fine for EQ closing branches.
> >
>
> Thanks for the clarification (I didn't see EQ in iv_number_of_iterations's
> switch (cond)).
that is because
ed form here. */
> +
> + return 0;
> +}
> /* Return nonzero if the loop specified by LOOP is suitable for
>the use of special low-overhead looping instructions. DESC
>describes the number of iterations of the loop. */
> Index: modulo-sched.c
> =====
Hello,
> It doesn't seem that the number of iterations analysis from loop-iv.c deals
> with EQ closing branches.
loop-iv works just fine for EQ closing branches.
Zdenek
> One option is for sms to use
> doloop_condition_get/loop-iv analysis in their current form, and if failed
> check (on our ow
Hello,
> By "this change" I mean just commenting out the check in
> doloop_condition_get. After applying the patch that introduced DOLOOP
> patterns for SPU (http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01470.html)
> we needed this hack in order to be able to use the doloop_condition_get to
> retu
anyway, you cannot submit new changes for 4.1).
Zdenek
> Thanks,
> Vladimir
>
> On 6/12/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:
> >
> >Hello,
> >
> >> To make sure I understood you correctly, does it mean that the change
> >> (below in /*
Hello,
> So, I think I am still not convinced which way we want to access the RHS
> of a GS_ASSIGN.
>
> Since GS_ASSIGN can have various types of RHS, we originally had:
>
> gs_assign_unary_rhs (gs) <- Access the only operand on RHS
> gs_assign_binary_rhs1 (gs)<- Access the 1st RHS oper
Hello,
> Of course, instead of clock(), I'd like to use a non-intrusive
> mechanism. However, my research on this topic didn't lead to anything
> but perfsuite, which doesn't work very well for me (should it?).
>
> So here are the questions
>
> - how can I actually insert the code (I need to do
mewhere
else.
Zdenek
> Thanks,
> Vladimir
>
>
> On 6/12/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:
> >Hello,
> >
> >> In file loop_doloop.c function doloop_condition_get makes sure that
> >> the condition is GE or NE
> >> otherwise
Hello,
> In file loop_doloop.c function doloop_condition_get makes sure that
> the condition is GE or NE
> otherwise it prevents doloop optimizations. This caused a problem for
> a loop which had NE condition without unrolling and EQ if unrolling
> was run.
actually, doloop_condition_get is not a
Hello,
> I am trying to understand the usage of some functions in tree-affine.c
> file and I appreciate your help.
>
> For example; for the two memory accesses
> arr[b+8].X and arr[b+9].X, how does their affine combinations
> will look like after executing the following sequence of operation?
Hello,
> The number of floating point ops. in loop body.
> The number of memory ops. in loop body.
> The number of operands in loop body.
> The number of implicit instructions in loop body.
> The number of unique predicates in loop body.
> The number of indirect references in loop body.
> The numb
Hello,
> > The problem is, that it does not give any speedups (it is almost
> > completely compile-time neutral for compilation of preprocessed
> > gcc sources). I will check whether moving also edges to pools
> > changes anything, but so far it does not seem very promising :-(
>
> Well, the ben
Hello,
> Ian Lance Taylor <[EMAIL PROTECTED]> writes:
>
> > Zdenek Dvorak <[EMAIL PROTECTED]> writes:
> >
> > > The problem is, that it does not give any speedups (it is almost
> > > completely compile-time neutral for compilation of preprocessed
Hello,
as discussed in http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01133.html,
it might be a good idea to try moving cfg to alloc pools. The patch
below does that for basic blocks (each function has a separate pool
from that its basic blocks are allocated). At the moment, the patch
breaks preco
Hello,
> > Because the patch had other effects like adding a DCE after Copyprop
> > in the loop optimizer section.
> >
>
> Disable DCE after Copyprop in the loop optimizer section fixes my
> problem. Any idea why?
no, not really; it could be anything (it may even have nothing to do
with dce, pe
Hello,
> ii)
> In loop_version there are two calls to loop_split_edge_with
> 1. loop_split_edge_with (loop_preheader_edge (loop), NULL);
> 2. loop_split_edge_with (loop_preheader_edge (nloop), NULL);
> nloop is the versioned loop, loop is the original.
>
> loop_split_edge_with has the following
Hello,
> (based on gcc 4.1.1).
now that is a problem; things have changed a lot since then, so I am not
sure how much I will be able to help.
> 1. The problem was unveiled by compiling a testcase with dump turned
> on. The compilation failed while calling function get_loop_body from
> flow_loop_
Hello,
> 4. PR 31360: Missed optimization
>
> I don't generally mark missed optimization bugs as P1, but not hoisting
> loads of zero out of a 4-instruction loop is bad. Zdenek has fixed this
> on mainline. Andrew says that patch has a bug. So, what's the story here?
I found the problem, I wi
Hello,
> On Sun, 2007-04-22 at 14:44 +0200, Richard Guenther wrote:
> > On 4/22/07, Laurent GUERBY <[EMAIL PROTECTED]> wrote:
> > > > > but also does not make anyone actually use the options. Nobody reads
> > > > > the documention. Of course, this is a bit overstatement, but with a
> > > > > few
> Look from what we're starting:
>
> <<
> @item -funroll-loops
> @opindex funroll-loops
> Unroll loops whose number of iterations can be determined at compile
> time or upon entry to the loop. @option{-funroll-loops} implies
> @option{-frerun-cse-after-loop}. This option makes code larger,
> and
Hello,
> Steve Ellcey wrote:
>
> >This seems unfortunate. I was hoping I might be able to turn on loop
> >unrolling for IA64 at -O2 to improve performance. I have only started
> >looking into this idea but it seems to help performance quite a bit,
> >though it is also increasing size quite a bi
Hello,
> Well, the target architecture is actually quite peculiar, it's a
> parallel SPMD machine. The only similarity with MIPS is the ISA. The
> latency I'm trying to hide is somewhere around 24 cycles, but because it
> is a parallel machine, up to 1024 threads have to stall for 24 cycles in
Hello,
> 2. Right now I am inserting a __builting_prefetch(...) call immediately
> before the actual read, getting something like:
> D.1117_12 = &A[D.1101_14];
> __builtin_prefetch (D.1117_12, 0, 1);
> D.1102_16 = A[D.1101_14];
>
> However, if I enable the instruction scheduler pass, it doesn
Hello,
> >Remarks:
> >-- it would be guaranteed that the indices of each memory reference are
> > independent, i.e., that &ref[idx1][idx2] == &ref[idx1'][idx2'] only
> > if idx1 == idx1' and idx2 = idx2'; this is important for dependency
> > analysis (and for this reason we also need to reme
Hello,
> >> >> That is, unless we could share most of the index struct (upper,
> >> >> lower, step) among expressions that access them (IE make index be
> >> >> immutable, and require unsharing and resharing if you want to modify
> >> >> the expression).
> >> >
> >> >That appears a bit dangerous
Hello,
> >at the moment, any pass that needs to process memory references are
> >complicated (or restricted to handling just a limited set of cases) by
> >the need to interpret the quite complex representation of memory
> >references that we have in gimple. For example, there are about 1000 of
>
Hello,
> > -- base of the reference
> > -- constant offset
> > -- vector of indices
> > -- type of the accessed location
> > -- original tree of the memory reference (or another summary of the
> > structure of the access, for aliasing purposes)
> > -- flags
>
> What do you do with Ada COMPO
Hello,
> >> >-- flags
> >> >
> >> >for each index, we remeber
> >> >-- lower and upper bound
> >> >-- step
> >> >-- value of the index
> >>
> >> This seems a lot, however, since most of it can be derived from the
> >> types, why are we also keeping it in the references.
> >
> >The lower bound and
Hello,
> >Proposal:
> >
> >For each memory reference, we remember the following information:
> >
> >-- base of the reference
> >-- constant offset
> >-- vector of indices
> >-- type of the accessed location
> >-- original tree of the memory reference (or another summary of the
> > structure o
Hello,
> >> This looks like a very complicated (though very generic) way of
> >> specifying a memory
> >> reference. Last time we discussed this I proposed to just have BASE,
> >OFFSET
> >> and accessed TYPE (and an alias tag of the memory reference). I realize
> >> this
> >> doesn't cover acce
Hello,
> This looks like a very complicated (though very generic) way of
> specifying a memory
> reference. Last time we discussed this I proposed to just have BASE, OFFSET
> and accessed TYPE (and an alias tag of the memory reference). I realize
> this
> doesn't cover accesses to multi-dimensi
Hello,
> This looks like a very complicated (though very generic) way of
> specifying a memory
> reference. Last time we discussed this I proposed to just have BASE, OFFSET
> and accessed TYPE (and an alias tag of the memory reference). I realize
> this
> doesn't cover accesses to multi-dimensi
Hello,
at the moment, any pass that needs to process memory references are
complicated (or restricted to handling just a limited set of cases) by
the need to interpret the quite complex representation of memory
references that we have in gimple. For example, there are about 1000 of
lines of quite
Hello,
> > only gimple_vals (name or invariant). However, the expressions are
> >matched in final_cleanup dump (after out-of-ssa and ter), so this no
> >longer is the case. I think just the regular expressions need to be
> >updated.
>
> Then IV-OPTs has an issue too but IV-OPTs dump gives:
> D
1 - 100 of 228 matches
Mail list logo