Pulling this one back as I have a better solution, patch coming shortly.
Thanks,
Teresa
On Fri, Mar 16, 2012 at 3:33 PM, Teresa Johnson wrote:
>
> Ping - now that stage 1 is open, could someone review?
>
> Thanks,
> Teresa
>
> On Sun, Dec 4, 2011 at 10:26 PM, Teresa Johnson wrote:
> > Latest pa
Ping - now that stage 1 is open, could someone review?
Thanks,
Teresa
On Sun, Dec 4, 2011 at 10:26 PM, Teresa Johnson wrote:
> Latest patch which improves the efficiency as described below is
> included here. Boostrapped and checked again with
> x86_64-unknown-linux-gnu. Could someone review?
>
The patch is good for google branches for now while waiting for upstream review.
David
On Sun, Dec 4, 2011 at 10:26 PM, Teresa Johnson wrote:
> Latest patch which improves the efficiency as described below is
> included here. Boostrapped and checked again with
> x86_64-unknown-linux-gnu. Could s
Latest patch which improves the efficiency as described below is
included here. Boostrapped and checked again with
x86_64-unknown-linux-gnu. Could someone review?
Thanks,
Teresa
2011-12-04 Teresa Johnson
* loop-unroll.c (decide_unroll_constant_iterations): Call loop
unroll tar
On Fri, Dec 2, 2011 at 11:59 AM, Xinliang David Li wrote:
> ;
>>
>> +/* Determine whether LOOP contains floating-point computation. */
>> +bool
>> +loop_has_FP_comp(struct loop *loop)
>> +{
>> + rtx set, dest;
>
> This probably should be extended to detect other long latency
> operations in the f
On Fri, Dec 2, 2011 at 11:36 AM, Andi Kleen wrote:
> Teresa Johnson writes:
>
> Interesting optimization. I would be concerned a little bit
> about compile time, does it make a measurable difference?
I haven't measured compile time explicitly, but I don't it should,
especially after I address yo
On Fri, Dec 2, 2011 at 11:36 AM, Andi Kleen wrote:
> Teresa Johnson writes:
>
> Interesting optimization. I would be concerned a little bit
> about compile time, does it make a measurable difference?
>
>> The attached patch detects loops containing instructions that tend to
>> incur high LCP (loo
;
>
> +/* Determine whether LOOP contains floating-point computation. */
> +bool
> +loop_has_FP_comp(struct loop *loop)
> +{
> + rtx set, dest;
This probably should be extended to detect other long latency
operations in the future.
> +
> + if (ix86_tune != PROCESSOR_COREI7_64 &&
> + ix86_
Teresa Johnson writes:
Interesting optimization. I would be concerned a little bit
about compile time, does it make a measurable difference?
> The attached patch detects loops containing instructions that tend to
> incur high LCP (loop changing prefix) stalls on Core i7, and limits
> their unrol
Thanks, Andreas. You are right in that fully peeling a loop is done by
a different code path (peel_loops_completely() and earlier in the tree
unroller).
Teresa
On Fri, Dec 2, 2011 at 12:54 AM, Andreas Krebbel
wrote:
> On Thu, Dec 01, 2011 at 11:39:36PM -0800, Teresa Johnson wrote:
>> To do this
On Thu, Dec 01, 2011 at 11:39:36PM -0800, Teresa Johnson wrote:
> To do this I leveraged the existing TARGET_LOOP_UNROLL_ADJUST target
> hook, which was previously only defined for s390. I added one
> additional call to this target hook, when unrolling for constant trip
> count loops. Previously it
The attached patch detects loops containing instructions that tend to
incur high LCP (loop changing prefix) stalls on Core i7, and limits
their unroll factor to try to keep the unrolled loop body small enough
to fit in the Corei7's loop stream detector which can hide LCP stalls
in loops.
To do thi
12 matches
Mail list logo