On 11/27/13 15:31, Wei Mi wrote:
Hmm, maybe attack from the other direction? -- could we clear SCHED_GROUP_P
for each insn at the start of this loop in sched_analyze?
It's not as clean in the sense that SCHED_GROUP_P "escapes" the scheduler,
but it might be an option.
for (insn = head;; ins
>> Hmm, maybe attack from the other direction? -- could we clear SCHED_GROUP_P
>> for each insn at the start of this loop in sched_analyze?
>>
>> It's not as clean in the sense that SCHED_GROUP_P "escapes" the scheduler,
>> but it might be an option.
>>
>>for (insn = head;; insn = NEXT_INSN (in
On Tue, Nov 26, 2013 at 9:34 PM, Jeff Law wrote:
> On 11/26/13 12:33, Wei Mi wrote:
>>
>> On Mon, Nov 25, 2013 at 2:12 PM, Jeff Law wrote:
>>>
>>>
Doing the cleanup at the end of BB could ensure all the groups
inserted for macrofusion will be cleaned. For groups not at the end of
>
On 11/26/13 12:33, Wei Mi wrote:
On Mon, Nov 25, 2013 at 2:12 PM, Jeff Law wrote:
Doing the cleanup at the end of BB could ensure all the groups
inserted for macrofusion will be cleaned. For groups not at the end of
a block, no matter whether they are cleaned up or not, nothing will
happen b
On Mon, Nov 25, 2013 at 2:12 PM, Jeff Law wrote:
>
>>
>> Doing the cleanup at the end of BB could ensure all the groups
>> inserted for macrofusion will be cleaned. For groups not at the end of
>> a block, no matter whether they are cleaned up or not, nothing will
>> happen because other passes wi
Doing the cleanup at the end of BB could ensure all the groups
inserted for macrofusion will be cleaned. For groups not at the end of
a block, no matter whether they are cleaned up or not, nothing will
happen because other passes will not mess up those groups -- you said
cc0-setter/cc0-user was
On Mon, Nov 25, 2013 at 11:25 AM, Jeff Law wrote:
> On 11/25/13 12:16, Wei Mi wrote:
>>>
>>>
>>> I'll note you're doing an extra pass over all the RTL here. Is there
>>> any
>>> clean way you can clean SCHED_GROUP_P without that extra pass over the
>>> RTL?
>>> Perhaps when the group actually ge
On 11/25/13 12:16, Wei Mi wrote:
I'll note you're doing an extra pass over all the RTL here. Is there any
clean way you can clean SCHED_GROUP_P without that extra pass over the RTL?
Perhaps when the group actually gets scheduled?
jeff
With your help to understand that sched group will not
On Mon, Nov 25, 2013 at 10:36 AM, Jeff Law wrote:
> On 11/24/13 00:30, Wei Mi wrote:
>>
>> Sorry about the problem.
>>
>> For the failed testcase, it was compiled using -fmodulo-sched.
>> modulo-sched phase set SCHED_GROUP_P of a jump insn to be true, which
>> means the jump insn should be schedul
On 11/24/13 00:30, Wei Mi wrote:
Sorry about the problem.
For the failed testcase, it was compiled using -fmodulo-sched.
modulo-sched phase set SCHED_GROUP_P of a jump insn to be true, which
means the jump insn should be scheduled with prev insn as a group.
When modulo scheduling is finished, th
On Mon, Nov 25, 2013 at 2:08 AM, Alexander Monakov wrote:
> On Sat, 23 Nov 2013, Wei Mi wrote:
>> For the failed testcase, it was compiled using -fmodulo-sched.
>> modulo-sched phase set SCHED_GROUP_P of a jump insn to be true, which
>> means the jump insn should be scheduled with prev insn as a g
On Sat, 23 Nov 2013, Wei Mi wrote:
> For the failed testcase, it was compiled using -fmodulo-sched.
> modulo-sched phase set SCHED_GROUP_P of a jump insn to be true, which
> means the jump insn should be scheduled with prev insn as a group.
SMS doesn't set SCHED_GROUP_P by itself; did you mean tha
Sorry about the problem.
For the failed testcase, it was compiled using -fmodulo-sched.
modulo-sched phase set SCHED_GROUP_P of a jump insn to be true, which
means the jump insn should be scheduled with prev insn as a group.
When modulo scheduling is finished, the flag of SCHED_GROUP_P is not
clea
On Mon, Nov 4, 2013 at 1:51 PM, Wei Mi wrote:
> Thanks! The three patches are commited as r204367, r204369 and r204371.
>
r204369 caused:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59020
--
H.J.
Thanks! The three patches are commited as r204367, r204369 and r204371.
Regards,
Wei Mi.
On Sun, Nov 3, 2013 at 5:18 PM, Jan Hubicka wrote:
>> Ping. Is it ok for x86 maintainer?
>
> I tought I already approved the x86 bits.
>>
>> Thanks,
>> Wei Mi.
>>
>> On Wed, Oct 16, 2013 at 4:25 PM, Wei Mi
> Ping. Is it ok for x86 maintainer?
I tought I already approved the x86 bits.
>
> Thanks,
> Wei Mi.
>
> On Wed, Oct 16, 2013 at 4:25 PM, Wei Mi wrote:
> >> Go ahead and consider that pre-approved. Just send it to the list with a
> >> note that I approved it in this thread.
> >>
> >> Jeff
> >
Ping. Is it ok for x86 maintainer?
Thanks,
Wei Mi.
On Wed, Oct 16, 2013 at 4:25 PM, Wei Mi wrote:
>> Go ahead and consider that pre-approved. Just send it to the list with a
>> note that I approved it in this thread.
>>
>> Jeff
>
> Thanks! The new patch addressed Jeff's comments.
>
> Is it ok
On Thu, Oct 17, 2013 at 12:35 AM, Marek Polacek wrote:
> On Wed, Oct 16, 2013 at 04:25:58PM -0700, Wei Mi wrote:
>> +/* Return true if target platform supports macro-fusion. */
>> +
>> +static bool
>> +ix86_macro_fusion_p ()
>> +{
>> + if (TARGET_FUSE_CMP_AND_BRANCH)
>> +return true;
>> + e
On Wed, Oct 16, 2013 at 04:25:58PM -0700, Wei Mi wrote:
> +/* Return true if target platform supports macro-fusion. */
> +
> +static bool
> +ix86_macro_fusion_p ()
> +{
> + if (TARGET_FUSE_CMP_AND_BRANCH)
> +return true;
> + else
> +return false;
> +}
That looks weird, why not just
sta
> Go ahead and consider that pre-approved. Just send it to the list with a
> note that I approved it in this thread.
>
> Jeff
Thanks! The new patch addressed Jeff's comments.
Is it ok for x86 maintainer?
Thanks,
Wei Mi.
2013-10-16 Wei Mi
* gcc/config/i386/i386.c (memory_address_len
On 10/15/13 15:30, Wei Mi wrote:
Aren't you just trying to see if we have a comparison feeding the
conditional jump and if they're already adjacent? Do you actually need to
get the condition code regs to do that test?
Yes, I am trying to see if we have a comparison feeding the
conditional
Thanks for the comments. One question inlined. Preparing another patch
addressing the comments.
Regards,
Wei Mi.
On Tue, Oct 15, 2013 at 1:35 PM, Jeff Law wrote:
> On 10/03/13 12:24, Wei Mi wrote:
>>
>> Thanks,
>> Wei Mi.
>>
>> 2013-10-03 Wei Mi
>>
>> * gcc/config/i386/i386.c (memory
On 10/03/13 12:24, Wei Mi wrote:
Thanks,
Wei Mi.
2013-10-03 Wei Mi
* gcc/config/i386/i386.c (memory_address_length): Extract a part
of code to rip_relative_addr_p.
(rip_relative_addr_p): New Function.
(ix86_macro_fusion_p): Ditto.
(ix86_macro_fusi
On Tue, Sep 24, 2013 at 4:32 PM, Wei Mi wrote:
>>> It doesn't look right. IP relative address is only possible
>>> with TARGET_64BIT and
>>>
>>> 1. base == pc. Or
>>> 2. UUNSPEC_PCREL, UNSPEC_GOTPCREL, and
>>> NSPEC_GOTNTPOFF.
>>
>> Target 64bit should be tested above. We however output RIP add
Wednesday, September 25, 2013 2:12 AM
To: Wei Mi
Cc: Jan Hubicka; Alexander Monakov; Steven Bosscher; GCC Patches; David Li;
Kirill Yukhin
Subject: Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Tue, Sep 24, 2013 at 12:06 PM, Wei Mi wrote:
> This is the updated pat
>> It doesn't look right. IP relative address is only possible
>> with TARGET_64BIT and
>>
>> 1. base == pc. Or
>> 2. UUNSPEC_PCREL, UNSPEC_GOTPCREL, and
>> NSPEC_GOTNTPOFF.
>
> Target 64bit should be tested above. We however output RIP addresses
> also for basic symbol references. I.e. when ba
> > + gcc_assert (ok);
> > + base = parts.base;
> > + index = parts.index;
> > + disp = parts.disp;
> > +
> > + if (TARGET_64BIT && !base && !index)
> > +{
> > + rtx symbol = disp;
> > +
> > + if (GET_CODE (disp) == CONST
> > + && GET_CODE (XEXP (disp, 0)) == PLUS
> > +
On Tue, Sep 24, 2013 at 12:06 PM, Wei Mi wrote:
> This is the updated patch2.
> Changed:
> 1. For cmp/test with rip-relative addressing mem operand, don't group
> insns. Bulldozer also doesn't support fusion for cmp/test with both
> displacement MEM and immediate operand, while m_CORE_ALL doesn't
This is the updated patch2.
Changed:
1. For cmp/test with rip-relative addressing mem operand, don't group
insns. Bulldozer also doesn't support fusion for cmp/test with both
displacement MEM and immediate operand, while m_CORE_ALL doesn't
support fusion for cmp/test with MEM and immediate operand.
> You disable fusion for Budozer here sinze you did not add it into
> TARGET_FUSE_CMP_AND_BRANCH_64.
Ok, will add it.
>
> Perhaps we can have TARGET_FUSE_CMP_AND_BRANCH_64 and
> TARGET_FUSE_CMP_AND_BRANCH_32
> plus an macro TARGET_FUSE_CMP_AND_BRANCH that chose corresponding variant
> based
> o
>
> I am not sure if AMD hardware has any limitations here. It fuses only
> cmp/test
This is what Agner Fog says:
A CMP or TEST instruction immediately followed by a conditional jump can be
fused into a single macro-op. This applies to all versions of the CMP and TEST
instructions and all cond
> 2013-09-16 Wei Mi
>
> * gcc/config/i386/i386-c.c (ix86_target_macros_internal): Separate
> PROCESSOR_COREI7_AVX out from PROCESSOR_COREI7.
> * gcc/config/i386/i386.c (ix86_option_override_internal): Ditto.
> (ix86_issue_rate): Ditto.
> (ia32_multipass_d
>> Just notice another problem here:
>> processor_type only contains PROCESSOR_COREI7, so I cannot
>> differentiate Westmere and Sandybridge in x86-tune.def, which are
>> different for TARGET_FUSE_ALU_AND_BRANCH. So do I have to separate
>> m_SANDYBRIDGE out from m_COREI7?
>
> Yes, please.
>
> Than
Wei Mi writes:
>> Checking corei7/corei7-avx explicitly isn't a good idea.
>> It is also useful for Ivy Bridge and Haswell. I think you
>> should use a variable to control it, similar to
>> TARGET_FUSE_CMP_AND_BRANCH.
>>
>>
>> --
>> H.J.
>
> Different x86 microarchitectures support macro-fusion
On Fri, Sep 13, 2013 at 4:07 PM, Wei Mi wrote:
> On Fri, Sep 13, 2013 at 1:45 PM, Wei Mi wrote:
>> On Fri, Sep 13, 2013 at 12:09 PM, H.J. Lu wrote:
>>> On Fri, Sep 13, 2013 at 11:28 AM, Wei Mi wrote:
> Checking corei7/corei7-avx explicitly isn't a good idea.
> It is also useful for Ivy
On Fri, Sep 13, 2013 at 1:45 PM, Wei Mi wrote:
> On Fri, Sep 13, 2013 at 12:09 PM, H.J. Lu wrote:
>> On Fri, Sep 13, 2013 at 11:28 AM, Wei Mi wrote:
Checking corei7/corei7-avx explicitly isn't a good idea.
It is also useful for Ivy Bridge and Haswell. I think you
should use a var
On Fri, Sep 13, 2013 at 12:09 PM, H.J. Lu wrote:
> On Fri, Sep 13, 2013 at 11:28 AM, Wei Mi wrote:
>>> Checking corei7/corei7-avx explicitly isn't a good idea.
>>> It is also useful for Ivy Bridge and Haswell. I think you
>>> should use a variable to control it, similar to
>>> TARGET_FUSE_CMP_AN
On Fri, Sep 13, 2013 at 11:28 AM, Wei Mi wrote:
>> Checking corei7/corei7-avx explicitly isn't a good idea.
>> It is also useful for Ivy Bridge and Haswell. I think you
>> should use a variable to control it, similar to
>> TARGET_FUSE_CMP_AND_BRANCH.
>>
>>
>> --
>> H.J.
>
> Different x86 microarc
On Fri, Sep 13, 2013 at 10:28 AM, Wei Mi wrote:
>> Thanks. At this point you need feedback from x86 and scheduler maintainers.
>> I would recommend you to resubmit the patch with a Changelog text, and with
>> the text of the patch inline in the email (your last mail has the patch as a
>> binary a
> Checking corei7/corei7-avx explicitly isn't a good idea.
> It is also useful for Ivy Bridge and Haswell. I think you
> should use a variable to control it, similar to
> TARGET_FUSE_CMP_AND_BRANCH.
>
>
> --
> H.J.
Different x86 microarchitectures support macro-fusion for different
compare and br
> Thanks. At this point you need feedback from x86 and scheduler maintainers.
> I would recommend you to resubmit the patch with a Changelog text, and with
> the text of the patch inline in the email (your last mail has the patch as a
> binary attachment, which makes it harder to review and respon
On Thu, 12 Sep 2013, Wei Mi wrote:
> Thanks, fixed. New patch attached.
Thanks. At this point you need feedback from x86 and scheduler maintainers.
I would recommend you to resubmit the patch with a Changelog text, and with
the text of the patch inline in the email (your last mail has the patch
> Your new implementation is not efficient: when looping over BBs, you need to
> look only at the last insn of each basic block.
>
Thanks, fixed. New patch attached.
patch
Description: Binary data
On Wed, 11 Sep 2013, Wei Mi wrote:
> I agree with you that explicit handling in sched-deps.c for this
> feature looks not good. So I move it to sched_init (Instead of
> ix86_sched_init_global because ix86_sched_init_global is used to
> install scheduling hooks), and then it is possible for other
Thanks! Your method to adjust 'last' is more concise. I try it and it
works for small testcases. bootstrap and regression are ok. More
performance test is going on.
I agree with you that explicit handling in sched-deps.c for this
feature looks not good. So I move it to sched_init (Instead of
ix86_
On Wed, Sep 4, 2013 at 12:33 PM, Alexander Monakov wrote:
> On Wed, Sep 4, 2013 at 9:53 PM, Steven Bosscher wrote:
>>
>> On Wed, Sep 4, 2013 at 10:58 AM, Alexander Monakov wrote:
>> > Hello,
>> >
>> > Could you use the existing facilities instead, such as adjust_priority
>> > hook,
>> > or makin
Taking the same issue slot is not enough for x86. The compare and
branch need to be consecutive in binary to be macro-fused on x86.
Thanks,
Wei Mi.
On Wed, Sep 11, 2013 at 10:45 AM, Andrew Pinski wrote:
> On Wed, Sep 4, 2013 at 12:33 PM, Alexander Monakov wrote:
>> On Wed, Sep 4, 2013 at 9:53 P
On Wed, 11 Sep 2013, Wei Mi wrote:
> I tried that and it caused some regressions, so I choosed to do
> chain_to_prev_insn another time in add_branch_dependences. There could
> be some dependence between those two functions.
(please don't top-post on this list)
In that case you can adjust 'last
I tried that and it caused some regressions, so I choosed to do
chain_to_prev_insn another time in add_branch_dependences. There could
be some dependence between those two functions.
On Wed, Sep 11, 2013 at 2:58 AM, Alexander Monakov wrote:
>
>
> On Tue, 10 Sep 2013, Wei Mi wrote:
>
>> Because de
On Tue, 10 Sep 2013, Wei Mi wrote:
> Because deps_analyze_insn only analyzes data deps but no control deps.
> Control deps are included by add_branch_dependences. Without the
> chain_to_prev_insn in the end of add_branch_dependences, jmp will be
> control dependent on every previous insn in the
Because deps_analyze_insn only analyzes data deps but no control deps.
Control deps are included by add_branch_dependences. Without the
chain_to_prev_insn in the end of add_branch_dependences, jmp will be
control dependent on every previous insn in the same bb, and the cmp
and jmp group could still
On Fri, 6 Sep 2013, Wei Mi wrote:
> SCHED_GROUP works after I add chain_to_prev_insn after
> add_branch_dependences, in order to chain control dependences to prev
> insn for sched group.
chain_to_prev_insn is done in the end of deps_analyze_insn, why is that not
sufficient?
Alexander
Add a testcase. bootstrap and regression ok for the patch in last mail.
2013-09-09 Wei Mi
* gcc/testsuite/gcc.dg/macro-fusion-1.c: New.
Index: gcc/testsuite/gcc.dg/macro-fusion-1.c
===
--- gcc/testsuite/gcc.dg/macro-fusio
SCHED_GROUP works after I add chain_to_prev_insn after
add_branch_dependences, in order to chain control dependences to prev
insn for sched group. Here is the new patch. Testing is going on.
Thanks,
Wei Mi.
2013-09-06 Wei Mi
* config/i386/i386.c (ix86_macro_fusion_p): New function.
Thanks for the suggestions! I take a look at adjust_priority, and find
it may not guarantee to schedule cmp and jmp together. The priority is
used to choose a candidate from ready list. If cmp is the only insn in
ready list and there is another insn-A in queued set (insn-A's
dependence has been res
On Wed, Sep 4, 2013 at 9:53 PM, Steven Bosscher wrote:
>
> On Wed, Sep 4, 2013 at 10:58 AM, Alexander Monakov wrote:
> > Hello,
> >
> > Could you use the existing facilities instead, such as adjust_priority hook,
> > or making the compare-branch insn sequence a SCHED_GROUP?
>
>
> Or a define_bypas
On Wed, Sep 4, 2013 at 10:58 AM, Alexander Monakov wrote:
> Hello,
>
> Could you use the existing facilities instead, such as adjust_priority hook,
> or making the compare-branch insn sequence a SCHED_GROUP?
Or a define_bypass?
Ciao!
Steven
Hello,
Could you use the existing facilities instead, such as adjust_priority hook,
or making the compare-branch insn sequence a SCHED_GROUP?
Alexander
58 matches
Mail list logo