Re: How to make use of instruction scheduling to improve performance?
2007/7/28, Ramana Radhakrishnan <[EMAIL PROTECTED]>: > Hi, > > > On 7/28/07, 吴曦 <[EMAIL PROTECTED]> wrote: > > > > > > I am working on gcc 4.1.1 and itanium2 architecture. I instrumented > > > > > > each ld and st instruction in final_scan_insn() by looking at the > > > > > > insn > > > > > > template (These instrumentations are used to do some security > > > > > > checks). > > > > > > These instrumentations incur high performance overhead when running > > > > > > specint benchmarks. However, these instrumentations contain high > > > > > > dependencies between instructions so that I want to use instruction > > > > > > scheduling to improve the performance. > > > > > > In the current implementation, the instrumentations are emitted > > > > > > as > > > > > > assembly instructions (not insns). What should I do to make use of > > > > > > the > > > > > > instruction scheduler? > > > > > > > > > > If I understand your description, you are adding instrumentation code, > > > > > and you want to expose that code to the scheduler. What you need to > > > > > do in that case is to add the code as RTL instructions before the > > > > > scheduling pass runs. You will need to figure out the RTL which will > > > > > do what you want. Then you will need to insert it around the > > > > > > > > > instructions which you want to instrument. You will probably want to > > > > ~ > > > > Before the second scheduling pass, how to identify that one insn will > > > > be output as a load instruction (or store instruction)? In the final, > > > > i use get_insn_template() to do this matching. Can I use the same > > > > method before the second scheduling pass? If not, would you mind > > > > giving some hints? thx > > > > > > Please send followups to the mailing list, not just to me. Thanks. > > > > > > You should just match on the RTL. I don't know enough about the > > > Itanium to tell you precisely what to look for. But, for example, you > > > might look for > > >s = single_set (PATTERN (insn)); > > >if (s != NULL && (MEM_P (SET_SRC (s) || MEM_P (SET_DEST (s) > > > ... > > > > > > Ian > > > > > > > Thanks. I observe that the 2nd instruction scheduling happens after > > the local and global allocation. However, in my instrumentation, I > > need several registers to do computation, can I allocate registers to > > do computation in the instrumentation code just before the 2nd > > instruction scheduling? If so, would you mind giving some hints on the > > interfaces that I could make use of. > > Generally you should be able to create new temporaries for such > calculations before register allocation / reload . Otherwise you might > have to resort to reserving a couple of registers in your ABI for such > computations if you wanted these generated after reload (you could > have a split that did that after reload but where in the function do > you want to insert the instrumentation code ?) > > From what you are indicating - there isn't enough detail about where ~ > in the function body you are inserting such instrumentation code - thx, As I have in indicated, I want to add instrumentations for each ld and st instruction in one function on itanium. (In my current implementation, I also instrument cmp and mv instructions on itanium). for example, for a ld instruction in the original program: ld rX=[rY] I want to instrument it as instrumentation prologue ld rX=[rY] instrumentation epilogue currently, to identify such ld instruction, I put my instrumentation in final, and use get_insn_template() to see what instruction this insn will be output as. To summarize, as I want to expose my instrumentation to instruction scheduling, following work should be done: 1. identify that one insn will be output as a ld instruction 2. allocate register to do the instrumentation calculation (in my current implementation, I use dedicated register to do this.) 3. emit the prepared instrumentation insn > > If you are doing such instrumentation in the prologue or epilogue of a > function, you could choose to use gen_reg_rtx to obtain a temporary > register. > > So typically obtain a temporary register in the following manner > rtx tmp_reg = gen_reg_rtx (mode); > > Use the tmp_reg in whatever instruction you want to generate using the > corresponding register as one of the operands . For these you might > want to use the corresponding gen_*** named functions . > > cheers > Ramana > > > > > > > > > >Besides, what happens if I move the insertion of instrumentation > > before register allocation, or even before the 1st scheduling pass, > > can I identify load/store i
Re: How to make use of instruction scheduling to improve performance?
2007/7/28, 吴曦 <[EMAIL PROTECTED]>: > 2007/7/28, Ramana Radhakrishnan <[EMAIL PROTECTED]>: > > Hi, > > > > > > On 7/28/07, 吴曦 <[EMAIL PROTECTED]> wrote: > > > > > > > I am working on gcc 4.1.1 and itanium2 architecture. I > > > > > > > instrumented > > > > > > > each ld and st instruction in final_scan_insn() by looking at the > > > > > > > insn > > > > > > > template (These instrumentations are used to do some security > > > > > > > checks). > > > > > > > These instrumentations incur high performance overhead when > > > > > > > running > > > > > > > specint benchmarks. However, these instrumentations contain high > > > > > > > dependencies between instructions so that I want to use > > > > > > > instruction > > > > > > > scheduling to improve the performance. > > > > > > > In the current implementation, the instrumentations are > > > > > > > emitted as > > > > > > > assembly instructions (not insns). What should I do to make use > > > > > > > of the > > > > > > > instruction scheduler? > > > > > > > > > > > > If I understand your description, you are adding instrumentation > > > > > > code, > > > > > > and you want to expose that code to the scheduler. What you need to > > > > > > do in that case is to add the code as RTL instructions before the > > > > > > scheduling pass runs. You will need to figure out the RTL which > > > > > > will > > > > > > do what you want. Then you will need to insert it around the > > > > > > > > > > > instructions which you want to instrument. You will probably want > > > > > > to > > > > > ~ > > > > > Before the second scheduling pass, how to identify that one insn will > > > > > be output as a load instruction (or store instruction)? In the final, > > > > > i use get_insn_template() to do this matching. Can I use the same > > > > > method before the second scheduling pass? If not, would you mind > > > > > giving some hints? thx > > > > > > > > Please send followups to the mailing list, not just to me. Thanks. > > > > > > > > You should just match on the RTL. I don't know enough about the > > > > Itanium to tell you precisely what to look for. But, for example, you > > > > might look for > > > >s = single_set (PATTERN (insn)); > > > >if (s != NULL && (MEM_P (SET_SRC (s) || MEM_P (SET_DEST (s) > > > > ... > > > > > > > > Ian > > > > > > > > > > Thanks. I observe that the 2nd instruction scheduling happens after > > > the local and global allocation. However, in my instrumentation, I > > > need several registers to do computation, can I allocate registers to > > > do computation in the instrumentation code just before the 2nd > > > instruction scheduling? If so, would you mind giving some hints on the > > > interfaces that I could make use of. > > > > Generally you should be able to create new temporaries for such > > calculations before register allocation / reload . Otherwise you might > > have to resort to reserving a couple of registers in your ABI for such > > computations if you wanted these generated after reload (you could > > have a split that did that after reload but where in the function do > > you want to insert the instrumentation code ?) > > > > From what you are indicating - there isn't enough detail about where > ~ > > in the function body you are inserting such instrumentation code - > > thx, As I have in indicated, I want to add instrumentations for each > ld and st instruction in one function on itanium. (In my current > implementation, I also instrument cmp and mv instructions on itanium). > for example, for a ld instruction in the original program: > ld rX=[rY] > I want to instrument it as > instrumentation prologue > ld rX=[rY] > instrumentation epilogue > currently, to identify such ld instruction, I put my instrumentation > in final, and use get_insn_template() to see what instruction this > insn will be output as. > > To summarize, as I want to expose my instrumentation to instruction > scheduling, following work should be done: > 1. identify that one insn will be output as a > ld instruction > 2. allocate register to do the instrumentation > calculation (in my current implementation, I use dedicated register to > do this.) > 3. emit the prepared instrumentation insn > > > > If you are doing such instrumentation in the prologue or epilogue of a > > function, you could choose to use gen_reg_rtx to obtain a temporary > > register. > > > > So typically obtain a temporary register in the following manner > > rtx tmp_reg = gen_reg_rtx (mode); > > > > Use the tmp_reg in whatever instruction you want to generate using the > > corresponding register as one of the op
Re: How to make use of instruction scheduling to improve performance?
"吴曦" <[EMAIL PROTECTED]> writes: > there are some questions after I read the source code today. > 1st. if I add the instrumentation before 2nd scheduling; will gcc emit > an insn which will be output as a ld instruction later? If this could > happen, some ld instruction may not be instrumented... No, gcc won't introduce any new memory load or store instructions after the prologue and epilogue instructions are threaded. It may still move them around or eliminate them, though. > 2nd. to identify ld/st instruction (memory access op), I want to > modify gen_rtx_SET, the method is that, if I find SRC or DST is an > memory operand in gen_rtx_SET, then add instrumentation code before > and after the insn to emit. Will this method work? Besides, if some > false positives occur, how to correct them (I don't have some very > clear idea.) Modifying gen_rtx_SET is probably not the right way to go. That is used in many places throughout the RTL passes. Not all of those places are going to be able to cope with the new instructions you want to add. Ian
Re: How to make use of instruction scheduling to improve performance?
28 Jul 2007 09:04:01 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>: > "吴曦" <[EMAIL PROTECTED]> writes: > > > there are some questions after I read the source code today. > > 1st. if I add the instrumentation before 2nd scheduling; will gcc emit > > an insn which will be output as a ld instruction later? If this could > > happen, some ld instruction may not be instrumented... > > No, gcc won't introduce any new memory load or store instructions > after the prologue and epilogue instructions are threaded. It may ~~~ when are prologue and epilogue instructions threaded? (after register allocation? besides, what is the exact meaning of "prologue and epilogue instructions are threaded"? Would you mind explaining in more detail? thx :-)) > still move them around or eliminate them, though. ~~ emmm, I need to move/remove my instrumentation if necessary... > > > 2nd. to identify ld/st instruction (memory access op), I want to > > modify gen_rtx_SET, the method is that, if I find SRC or DST is an > > memory operand in gen_rtx_SET, then add instrumentation code before > > and after the insn to emit. Will this method work? Besides, if some > > false positives occur, how to correct them (I don't have some very > > clear idea.) > > Modifying gen_rtx_SET is probably not the right way to go. That is ~ Then, what about modifying machine description file? Add define_expand for the define_insn which will output ld/st instruction (this define_expand can insert instrumentation insns. Of course, I need to identify the operands to the define_expand contains a memory operand and a reg operand.) > used in many places throughout the RTL passes. Not all of those > places are going to be able to cope with the new instructions you want > to add. > > Ian > Thanks for your hints again :-)
Re: GCC 4.2.1 : bootstrap fails at stage 2. compiler produces wrong binary for wrong processor
The default cpu is v8plus. v9 actually, which automatically enables the V8+ stuff in 32-bit mode. -- Eric Botcazou
Re: How to make use of instruction scheduling to improve performance?
"吴曦" <[EMAIL PROTECTED]> writes: > 28 Jul 2007 09:04:01 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>: > > "吴曦" <[EMAIL PROTECTED]> writes: > > > > > there are some questions after I read the source code today. > > > 1st. if I add the instrumentation before 2nd scheduling; will gcc emit > > > an insn which will be output as a ld instruction later? If this could > > > happen, some ld instruction may not be instrumented... > > > > No, gcc won't introduce any new memory load or store instructions > > after the prologue and epilogue instructions are threaded. It may > ~~~ > when are prologue and epilogue instructions threaded? (after register > allocation? besides, what is the exact meaning of "prologue and > epilogue instructions are threaded"? Would you mind explaining in more > detail? thx :-)) If you look in gcc/passes.c you will see the list of passes. The prologue and epilogue instructions are threaded in pass_thread_prologue_and_epilogue. This happens after register allocation. It means that the prologue and epilogue instructions are added to the RTL, so that the second scheduling pass can see them. > > still move them around or eliminate them, though. > ~~ > emmm, I need to move/remove my instrumentation if necessary... Yes. This is true by definition, since you want to instrument before the second scheduling pass. The scheduler can and will move load and store instructions. You need to set up the dependencies so that your instrumentation will still occur at the right time. > > > 2nd. to identify ld/st instruction (memory access op), I want to > > > modify gen_rtx_SET, the method is that, if I find SRC or DST is an > > > memory operand in gen_rtx_SET, then add instrumentation code before > > > and after the insn to emit. Will this method work? Besides, if some > > > false positives occur, how to correct them (I don't have some very > > > clear idea.) > > > > Modifying gen_rtx_SET is probably not the right way to go. That is > ~ > Then, what about modifying machine description file? Add define_expand > for the define_insn which will output ld/st instruction (this > define_expand can insert instrumentation insns. Of course, I need to > identify the operands to the define_expand contains a memory operand > and a reg operand.) That will work in some sense, but if a load or store instruction is eliminated you are quite likely to still have the instrumentation instructions lying around. Ian
Re: GCC 4.2.1 : bootstrap fails at stage 2. compiler produces wrong binary for wrong processor
>> The default cpu is v8plus. > > v9 actually, which automatically enables the V8+ stuff in 32-bit mode. That isn't what I see here. The output binary was definately for a v8plus processor. That would be a UltraSparc 1 at the least. ELF Header ei_magic: { 0x7f, E, L, F } ei_class: ELFCLASS32 ei_data: ELFDATA2MSB e_machine: EM_SPARC32PLUS e_version:EV_CURRENT e_type: ET_EXEC e_flags:[ EF_SPARC_32PLUS ] e_entry: 0x121a8 e_ehsize: 52 e_shstrndx: 20 e_shoff: 0x1ab50 e_shentsize: 40 e_shnum: 21 e_phoff: 0x34 e_phentsize: 32 e_phnum: 5 so .. there you have it. Dennis
Re: gcc register allocation
"Purll, Duncan" <[EMAIL PROTECTED]> writes: > DISCLAIMER: > Unless indicated otherwise, the information contained in this message is > privileged and confidential, and is intended only for the use of the > addressee(s) named above and others who have been specifically authorized to > receive it. If you are not the intended recipient, you are hereby notified > that any dissemination, distribution or copying of this message and/or > attachments is strictly prohibited. The company accepts no liability for any > damage caused by any virus transmitted by this email. Furthermore, the > company does not warrant a proper and complete transmission of this > information, nor does it accept liability for any delays. If you have > received this message in error, please contact the sender and delete the > message. Thank you. Please do not send e-mail messages with this sort of disclaimer to [EMAIL PROTECTED] These disclaimers are prohibited by list policy, which can be found at http://gcc.gnu.org/lists.html. If you are unable to disable the disclaimer from your account, I recommend using a free web-based e-mail account. Thanks. Ian
Re: Creating gcc-newbies mailing list
On Fri, 27 Jul 2007, Rask Ingemann Lambertsen wrote: > This part of the documentation is fragmented in a way such that I > sometimes can't find what I'm looking for, even if I know it is there > (somewhere). For example, when it comes to submitting patches, we have > http://gcc.gnu.org/codingconventions.html> and > http://gcc.gnu.org/contribute.html> which both say something about > ChangeLog enties while neither mention the patch tracker. Another > example is that both http://gcc.gnu.org/contribute.html> and > http://gcc.gnu.org/install/test.html> document how to test GCC, so > you have to find and read both. Are there concrete changes you think would make sense? http://gcc.gnu.org/install/test.html is focused on users, so we will probably have to have two (complementary) sources on testing, but for the others changes probably will be easier. Gerald
You introduced a memory leak with the IPA-SSA stuff
It used to be that the bitmap obstack known as "alias_bitmap_obstack" was released and renewed every time we called compute_may_aliases. This didn't really leak because the absolute last one was destroyed at the end of compilation. You changed it to be only released if gimple_aliases_computed_p (cfun). This of course, leaks all the bitmaps in that obstack whenever we change functions, because bitmap_obstack_initialize does not free the old obstack if it is still in use, it just leaks it. The code needs to be something like if (alias_bitmap_obstack.elements != NULL) bitmap_obstack_release (&alias_bitmap_obstack); bitmap_obstack_initialize (&alias_bitmap_obstack); (or some other approriate thing :P) Right now we leak a couple meg per function if they have a lot of symbols, we'd leak more otherwise. --Dan
Re: "Proceedings of the GCC Developers' Summit" now available
On Fri, 27 Jul 2007, Diego Novillo wrote: >> Why not provide a permanent home for the GCC summit proceedings at >> gcc.gnu.org? It seems the logical place. > That's what I've done. The .pdf is *in* gcc.gnu.org. The others could > be sucked in as well. They're now pointing to gccsummit. Currently I only see the 2003 and 2004 proceedings at ftp://gcc.gnu.org/pub/gcc/summit/ How about moving everything to one consistent place? Any preferences on what that place should be? Gerald
Re: How to make use of instruction scheduling to improve performance?
28 Jul 2007 12:16:51 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>: > "吴曦" <[EMAIL PROTECTED]> writes: > > > 28 Jul 2007 09:04:01 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>: > > > "吴曦" <[EMAIL PROTECTED]> writes: > > > > > > > there are some questions after I read the source code today. > > > > 1st. if I add the instrumentation before 2nd scheduling; will gcc emit > > > > an insn which will be output as a ld instruction later? If this could > > > > happen, some ld instruction may not be instrumented... > > > > > > No, gcc won't introduce any new memory load or store instructions > > > after the prologue and epilogue instructions are threaded. It may > > ~~~ > > when are prologue and epilogue instructions threaded? (after register > > allocation? besides, what is the exact meaning of "prologue and > > epilogue instructions are threaded"? Would you mind explaining in more > > detail? thx :-)) > > If you look in gcc/passes.c you will see the list of passes. The > prologue and epilogue instructions are threaded in > pass_thread_prologue_and_epilogue. This happens after register ~ Sorry, I didn't find that pass in gcc 4.1.1. This pass is added in the newest gcc? thx. > allocation. It means that the prologue and epilogue instructions are ~~ As you have indicated, this pass happens after register allocation, I want to allocate register rather than dedicating register to do the instrumentation calculation, are there any hints to do this? > added to the RTL, so that the second scheduling pass can see them. > > > > still move them around or eliminate them, though. > > ~~ > > emmm, I need to move/remove my instrumentation if necessary... > > Yes. This is true by definition, since you want to instrument before > the second scheduling pass. The scheduler can and will move load and > store instructions. You need to set up the dependencies so that your > instrumentation will still occur at the right time. > > > > > 2nd. to identify ld/st instruction (memory access op), I want to > > > > modify gen_rtx_SET, the method is that, if I find SRC or DST is an > > > > memory operand in gen_rtx_SET, then add instrumentation code before > > > > and after the insn to emit. Will this method work? Besides, if some > > > > false positives occur, how to correct them (I don't have some very > > > > clear idea.) > > > > > > Modifying gen_rtx_SET is probably not the right way to go. That is > > ~ > > Then, what about modifying machine description file? Add define_expand > > for the define_insn which will output ld/st instruction (this > > define_expand can insert instrumentation insns. Of course, I need to > > identify the operands to the define_expand contains a memory operand > > and a reg operand.) > > That will work in some sense, but if a load or store instruction is > eliminated you are quite likely to still have the instrumentation > instructions lying around. > > Ian > Thanks for your hints.
Re: How to make use of instruction scheduling to improve performance?
2007/7/29, 吴曦 <[EMAIL PROTECTED]>: > 28 Jul 2007 12:16:51 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>: > > "吴曦" <[EMAIL PROTECTED]> writes: > > > > > 28 Jul 2007 09:04:01 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>: > > > > "吴曦" <[EMAIL PROTECTED]> writes: > > > > > > > > > there are some questions after I read the source code today. > > > > > 1st. if I add the instrumentation before 2nd scheduling; will gcc emit > > > > > an insn which will be output as a ld instruction later? If this could > > > > > happen, some ld instruction may not be instrumented... > > > > > > > > No, gcc won't introduce any new memory load or store instructions > > > > after the prologue and epilogue instructions are threaded. It may > > > ~~~ > > > when are prologue and epilogue instructions threaded? (after register > > > allocation? besides, what is the exact meaning of "prologue and > > > epilogue instructions are threaded"? Would you mind explaining in more > > > detail? thx :-)) > > > > If you look in gcc/passes.c you will see the list of passes. The > > prologue and epilogue instructions are threaded in > > pass_thread_prologue_and_epilogue. This happens after register > ~ > Sorry, I didn't find that pass in gcc 4.1.1. This pass is added in the > newest gcc? > thx. > > > allocation. It means that the prologue and epilogue instructions are > ~~ > As you have indicated, this pass happens after register allocation, I > want to allocate register rather than dedicating register to do the > instrumentation calculation, are there any hints to do this? > > > added to the RTL, so that the second scheduling pass can see them. > > > > > > still move them around or eliminate them, though. > > > ~~ > > > emmm, I need to move/remove my instrumentation if necessary... > > > > Yes. This is true by definition, since you want to instrument before > > the second scheduling pass. The scheduler can and will move load and > > store instructions. You need to set up the dependencies so that your > > instrumentation will still occur at the right time. > > > > > > > 2nd. to identify ld/st instruction (memory access op), I want to > > > > > modify gen_rtx_SET, the method is that, if I find SRC or DST is an > > > > > memory operand in gen_rtx_SET, then add instrumentation code before > > > > > and after the insn to emit. Will this method work? Besides, if some > > > > > false positives occur, how to correct them (I don't have some very > > > > > clear idea.) > > > > > > > > Modifying gen_rtx_SET is probably not the right way to go. That is > > > ~ > > > Then, what about modifying machine description file? Add define_expand > > > for the define_insn which will output ld/st instruction (this > > > define_expand can insert instrumentation insns. Of course, I need to > > > identify the operands to the define_expand contains a memory operand > > > and a reg operand.) > > > > That will work in some sense, but if a load or store instruction is > > eliminated you are quite likely to still have the instrumentation > > instructions lying around. > > > > Ian > > > Thanks for your hints. > rest_of_handle_flow2 calls thread_prologue_and_epilogue_insns, maybe I need to move to a newer version of gcc