Re: How to make use of instruction scheduling to improve performance?

2007-07-28 Thread 吴曦
2007/7/28, Ramana Radhakrishnan <[EMAIL PROTECTED]>:
> Hi,
>
>
> On 7/28/07, 吴曦 <[EMAIL PROTECTED]> wrote:
> > > > > > I am working on gcc 4.1.1 and itanium2 architecture. I instrumented
> > > > > > each ld and st instruction in final_scan_insn() by looking at the 
> > > > > > insn
> > > > > > template (These instrumentations are used to do some security 
> > > > > > checks).
> > > > > > These instrumentations incur high performance overhead when running
> > > > > > specint benchmarks. However, these instrumentations contain high
> > > > > > dependencies between instructions so that I want to use instruction
> > > > > > scheduling to improve the performance.
> > > > > > In the current implementation, the instrumentations are emitted 
> > > > > > as
> > > > > > assembly instructions (not insns). What should I do to make use of 
> > > > > > the
> > > > > > instruction scheduler?
> > > > >
> > > > > If I understand your description, you are adding instrumentation code,
> > > > > and you want to expose that code to the scheduler.  What you need to
> > > > > do in that case is to add the code as RTL instructions before the
> > > > > scheduling pass runs.  You will need to figure out the RTL which will
> > > > > do what you want.  Then you will need to insert it around the
> > > >
> > > > > instructions which you want to instrument.  You will probably want to
> > > > ~
> > > > Before the second scheduling pass, how to identify that one insn will
> > > > be output as a load instruction (or store instruction)? In the final,
> > > > i use get_insn_template() to do this matching. Can I use the same
> > > > method before the second scheduling pass? If not, would you mind
> > > > giving some hints? thx
> > >
> > > Please send followups to the mailing list, not just to me.  Thanks.
> > >
> > > You should just match on the RTL.  I don't know enough about the
> > > Itanium to tell you precisely what to look for.  But, for example, you
> > > might look for
> > >s = single_set (PATTERN (insn));
> > >if (s != NULL && (MEM_P (SET_SRC (s) || MEM_P (SET_DEST (s)
> > >  ...
> > >
> > > Ian
> > >
> >
> > Thanks. I observe that the 2nd instruction scheduling happens after
> > the local and global allocation. However, in my instrumentation, I
> > need several registers to do computation, can I allocate registers to
> > do computation in the instrumentation code just before the 2nd
> > instruction scheduling? If so, would you mind giving some hints on the
> > interfaces that I could make use of.
>
> Generally you should be able to create new temporaries for such
> calculations before register allocation / reload . Otherwise you might
> have to resort to reserving a couple of registers in your ABI for such
> computations if you wanted these generated after reload (you could
> have a split that did that after reload but where in the function do
> you want to insert the instrumentation code ?)
>
> From what you are indicating - there isn't enough detail about where
~
> in the function body you are inserting such instrumentation code  -

thx, As I have in indicated, I want to add instrumentations for each
ld and st instruction in one function on itanium. (In my current
implementation, I also instrument cmp and mv instructions on itanium).
for example, for a ld instruction in the original program:
 ld rX=[rY]
I want to instrument it as
 instrumentation prologue
 ld rX=[rY]
 instrumentation epilogue
currently, to identify such ld instruction, I put my instrumentation
in final, and use get_insn_template() to see what instruction this
insn will be output as.

To summarize, as I want to expose my instrumentation to instruction
scheduling, following work should be done:
   1. identify that one insn will be output as a
ld instruction
   2. allocate register to do the instrumentation
calculation (in my current implementation, I use dedicated register to
do this.)
   3. emit the prepared instrumentation insn
>
> If you are doing such instrumentation in the prologue or epilogue of a
> function, you could choose to use gen_reg_rtx to obtain a temporary
> register.
>
> So typically obtain a temporary register in the following manner
>  rtx tmp_reg = gen_reg_rtx (mode);
>
> Use the tmp_reg in whatever instruction you want to generate using the
> corresponding register as one of the operands .  For these you might
> want to use the corresponding gen_*** named functions .
>
> cheers
> Ramana
>
>
>
>
>
>
>
>
> >Besides,  what happens if I move the insertion of instrumentation
> > before register allocation,  or even before the 1st scheduling pass,
> > can I identify load/store i

Re: How to make use of instruction scheduling to improve performance?

2007-07-28 Thread 吴曦
2007/7/28, 吴曦 <[EMAIL PROTECTED]>:
> 2007/7/28, Ramana Radhakrishnan <[EMAIL PROTECTED]>:
> > Hi,
> >
> >
> > On 7/28/07, 吴曦 <[EMAIL PROTECTED]> wrote:
> > > > > > > I am working on gcc 4.1.1 and itanium2 architecture. I 
> > > > > > > instrumented
> > > > > > > each ld and st instruction in final_scan_insn() by looking at the 
> > > > > > > insn
> > > > > > > template (These instrumentations are used to do some security 
> > > > > > > checks).
> > > > > > > These instrumentations incur high performance overhead when 
> > > > > > > running
> > > > > > > specint benchmarks. However, these instrumentations contain high
> > > > > > > dependencies between instructions so that I want to use 
> > > > > > > instruction
> > > > > > > scheduling to improve the performance.
> > > > > > > In the current implementation, the instrumentations are 
> > > > > > > emitted as
> > > > > > > assembly instructions (not insns). What should I do to make use 
> > > > > > > of the
> > > > > > > instruction scheduler?
> > > > > >
> > > > > > If I understand your description, you are adding instrumentation 
> > > > > > code,
> > > > > > and you want to expose that code to the scheduler.  What you need to
> > > > > > do in that case is to add the code as RTL instructions before the
> > > > > > scheduling pass runs.  You will need to figure out the RTL which 
> > > > > > will
> > > > > > do what you want.  Then you will need to insert it around the
> > > > >
> > > > > > instructions which you want to instrument.  You will probably want 
> > > > > > to
> > > > > ~
> > > > > Before the second scheduling pass, how to identify that one insn will
> > > > > be output as a load instruction (or store instruction)? In the final,
> > > > > i use get_insn_template() to do this matching. Can I use the same
> > > > > method before the second scheduling pass? If not, would you mind
> > > > > giving some hints? thx
> > > >
> > > > Please send followups to the mailing list, not just to me.  Thanks.
> > > >
> > > > You should just match on the RTL.  I don't know enough about the
> > > > Itanium to tell you precisely what to look for.  But, for example, you
> > > > might look for
> > > >s = single_set (PATTERN (insn));
> > > >if (s != NULL && (MEM_P (SET_SRC (s) || MEM_P (SET_DEST (s)
> > > >  ...
> > > >
> > > > Ian
> > > >
> > >
> > > Thanks. I observe that the 2nd instruction scheduling happens after
> > > the local and global allocation. However, in my instrumentation, I
> > > need several registers to do computation, can I allocate registers to
> > > do computation in the instrumentation code just before the 2nd
> > > instruction scheduling? If so, would you mind giving some hints on the
> > > interfaces that I could make use of.
> >
> > Generally you should be able to create new temporaries for such
> > calculations before register allocation / reload . Otherwise you might
> > have to resort to reserving a couple of registers in your ABI for such
> > computations if you wanted these generated after reload (you could
> > have a split that did that after reload but where in the function do
> > you want to insert the instrumentation code ?)
> >
> > From what you are indicating - there isn't enough detail about where
> ~
> > in the function body you are inserting such instrumentation code  -
> 
> thx, As I have in indicated, I want to add instrumentations for each
> ld and st instruction in one function on itanium. (In my current
> implementation, I also instrument cmp and mv instructions on itanium).
> for example, for a ld instruction in the original program:
> ld rX=[rY]
> I want to instrument it as
> instrumentation prologue
> ld rX=[rY]
> instrumentation epilogue
> currently, to identify such ld instruction, I put my instrumentation
> in final, and use get_insn_template() to see what instruction this
> insn will be output as.
>
> To summarize, as I want to expose my instrumentation to instruction
> scheduling, following work should be done:
>   1. identify that one insn will be output as a
> ld instruction
>   2. allocate register to do the instrumentation
> calculation (in my current implementation, I use dedicated register to
> do this.)
>   3. emit the prepared instrumentation insn
> >
> > If you are doing such instrumentation in the prologue or epilogue of a
> > function, you could choose to use gen_reg_rtx to obtain a temporary
> > register.
> >
> > So typically obtain a temporary register in the following manner
> >  rtx tmp_reg = gen_reg_rtx (mode);
> >
> > Use the tmp_reg in whatever instruction you want to generate using the
> > corresponding register as one of the op

Re: How to make use of instruction scheduling to improve performance?

2007-07-28 Thread Ian Lance Taylor
"吴曦" <[EMAIL PROTECTED]> writes:

> there are some questions after I read the source code today.
> 1st. if I add the instrumentation before 2nd scheduling; will gcc emit
> an insn which will be output as a ld instruction later? If this could
> happen, some ld instruction may not be instrumented...

No, gcc won't introduce any new memory load or store instructions
after the prologue and epilogue instructions are threaded.  It may
still move them around or eliminate them, though.

> 2nd. to identify ld/st instruction (memory access op), I want to
> modify gen_rtx_SET, the method is that, if I find SRC or DST is an
> memory operand in gen_rtx_SET, then add instrumentation code before
> and after the insn to emit. Will this method work? Besides, if some
> false positives occur, how to correct them (I don't have some very
> clear idea.)

Modifying gen_rtx_SET is probably not the right way to go.  That is
used in many places throughout the RTL passes.  Not all of those
places are going to be able to cope with the new instructions you want
to add.

Ian


Re: How to make use of instruction scheduling to improve performance?

2007-07-28 Thread 吴曦
28 Jul 2007 09:04:01 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>:
> "吴曦" <[EMAIL PROTECTED]> writes:
>
> > there are some questions after I read the source code today.
> > 1st. if I add the instrumentation before 2nd scheduling; will gcc emit
> > an insn which will be output as a ld instruction later? If this could
> > happen, some ld instruction may not be instrumented...
>
> No, gcc won't introduce any new memory load or store instructions
> after the prologue and epilogue instructions are threaded.  It may
~~~
when are prologue and epilogue instructions threaded? (after register
allocation? besides, what is the exact meaning of "prologue and
epilogue instructions are threaded"? Would you mind explaining in more
detail? thx :-))

> still move them around or eliminate them, though.
~~
emmm, I need to move/remove my instrumentation if necessary...

>
> > 2nd. to identify ld/st instruction (memory access op), I want to
> > modify gen_rtx_SET, the method is that, if I find SRC or DST is an
> > memory operand in gen_rtx_SET, then add instrumentation code before
> > and after the insn to emit. Will this method work? Besides, if some
> > false positives occur, how to correct them (I don't have some very
> > clear idea.)
>
> Modifying gen_rtx_SET is probably not the right way to go.  That is
~
Then, what about modifying machine description file? Add define_expand
for the define_insn which will output ld/st instruction (this
define_expand can insert instrumentation insns. Of course, I need to
identify the operands to the define_expand contains a memory operand
and a reg operand.)

> used in many places throughout the RTL passes.  Not all of those
> places are going to be able to cope with the new instructions you want
> to add.
>
> Ian
>

Thanks for your hints again :-)


Re: GCC 4.2.1 : bootstrap fails at stage 2. compiler produces wrong binary for wrong processor

2007-07-28 Thread Eric Botcazou
The default cpu is v8plus. 


v9 actually, which automatically enables the V8+ stuff in 32-bit mode.

--
Eric Botcazou



Re: How to make use of instruction scheduling to improve performance?

2007-07-28 Thread Ian Lance Taylor
"吴曦" <[EMAIL PROTECTED]> writes:

> 28 Jul 2007 09:04:01 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>:
> > "吴曦" <[EMAIL PROTECTED]> writes:
> >
> > > there are some questions after I read the source code today.
> > > 1st. if I add the instrumentation before 2nd scheduling; will gcc emit
> > > an insn which will be output as a ld instruction later? If this could
> > > happen, some ld instruction may not be instrumented...
> >
> > No, gcc won't introduce any new memory load or store instructions
> > after the prologue and epilogue instructions are threaded.  It may
> ~~~
> when are prologue and epilogue instructions threaded? (after register
> allocation? besides, what is the exact meaning of "prologue and
> epilogue instructions are threaded"? Would you mind explaining in more
> detail? thx :-))

If you look in gcc/passes.c you will see the list of passes.  The
prologue and epilogue instructions are threaded in
pass_thread_prologue_and_epilogue.  This happens after register
allocation.  It means that the prologue and epilogue instructions are
added to the RTL, so that the second scheduling pass can see them.

> > still move them around or eliminate them, though.
> ~~
> emmm, I need to move/remove my instrumentation if necessary...

Yes.  This is true by definition, since you want to instrument before
the second scheduling pass.  The scheduler can and will move load and
store instructions.  You need to set up the dependencies so that your
instrumentation will still occur at the right time.

> > > 2nd. to identify ld/st instruction (memory access op), I want to
> > > modify gen_rtx_SET, the method is that, if I find SRC or DST is an
> > > memory operand in gen_rtx_SET, then add instrumentation code before
> > > and after the insn to emit. Will this method work? Besides, if some
> > > false positives occur, how to correct them (I don't have some very
> > > clear idea.)
> >
> > Modifying gen_rtx_SET is probably not the right way to go.  That is
> ~
> Then, what about modifying machine description file? Add define_expand
> for the define_insn which will output ld/st instruction (this
> define_expand can insert instrumentation insns. Of course, I need to
> identify the operands to the define_expand contains a memory operand
> and a reg operand.)

That will work in some sense, but if a load or store instruction is
eliminated you are quite likely to still have the instrumentation
instructions lying around.

Ian


Re: GCC 4.2.1 : bootstrap fails at stage 2. compiler produces wrong binary for wrong processor

2007-07-28 Thread Dennis Clarke

>> The default cpu is v8plus.
>
> v9 actually, which automatically enables the V8+ stuff in 32-bit mode.

That isn't what I see here.  The output binary was definately for a v8plus
processor. That would be a UltraSparc 1 at the least.

ELF Header
  ei_magic:   { 0x7f, E, L, F }
  ei_class:   ELFCLASS32  ei_data:  ELFDATA2MSB
  e_machine:  EM_SPARC32PLUS  e_version:EV_CURRENT
  e_type: ET_EXEC
  e_flags:[ EF_SPARC_32PLUS ]
  e_entry:   0x121a8  e_ehsize: 52  e_shstrndx:   20
  e_shoff:   0x1ab50  e_shentsize:  40  e_shnum:  21
  e_phoff:  0x34  e_phentsize:  32  e_phnum:   5

so .. there you have it.

Dennis



Re: gcc register allocation

2007-07-28 Thread Ian Lance Taylor
"Purll, Duncan" <[EMAIL PROTECTED]> writes:

> DISCLAIMER:
> Unless indicated otherwise, the information contained in this message is 
> privileged and confidential, and is intended only for the use of the 
> addressee(s) named above and others who have been specifically authorized to 
> receive it. If you are not the intended recipient, you are hereby notified 
> that any dissemination, distribution or copying of this message and/or 
> attachments is strictly prohibited. The company accepts no liability for any 
> damage caused by any virus transmitted by this email. Furthermore, the 
> company does not warrant a proper and complete transmission of this 
> information, nor does it accept liability for any delays. If you have 
> received this message in error, please contact the sender and delete the 
> message. Thank you.

Please do not send e-mail messages with this sort of disclaimer to
[EMAIL PROTECTED]  These disclaimers are prohibited by list policy,
which can be found at http://gcc.gnu.org/lists.html.  If you are
unable to disable the disclaimer from your account, I recommend using
a free web-based e-mail account.  Thanks.

Ian


Re: Creating gcc-newbies mailing list

2007-07-28 Thread Gerald Pfeifer
On Fri, 27 Jul 2007, Rask Ingemann Lambertsen wrote:
> This part of the documentation is fragmented in a way such that I 
> sometimes can't find what I'm looking for, even if I know it is there 
> (somewhere). For example, when it comes to submitting patches, we have 
> http://gcc.gnu.org/codingconventions.html> and 
> http://gcc.gnu.org/contribute.html> which both say something about 
> ChangeLog enties while neither mention the patch tracker. Another 
> example is that both http://gcc.gnu.org/contribute.html> and 
> http://gcc.gnu.org/install/test.html> document how to test GCC, so 
> you have to find and read both.

Are there concrete changes you think would make sense?  

http://gcc.gnu.org/install/test.html is focused on users, so we will 
probably have to have two (complementary) sources on testing, but for
the others changes probably will be easier.

Gerald


You introduced a memory leak with the IPA-SSA stuff

2007-07-28 Thread Daniel Berlin
It used to be that the bitmap obstack known as "alias_bitmap_obstack"
was released and renewed every time we called compute_may_aliases.
This didn't really leak because the absolute last one was destroyed at
the end of compilation.

You changed it to be only released if gimple_aliases_computed_p (cfun).

This of course, leaks all the bitmaps in that obstack whenever we
change functions, because bitmap_obstack_initialize does not free the
old obstack if it is still in use, it just leaks it.

The code needs to be something like
  if (alias_bitmap_obstack.elements != NULL)
bitmap_obstack_release (&alias_bitmap_obstack);
  bitmap_obstack_initialize (&alias_bitmap_obstack);

(or some other approriate thing :P)

Right now we leak a couple meg per function if they have a lot of
symbols, we'd leak more otherwise.
--Dan


Re: "Proceedings of the GCC Developers' Summit" now available

2007-07-28 Thread Gerald Pfeifer
On Fri, 27 Jul 2007, Diego Novillo wrote:
>> Why not provide a permanent home for the GCC summit proceedings at
>> gcc.gnu.org?  It seems the logical place.
> That's what I've done.  The .pdf is *in* gcc.gnu.org.  The others could
> be sucked in as well.  They're now pointing to gccsummit.

Currently I only see the 2003 and 2004 proceedings at
  ftp://gcc.gnu.org/pub/gcc/summit/

How about moving everything to one consistent place?  Any preferences
on what that place should be?

Gerald


Re: How to make use of instruction scheduling to improve performance?

2007-07-28 Thread 吴曦
28 Jul 2007 12:16:51 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>:
> "吴曦" <[EMAIL PROTECTED]> writes:
>
> > 28 Jul 2007 09:04:01 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>:
> > > "吴曦" <[EMAIL PROTECTED]> writes:
> > >
> > > > there are some questions after I read the source code today.
> > > > 1st. if I add the instrumentation before 2nd scheduling; will gcc emit
> > > > an insn which will be output as a ld instruction later? If this could
> > > > happen, some ld instruction may not be instrumented...
> > >
> > > No, gcc won't introduce any new memory load or store instructions
> > > after the prologue and epilogue instructions are threaded.  It may
> > ~~~
> > when are prologue and epilogue instructions threaded? (after register
> > allocation? besides, what is the exact meaning of "prologue and
> > epilogue instructions are threaded"? Would you mind explaining in more
> > detail? thx :-))
>
> If you look in gcc/passes.c you will see the list of passes.  The
> prologue and epilogue instructions are threaded in
> pass_thread_prologue_and_epilogue.  This happens after register
~
Sorry, I didn't find that pass in gcc 4.1.1. This pass is added in the
newest gcc?
thx.

> allocation.  It means that the prologue and epilogue instructions are
~~
As you have indicated, this pass happens after register allocation, I
want to allocate register rather than dedicating register to do the
instrumentation calculation, are there any hints to do this?

> added to the RTL, so that the second scheduling pass can see them.
>
> > > still move them around or eliminate them, though.
> > ~~
> > emmm, I need to move/remove my instrumentation if necessary...
>
> Yes.  This is true by definition, since you want to instrument before
> the second scheduling pass.  The scheduler can and will move load and
> store instructions.  You need to set up the dependencies so that your
> instrumentation will still occur at the right time.
>
> > > > 2nd. to identify ld/st instruction (memory access op), I want to
> > > > modify gen_rtx_SET, the method is that, if I find SRC or DST is an
> > > > memory operand in gen_rtx_SET, then add instrumentation code before
> > > > and after the insn to emit. Will this method work? Besides, if some
> > > > false positives occur, how to correct them (I don't have some very
> > > > clear idea.)
> > >
> > > Modifying gen_rtx_SET is probably not the right way to go.  That is
> > ~
> > Then, what about modifying machine description file? Add define_expand
> > for the define_insn which will output ld/st instruction (this
> > define_expand can insert instrumentation insns. Of course, I need to
> > identify the operands to the define_expand contains a memory operand
> > and a reg operand.)
>
> That will work in some sense, but if a load or store instruction is
> eliminated you are quite likely to still have the instrumentation
> instructions lying around.
>
> Ian
>
Thanks for your hints.


Re: How to make use of instruction scheduling to improve performance?

2007-07-28 Thread 吴曦
2007/7/29, 吴曦 <[EMAIL PROTECTED]>:
> 28 Jul 2007 12:16:51 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>:
> > "吴曦" <[EMAIL PROTECTED]> writes:
> >
> > > 28 Jul 2007 09:04:01 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>:
> > > > "吴曦" <[EMAIL PROTECTED]> writes:
> > > >
> > > > > there are some questions after I read the source code today.
> > > > > 1st. if I add the instrumentation before 2nd scheduling; will gcc emit
> > > > > an insn which will be output as a ld instruction later? If this could
> > > > > happen, some ld instruction may not be instrumented...
> > > >
> > > > No, gcc won't introduce any new memory load or store instructions
> > > > after the prologue and epilogue instructions are threaded.  It may
> > > ~~~
> > > when are prologue and epilogue instructions threaded? (after register
> > > allocation? besides, what is the exact meaning of "prologue and
> > > epilogue instructions are threaded"? Would you mind explaining in more
> > > detail? thx :-))
> >
> > If you look in gcc/passes.c you will see the list of passes.  The
> > prologue and epilogue instructions are threaded in
> > pass_thread_prologue_and_epilogue.  This happens after register
> ~
> Sorry, I didn't find that pass in gcc 4.1.1. This pass is added in the
> newest gcc?
> thx.
>
> > allocation.  It means that the prologue and epilogue instructions are
> ~~
> As you have indicated, this pass happens after register allocation, I
> want to allocate register rather than dedicating register to do the
> instrumentation calculation, are there any hints to do this?
>
> > added to the RTL, so that the second scheduling pass can see them.
> >
> > > > still move them around or eliminate them, though.
> > > ~~
> > > emmm, I need to move/remove my instrumentation if necessary...
> >
> > Yes.  This is true by definition, since you want to instrument before
> > the second scheduling pass.  The scheduler can and will move load and
> > store instructions.  You need to set up the dependencies so that your
> > instrumentation will still occur at the right time.
> >
> > > > > 2nd. to identify ld/st instruction (memory access op), I want to
> > > > > modify gen_rtx_SET, the method is that, if I find SRC or DST is an
> > > > > memory operand in gen_rtx_SET, then add instrumentation code before
> > > > > and after the insn to emit. Will this method work? Besides, if some
> > > > > false positives occur, how to correct them (I don't have some very
> > > > > clear idea.)
> > > >
> > > > Modifying gen_rtx_SET is probably not the right way to go.  That is
> > > ~
> > > Then, what about modifying machine description file? Add define_expand
> > > for the define_insn which will output ld/st instruction (this
> > > define_expand can insert instrumentation insns. Of course, I need to
> > > identify the operands to the define_expand contains a memory operand
> > > and a reg operand.)
> >
> > That will work in some sense, but if a load or store instruction is
> > eliminated you are quite likely to still have the instrumentation
> > instructions lying around.
> >
> > Ian
> >
> Thanks for your hints.
>
rest_of_handle_flow2 calls thread_prologue_and_epilogue_insns, maybe I
need to move to a newer version of gcc