Fwd: Hi Friends...

2006-08-18 Thread Raghu :-)

Hi Friends,

   I was skimming through the code after downloading 

gcc-core-4.1.1.tar.bz2, its really great to see!!! well i was on a
lookout for an option in gcc (basically a feature) since i couldn't
find, i thought i would "see" how gcc handles that, i have done the
code change in a very crude way but i was able to get what i wanted
and i want to contribute the same to the community so that every one
gets benefitted, i have downloaded a pdf version of coding standards
it will take some time for me to go through that :-) but before i go
through i wanted to know whether there can be a better way of doing
that... for these things whom should i contact ??? like if there is
any maintainer for the libcpp something that sort of, in my work i
have found out that to develop any good software a thorough
understanding of the process,  logic and benefits should be listed
down before we startoff so can anybody help me from where should i
start, who will approve the design who will review the code ? how will
be the testcase doc/results/man/info pages updated ? and how will the
code finally get checked in ??? how the build be done ? how is it
released ? finally I wanted to know eagerly how to join the open
source community ??? i couldn't find answers to these while i trying
to get FAQ ? Can anybody guide :-) ?
Thanks,

Regards,
Raghavendra MB


Re: Use gcc front-end for metrics

2006-08-18 Thread Roel Meeuws

Hi,

I'm Roel Meeuws a dutch MSc. student that is working on a project
involving software metrics. I need a program that can determine a wide
variety of code metrics based on control flow graphs and abstract
syntax trees. While browsing the internet I found that there are no
good GPL'ed metrication tools out there and I decided to build one
myself.

In order to build a metrication tool I need a frontend that can
provide me with an abstract syntax tree containing information on all
actual language constructs in the code and also a CFG representation.
I reckon GCC has these capabilities and I was wondering if any of you
could tell me if it is possible to use just GCC's frontend.
Furthermore, where should I start, how do I extract the frontend from
GCC, which of the intermediate GCC representation could I use, are
they documented?

I would like to thank you in advance for any help you can give me.

kind regards,

Roel

--
Roel Meeuws

Delft University of Technology
Faculty of Electrical Engineering Mathematics and Computer Science
Computer Engineering Laboratory
Mekelweg 4, 2628 CD Delft, The Netherlands

Email:[EMAIL PROTECTED]
Office phone: +31 (0)6 10 82 44 01



Re: Fwd: Hi Friends...

2006-08-18 Thread Paolo Bonzini

Hi Friends,

   I was skimming through the code after downloading 

gcc-core-4.1.1.tar.bz2, its really great to see!!! well i was on a
lookout for an option in gcc (basically a feature) since i couldn't
find, i thought i would "see" how gcc handles that, i have done the
code change in a very crude way but i was able to get what i wanted
and i want to contribute the same to the community so that every one
gets benefitted, i have downloaded a pdf version of coding standards
it will take some time for me to go through that :-) but before i go
through i wanted to know whether there can be a better way of doing
that... for these things whom should i contact ??? like if there is
any maintainer for the libcpp something that sort of, in my work i
have found out that to develop any good software a thorough
understanding of the process,  logic and benefits should be listed
down before we startoff so can anybody help me from where should i
start, who will approve the design who will review the code ? how will
be the testcase doc/results/man/info pages updated ? and how will the
code finally get checked in ??? how the build be done ? how is it
released ? finally I wanted to know eagerly how to join the open
source community ??? i couldn't find answers to these while i trying
to get FAQ ? Can anybody guide :-) ?


http://gcc.gnu.org/contribute.html

Paolo


Re: Use gcc front-end for metrics

2006-08-18 Thread Steven Bosscher

On 8/18/06, Roel Meeuws <[EMAIL PROTECTED]> wrote:


In order to build a metrication tool I need a frontend that can
provide me with an abstract syntax tree containing information on all
actual language constructs in the code and also a CFG representation.
I reckon GCC has these capabilities and I was wondering if any of you
could tell me if it is possible to use just GCC's frontend.
Furthermore, where should I start, how do I extract the frontend from
GCC, which of the intermediate GCC representation could I use, are
they documented?

I would like to thank you in advance for any help you can give me.


Right, so you want to have a count of source level constructs, and
basically something similar at the lower levels...

If you're going to do source level metrics, you will have to
instrument the front ends. All front ends, perhaps with the exception
of Ada and C++, have a pretty quick lowering to a level where you
won't be able to e.g. distinguish a for-loop from a while-loop, if
that would be something you're interested in. Depending on what
language you'll be analyzing, or rather how many of them, I'd suggest
you instrument the parser for your metrics, or forget about source
level constructs and just look at lower level information only.

As for CFG work, you should probably write a tree pass and insert it
at some point in the compilation schedule (see passes.c). Depending on
how close you want to stay to the original source code, you could put
the pass early or late. If you put it late, you can analyze the
optimized representation. In any case, you're going to find that gcc
will produce a CFG pretty early on for GIMPLE (gcc's three-address,
high level intermediate representation), but this happens _after_ the
front ends are done, and _after_ lowering to GIMPLE.

You can usually only find documentation on the front ends in the
source code, but the gcc online documentation can guide you a bit
there. So your first step would be to look at the GCC internals
documentation on http://gcc.gnu.org/onlinedocs/. You'll want to work
on GIMPLE (as opposed to RTL) which is reasonably well documented,
again in the GCC internals documentation.  And if you get stuck after
looking for a while, you'll usually find someone helpful on this list.

You may also want to look at the GCC wiki (http://gcc.gnu.org/wiki/)
and the Introspector Project (http://introspector.sourceforge.net/).

Hope this helps.

Gr.
Steven


PowerPC FPU support

2006-08-18 Thread Michael Eager

I'm adding support to GCC for a different PPC floating point unit.
It's similar to the standard PPC FPU in that it supports most of
the same instructions and all operation are in FP registers.
The FPU comes in a single-precision and double-precision variant.
There's also an option of having no FPU.

Rather than creating yet another configuration with another
TARGET_ definition and creating ever more cluttered
condition expression, I've thought to replace TARGET_FPRS with
TARGET_FPRS_SINGLE and TARGET_FPRS_DOUBLE.  These would both have
the value 1 for standard PPC, and 1 or 0 depending on whether the
single-or double-precision FPU was available, as specified by a new
option -mfpu=.  There would be some added instruction patterns
for the single-precision operations.

Does this sound like a reasonable approach or is there a better
way to do this?

--
Michael Eager[EMAIL PROTECTED]
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


Re: PowerPC FPU support

2006-08-18 Thread David Edelsohn
> Michael Eager writes:

Michael> I'm adding support to GCC for a different PPC floating point unit.
Michael> It's similar to the standard PPC FPU in that it supports most of
Michael> the same instructions and all operation are in FP registers.
Michael> The FPU comes in a single-precision and double-precision variant.
Michael> There's also an option of having no FPU.

Michael> Rather than creating yet another configuration with another
Michael> TARGET_ definition and creating ever more cluttered
Michael> condition expression, I've thought to replace TARGET_FPRS with
Michael> TARGET_FPRS_SINGLE and TARGET_FPRS_DOUBLE.  These would both have
Michael> the value 1 for standard PPC, and 1 or 0 depending on whether the
Michael> single-or double-precision FPU was available, as specified by a new
Michael> option -mfpu=.  There would be some added instruction patterns
Michael> for the single-precision operations.

I think you want to look at TARGET_HARD_FLOAT, not TARGET_FPRS.
TARGET_FPRS was added for Motorola e500 that has FP in GPRs.

David



Re: PowerPC FPU support

2006-08-18 Thread Michael Eager

David Edelsohn wrote:

Michael Eager writes:


Michael> I'm adding support to GCC for a different PPC floating point unit.
Michael> It's similar to the standard PPC FPU in that it supports most of
Michael> the same instructions and all operation are in FP registers.
Michael> The FPU comes in a single-precision and double-precision variant.
Michael> There's also an option of having no FPU.

Michael> Rather than creating yet another configuration with another
Michael> TARGET_ definition and creating ever more cluttered
Michael> condition expression, I've thought to replace TARGET_FPRS with
Michael> TARGET_FPRS_SINGLE and TARGET_FPRS_DOUBLE.  These would both have
Michael> the value 1 for standard PPC, and 1 or 0 depending on whether the
Michael> single-or double-precision FPU was available, as specified by a new
Michael> option -mfpu=.  There would be some added instruction patterns
Michael> for the single-precision operations.

I think you want to look at TARGET_HARD_FLOAT, not TARGET_FPRS.
TARGET_FPRS was added for Motorola e500 that has FP in GPRs.


TARGET_HARD_FLOAT means that you have hardware floating point of
some kind.  I guess I could split this into TARGET_HARD_FLOAT_SINGLE
and TARGET_HARD_FLOAT_DOUBLE.

--
Michael Eager[EMAIL PROTECTED]
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


strict aliasing issue

2006-08-18 Thread Stuart Hastings

Here's my reduced testcase:

typedef long GLint;
extern void aglChoosePixelFormat(const GLint *);
void find(const int *alistp) {
  const int *blist;
  int list[32];
  if (alistp)
blist = alistp;
  else {
list[3] = 42;   /* this store disappears with -O1 -fstrict- 
aliasing */

blist = list;
  }
  aglChoosePixelFormat((GLint*)blist);
}

(The original testcase is C++, from the FLTK project.)

If I compile this with -O1 -fstrict-aliasing, the "= 42" store  
disappears.  I've confirmed this on mainline PPC and x86.


I'm not a language lawyer; is this a legal program?  (If the program  
is legal, should I file a PR?)


stuart hastings
Apple Computer


Re: PowerPC FPU support

2006-08-18 Thread Ian Lance Taylor
Michael Eager <[EMAIL PROTECTED]> writes:

> David Edelsohn wrote:
> >> Michael Eager writes:
> > Michael> I'm adding support to GCC for a different PPC floating
> > point unit.
> > Michael> It's similar to the standard PPC FPU in that it supports most of
> > Michael> the same instructions and all operation are in FP registers.
> > Michael> The FPU comes in a single-precision and double-precision variant.
> > Michael> There's also an option of having no FPU.
> > Michael> Rather than creating yet another configuration with another
> > Michael> TARGET_ definition and creating ever more cluttered
> > Michael> condition expression, I've thought to replace TARGET_FPRS with
> > Michael> TARGET_FPRS_SINGLE and TARGET_FPRS_DOUBLE.  These would both have
> > Michael> the value 1 for standard PPC, and 1 or 0 depending on whether the
> > Michael> single-or double-precision FPU was available, as specified by a new
> > Michael> option -mfpu=.  There would be some added instruction patterns
> > Michael> for the single-precision operations.
> > I think you want to look at TARGET_HARD_FLOAT, not TARGET_FPRS.
> > TARGET_FPRS was added for Motorola e500 that has FP in GPRs.
> 
> TARGET_HARD_FLOAT means that you have hardware floating point of
> some kind.  I guess I could split this into TARGET_HARD_FLOAT_SINGLE
> and TARGET_HARD_FLOAT_DOUBLE.

The MIPS has TARGET_SINGLE_FLOAT and TARGET_DOUBLE_FLOAT.  Then it
does this:

(define_mode_macro ANYF [(SF "TARGET_HARD_FLOAT")
 (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")
 (V2SF "TARGET_PAIRED_SINGLE_FLOAT")])

and defines most of the floating point insns using ANYF, so that the
DF versions are only defined if TARGET_DOUBLE_FLOAT.

The command line options are -mhard-float, -msoft-float,
-msingle-float, -mdouble-float.  If -msoft-float is chosen,
-msingle-float and -mdouble-float are irrelevant.  If -mhard-float is
chosen, -mdouble-float is the default.  If -mhard-float and
-msingle-float are chosen, the single precision floating point
instructions are used but not the double precision floating point
instructions.

Ian


Re: strict aliasing issue

2006-08-18 Thread Andrew Pinski
> 
> Here's my reduced testcase:
> 
> typedef long GLint;
> extern void aglChoosePixelFormat(const GLint *);
> void find(const int *alistp) {
>const int *blist;
>int list[32];
>if (alistp)
>  blist = alistp;
>else {
>  list[3] = 42;   /* this store disappears with -O1 -fstrict- 
> aliasing */
>  blist = list;
>}
>aglChoosePixelFormat((GLint*)blist);
> }
> 
> (The original testcase is C++, from the FLTK project.)
> 
> If I compile this with -O1 -fstrict-aliasing, the "= 42" store  
> disappears.  I've confirmed this on mainline PPC and x86.
> 
> I'm not a language lawyer; is this a legal program?  (If the program  
> is legal, should I file a PR?)

This is legal as long as aglChoosePixelFormat only access it as int.

-- Pinski


Re: strict aliasing issue

2006-08-18 Thread Andrew Pinski
> I'm not a language lawyer; is this a legal program?  (If the program  
> is legal, should I file a PR?)

Mike Stump already filed a PR about this, PR 28778 and I gave a full testcase
which shows that this is legal code and the compiler should not be removing
the store.  

Oh and I marked it as a regression after testing it on 3.4.0.

-- Pinski


ANNOUNCE: Gelato GCC Improvement on Itanium Workshop Summary, 7-8 August, Moscow Russia

2006-08-18 Thread Mark K. Smith
A meeting of the Gelato GCC Improvement on Itanium Workgroup took
place August 7 & 8, 2006 in Moscow, Russia. The workshop was hosted by
the Institute for System Programming at the Russian Academy of
Sciences and was sponsored by Intel and HP.
http://gcc.gelato.org/MoscowMeeting

Compiler experts from the GCC open-source community, Red Hat, SuSE,
Intel, HP, and the Gelato Member community discussed specific GCC
improvements for the Itanium platform. Several key areas were
identified to improve Itanium GCC performance.
http://gcc.gelato.org/MeetingNotes

Presentations from the workshop (all presentations can be found at
http://gcc.gelato.org/MoscowMeeting):

  *  An Architectural Overview of GCC, Diego Novillo 
  *  GCC RTL Backend, Steven Bosscher 
  *  The IA64 Backend in GCC, Jim Wilson 
  *  LTO in GCC, Ken Zadeck 
  *  LNO in GCC, Sebastian Pop 
  *  PDO in GCC, Jan Hubicka 
  *  Itanium Architecture and ICC Tutorial, Mark Davis 
  *  Performance Metrics & Measurement, Shin-Ming Liu 
  *  CERN Loops Dissected, Sverre Jarp 
  *  Alias Analysis in GCC, Diego Novillo 
  *  Software Pipelining in GCC, Vladimir Makarov 
  *  Data Prefetching in GCC, Zdenek Dvorak 
  *  Instruction Scheduling in GCC, Vladimir Makarov 
  *  Superblock Update, Robert Kidd 
  *  ISP RAS Scheduling Update, Andrey Belevantsev 




gcc-4.1-20060818 is now available

2006-08-18 Thread gccadmin
Snapshot gcc-4.1-20060818 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20060818/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.1 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch 
revision 116251

You'll find:

gcc-4.1-20060818.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.1-20060818.tar.bz2 C front end and core compiler

gcc-ada-4.1-20060818.tar.bz2  Ada front end and runtime

gcc-fortran-4.1-20060818.tar.bz2  Fortran front end and runtime

gcc-g++-4.1-20060818.tar.bz2  C++ front end and runtime

gcc-java-4.1-20060818.tar.bz2 Java front end and runtime

gcc-objc-4.1-20060818.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.1-20060818.tar.bz2The GCC testsuite

Diffs from 4.1-20060811 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.1
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


unwind, x86, DW_CFA_GNU_args_size

2006-08-18 Thread Geoffrey Keating

Hi Alexandre,

your patch,


r112170 | aoliva | 2006-03-16 22:08:49 -0800 (Thu, 16 Mar 2006) | 4  
lines


* dwarf2out.c (dwarf2out_stack_adjust): Always track the stack
pointer, instead of assuming it is possible to derive the
correct args size from a call insn.

discussed in , is causing g++.brendan/eh1.C to fail on x86-darwin.


The problem is that the patch assumes that it's possible to track the  
stack pointer.  I don't think this is possible and it's certainly not  
done right by this code.


The problem is that the code ignores all instructions in the  
prologue.  It happens, in eh1.C, that a stack adjustment (to enforce  
stack alignment) for the call is combined with a different stack  
adjustment (to allocate a local variable) in the prologue, by the  
'csa' phase.  There's no way to get this right without knowing that  
this is happening, and as far as I can see there's no information  
available that it happened.  The final code looks like:


pushl   %ebp
movl%esp, %ebp
pushl   %ebx
subl$16, %esp   # the stack push that is combined
callL6
"L001$pb":
L6:
popl%ebx # last instruction in prolog
pushl   $4
callL___cxa_allocate_exception$stub		# this routine has  
16 bytes of arguments

movl$99, (%eax)
addl$12, %esp   # stack adjust preparing for next call
pushl   $0
pushl   L__ZTIi$non_lazy_ptr-"L001$pb"(%ebx)
pushl   %eax
callL___cxa_throw$stub

The EH information says that __cxa_allocate_exception has 4 bytes of  
args, and __cxa_throw has 12 bytes of args; but really they both have  
16 bytes of args.


The effect of all this on Darwin is that the stack becomes misaligned  
and then the dynamic loader crashes when it tries to use SSE (or  
something).


I think possible solutions are:

1. Revert this patch, and note that the original bug (which you  
mention you couldn't reproduce on 4.0+) might not be fixed.
2. Remove the whole routine and declare that if you want to use EH,  
you must not have ACCUMULATE_OUTGOING_ARGS set.
3. Try to patch around it by looking at the argument size in the  
call, and if it's greater than the apparent argument size, using the  
value in the call instead.
4. Make the routine much more intelligent, or ideally have the proper  
information attached to the call at the time it's generated (as a  
note?), and just use that.  Add gcc_assert calls to verify that (a)  
the argument size never becomes negative and (b) the argument size is  
never less than the size of each particular call.


I prefer (1).  Next best would be (4), but it's likely to break some  
ports.  (3) doesn't sound very good, and (2) removes a feature.


Any other suggestions?

smime.p7s
Description: S/MIME cryptographic signature


Re: unwind, x86, DW_CFA_GNU_args_size

2006-08-18 Thread Ian Lance Taylor
Geoffrey Keating <[EMAIL PROTECTED]> writes:

> The problem is that the code ignores all instructions in the
> prologue.  It happens, in eh1.C, that a stack adjustment (to enforce
> stack alignment) for the call is combined with a different stack
> adjustment (to allocate a local variable) in the prologue, by the
> 'csa' phase.  There's no way to get this right without knowing that
> this is happening, and as far as I can see there's no information
> available that it happened.  The final code looks like:
> 
>  pushl   %ebp
>  movl%esp, %ebp
>  pushl   %ebx
>  subl$16, %esp# the stack push that is combined
>  callL6
> "L001$pb":
> L6:
>  popl%ebx  # last instruction in prolog
>  pushl   $4
>  callL___cxa_allocate_exception$stub  # this
> routine has  16 bytes of arguments
>  movl$99, (%eax)
>  addl$12, %esp# stack adjust preparing for next call
>  pushl   $0
>  pushl   L__ZTIi$non_lazy_ptr-"L001$pb"(%ebx)
>  pushl   %eax
>  callL___cxa_throw$stub
> 
> The EH information says that __cxa_allocate_exception has 4 bytes of
> args, and __cxa_throw has 12 bytes of args; but really they both have
> 16 bytes of args.
> 
> The effect of all this on Darwin is that the stack becomes misaligned
> and then the dynamic loader crashes when it tries to use SSE (or
> something).
> 
> I think possible solutions are:
> 
> 1. Revert this patch, and note that the original bug (which you
> mention you couldn't reproduce on 4.0+) might not be fixed.
> 2. Remove the whole routine and declare that if you want to use EH,
> you must not have ACCUMULATE_OUTGOING_ARGS set.
> 3. Try to patch around it by looking at the argument size in the
> call, and if it's greater than the apparent argument size, using the
> value in the call instead.
> 4. Make the routine much more intelligent, or ideally have the proper
> information attached to the call at the time it's generated (as a
> note?), and just use that.  Add gcc_assert calls to verify that (a)
> the argument size never becomes negative and (b) the argument size is
> never less than the size of each particular call.
> 
> I prefer (1).  Next best would be (4), but it's likely to break some
> ports.  (3) doesn't sound very good, and (2) removes a feature.
> 
> Any other suggestions?

We could change CSA to not combine a prologue instruction with a
non-prologue instruction.  Although that would remove a (minor)
optimization.

We could change CSA so that when it combines a prologue instruction
with a non-prologue instruction it resets the RTX_FRAME_RELATED flag.
That probably wouldn't work.

We could change CSA so that when it combines a prologue instruction
with a non-prologue instruction it sets a new flag on the instruction,
and uses a table on the side to record the original values in the
instruction.

We could avoid nesting memcpy calls on ACCUMULATE_OUTGOING_ARGS
machines, in which case I think Alex's patch is unnecessary.

I'm not sure about your option 1--Alex didn't say that he couldn't
reproduce the bug in mainline, he said he didn't have a test case for
the specific case of memcpy popping the arguments off the stack on
return.

Ian


Re: unwind, x86, DW_CFA_GNU_args_size

2006-08-18 Thread Geoffrey Keating


On 18/08/2006, at 5:42 PM, Ian Lance Taylor wrote:

...

We could change CSA so that when it combines a prologue instruction
with a non-prologue instruction it sets a new flag on the instruction,
and uses a table on the side to record the original values in the
instruction.


I guess that would work; but wouldn't it be easier to just have  
calls.c tell the dwarf output code what the right offset is?



We could avoid nesting memcpy calls on ACCUMULATE_OUTGOING_ARGS
machines, in which case I think Alex's patch is unnecessary.

I'm not sure about your option 1--Alex didn't say that he couldn't
reproduce the bug in mainline, he said he didn't have a test case for
the specific case of memcpy popping the arguments off the stack on
return.


I'm not sure how memcpy gets involved here.  memcpy doesn't throw, so  
there's no need for any args_size data for it---in fact, it's a small  
optimization bug that such data gets reflected in the EH information  
at all.


Alex said that he couldn't find a case that didn't involve memcpy.   
Maybe an appropriate 'fix' is to revert the patch and put an assert  
in calls.c to ensure that a nested function call must be nothrow?




smime.p7s
Description: S/MIME cryptographic signature


Re: unwind, x86, DW_CFA_GNU_args_size

2006-08-18 Thread Ian Lance Taylor
Geoffrey Keating <[EMAIL PROTECTED]> writes:

> On 18/08/2006, at 5:42 PM, Ian Lance Taylor wrote:
> 
> ...
> > We could change CSA so that when it combines a prologue instruction
> > with a non-prologue instruction it sets a new flag on the instruction,
> > and uses a table on the side to record the original values in the
> > instruction.
> 
> I guess that would work; but wouldn't it be easier to just have
> calls.c tell the dwarf output code what the right offset is?

Would that actually work correctly when the stack adjustments are
combined?  Wouldn't we get the wrong number when different stack
adjustments were combined?

> > We could avoid nesting memcpy calls on ACCUMULATE_OUTGOING_ARGS
> > machines, in which case I think Alex's patch is unnecessary.
> >
> > I'm not sure about your option 1--Alex didn't say that he couldn't
> > reproduce the bug in mainline, he said he didn't have a test case for
> > the specific case of memcpy popping the arguments off the stack on
> > return.
> 
> I'm not sure how memcpy gets involved here.  memcpy doesn't throw, so
> there's no need for any args_size data for it---in fact, it's a small
> optimization bug that such data gets reflected in the EH information
> at all.
> 
> Alex said that he couldn't find a case that didn't involve memcpy.
> Maybe an appropriate 'fix' is to revert the patch and put an assert
> in calls.c to ensure that a nested function call must be nothrow?

That seems quite plausible to me.  Alex?

Ian


How do I teach GCC about automatic vec_concat and vec_select?

2006-08-18 Thread Erich Plondke

I'm doing some research on a pretty plain 32-bit RISC architecture that has
some extra facilities for doing vector operations.  Not exactly new, I know.

The difference with this one is that the vectors are pairs of normal
registers.

This isn't all that new; lots of architectures have normal register pair
loads and stores, lots of machines use pairs of registers to hold DI
values, and lots of architectures have the ability to do V2SI or V4HI
or V8QI on a 64-bit value... but usually they are special vector registers.

The main difference here is that we can access either half of
a V2SI result from any kind of vector operation (add, sub, shift, etc)
with any SI instruction without any additional copies or moves.

Similarly, several instructions can pick out either HI from any register,
which means that they can pick out any arbitrary element from a V2HI or
V4HI for free.

In the same way that we can use either 32-bit register of the pair as a source
for a normal SI instruction, if you allocate your registers such that
one operation lands in an even register and another operation lands in the odd
register next to it, you can use both of them together as a V2SI without any
additional modification... free vector formation.

As it turns out this is very handy when I'm writing code by hand, but I haven't
figured out a good way to teach GCC about it.  So my question is: what
strategy should I use to teach GCC about this?

Basically, (I think) all these are equivelant and free:

low word:
   (subreg:SI (match_operand:DI "register_pair_operand" "P") 0)
   (vec_select:SI (match_operand:V2SI "register_pair_operand" "P")
   (parallel [(const_int 0)]))
   (truncate:SI (match_operand:DI "register_pair_operand" "P"))

high word:
   (subreg:SI (match_operand:DI "register_pair_operand" "P") 1)
   (vec_select:SI (match_operand:V2SI "register_pair_operand" "P")
   (parallel [(const_int 1)]))
   (truncate:SI (lshiftrt:DI (match_operand:DI "register_pair_operand" "P")
(const_int 32)))

And maybe others, of course.  Similarly any two instructions with a destination
as as register_operand can get a free

   (vec_concat:V2SI (match_operand:SI "even_register_operand" "r")
   (match_operand:SI "odd_register_operand" "r"))

if the registers are placed in an even/odd pair.

So my theories so far are:

A) Generate large numbers of define_insn's that cross all the instructions
   with all the ways of generating inputs and outputs, and let
   the combiner try and figure it out.  Either by using scripts to
   generate things or trying to use mode macros...

B) Trying to make peephole patterns for these things.  The problem here
   is that all peephole stuff happens after register allocation,
   and then the likelyhood that we could do the even/odd pairing
   is much lower... so maybe add a "peephole3" pass that occurs before
   register allocation and attempts to peephole out all these
   formations of and extractions from vector types.

C) Generating the instructions to form or extract from vectors,
   telling GCC that they don't cost anything, and then trying
   to eliminate them later and rewrite registers... but that
   sounds scary and likely to cause problems deep in the bowels of
   reload.

I have the feeling that someone has already created a clever solution
to problems like this and it's probably best to ask...

It seems like several ports could benefit from this sort of thing.
For instance, it looks like the Sparc port has several peephole2's that
combine two SI loads that are adjacent in both memory and the regfile
and combines them into a ldd... but it looks like it will only work
if the register allocator randomly happens to pick adjacent registers
for the two loads.  I've looked in other ports but I still haven't
found what I'm looking for...

If anyone has any ideas, I'm all ears.

Thanks in advance,

   Erich

--
Why are ``tolerant'' people so intolerant of intolerant people?