Re: GCC 4.3 branch created, 4.4 opens for stage1

2008-02-19 Thread Paolo Bonzini

Ralf Wildenhues wrote:

* Jakub Jelinek wrote on Tue, Feb 19, 2008 at 12:18:02AM CET:

PR35218 - I believe the latest patch worked for the tester,
  so we now have a patch and just need an approval?


Yes, the patch is at

and the confirmation at



Patch is ok, please commit! (don't post generated files next time, also).

Paolo


Re: [PATCH, DOC] PR 31549: move -frtl-abstract-sequences description

2008-02-19 Thread Gabor Loki

Gabor Loki wrote:

Gerald Pfeifer wrote:

The last time I tried this on ARM it didn't work because there were
ICEs and it might have been fixed by now.

However searching on bugzilla found me these .

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33009
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33642


In the latter Richard wrote "I propose to kill all traces of it unless 
someone steps up and maintains this piece of code and we enable it for 
-Os".


Gabor, what are your plans?  Having a pass which seems to crash more
than anything else doesn't seem useful so I agree with Richard.


First of all I am sorry, but I was a really busy with other things in the past 
year.
I didn't have enough free time to keep track on CFO related stuffs.


I have just got confirmation that I can spend as many time as needed to
maintain CFO. :-)

So, I am going to fix those bugs.
Thanks for your patients.


--Gabor


Re: API for callgraph and IPA passes for whole program optimization

2008-02-19 Thread Diego Novillo

On 2/17/08 11:31 AM, Jan Hubicka wrote:


The plan would be to update passmanager first and transit main IPA passes
(inliner, constant propagation, alias analysis) on mainline.  The more advanced
IPA stuff, as struct reorg can go later since it is not a showstopper for first
incarnation of whole program optimizer anyway. On LTO branch we can do changes
related to memory management and overall pass queue organization.


Sounds reasonable.


 /* For IPA pass only. Analyze the given function body and produce summary
information.  */
 void analyze_function (struct cgraph_node *);
 /* For IPA pass only. Analyze the given variable initializer and produce
summary information.  */
 void analyze_variable (struct varpool_node *);
 /* For IPA pass only. Read summary from disk.  */
 void read_function_local_info (struct cgraph_node *, whatever parameters needed
by LTO implementation);
 void read_variable_local_info (struct varpool_node *, whatever parameters 
needed
  by LTO implementation);
 /* For IPA pass only. Apply changes to given function body.  */
 void modify_function (struct cgraph_node *);
 /* For IPA pass only. Apply changes to given variable body.  */
 void modify_body (struct varpool_node *);
 /* For IPA pass only. Write summary to disk.  */
 void write_function_local_info (struct cgraph_node *, whatever parameters
 needed by LTO implementation);
 void write_variable_local_info (struct varpool_node *, whatever parameters
 needed by LTO implementation);

I find 'analyze' for the first stage confusing.  We do no analysis 
there, we just produce summary info.  The analysis is actually done by 
what you call 'read'.  How about some variant of:


generate_summary_{function/variable}
analyze_{function/variable}
transform_{function/variable}

?


For implementing the stage C by whopr document (ie be able to produce
.o files with decisions from global optimization in them), we would 
also need two extra hook for reading and writing

ipa_optimization_info, but I would leave this out.


Note that besides these hooks we will also need the central driver for 
whole-program analysis.  My thinking is that this driver will be part of 
the IPA manager itself.  We may also want to write the optimization plan 
to a central file, instead of replicating it on every .o file.




I would propose doing this change along with killing RTL dump letter fields,
since most annoying change of this is actually updating all the initializers of
all GCC passes by hand.

   BTW what about instead of adding 8 NULL fields to each initializer adding a
   simple macro, like
 IPA_PASS (analyze_fun, analyze_var, write_fun, write_var, read_fun,
   read_var, execute, modify_fun, modify_var)
 LOCAL_PASS (execute_fun)
 RTL_PASS (execute_fun)
   macros so we don't have to go over it again?


Sounds good.


   I would be happy to do the non-macroized change however.

With these extra hooks, passmanager queue could be organized as follows:
   all_lowering_passes: executed per function as done now.
   all_early_ipa_passes: queue consisting of IPA passes with only execute
 function set.  Here we will do things like early inlining, early
 optimization and similar passes.
   all_interunit_ipa_passes: IPA passes with analyze/execute/modify pair.
 Pass manager will execute them after early_ipa_passes and will call
 all analyze hooks first.  Possibly followed by write hooks and
 exit, or with execute next and modify hooks last based on the fact if we
 do LTO or not.
   all_late_ipa_passes: If we opt for having late small IPA optimizer, we can
 put passes here.  Probably not in initial implementation.
   all_passes: Local optimization passes as we do now executed on topological
   order. This can be subpass of last pass of all_interunit_ipa_passes too.


Yes.


With LTO linktime optimization the queue will start with
all_interunit_ipa_passes with read hooks followed by execute and modify hooks.


Hmm, well.  This could even be on two or three separate compilation 
passes.  The first pass calls all the 'generate' hooks (this can be done 
via make -jN with all the initial .c files), a second pass calls all the 
analysis hooks (this is done by a single GCC invocation) and the third 
pass (also done via make -jN) calls all the modify hooks.


We could structure things so that:

$ gcc -flto -O2 *.c

does everything in one invocation.  But I would also like to support the 
model where we operate in separate phases.




So the plan is to turn IPCP and other passes from doing real cloning into
same virtual cloning.


Sounds good.


Thats about it.  I would welcome all comments and if it seems useful I can
turn it into wiki page adding details to the current implementation plan at 
wiki.


Thanks for the detailed plan.  Yes, please add it to the whopr wiki. 
The only aspects that are no

Re: Ada: building 4.3 cross-gnattools with gcc-4.2.1

2008-02-19 Thread Joel Sherrill

Arnaud Charlet wrote:

1) Is it supposed to work? (i.e. building an Ada cross compiler 4.3 by a
4.2 compiler)



No, as documented, you need a matching native compiler to build a cross Ada
compiler (e.g. build from the same 4.3 sources a native compiler first, and
then use it to build the cross compiler).

  

That's what I have been doing.  My procedure is:

+ build a native
+ build a C/C++ cross
+ build the Ada cross

With that I have managed to build 4.2.x and SVN trunk
and post ACATS for PowerPC, SPARC, and i386 on RTEMS.
SPARC and PowerPC look very good.  I am using qemu to
test the i386 and something isn't  quite right for i386.
It looks like it branches off to never never land and then
gets a fault. I haven't had a chance to look into it yet.

I posted results to gcc-testresults.

--joel

2) does it work to build 4.3 cross-gnattools with the current 4.3 Ada
compiler?



It does in general. I do not know for avr in particular.

  

3) Would a patch be accepted that makes the 4.3 gnattols be compilable
by 4.2 compiler?



In general, no. On a case by case basis, assuming the patch is simple and
does not make the code worse in terms of maintenance, then it can be OK.

Arno
  



--
Joel Sherrill, Ph.D. Director of Research & Development
[EMAIL PROTECTED]On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
  Support Available (256) 722-9985




Redundant logical operations left after early splitting

2008-02-19 Thread hutchinsonandy


I am stumped and hope someone more skilled can give me some clues to 
solve this problem I have  4.3/4.4 gcc.


I have created RTL define/splits for AVR port logical operation 
(AND,IOR etc). These split larger mode operations in to QImode 
operations. I also created similar splitters for zero_extend and some 
combined zero_extended shift operations.


All of these create QImode moves of subregs and all split before reload 
(ie unconditional).


They all split as expected at first oppertunity and produce expected 
RTL. Whoppee!


When code involves zero constants (created by zero_extend, or shift) I 
can see propagation of constant into following logical operations. (Os 
or O3 optimizations). Such as:


Rx = 0
OR Ry,Rx

becoming

OR Ry,0

The propagation is fine, BUT the creation of OR Ry,0  is a totally 
redundant operation and  remains intact thru all further passes into 
final code - apparently not being removed by any optimisations after 
split1 pass!


I created RTL pattern to remove these (splitting  OR Rx,0 into NOP) and 
that removes them - but surely this workaround should not be needed. I 
am stumped by what could be causing the problem. Help!


There also seem to be cases where zero constants are not propagated 
into instructions. Yet testcase only involves simple operands, no loops 
or conditionals or any other side effects that might be reason to block 
this. Can anyone suggest some non-obvious reasons for this?


Andy






More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: API for callgraph and IPA passes for whole program optimization

2008-02-19 Thread Kenneth Zadeck

>
> Thanks for the detailed plan.  Yes, please add it to the whopr wiki.
> The only aspects that are not too clear to me are what exactly do you
> plan to do in mainline.
>
> One idea would be to do all the basic framework during stage 1 and
> leave it in mainline.   I would suggest doing as much as possible in
> mainline, so that it's then pulled in by the LTO branch.
>
> Kenny, what do you expect we could pull out from the LTO branch for
> stage 1?  Does it make sense to open a new branch inheriting from LTO
> for this work?
>
>
I am a not an expert on branches and merging and such.  I worry about
trying to pull all of these different branch that are touching the same
parts of the compiler, back together at the end. 

I would prefer that honza work on the lto branch and that we selectively
pull patches from that and put them on the mainline. 

Kenny


> Thanks.  Diego.
>



Re: GCC 4.3 branch created, 4.4 opens for stage1

2008-02-19 Thread Paolo Bonzini



PR34950 - Jason/Mark, could you help with this?  It is 4.2/4.3
  regression, so perhaps doesn't need to hold the rc
PR35218 - I believe the latest patch worked for the tester,
  so we now have a patch and just need an approval?
PR35232 - I'm not sure I'm comfortable with a big reload patch
  this late
PR35239 - Rask, do you have a patch for this?


I've posted a fix for PR32009, which is 4-5% hit on gcc performance for 
powerpc-darwin.


I'm applying it to mainline in 12 hours (if bootstrap/regtest passes of 
course, but I'm confident); can I apply it on friday to 4.3-branch?


Paolo


MELT branch created, barely usable.

2008-02-19 Thread Basile STARYNKEVITCH

Hello All,

I just created the MELT branch (basically merging my own source tree), 
but it is barely usable.


See the wiki and my 2007 summit paper and
http://gcc.gnu.org/wiki/MiddleEndLispTranslator

The bootstrap is not completed, because:

   make bootstrap might not actually work

and most importantly, it should have a different meaning:

   the file gcc/warm-basilys.c (which does not exist yet! it is 
generated by "itself") should be compiled into a sort of plugin 
warm-basilys.so which is dynamically loaded by cc1 to be able to 
generate this very file gcc/warm-basilys.c from melt/warm-basilys.melt 
(which is still buggy and incomplete, and which I still "compile" to C 
with an horrible contrib/cold-basilys.lisp file (for CLISP)


So from a SubVersion point of view warm-basilys.c is like the configure 
files (generated, but in the source tree). I'm avoiding polluting the 
SubVersion repository with big generated files which are still buggy.


All this is *WORK IN PROGRESS*

Thanks for reading

--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: GCC 4.3.0 Status Report (2008-02-14)

2008-02-19 Thread Jakub Jelinek
On Mon, Feb 18, 2008 at 01:10:57PM -0800, Janis Johnson wrote:
> I changed argument passing and function return of generic vectors to be
> consistent with and without the AltiVec ABI for powerpc-linux  and
> powerpc64-linux, but in so doing inadvertently changed the behavior for
> other powerpc targets as well.  Meanwhile the trunk has been more and
> more frozen, so the latest attempt is to switch the default ABI with as
> little other change as possible.

But non-generic vectors are still ABI incompatible, right?
Say:
typedef int v __attribute__((vector_size (16)));
v foo (v a, v b)
{
  return a + b;
}

has incompatible function argument passing and return value conventions,
even with e.g. -m64 vs. -m64 -mcpu=970 (and the plan is to have the
same between -m32 vs. -m32 -mcpu=970).

Jakub


Re: GCC 4.3.0 Status Report (2008-02-14)

2008-02-19 Thread Janis Johnson
On Tue, 2008-02-19 at 01:34 +, Joseph S. Myers wrote:
> On Mon, 18 Feb 2008, Janis Johnson wrote:
> 
> > There are lots of inconsistencies in passing generic vectors as arguments
> > and return values, and I'll leave those alone until the PowerPC ELF ABI
> > group decides what to do with them.
> 
> Perhaps you'd care to recommend what the semantics *ought* to be, given 
> that they're currently inconsistent?  The provisional conclusion at the 
> last ABI call was to add ATR-SOFT-VECTOR-64 and ATR-SOFT-VECTOR-128 to the 
> ABI taxonomy, recognising that the existence of vector types does not 
> depend on the existence of vector registers, but we don't have any 
> associated ABI text to describe associated argument-passing and return 
> rules, only that for Altivec and SPE vectors conditional on ATR-ALTIVEC 
> and ATR-SPE respectively.

My recommendation is to have a very simple rule for passing arguments and
returning function results of vector types:

  With vector hardware support and an ABI that supports passing vectors
  in vector registers, vector types that map to hardware types are
  passed and returned in vector registers, except for unnamed arguments.

  In any other situation vector types are passed the same as aggregates
  of the same size.  Those other situations include:

no vector hardware support
no ABI support for passing vectors in vector registers
vector types smaller than hardware vector types
vector types larger than hardware vector types
vector types whose elements aren't supported by a hardware type
  (e.g. vector of double for AltiVec)
unnamed arguments

That ought to be a good starting point for discussion.

Janis



Re: GCC 4.3.0 Status Report (2008-02-14)

2008-02-19 Thread Janis Johnson
On Tue, 2008-02-19 at 18:47 +0100, Jakub Jelinek wrote:
> On Mon, Feb 18, 2008 at 01:10:57PM -0800, Janis Johnson wrote:
> > I changed argument passing and function return of generic vectors to be
> > consistent with and without the AltiVec ABI for powerpc-linux  and
> > powerpc64-linux, but in so doing inadvertently changed the behavior for
> > other powerpc targets as well.  Meanwhile the trunk has been more and
> > more frozen, so the latest attempt is to switch the default ABI with as
> > little other change as possible.
> 
> But non-generic vectors are still ABI incompatible, right?
> Say:
> typedef int v __attribute__((vector_size (16)));
> v foo (v a, v b)
> {
>   return a + b;
> }
> 
> has incompatible function argument passing and return value conventions,
> even with e.g. -m64 vs. -m64 -mcpu=970 (and the plan is to have the
> same between -m32 vs. -m32 -mcpu=970).

That's correct, and intended.  The AltiVec ABI specifies that
vector types that map to AltiVec vector types are passed and
returned in vector registers.  Without vector support they seemed
to be passed in rather random ways.  No one has complained about
that in the last several years, and the documentation says that
it's not well-defined or stable, so we'll wait to fix that until
the PowerPC ELF ABI group recommends what to do.

The important issue about the ABI for right now is not argument
passing, but saving and restoring registers.  As David points
out there is nothing that defines conventions for vector registers
if the AltiVec ABI is not used.  He asked that enabling AltiVec
support, either with -maltivec or -mcpu=970 or configuring to use
vector hardware by default, defaults to using the AltiVec ABI to
avoid using surprises by users who have no idea their code is using
AltiVec hardware and needs the AltiVec ABI.

Janis



Re: API for callgraph and IPA passes for whole program optimization

2008-02-19 Thread Jan Hubicka
> I find 'analyze' for the first stage confusing.  We do no analysis 
> there, we just produce summary info.  The analysis is actually done by 
> what you call 'read'.  How about some variant of:
> 
> generate_summary_{function/variable}
> analyze_{function/variable}
> transform_{function/variable}

We seem to have bit confusion here, but I guess it is just terminology
;)

What we want to do is I think clear.  The seqeunce should be:
 1) Look at function body (or variable) and produce some summary
 2) Do optional serialization of summary to disk and read back
 3) Perform interprocedural propagation based on knowledge of full
 callgraph and all the summaries
 4) Do whatever we concluded in 3) on the function body at a time it is
 being compiled.

So for terminology I tend to use:
1) is what I call analyze, since we do look at function and analyze its
local properties. I have no problem to call it generate_summary
3) is execute, since I want to use existing "execute" hook of
passmanager.  It is still having the meaning "do the real work of the
pass" so execute seems to match.
4) is called modify.

I have certainly no problem calling 1) generate_summary and 4)
transform hook.

But the "read" hook is really just intended to convert whatever is
on-disk format to in memory representation, so I don't see why it should
be called analyze_function/variable.

I see that the "execute" stage can be called "analyze", since we do the
IPA analysis to decide what we want to optimize, but then it would be
just "analyze" hook (without the function/variable variants) that walks
the callgraph and varpool himself, instead of being called on each
function/variable in isolation.
> 
> Note that besides these hooks we will also need the central driver for 
> whole-program analysis.  My thinking is that this driver will be part of 

Currently the job to drive compilation process is implemented in
cgraphunit and passmanager. I am leaning to plan to do as much work as
possible at passmanager side with cgraphunit and rest_of_compilation
being from large portion replaced by few extra passes added to queue.

I think this scheme scales to the planned IPA optimizer too: all we need
is to teach passmanger into the new hooks and reorganize the queue a
bit.  As we have now is:

 1) all_lowering_passes queue used by cgraphunit when constructing
 cgraph. I think it can stay this way
 2) all_ipa_passes driving all our IPA optimization
 3) all_passes together with some code in rest_of_compilation driving
 the local optimization.

I am slowly working towards making all_passes part of the ipa passes, so
cgraphunit will need to worry only about the initial analysis of
copmilation unit.

With IPA, I would propose adding the all_interunit_ipa_passes that will
be point where compilation will start with LTO frontend and end with
LTO compilation. 
> 
> Hmm, well.  This could even be on two or three separate compilation 
> passes.  The first pass calls all the 'generate' hooks (this can be done 
> via make -jN with all the initial .c files), a second pass calls all the 
> analysis hooks (this is done by a single GCC invocation) and the third 
> pass (also done via make -jN) calls all the modify hooks.
> 
> We could structure things so that:
> 
> $ gcc -flto -O2 *.c
> 
> does everything in one invocation.  But I would also like to support the 
> model where we operate in separate phases.

Yes, in order to be able to do "execute" (or analyze in your names)
hooks once and perform modify based on the results in parallel in other
compilation projects, we need to be able to write the optimization
decisions into summaries on disk.  In my terminology this is "function
summary" and "optimization summary".

The function summary is produced early via generate_summary hook and is
placed in cgraph->local field, while "optimization summary" is result of
"execute" hook and is placed in cgraph->global field or realized by
changing the callgraph itself (ie producing new clones, function and
such).

I think we will need another pair of read/write methods to serialize
optimization summary on disk either to the newly shipped .o files or to common
optimization decision file.

But I think we want to implement this incrementally: first do the model
optimizing everything in one linktime process (keeping in mind that we
will want to do more) and then implementing the distribution perhaps
based on Kenny's idea of duplicating all the analysis work in all nodes
or via shipping the newly built .o files.
> 
> 
> >So the plan is to turn IPCP and other passes from doing real cloning into
> >same virtual cloning.
> 
> Sounds good.
> 
> >Thats about it.  I would welcome all comments and if it seems useful I can
> >turn it into wiki page adding details to the current implementation plan 
> >at wiki.
> 
> Thanks for the detailed plan.  Yes, please add it to the whopr wiki. 
> The only aspects that are not too clear to me are what exactly do you 
> plan to do in mainline.
> 
> One idea would

-fpic support detection in testsuite

2008-02-19 Thread Jan Beulich
gcc/testsuite/lib/target-supports.exp checks whether the compiler spits
out any messages when using -fpic/-fPIC; this doesn't cover the case
where the compiler happily processes everything, but the linker cannot
deal with the result (in the given case, because the specific gas (x86) in
use accepts @ as a normal symbol character, and hence the usual
@ syntax doesn't yield the expected result; note
that the target doesn't really need PIC code, not does it support TLS,
thus all the constructs are really meaningless).

Should the testsuite not instead do a test whether all involved tools
are able to handle -fPIC and its results)? Or should the target simply
disallow -fPIC (and if so, how is that supposed to be done)?

Thanks, Jan



Re: API for callgraph and IPA passes for whole program optimization

2008-02-19 Thread Jan Hubicka
> Currently the job to drive compilation process is implemented in
> cgraphunit and passmanager. I am leaning to plan to do as much work as

BTW for a while I think that name of cgraphunit outlived its original
meaning.

Just as historical note, it was introduced so because some bits wasn't
possible to put into cgraph.c because frontends not converted to
function-at-a-time didn't link then (by not defining walk_tree, for
instance).  So I split cgraph into stuff required by middle end linked
in unconditionally and stuff required to drive unit-at-a-time
compilation linked only to unit-at-a-time frontends.

It probably should be moved to driver.c after unrelated
cgraph/passmanager bits are pulled out of that file.

Honza


[PATCH] Complex arithmetic changes

2008-02-19 Thread Janne Blomqvist
Hello,

the attached patch fixes PR:s c/35162 (just a documentation fix) and
fortran/29549.

c/35162 is just a small documentation fix where the documentation was
inconsistent with the code as well as C99.

It adds a new option -fcx-fortran-rules which sets flag_complex_method
to 1, and makes libgfortran use that switch, gaining a factor of 4
speedup on complex matrix multplication.

Regtested on i686-pc-linux-gnu, doc changes tested with 'make info' and
'make pdf'. Ok for trunk?

-- 
Janne Blomqvist
gcc ChangeLog:

2008-02-19  Janne Blomqvist  <[EMAIL PROTECTED]>

PR fortran/29549
PR c/35162

* doc/invoke.texi (-fcx-limited-range): Correct to be in line with
actual behaviour and C99.
(-fcx-fortran-rules): Document new option.
* toplev.c (process_options): Handle -fcx-fortran-rules.
* common.opt: Add documentation for -fcx-fortran-rules


libgfortran ChangeLog:

2008-02-19  Janne Blomqvist  <[EMAIL PROTECTED]>

PR fortran/29549

* Makefile.am: Add -fcx-fortran-rules to AM_CFLAGS for all of
libgfortran.
* Makefile.in: Regenerated.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 132440)
+++ gcc/doc/invoke.texi	(working copy)
@@ -320,7 +320,8 @@ Objective-C and Objective-C++ Dialects}.
 -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
 -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
 -fcheck-data-deps -fcprop-registers -fcrossjumping -fcse-follow-jumps @gol
--fcse-skip-blocks -fcx-limited-range -fdata-sections -fdce -fdce @gol
+-fcse-skip-blocks -fcx-fortran-rules -fcx-limited-range @gol
+-fdata-sections -fdce -fdce @gol
 -fdelayed-branch -fdelete-null-pointer-checks -fdse -fdse @gol
 -fearly-inlining -fexpensive-optimizations -ffast-math @gol
 -ffinite-math-only -ffloat-store -fforward-propagate @gol
@@ -6541,13 +6542,25 @@ implicitly converting it to double preci
 @item -fcx-limited-range
 @opindex fcx-limited-range
 When enabled, this option states that a range reduction step is not
-needed when performing complex division.  The default is
[EMAIL PROTECTED], but is enabled by @option{-ffast-math}.
+needed when performing complex division.  Also, there is no checking
+whether the result of a complex multiplication or division is NaN +
+I*NaN, with an attempt to rescue the situation in that case.
+The default is @option{-fno-cx-limited-range}, but is enabled by
[EMAIL PROTECTED]
 
 This option controls the default setting of the ISO C99
 @code{CX_LIMITED_RANGE} pragma.  Nevertheless, the option applies to
 all languages.
 
[EMAIL PROTECTED] -fcx-fortran-rules
[EMAIL PROTECTED] fcx-fortran-rules
+Complex multiplication and division follow Fortran rules.  Range
+reduction is done as part of complex division, but there is no checking
+whether the result of a complex multiplication or division is NaN +
+I*NaN, with an attempt to rescue the situation in that case.
+
+The default is @option{-fno-cx-fortran-rules}.
+
 @end table
 
 The following options control optimizations that may improve
Index: gcc/toplev.c
===
--- gcc/toplev.c	(revision 132440)
+++ gcc/toplev.c	(working copy)
@@ -2001,6 +2001,10 @@ process_options (void)
   if (flag_cx_limited_range)
 flag_complex_method = 0;
 
+  /* With -fcx-fortran-rules, we do something in-between cheap and C99.  */
+  if (flag_cx_fortran_rules)
+flag_complex_method = 1;
+
   /* Targets must be able to place spill slots at lower addresses.  If the
  target already uses a soft frame pointer, the transition is trivial.  */
   if (!FRAME_GROWS_DOWNWARD && flag_stack_protect)
Index: gcc/common.opt
===
--- gcc/common.opt	(revision 132440)
+++ gcc/common.opt	(working copy)
@@ -390,6 +390,10 @@ fcx-limited-range
 Common Report Var(flag_cx_limited_range) Optimization
 Omit range reduction step when performing complex division
 
+fcx-fortran-rules
+Common Report Var(flag_cx_fortran_rules) Optimization
+Complex multiplication and division follow Fortran rules
+
 fdata-sections
 Common Report Var(flag_data_sections) Optimization
 Place data items into their own section
Index: libgfortran/Makefile.am
===
--- libgfortran/Makefile.am	(revision 132440)
+++ libgfortran/Makefile.am	(working copy)
@@ -28,6 +28,9 @@ AM_CPPFLAGS = -iquote$(srcdir)/io -I$(sr
 	  -I$(srcdir)/$(MULTISRCTOP)../gcc/config \
 	  -I$(MULTIBUILDTOP)../../$(host_subdir)/gcc -D_GNU_SOURCE
 
+# Fortran rules for complex multiplication and division
+AM_CFLAGS += -fcx-fortran-rules
+
 gfor_io_src= \
 io/close.c \
 io/file_pos.c \
Index: libgfortran/Makefile.in
===
--- libgfortran/Makefile.in	(revision 132440)
+++ lib

Re: Redundant logical operations left after early splitting

2008-02-19 Thread Jeff Law

[EMAIL PROTECTED] wrote:


I am stumped and hope someone more skilled can give me some clues to 
solve this problem I have  4.3/4.4 gcc.


I have created RTL define/splits for AVR port logical operation (AND,IOR 
etc). These split larger mode operations in to QImode operations. I also 
created similar splitters for zero_extend and some combined 
zero_extended shift operations.


All of these create QImode moves of subregs and all split before reload 
(ie unconditional).


They all split as expected at first oppertunity and produce expected 
RTL. Whoppee!


When code involves zero constants (created by zero_extend, or shift) I 
can see propagation of constant into following logical operations. (Os 
or O3 optimizations). Such as:


Rx = 0
OR Ry,Rx

becoming

OR Ry,0

The propagation is fine, BUT the creation of OR Ry,0  is a totally 
redundant operation and  remains intact thru all further passes into 
final code - apparently not being removed by any optimisations after 
split1 pass!


I created RTL pattern to remove these (splitting  OR Rx,0 into NOP) and 
that removes them - but surely this workaround should not be needed. I 
am stumped by what could be causing the problem. Help!


There also seem to be cases where zero constants are not propagated into 
instructions. Yet testcase only involves simple operands, no loops or 
conditionals or any other side effects that might be reason to block 
this. Can anyone suggest some non-obvious reasons for this?

You'll need to look at the RTL dumps to determine why these redundant
operations are not being removed or why some propagations aren't being
performed.

GCC certainly has code to do things like remove X IOR 0, so you just
need to figure out why it isn't triggering.  If your code had lots of
SUBREGs, then that's definitely worth investigating -- many of GCC's
optimizers aren't particularly adept at dealing with SUBREGs.

jeff


SSA alias representation

2008-02-19 Thread Fran Baena
Hi everybody,

i am studing how gcc carries out Alias Representation and some questions appear.

For instance, given this code portion:

 if ( ... )
   p  = &a;
 else
   if ( ... )
 p = &b;
   else
 p = &c;

 a = 5;
 b = 3;
 d = *p4;

My questions are:

- both p like *p need a Name Memory Tag structure? It is enough with only one?

- About versioning : every name can be versioned, cannot it?
   - when are virtual operands inserted? Do they have to be versioned
at same time that real operands? For instance, # a6 = VDEF ; a7 =
3;, this implies that alias computation is processed at same time that
SSA Renaming?

   - Could the others names (TMT, NMT,  SFT, etc.) be versioned?
(every name that is able to be part of a virtual operand could be
versioned)

Thank you in advance,

Fran


Re: Redundant logical operations left after early splitting

2008-02-19 Thread hutchinsonandy

Jeff,

thanks for help - I desparately need ideas - this problem is driving me 
nuts


The RTL for IOR Rx,0 does use subregs  (since I use simplify_gen_subreg 
in splitter.)


Perhaps I should  generate new pseudo QI registers instead before 
reload?


Is there any particular function or pass that should be dealing with 
IOR rx,0 - that I could trace thru and figure out why it does not like 
it (or never gets there)?


Andy



-Original Message-
From: Jeff Law <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: gcc@gcc.gnu.org
Sent: Tue, 19 Feb 2008 2:12 pm
Subject: Re: Redundant logical operations left after early splitting


[EMAIL PROTECTED] wrote: 
> I am stumped and hope someone more skilled can give me some clues 

to > solve this problem I have 4.3/4.4 gcc. 
> I have created RTL define/splits for AVR port logical operation 
(AND,IOR > etc). These split larger mode operations in to QImode 
operations. I also > created similar splitters for zero_extend and some 
combined > zero_extended shift operations. 
> All of these create QImode moves of subregs and all split before 

reload > (ie unconditional). 
> They all split as expected at first oppertunity and produce 

expected > RTL. Whoppee! 
> When code involves zero constants (created by zero_extend, or 
shift) I > can see propagation of constant into following logical 
operations. (Os > or O3 optimizations). Such as: 

> Rx = 0 
OR Ry,Rx 
> becoming 
> OR Ry,0 
> The propagation is fine, BUT the creation of OR Ry,0 is a totally > 
redundant operation and remains intact thru all further passes into > 
final code - apparently not being removed by any optimisations after > 
split1 pass! 
> I created RTL pattern to remove these (splitting OR Rx,0 into NOP) 
and > that removes them - but surely this workaround should not be 
needed. I > am stumped by what could be causing the problem. Help! 
> There also seem to be cases where zero constants are not propagated 
into > instructions. Yet testcase only involves simple operands, no 
loops or > conditionals or any other side effects that might be reason 
to block > this. Can anyone suggest some non-obvious reasons for this? 

You'll need to look at the RTL dumps to determine why these redundant 
operations are not being removed or why some propagations aren't being 
performed. 
 
GCC certainly has code to do things like remove X IOR 0, so you just 
need to figure out why it isn't triggering. If your code had lots of 
SUBREGs, then that's definitely worth investigating -- many of GCC's 
optimizers aren't particularly adept at dealing with SUBREGs. 
 
jeff 



More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: [PATCH] Complex arithmetic changes

2008-02-19 Thread Janne Blomqvist
Janne Blomqvist wrote:
> Hello,
> 
> the attached patch fixes PR:s c/35162 (just a documentation fix) and
> fortran/29549.

Uh, sorry, wrong list (I intended to send to gcc-patches and fortran).

-- 
Janne Blomqvist



signature.asc
Description: OpenPGP digital signature


Re: Redundant logical operations left after early splitting

2008-02-19 Thread Jeff Law

[EMAIL PROTECTED] wrote:

The RTL for IOR Rx,0 does use subregs  (since I use simplify_gen_subreg 
in splitter.)


Perhaps I should  generate new pseudo QI registers instead before reload?

It's been a long time, but yes, you could look into creating new
registers if you're early into in the optimization pipeline.  Your
alternative is to extend the optimizers to better handle subregs
better.


The latter is far more general and would probably help in numerous
situations and is definitely worth a looksie to see if it can be
done easily.




Is there any particular function or pass that should be dealing with IOR 
rx,0 - that I could trace thru and figure out why it does not like it 
(or never gets there)?

I would be looking in combine and simplify-rtx (which is called by
combine).  If your splitter triggers after combine, then I'm not
immediately sure where to look -- I'm not offhand aware of a pass
after combine which would call into simplify-rtx to perform this
optimization.

jeff


Re: GCC 4.3 branch created, 4.4 opens for stage1

2008-02-19 Thread Ralf Wildenhues
* Paolo Bonzini wrote on Tue, Feb 19, 2008 at 01:28:05PM CET:
> Ralf Wildenhues wrote:
> >* Jakub Jelinek wrote on Tue, Feb 19, 2008 at 12:18:02AM CET:
> >>PR35218 - I believe the latest patch worked for the tester,
> >>  so we now have a patch and just need an approval?
> >
> >Yes, the patch is at
> >
> >and the confirmation at
> >
> 
> Patch is ok, please commit! (don't post generated files next time, also).

Erm, so I committed the patch to trunk and 4.2 now, and marked the PR as
fixed.  But I forgot that the 4.3 branch has opened in the meantime.  I
read that it needs special approval.

So does you above OK also extend to the 4.3 branch?

Thanks, and sorry if the question sounds pedantic,
Ralf


Re: Redundant logical operations left after early splitting

2008-02-19 Thread hutchinsonandy

Jeff,

thanks again for suggestions.

As I understand it (perhaps wrongly), actual splitting only occurs 
after combine pass (by split1 pass).


The combine, does, match one of my patterns (lshift:SI 
(zero_extend:QI), 8/16/24). The other splitters (zero_extend) are 
"matched" initially - since they are standard insn.


I dont think this matters, if neither is split until split1 - (or does 
combine perform a split that is hidden from the dump files?)


If your description is correct, you may have answered the question - 
combine is before split1 so there are no further optimisation perfromed 
on any split - since simplfy-rtx is never called.


I could use expand - but I know that will cause a world of hurt by 
missing optimisations at the "word" level. So new  psuedos seems my 
only way forward (at the target level)


Which optimiser/pass would benefit from handling subregs better?

best regards

Andy




-Original Message-
From: Jeff Law <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: gcc@gcc.gnu.org
Sent: Tue, 19 Feb 2008 2:47 pm
Subject: Re: Redundant logical operations left after early splitting


[EMAIL PROTECTED] wrote: 
 
The RTL for IOR Rx,0 does use subregs (since I use 

simplify_gen_subreg > in splitter.) 
> Perhaps I should generate new pseudo QI registers instead before 

reload? 
It's been a long time, but yes, you could look into creating new 
registers if you're early into in the optimization pipeline. Your 
alternative is to extend the optimizers to better handle subregs 
better. 
 
The latter is far more general and would probably help in numerous 
situations and is definitely worth a looksie to see if it can be 
done easily. 
 
> Is there any particular function or pass that should be dealing 
with IOR > rx,0 - that I could trace thru and figure out why it does 
not like it > (or never gets there)? 

I would be looking in combine and simplify-rtx (which is called by 
combine). If your splitter triggers after combine, then I'm not 
immediately sure where to look -- I'm not offhand aware of a pass 
after combine which would call into simplify-rtx to perform this 
optimization. 
 
jeff 



More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: -fpic support detection in testsuite

2008-02-19 Thread Janis Johnson
On Tue, 2008-02-19 at 17:04 +, Jan Beulich wrote:
> gcc/testsuite/lib/target-supports.exp checks whether the compiler spits
> out any messages when using -fpic/-fPIC; this doesn't cover the case
> where the compiler happily processes everything, but the linker cannot
> deal with the result (in the given case, because the specific gas (x86) in
> use accepts @ as a normal symbol character, and hence the usual
> @ syntax doesn't yield the expected result; note
> that the target doesn't really need PIC code, not does it support TLS,
> thus all the constructs are really meaningless).
> 
> Should the testsuite not instead do a test whether all involved tools
> are able to handle -fPIC and its results)? Or should the target simply
> disallow -fPIC (and if so, how is that supposed to be done)?

Procedure check_effective_target_fpic invokes check_no_compiler_messages
with "object" but you want it to use "executable" instead.  The support
is there, try changing the call and see if it works for you.

Janis



New GCC ICI v0.9.5 (bug fixes + new examples)

2008-02-19 Thread Grigori Fursin
Hi all,

Just a small note, that we released a new GCC-ICI (Interactive
Compilation Interface) version 0.9.5. It allows function-level
optimization and specialization by selecting or reordering
only appropriate passes. It uses external plugins to monitor
and improve default compiler optimization heuristic. 

The new ICI is used in the MILEPOST project to automatically
learn how to optimize programs using machine learning. It is
merged with the Program Feature Extractor from IBM Haifa
and will soon be available under the GCC MILEPOST branch.

More information can be found here:
http://gcc-ici.sourceforge.net
http://www.milepost.eu

Yours,
Grigori Fursin

=
Grigori Fursin, PhD
Research Scientist, INRIA, France
http://fursin.net/research




Re: GCC 4.3 branch created, 4.4 opens for stage1

2008-02-19 Thread Gerald Pfeifer
On Tue, 19 Feb 2008, Ralf Wildenhues wrote:
> Erm, so I committed the patch to trunk and 4.2 now, and marked the PR as
> fixed.  But I forgot that the 4.3 branch has opened in the meantime.  I
> read that it needs special approval.
> 
> So does you above OK also extend to the 4.3 branch?

Heh, by committing to the 4.2 branch you made this a regression in 4.3
now. :-}  It's not in any GCC 4.2 release yet, so not a high priority
for GCC 4.3.0, I guess, but would be nice to see this in GCC 4.3.1 at
least (after a bit more exposure in HEAD perhaps).

My 2 cent.

Gerald


Re: GCC 4.3 branch created, 4.4 opens for stage1

2008-02-19 Thread Joe Buck
On Tue, Feb 19, 2008 at 10:11:41PM +0100, Ralf Wildenhues wrote:
> * Gerald Pfeifer wrote on Tue, Feb 19, 2008 at 10:06:14PM CET:
> > On Tue, 19 Feb 2008, Ralf Wildenhues wrote:
> > > 
> > > So does you above OK also extend to the 4.3 branch?
> > 
> > Heh, by committing to the 4.2 branch you made this a regression in 4.3
> > now. :-}  It's not in any GCC 4.2 release yet, so not a high priority
> > for GCC 4.3.0, I guess, but would be nice to see this in GCC 4.3.1 at
> > least (after a bit more exposure in HEAD perhaps).
> 
> Jakub gave me ok for 4.2 branch, and I committed there.

I guess we're OK as long as 4.3.1 comes out before the next 4.2.x release.
But I think in general we should avoid this kind of thing (creating a
new regression by fixing a bug in 4.2.x and not in 4.3.x).


Re: Redundant logical operations left after early splitting

2008-02-19 Thread Jeff Law

[EMAIL PROTECTED] wrote:

Jeff,

thanks again for suggestions.

As I understand it (perhaps wrongly), actual splitting only occurs after 
combine pass (by split1 pass).

Combine has some limited splitting capabilities.  For example it can
try to combine 3 insns, which might not match a pattern, but can match
a splitter which generates 2 insns.  I don't recall the other cases
where combine splits, nor do I offhand recall if the split insns are
ever simplified.  Search for "split" in combine.c



If your description is correct, you may have answered the question - 
combine is before split1 so there are no further optimisation perfromed 
on any split - since simplfy-rtx is never called.


I could use expand - but I know that will cause a world of hurt by 
missing optimisations at the "word" level. So new  psuedos seems my only 
way forward (at the target level)


Which optimiser/pass would benefit from handling subregs better?

Well, if you get it into simplify-rtx, then it can be used by multiple
passes.  Look for something like simplify_binary.   A well placed
conditional breakpoint can be invaluable.

jeff


Re: GCC 4.3 branch created, 4.4 opens for stage1

2008-02-19 Thread Ralf Wildenhues
* Gerald Pfeifer wrote on Tue, Feb 19, 2008 at 10:06:14PM CET:
> On Tue, 19 Feb 2008, Ralf Wildenhues wrote:
> > 
> > So does you above OK also extend to the 4.3 branch?
> 
> Heh, by committing to the 4.2 branch you made this a regression in 4.3
> now. :-}  It's not in any GCC 4.2 release yet, so not a high priority
> for GCC 4.3.0, I guess, but would be nice to see this in GCC 4.3.1 at
> least (after a bit more exposure in HEAD perhaps).

Jakub gave me ok for 4.2 branch, and I committed there.

Cheers,
Ralf


Re: GCC 4.3 branch created, 4.4 opens for stage1

2008-02-19 Thread Mark Mitchell

Joe Buck wrote:

On Tue, Feb 19, 2008 at 10:11:41PM +0100, Ralf Wildenhues wrote:

* Gerald Pfeifer wrote on Tue, Feb 19, 2008 at 10:06:14PM CET:

On Tue, 19 Feb 2008, Ralf Wildenhues wrote:

So does you above OK also extend to the 4.3 branch?

Heh, by committing to the 4.2 branch you made this a regression in 4.3
now. :-}  It's not in any GCC 4.2 release yet, so not a high priority
for GCC 4.3.0, I guess, but would be nice to see this in GCC 4.3.1 at
least (after a bit more exposure in HEAD perhaps).

Jakub gave me ok for 4.2 branch, and I committed there.


I guess we're OK as long as 4.3.1 comes out before the next 4.2.x release.
But I think in general we should avoid this kind of thing (creating a
new regression by fixing a bug in 4.2.x and not in 4.3.x).


We really do not want that to happen.

This is also why we decided relatively recently (after the last release) 
to try to keep release branches releasing around the same time; in other 
words, to avoid releasing 4.3.0 today, 4.2.4 a month later, and 4.3.1 
two months after that.  That creates a period of time where 4.2.4 may 
have a bug fix, but no 4.3.x release does.


Please don't commit patches to one release branch unless you are also 
patching all later branches.


Thanks,

--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: -fpic support detection in testsuite

2008-02-19 Thread Hans-Peter Nilsson
On Tue, 19 Feb 2008, Jan Beulich wrote:
> Should the testsuite not instead do a test whether all involved tools
> are able to handle -fPIC and its results)?

*shrug*  Then, would you want to build a dso or just link
something compiled with -fpic/-fPIC or both?

(I guess just the link; save the other case for some other
effective_target test.)

> Or should the target simply
> disallow -fPIC (and if so, how is that supposed to be done)?

If you go this route, see error call in cris_override_options
(OVERRIDE_OPTIONS worker).

brgds, H-P


how to correct logs on my MELT branch?

2008-02-19 Thread Basile STARYNKEVITCH

Hello All,

On my MELT branch http://gcc.gnu.org/wiki/MiddleEndLispTranslator
I committed some code with badly formatted (really free form) logs.

How can I correct the logs now to put them in ChangeLog format?

Regards.
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


RE: Redundant logical operations left after early splitting

2008-02-19 Thread Dave Korn
On 19 February 2008 19:26, [EMAIL PROTECTED] wrote:

> The RTL for IOR Rx,0 does use subregs  (since I use simplify_gen_subreg
> in splitter.)

  Are there any notes on any of these insns?

cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: GCC 4.3 branch created, 4.4 opens for stage1

2008-02-19 Thread Janis Johnson
On Mon, 2008-02-18 at 18:18 -0500, Jakub Jelinek wrote:
> Hi!
> 
> As I've mentioned last week, I've created branches/gcc-4_3-branch.
> The trunk is now 4.4 stage 1, the branch is open for regression bugfixes
> and documentation fixes only, but additionally all checkings require
> RM approval in addition to normal approval.
> Before the release candidate is cut, it would be good to fix the 4 P1
> bugs we have now:

> and:

> ppc-linux -maltivec stuff - assuming a solution is agreed on quickly

See http://gcc.gnu.org/ml/gcc-patches/2008-02/msg00802.html for
tested version of latest (final?) patch.

Janis



Re: Redundant logical operations left after early splitting

2008-02-19 Thread hutchinsonandy



Dave and Jeff,

(sorry if you get more than one copy of this email,  it's playing up!)

Here are more details and I have include testcase, splitter patterns and
RTL dump to show problem in more detail.


The testcase is:


unsigned long f (unsigned char  *P)

{

unsigned long C;

C  = ((unsigned long)P[1] << 24)

   | ((unsigned long)P[2] << 16)

   | ((unsigned long)P[3] <<  8)

   | ((unsigned long)P[4] <<  0);

return C;

}


which normally produce horrible code with no significant impact of
optimisation.


To solve this, back end patterns for zero_extend and the lshift by
multiples of 8 were split into QImode moves.

The hope was that gcc would then collapse the QImode expression list
such as  0|0|0|x or 0|0|y|0 into simple moves.

Here are splitter patterns (followed by more stuff):


;; xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x 
xx<---x


;; zero extend

(define_insn_and_split "zero_extendqi2"

[(set (match_operand:HIDI 0 "nonimmediate_operand" "")

 (zero_extend:HIDI (match_operand:QI 1 "nonimmediate_operand" 
"")))]


""

"#"

""

[(const_int 0)]

"

int i;

  enum machine_mode dmode = GET_MODE (operands[0]);

  int dsize = GET_MODE_SIZE (dmode);

  enum machine_mode smode = GET_MODE (operands[1]);

  int ssize = GET_MODE_SIZE (smode);

  rtx dword =  simplify_gen_subreg (smode, operands[0], dmode, 0);

  emit_move_insn (dword, operands[1]);

  for (i = ssize; i < dsize; i++)

  {

 rtx dbyte =  simplify_gen_subreg (QImode, operands[0], dmode, 
i);


  emit_move_insn (dbyte, const0_rtx);

  }

  DONE;

")


;byte shift is just a series of moves

;check src OR dest is in register, so the move will be ok


(define_insn_and_split "ashl3_const2p"

[(set (match_operand:HIDI 0 "nonimmediate_operand""")


  (ashift:HIDI (match_operand:HIDI 1 "register_operand"  "")

 (match_operand:HIDI 2 "const_int_operand" "i"))

)

  ]

"((INTVAL (operands[2]) % 8) == 0)

 "

"#"

""

[(const_int 0)]

{

  int i;

  enum machine_mode mode;

  mode = GET_MODE(operands[0]);

  int size = GET_MODE_SIZE(mode);

  HOST_WIDE_INT x = INTVAL (operands[2]);
  rtx dbytes[8], sbytes[8];

  for (i = 0; i < size; i++)

  {

  dbytes[i] =  simplify_gen_subreg (QImode, operands[0], mode, i);

  sbytes[i] =  simplify_gen_subreg (QImode, operands[1], mode, i);

  }

  int shift = x / 8;

  if (shift > size) shift = size;

  for (i = shift; i < size; i++)

  {

  emit_move_insn (dbytes[i], sbytes[i - shift]);
  }

  for (i = 0; i < shift; i++)

  {

  emit_move_insn (dbytes[i], const0_rtx);
  }

}

)


;


The result kinda works but we are left with OR x,0 (and some missed
opportunities to propagate zero constant forward into OR)


The spliiters are matched up initially (zero_extend) or at combine -
just as expected.


All the subregs appear as expected in split1. Naturally this produces a
bunch of QI subregs many of which  contain zero. No real change happens
in RTL  until local register allocation (lreg dump file). There are no
redundant  IOR Rm,0 in dump files before lreg pass. The only note  is a
reg dead on the pointer argument when it gets moved to a pointer
register. (no reg equals or other dead notes until lreg pass)


In the lreg dump file  I can see  the propagation of  many (but not all)
constant 0 forward into the IOR instructions (eg Rn = 0, Rm= Rm | Rn
=>  Rm = Rm|0).  These remains in RTL and are output into final code.
Loads  of zero into registers which end up being unused are removed in
latter passes.


I can remove IOR Rm,0 with a targetted splitter to create a NOP - which
is my last resort.


So here is lreg dump extract:



;; Function f (f)


starting the processing of deferred insns

ending the processing of deferred insns

df_analyze called

df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 (
1)


df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 (
1)




Pass 0


Register 42 costs: POINTER_X_REGS:0 POINTER_Y_REGS:0 POINTER_Z_REGS:0
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:8000 SIMPLE_LD_REGS:8000
LD_REGS:8000 NO_LD_REGS:8000 GENERAL_REGS:8000 ALL_REGS:1 MEM:2

Register 58 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000

Register 59 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000

Register 60 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000

Register 61 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0
LD_REGS:0 NO_LD_REGS:0 GENERAL_REGS:0 ALL_REGS:16000 MEM:16000

Register 62 costs: 

Re: Redundant logical operations left after early splitting

2008-02-19 Thread Jeff Law

[EMAIL PROTECTED] wrote:



Dave and Jeff,

(sorry if you get more than one copy of this email,  it's playing up!)

Here are more details and I have include testcase, splitter patterns and
RTL dump to show problem in more detail.

[ ... ]
Can you send the .combine dump as well as the dump for whatever pass
runs immediately before local-alloc?

Thanks,
JEff


Re: Redundant logical operations left after early splitting

2008-02-19 Thread Andy H
After some digging, I can confirm local-alloc.c is creating OR Rx,0 
instructions but not simplifying them
local-alloc.c is not the problem - but right now  it the only help I'm 
getting for post split optimization.


This occurs when source registers are replaced with equivalent constant 
using validate_replace_rtx() (which has very minimal simplifications)


I added validate_simplify_rtx() after the normal 
update_equiv_regs/validate_replace_rtx()  and the OR Rx,0 got removed.


I also found that the limited propagation of constants is also due to 
limitations of local-alloc.c. In particular two restrictions:


1) Constants are not propagated into  operands that are both input and 
output. For example:

Ra = 0
Ra=Ra | Rb

Not sure why - maybe just deemed too difficult.

2) The method used only replaces the first use in a daisy chain of 
moves. So if we have


Ra = 0
Rb = Ra
Rc = Rc | Rb

it will only reduce to:

Rb = 0
Rc = Rc | Rb

rather than

Rc = Rc | 0

and ideally

*NOTHING*

Propagating  REG_EQUIV notes across register-register moves would seem 
to a obviously simple way to fix this. Thoughts?
I am not sure local-alloc is the best place to address the overall 
problem, I doubt it is intended to provide such optimizations.

An additional cse pass after split would seem a better way perhaps?

Andy






Re: Redundant logical operations left after early splitting

2008-02-19 Thread hutchinsonandy

RTL dumps for Combine pass and ASMCONS (last one before local-alloc)


COMBINE


;; Function f (f)



starting the processing of deferred insns

ending the processing of deferred insns

df_analyze called

insn_cost 2: 4

insn_cost 6: 16

insn_cost 7: 12

insn_cost 8: 16

insn_cost 9: 16

insn_cost 10: 16

insn_cost 11: 16

insn_cost 12: 16

insn_cost 13: 16

insn_cost 14: 16

insn_cost 15: 16

insn_cost 35: 4

insn_cost 36: 4

insn_cost 37: 4

insn_cost 38: 4

insn_cost 26: 0

Failed to match this instruction:

(parallel [

   (set (reg:SI 44)

   (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24 [ P ])

   (const_int 1 [0x1])) [0 S1 A8])))

   (set (reg/v/f:HI 42 [ P ])

   (reg:HI 24 r24 [ P ]))

   ])

Failed to match this instruction:

(parallel [

   (set (reg:SI 44)

   (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24 [ P ])

   (const_int 1 [0x1])) [0 S1 A8])))

   (set (reg/v/f:HI 42 [ P ])

   (reg:HI 24 r24 [ P ]))

   ])

Failed to match this instruction:

(set (reg:SI 45)

   (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ])

   (const_int 1 [0x1])) [0 S1 A8]))

   (const_int 24 [0x18])))

Failed to match this instruction:

(parallel [

   (set (reg:SI 45)

   (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24
[ P ])

   (const_int 1 [0x1])) [0 S1 A8]))

   (const_int 24 [0x18])))

   (set (reg/v/f:HI 42 [ P ])

   (reg:HI 24 r24 [ P ]))

   ])

Failed to match this instruction:

(parallel [

   (set (reg:SI 45)

   (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24
[ P ])

   (const_int 1 [0x1])) [0 S1 A8]))

   (const_int 24 [0x18])))

   (set (reg/v/f:HI 42 [ P ])

   (reg:HI 24 r24 [ P ]))

   ])

Successfully matched this instruction:

(set (reg/v/f:HI 42 [ P ])

   (reg:HI 24 r24 [ P ]))

Failed to match this instruction:

(set (reg:SI 45)

   (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24 [ P ])

   (const_int 1 [0x1])) [0 S1 A8]))

   (const_int 24 [0x18])))

Failed to match this instruction:

(set (reg:SI 47)

   (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ])

   (const_int 2 [0x2])) [0 S1 A8]))

   (const_int 16 [0x10])))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 44)

   (const_int 24 [0x18]))

   (reg:SI 47)))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 46)

   (const_int 16 [0x10]))

   (reg:SI 45)))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42
[ P ])

   (const_int 1 [0x1])) [0 S1 A8]))

   (const_int 24 [0x18]))

   (reg:SI 47)))

Successfully matched this instruction:

(set (reg:SI 45)

   (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ])

   (const_int 1 [0x1])) [0 S1 A8])))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 45)

   (const_int 24 [0x18]))

   (reg:SI 47)))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42
[ P ])

   (const_int 2 [0x2])) [0 S1 A8]))

   (const_int 16 [0x10]))

   (reg:SI 45)))

Successfully matched this instruction:

(set (reg:SI 47)

   (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ])

   (const_int 2 [0x2])) [0 S1 A8])))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 47)

   (const_int 16 [0x10]))

   (reg:SI 45)))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 46)

   (const_int 16 [0x10]))

   (ashift:SI (reg:SI 44)

   (const_int 24 [0x18]

Successfully matched this instruction:

(set (reg:SI 47)

   (ashift:SI (reg:SI 44)

   (const_int 24 [0x18])))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 46)

   (const_int 16 [0x10]))

   (reg:SI 47)))

Failed to match this instruction:

(set (reg:SI 50)

   (ior:SI (ior:SI (reg:SI 45)

   (reg:SI 47))

   (reg:SI 49)))

Failed to match this instruction:

(set (reg:SI 50)

   (ior:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ])

   (const_int 4 [0x4])) [0 S1 A8]))

   (reg:SI 48)))

Failed to match this instruction:

(set (reg:SI 50)

   (ior:SI (ior:SI (ashift:SI (reg:SI 44)

   (const_int 24 [0x18]))

   (reg:SI 47))

   (reg:SI 49)))

Successfully matched this instruction:

(set (reg:SI 48)

   (ashift:SI (reg:SI 44)

   (const_int 24 [0x18])))

Failed to match this instruction:

(set (reg:SI 50)

   (ior:SI (ior:SI (reg:SI 48)

   (reg:SI 47

Re: Redundant logical operations left after early splitting

2008-02-19 Thread Andy H

Dave and Jeff,

Here are more details and I have include testcase, splitter patterns and 
RTL dump to show problem in more detail.


The testcase is:

unsigned long f (unsigned char  *P)
{
 unsigned long C;
 C  = ((unsigned long)P[1] << 24)
| ((unsigned long)P[2] << 16)
| ((unsigned long)P[3] <<  8)
| ((unsigned long)P[4] <<  0);
 return C;
}

which normally produce horrible code with no significant impact of 
optimisation.


To solve this, back end patterns for zero_extend and the lshift by 
multiples of 8 were split into QImode moves.
The hope was that gcc would then collapse the QImode expression list 
such as  0|0|0|x or 0|0|y|0 into simple moves.

Here are splitter patterns (followed by more stuff):

;; xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x
;; zero extend
(define_insn_and_split "zero_extendqi2"
 [(set (match_operand:HIDI 0 "nonimmediate_operand" "")
   (zero_extend:HIDI (match_operand:QI 1 "nonimmediate_operand" "")))]
 ""
 "#"
 ""
 [(const_int 0)]
 "
 int i;
   enum machine_mode dmode = GET_MODE (operands[0]);
   int dsize = GET_MODE_SIZE (dmode);
   enum machine_mode smode = GET_MODE (operands[1]);
   int ssize = GET_MODE_SIZE (smode);
   rtx dword =  simplify_gen_subreg (smode, operands[0], dmode, 0);
   emit_move_insn (dword, operands[1]);
   for (i = ssize; i < dsize; i++)
   {
   rtx dbyte =  simplify_gen_subreg (QImode, operands[0], dmode, i);
   emit_move_insn (dbyte, const0_rtx);
   }
   DONE;
 ")

;byte shift is just a series of moves
;check src OR dest is in register, so the move will be ok

(define_insn_and_split "ashl3_const2p"
 [(set (match_operand:HIDI 0 "nonimmediate_operand""")

   (ashift:HIDI (match_operand:HIDI 1 "register_operand"  "")
  (match_operand:HIDI 2 "const_int_operand" "i"))
)
   ]
 "((INTVAL (operands[2]) % 8) == 0)
  "
 "#"
 ""
 [(const_int 0)]
 {
   int i;
   enum machine_mode mode;
   mode = GET_MODE(operands[0]);
   int size = GET_MODE_SIZE(mode);
   HOST_WIDE_INT x = INTVAL (operands[2]);   
   rtx dbytes[8], sbytes[8];

   for (i = 0; i < size; i++)
   {
   dbytes[i] =  simplify_gen_subreg (QImode, operands[0], mode, i);
   sbytes[i] =  simplify_gen_subreg (QImode, operands[1], mode, i);
   }
   int shift = x / 8;
   if (shift > size) shift = size;
   for (i = shift; i < size; i++)
   {
   emit_move_insn (dbytes[i], sbytes[i - shift]);   
   }

   for (i = 0; i < shift; i++)
   {
   emit_move_insn (dbytes[i], const0_rtx);   
   }

 }
)

;

The result kinda works but we are left with OR x,0 (and some missed 
opportunities to propagate zero constant forward into OR)


The spliiters are matched up initially (zero_extend) or at combine - 
just as expected.


All the subregs appear as expected in split1. Naturally this produces a 
bunch of QI subregs many of which  contain zero. No real change happens 
in RTL  until local register allocation (lreg dump file). There are no 
redundant  IOR Rm,0 in dump files before lreg pass. The only note  is a 
reg dead on the pointer argument when it gets moved to a pointer 
register. (no reg equals or other dead notes until lreg pass)


In the lreg dump file  I can see  the propagation of  many (but not all) 
constant 0 forward into the IOR instructions (eg Rn = 0, Rm= Rm | Rn  
=>  Rm = Rm|0).  These remains in RTL and are output into final code.  
Loads  of zero into registers which end up being unused are removed in 
latter passes.


I can remove IOR Rm,0 with a targetted splitter to create a NOP - which 
is my last resort.


So here is lreg dump extract:


;; Function f (f)

starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 (1)
df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 (1)


Pass 0

 Register 42 costs: POINTER_X_REGS:0 POINTER_Y_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:8000 SIMPLE_LD_REGS:8000 
LD_REGS:8000 NO_LD_REGS:8000 GENERAL_REGS:8000 ALL_REGS:1 MEM:2
 Register 58 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000
 Register 59 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000
 Register 60 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000
 Register 61 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 
LD_REGS:0 NO_LD_REGS:0 GENERAL_REGS:0 ALL_REGS:16000 MEM:16000
 Register 62 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_RE