date:20101223

register allocation

2010-12-23 Thread roy rosen

Hi All,

I am looking at the code generated by my port and it seems that I have
a problem that too many copies between registers are generated.
I looked a bit at the register allocation and wanted to verify that I
understand its behavior.

Is that true that it first chooses a register class for each pseodo
and only then starts coloring?

I think that my problem is that in my architecture there are two
register classes which can do all arithmetic operation but class X can
also do loads and stores and class Y can also do DSP operations.

So when there are for example two DSP operations and between them some
arithmetic operations I expect to use only class Y but GCC prefers to
copy registers and do the arithmetic operations using X because for
some reason it determined that the prefered class for the registers in
the arithmetic operations is X.

It seems that determining the class does not look at the whole flow
but rather looks only at insns in which the register appears.

Do I understand the situation correctly?
Is there something I can do about it?

Thanks, Roy.

[RFC] Improving GCSE to reduce constant splits on ARM

2010-12-23 Thread Dmitry Melnik


Hi,

We've found that constant splitting on ARM can be very inefficient, if 
it's done inside a loop.

For example, the expression

  a = a & 0xff00ff00;

will be translated into the following code (on ARM, only 8-bit values 
shifted by an even number can be used as immediate arguments):


  bic r0, r0, #16711680
  bic r0, r0, #255

This makes perfect sense, unless this code is in a loop, and there are 
many instructions using the same bit mask.  In that case, we would want 
to put 0xff00ff00 constant into a register, let 
pass_rtl_move_loop_invariants put it outside a loop and reuse it for 
every appropriate bitwise AND inside a loop.


This is a real-life example (from evas rasterization library), where 
fixing this issue speeds up expedite test suite on average by 6% and up 
to 20% on several tests.


Why the splitting happens?
On 4.4, the only problem was GCSE, which propagated separate pseudo 
register with a constant into a consumer insn, i.e.

  r123 = 0xff00ff00; r124 = r125 & r123
was transformed into
  r124 = r125 & 0xff00ff00
After that, the constant within AND expression is not considered as loop 
invariant any more, and is not moved outside a loop.  This can be fixed 
by checking whether the insn transformed by GCSE will require splitting, 
and if it does, then the transformation should not be done at earlier 
GCSE passes.  We may check it by comparing rtx_cost of the constant 
we're going to propagate with GCSE with rtx_cost of const_int(1).
If moving loop invariant fails (e.g. due to register pressure), then 
pass_combine still can propagate it inside AND, and in this case it will 
result in the same code.


After this patch http://gcc.gnu.org/ml/gcc-patches/2009-08/msg01032.html 
, such constants are split as early as expand pass, so there's no chance 
for loop invariant code motion pass to deal with them.


So, the questions are:
1) Is it really necessary to split constants on ARM at the time of 
expand?  At least, loop invariant code motion can work better if 
splitting happens later.
2) Is there any reason we shouldn't prevent GCSE from propagating 
constants that we know will be split?


In the attachment is the prototype patch that fixes GCSE to allow 
propagating only those constants that won't cause split, and disables 
splitting in expand on ARM.


--
Best regards,
  Dmitry

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2512,13 +2512,13 @@ arm_split_constant (enum rtx_code code, enum machine_mode mode, rtx insn,
&& (arm_gen_constant (code, mode, NULL_RTX, val, target, source,
1, 0)
> (arm_constant_limit (optimize_function_for_size_p (cfun))
-+ (code != SET
++ (code != SET) - arm_fix_split)))
{
  if (code == SET)
{
  /* Currently SET is the only monadic value for CODE, all
 the rest are diadic.  */
- if (TARGET_USE_MOVT)
+ if (TARGET_USE_MOVT && !arm_fix_split)
arm_emit_movpair (target, GEN_INT (val));
  else
emit_set_insn (target, GEN_INT (val));
@@ -2529,7 +2529,7 @@ arm_split_constant (enum rtx_code code, enum machine_mode mode, rtx insn,
{
  rtx temp = subtargets ? gen_reg_rtx (mode) : target;

- if (TARGET_USE_MOVT)
+ if (TARGET_USE_MOVT && !arm_fix_split)
arm_emit_movpair (temp, GEN_INT (val));
  else
emit_set_insn (temp, GEN_INT (val));
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index a39bb3a..0c3952d 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -169,3 +169,8 @@ mfix-cortex-m3-ldrd
 Target Report Var(fix_cm3_ldrd) Init(2)
 Avoid overlapping destination and address registers on LDRD instructions
 that may trigger Cortex-M3 errata.
+
+mfix-split
+Target Report Var(arm_fix_split) Init(0)
+Deny to use movt in case of arm thumb2, and prefer to use memory loads
+than split insns
diff --git a/gcc/gcse.c b/gcc/gcse.c
index 9ff0da8..ed15997 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -2571,6 +2572,7 @@ constprop_register (rtx insn, rtx from, rtx to)

   /* Handle normal insns next.  */
   if (NONJUMP_INSN_P (insn)
+ && rtx_cost (to, GET_CODE (to), false)
+ <= rtx_cost (GEN_INT(1), CONST_INT, false)
&& try_replace_reg (from, to, insn))
 return 1;

Re: register allocation

2010-12-23 Thread Vladimir Makarov


On 12/23/2010 03:13 AM, roy rosen wrote:

Hi All,

I am looking at the code generated by my port and it seems that I have
a problem that too many copies between registers are generated.
I looked a bit at the register allocation and wanted to verify that I
understand its behavior.

Is that true that it first chooses a register class for each pseodo
and only then starts coloring?


Yes, that is true.

I think that my problem is that in my architecture there are two
register classes which can do all arithmetic operation but class X can
also do loads and stores and class Y can also do DSP operations.

So when there are for example two DSP operations and between them some
arithmetic operations I expect to use only class Y but GCC prefers to
copy registers and do the arithmetic operations using X because for
some reason it determined that the prefered class for the registers in
the arithmetic operations is X.

It seems that determining the class does not look at the whole flow
but rather looks only at insns in which the register appears.

Defining classes for pseudos is already one of the most expensive 
operation in IRA.  Looking at the flow would make it even more 
complicated (I even don't know how to use this to improve the allocation 
because it means live range splitting before coloring and before 
defining classes which could help do live range splitting reasonably 
taking register pressure into account).

Do I understand the situation correctly?

Yes, I guess.

Is there something I can do about it?
I'd recommend to try ira-improv branch.  I think that part of the 
problem is in usage of cover classes.  The branch removes the cover 
classes and permits IRA to use intersected register classes and that 
helps to assign better hard registers.

Re: BImode is treated as normal byte-wide mode and causes bug.

2010-12-23 Thread Richard Henderson

On 12/22/2010 06:54 AM, Paolo Bonzini wrote:
> On 12/22/2010 03:43 PM, Bingfeng Mei wrote:
>> Thanks for letting me know this. Since only our target experiences such
>> issue, I guess no other processors have such requirements of manipulating
>> BImode. I can live with the workaround now.
> 
> Perhaps Blackfin, but it has a BI->SI extension instruction so it doesn't see 
> this bug.

I've always thought that ia64 would benefit from representing _Bool
variables as BImode "in registers".  At which point they could be
stored in predicate registers and trivially used for conditionals.
That said, it does have a special BI->SI extensions pattern.

r~

Re: register allocation

2010-12-23 Thread Jeff Law


On 12/23/10 09:50, Vladimir Makarov wrote:


Defining classes for pseudos is already one of the most expensive 
operation in IRA.  Looking at the flow would make it even more 
complicated (I even don't know how to use this to improve the 
allocation because it means live range splitting before coloring and 
before defining classes which could help do live range splitting 
reasonably taking register pressure into account).
I've often wondered if we could use some of the class information to 
guide range splitting.  If a pseudo has contexts where it must be in 
class A and other contexts where it could be in class B, then there may 
be a reasonable split point where we could split the pseudo so that the 
split pseudos could be allocated into A & B respectively.


I looked at this eons ago with trying to split pseudos which had to be 
allocated to a particular hard reg over a small range, but could be 
allocated in a much larger class of regs elsewhere.  It worked, but was 
unmaintainable.  The other downside is we had defined the problem so 
narrowly that while the generated code clearly looked better, the net 
effect was unmeasurable as the reloads we avoided were typically outside 
of loops.


Jeff

default system path questions

2010-12-23 Thread Edward Peschko

All,

I found much to my dismay today that -I doesn't always work as
intuited. Namely, if I set CFLAGS to:

-I/path/to/gcc/include

where the default system path is:

/path/to/gcc/lib/gcc/i686-pc-linux-gnu/3.4.6/include
/usr/local/include
/path/to/gcc/include
/usr/include

the expected behavior would be to have the libraries searched before
any of the above are searched. But no, gcc silently ignores this
request, and finds the unwanted version of the include that I want in
/usr/local/include, due to not 'wanting to defeat system headers'.

This I guess I can understand (although it would be very nice to get a
warning). What I can't understand is why /usr/local/include is placed
*above* /path/to/gcc/include in this ordering. Since when is a
directory that has arbitrary installs from userland considered a
necessary part of system headers? Shouldn't the detection order be:

/path/to/gcc/lib/gcc/i686-pc-linux-gnu/3.4.6/include
/path/to/gcc/include
/usr/local/include
/usr/include

if /usr/local/include is to be included at all..

And come to think of it, why *is* the -I ignored? Why doesn't the
preprocessor just trust the user and that they know what they are
doing? Why is -nostdinc even necessary?

Ed

gcc-4.5-20101223 is now available

2010-12-23 Thread gccadmin

Snapshot gcc-4.5-20101223 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20101223/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 168215

You'll find:

 gcc-4.5-20101223.tar.bz2 Complete GCC (includes all of below)

  MD5=7176e7b41e12bd4b03e0b7fc6f14578e
  SHA1=82f197322da5baf1696eb65293fe72dce263e9ad

 gcc-core-4.5-20101223.tar.bz2C front end and core compiler

  MD5=1418e9fe36cc70275543e0b53a1ced85
  SHA1=4e5bf68f3ec470a484c3cac1182c001e4c8a3d46

 gcc-ada-4.5-20101223.tar.bz2 Ada front end and runtime

  MD5=304028dfea7be8c1c489b4ca0c3e0ff5
  SHA1=38332bfe8e971e995ec0e8e2e4043ad2e4836825

 gcc-fortran-4.5-20101223.tar.bz2 Fortran front end and runtime

  MD5=25b23d59a53bb637436ce10284b29702
  SHA1=1b777529656707137268dea519d8d472d565b3d9

 gcc-g++-4.5-20101223.tar.bz2 C++ front end and runtime

  MD5=0b2e2407a661c1cd3a359c550f4cb7e3
  SHA1=dd2e1325efc3be354178b5a165267aa7d7526820

 gcc-go-4.5-20101223.tar.bz2  Go front end and runtime

  MD5=57837bd4ca622cfddc15e3779de5f216
  SHA1=bec22331707cdc669342d72772174ff72d23b8ad

 gcc-java-4.5-20101223.tar.bz2Java front end and runtime

  MD5=6de6a68b54f2d53dcc2e925b13b8eb0e
  SHA1=b3e8bd91ea265d6394bc5b582ab4ca7811b7598e

 gcc-objc-4.5-20101223.tar.bz2Objective-C front end and runtime

  MD5=e5d7433447407344554a6a9997311177
  SHA1=1d076d2dd9fbaae76682be0d29050bc849c33ab6

 gcc-testsuite-4.5-20101223.tar.bz2   The GCC testsuite

  MD5=68dee9bfcd2232774e41b7f2121d4466
  SHA1=de8f7181dbc82be565a87d40fe85b2b6b26e8546

Diffs from 4.5-20101216 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: default system path questions

2010-12-23 Thread Jonathan Wakely

On 23 December 2010 22:17, Edward Peschko wrote:
> All,
>
> I found much to my dismay today that -I doesn't always work as
> intuited. Namely, if I set CFLAGS to:
>
> -I/path/to/gcc/include
> where the default system path is:
>
> /path/to/gcc/lib/gcc/i686-pc-linux-gnu/3.4.6/include
> /usr/local/include
> /path/to/gcc/include
> /usr/include
>
> the expected behavior would be to have the libraries searched before

Your email doesn't seem appropriate for this mailing list, it should
probably be sent to the gcc-help list, or submitted to bugzilla.

You've apparently read the documentation, because you refer to the
text about not wanting to defeat the system headers, so you're
probably aware that there may be a good reason for the current
behaviour.  Please send and further questions to the gcc-help mailing
list.

register allocation

[RFC] Improving GCSE to reduce constant splits on ARM

Re: register allocation

Re: BImode is treated as normal byte-wide mode and causes bug.

Re: register allocation

default system path questions

gcc-4.5-20101223 is now available

Re: default system path questions

8 matches

Site Navigation

Mail list logo

Footer information