register allocation
Hi All, I am looking at the code generated by my port and it seems that I have a problem that too many copies between registers are generated. I looked a bit at the register allocation and wanted to verify that I understand its behavior. Is that true that it first chooses a register class for each pseodo and only then starts coloring? I think that my problem is that in my architecture there are two register classes which can do all arithmetic operation but class X can also do loads and stores and class Y can also do DSP operations. So when there are for example two DSP operations and between them some arithmetic operations I expect to use only class Y but GCC prefers to copy registers and do the arithmetic operations using X because for some reason it determined that the prefered class for the registers in the arithmetic operations is X. It seems that determining the class does not look at the whole flow but rather looks only at insns in which the register appears. Do I understand the situation correctly? Is there something I can do about it? Thanks, Roy.
[RFC] Improving GCSE to reduce constant splits on ARM
Hi, We've found that constant splitting on ARM can be very inefficient, if it's done inside a loop. For example, the expression a = a & 0xff00ff00; will be translated into the following code (on ARM, only 8-bit values shifted by an even number can be used as immediate arguments): bic r0, r0, #16711680 bic r0, r0, #255 This makes perfect sense, unless this code is in a loop, and there are many instructions using the same bit mask. In that case, we would want to put 0xff00ff00 constant into a register, let pass_rtl_move_loop_invariants put it outside a loop and reuse it for every appropriate bitwise AND inside a loop. This is a real-life example (from evas rasterization library), where fixing this issue speeds up expedite test suite on average by 6% and up to 20% on several tests. Why the splitting happens? On 4.4, the only problem was GCSE, which propagated separate pseudo register with a constant into a consumer insn, i.e. r123 = 0xff00ff00; r124 = r125 & r123 was transformed into r124 = r125 & 0xff00ff00 After that, the constant within AND expression is not considered as loop invariant any more, and is not moved outside a loop. This can be fixed by checking whether the insn transformed by GCSE will require splitting, and if it does, then the transformation should not be done at earlier GCSE passes. We may check it by comparing rtx_cost of the constant we're going to propagate with GCSE with rtx_cost of const_int(1). If moving loop invariant fails (e.g. due to register pressure), then pass_combine still can propagate it inside AND, and in this case it will result in the same code. After this patch http://gcc.gnu.org/ml/gcc-patches/2009-08/msg01032.html , such constants are split as early as expand pass, so there's no chance for loop invariant code motion pass to deal with them. So, the questions are: 1) Is it really necessary to split constants on ARM at the time of expand? At least, loop invariant code motion can work better if splitting happens later. 2) Is there any reason we shouldn't prevent GCSE from propagating constants that we know will be split? In the attachment is the prototype patch that fixes GCSE to allow propagating only those constants that won't cause split, and disables splitting in expand on ARM. -- Best regards, Dmitry --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2512,13 +2512,13 @@ arm_split_constant (enum rtx_code code, enum machine_mode mode, rtx insn, && (arm_gen_constant (code, mode, NULL_RTX, val, target, source, 1, 0) > (arm_constant_limit (optimize_function_for_size_p (cfun)) -+ (code != SET ++ (code != SET) - arm_fix_split))) { if (code == SET) { /* Currently SET is the only monadic value for CODE, all the rest are diadic. */ - if (TARGET_USE_MOVT) + if (TARGET_USE_MOVT && !arm_fix_split) arm_emit_movpair (target, GEN_INT (val)); else emit_set_insn (target, GEN_INT (val)); @@ -2529,7 +2529,7 @@ arm_split_constant (enum rtx_code code, enum machine_mode mode, rtx insn, { rtx temp = subtargets ? gen_reg_rtx (mode) : target; - if (TARGET_USE_MOVT) + if (TARGET_USE_MOVT && !arm_fix_split) arm_emit_movpair (temp, GEN_INT (val)); else emit_set_insn (temp, GEN_INT (val)); diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt index a39bb3a..0c3952d 100644 --- a/gcc/config/arm/arm.opt +++ b/gcc/config/arm/arm.opt @@ -169,3 +169,8 @@ mfix-cortex-m3-ldrd Target Report Var(fix_cm3_ldrd) Init(2) Avoid overlapping destination and address registers on LDRD instructions that may trigger Cortex-M3 errata. + +mfix-split +Target Report Var(arm_fix_split) Init(0) +Deny to use movt in case of arm thumb2, and prefer to use memory loads +than split insns diff --git a/gcc/gcse.c b/gcc/gcse.c index 9ff0da8..ed15997 100644 --- a/gcc/gcse.c +++ b/gcc/gcse.c @@ -2571,6 +2572,7 @@ constprop_register (rtx insn, rtx from, rtx to) /* Handle normal insns next. */ if (NONJUMP_INSN_P (insn) + && rtx_cost (to, GET_CODE (to), false) + <= rtx_cost (GEN_INT(1), CONST_INT, false) && try_replace_reg (from, to, insn)) return 1;
Re: register allocation
On 12/23/2010 03:13 AM, roy rosen wrote: Hi All, I am looking at the code generated by my port and it seems that I have a problem that too many copies between registers are generated. I looked a bit at the register allocation and wanted to verify that I understand its behavior. Is that true that it first chooses a register class for each pseodo and only then starts coloring? Yes, that is true. I think that my problem is that in my architecture there are two register classes which can do all arithmetic operation but class X can also do loads and stores and class Y can also do DSP operations. So when there are for example two DSP operations and between them some arithmetic operations I expect to use only class Y but GCC prefers to copy registers and do the arithmetic operations using X because for some reason it determined that the prefered class for the registers in the arithmetic operations is X. It seems that determining the class does not look at the whole flow but rather looks only at insns in which the register appears. Defining classes for pseudos is already one of the most expensive operation in IRA. Looking at the flow would make it even more complicated (I even don't know how to use this to improve the allocation because it means live range splitting before coloring and before defining classes which could help do live range splitting reasonably taking register pressure into account). Do I understand the situation correctly? Yes, I guess. Is there something I can do about it? I'd recommend to try ira-improv branch. I think that part of the problem is in usage of cover classes. The branch removes the cover classes and permits IRA to use intersected register classes and that helps to assign better hard registers.
Re: BImode is treated as normal byte-wide mode and causes bug.
On 12/22/2010 06:54 AM, Paolo Bonzini wrote: > On 12/22/2010 03:43 PM, Bingfeng Mei wrote: >> Thanks for letting me know this. Since only our target experiences such >> issue, I guess no other processors have such requirements of manipulating >> BImode. I can live with the workaround now. > > Perhaps Blackfin, but it has a BI->SI extension instruction so it doesn't see > this bug. I've always thought that ia64 would benefit from representing _Bool variables as BImode "in registers". At which point they could be stored in predicate registers and trivially used for conditionals. That said, it does have a special BI->SI extensions pattern. r~
Re: register allocation
On 12/23/10 09:50, Vladimir Makarov wrote: Defining classes for pseudos is already one of the most expensive operation in IRA. Looking at the flow would make it even more complicated (I even don't know how to use this to improve the allocation because it means live range splitting before coloring and before defining classes which could help do live range splitting reasonably taking register pressure into account). I've often wondered if we could use some of the class information to guide range splitting. If a pseudo has contexts where it must be in class A and other contexts where it could be in class B, then there may be a reasonable split point where we could split the pseudo so that the split pseudos could be allocated into A & B respectively. I looked at this eons ago with trying to split pseudos which had to be allocated to a particular hard reg over a small range, but could be allocated in a much larger class of regs elsewhere. It worked, but was unmaintainable. The other downside is we had defined the problem so narrowly that while the generated code clearly looked better, the net effect was unmeasurable as the reloads we avoided were typically outside of loops. Jeff
default system path questions
All, I found much to my dismay today that -I doesn't always work as intuited. Namely, if I set CFLAGS to: -I/path/to/gcc/include where the default system path is: /path/to/gcc/lib/gcc/i686-pc-linux-gnu/3.4.6/include /usr/local/include /path/to/gcc/include /usr/include the expected behavior would be to have the libraries searched before any of the above are searched. But no, gcc silently ignores this request, and finds the unwanted version of the include that I want in /usr/local/include, due to not 'wanting to defeat system headers'. This I guess I can understand (although it would be very nice to get a warning). What I can't understand is why /usr/local/include is placed *above* /path/to/gcc/include in this ordering. Since when is a directory that has arbitrary installs from userland considered a necessary part of system headers? Shouldn't the detection order be: /path/to/gcc/lib/gcc/i686-pc-linux-gnu/3.4.6/include /path/to/gcc/include /usr/local/include /usr/include if /usr/local/include is to be included at all.. And come to think of it, why *is* the -I ignored? Why doesn't the preprocessor just trust the user and that they know what they are doing? Why is -nostdinc even necessary? Ed
gcc-4.5-20101223 is now available
Snapshot gcc-4.5-20101223 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20101223/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch revision 168215 You'll find: gcc-4.5-20101223.tar.bz2 Complete GCC (includes all of below) MD5=7176e7b41e12bd4b03e0b7fc6f14578e SHA1=82f197322da5baf1696eb65293fe72dce263e9ad gcc-core-4.5-20101223.tar.bz2C front end and core compiler MD5=1418e9fe36cc70275543e0b53a1ced85 SHA1=4e5bf68f3ec470a484c3cac1182c001e4c8a3d46 gcc-ada-4.5-20101223.tar.bz2 Ada front end and runtime MD5=304028dfea7be8c1c489b4ca0c3e0ff5 SHA1=38332bfe8e971e995ec0e8e2e4043ad2e4836825 gcc-fortran-4.5-20101223.tar.bz2 Fortran front end and runtime MD5=25b23d59a53bb637436ce10284b29702 SHA1=1b777529656707137268dea519d8d472d565b3d9 gcc-g++-4.5-20101223.tar.bz2 C++ front end and runtime MD5=0b2e2407a661c1cd3a359c550f4cb7e3 SHA1=dd2e1325efc3be354178b5a165267aa7d7526820 gcc-go-4.5-20101223.tar.bz2 Go front end and runtime MD5=57837bd4ca622cfddc15e3779de5f216 SHA1=bec22331707cdc669342d72772174ff72d23b8ad gcc-java-4.5-20101223.tar.bz2Java front end and runtime MD5=6de6a68b54f2d53dcc2e925b13b8eb0e SHA1=b3e8bd91ea265d6394bc5b582ab4ca7811b7598e gcc-objc-4.5-20101223.tar.bz2Objective-C front end and runtime MD5=e5d7433447407344554a6a9997311177 SHA1=1d076d2dd9fbaae76682be0d29050bc849c33ab6 gcc-testsuite-4.5-20101223.tar.bz2 The GCC testsuite MD5=68dee9bfcd2232774e41b7f2121d4466 SHA1=de8f7181dbc82be565a87d40fe85b2b6b26e8546 Diffs from 4.5-20101216 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: default system path questions
On 23 December 2010 22:17, Edward Peschko wrote: > All, > > I found much to my dismay today that -I doesn't always work as > intuited. Namely, if I set CFLAGS to: > > -I/path/to/gcc/include > where the default system path is: > > /path/to/gcc/lib/gcc/i686-pc-linux-gnu/3.4.6/include > /usr/local/include > /path/to/gcc/include > /usr/include > > the expected behavior would be to have the libraries searched before Your email doesn't seem appropriate for this mailing list, it should probably be sent to the gcc-help list, or submitted to bugzilla. You've apparently read the documentation, because you refer to the text about not wanting to defeat the system headers, so you're probably aware that there may be a good reason for the current behaviour. Please send and further questions to the gcc-help mailing list.