Re: RFC: extend cprop_hardreg into a global pass
On Wed, Jul 25, 2012 at 3:35 AM, Bin.Cheng wrote: > On Wed, Jul 25, 2012 at 2:14 AM, Steven Bosscher > wrote: >> Bin Cheng wrote: >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44025 >> >> You could foster-parent and fix the attached patch to address this issue. >> (I'm not interested in pursuing this further myself.) > > Thanks for your comments. > I haven't look into the patch yet, If you will look at it, please make sure to add a free_dominance_info call: + /* Finalize and clean up. */ + fini_walk_dominator_tree (&walk_data); + + /* We must free this, or cfgcleanup barfs. */ + free_dominance_info (CDI_DOMINATORS); + You could also add support for unpropagating constants found in REG_EQUIV notes (usually those are the expensive ones), and for related constants (e.g. "constval+{1,2,4,8}" and "constval<<{-3,-2,-1,1,2,3}"). You probably also would want to put the pass in its own file and add an rtl_opt_pass for it (I hacked it at the bottom of cse.c because I didn't want to change Makefile.in etc.). > but doing it before ira may > increase register pressure, while on the other hand, loading simple > constant is trivial operation on modern machines. I don't think register pressure is an issue as long as the equivalence of the register to a constant is known and the register allocator can re-materialize it. If the equivalence is lost, or RA decides that spilling is cheaper than re-materializing, then this unpropagation idea will probably be an overall loss. I didn't investigate any of this. Writing this whole pass tool ~30 minutes, I just wanted to show how easy it can be to write a simple experimental GCC pass. If you disable the un-propagation itself (i.e. the validate_change call), you can at least use the pass to see how often opportunities for this kind of transformation arise in real-world software, and decide where to spend your effort most efficiently. For the other PR you mentioned, that looks like a register allocation regression, that should be addresses in IRA rather than in regcprop. Ciao! Steven
Integer promotion for register based arguments
Hi, I've tried compiling the following program targeting both MIPS, LM32 and ARM. long a, b; void func(short p) { b = (long)p; } int main() { if(a < 2) func((short)a); return 0; } For MIPS and LM32, truncation is performed in the calling function and sign extension in the called function. One of these operations seems redundant. For ARM, truncation is performed in the caller, but sign-extension isn't performed in the callee, which seems more efficient. Why might this be? - PROMOTE_MODE is defined for all targets such that HImode should be promoted. - TARGET_PROMOTE_FUNCTION_MODE is also defined for all targets such that function arguments should be promoted. Are there other target macros that control this? Thanks, Jon
Re: Integer promotion for register based arguments
On 07/25/2012 12:15 PM, Jon Beniston wrote: > For MIPS and LM32, truncation is performed in the calling function > and sign extension in the called function. One of these operations > seems redundant. For ARM, truncation is performed in the caller, > but sign-extension isn't performed in the callee, which seems more > efficient. Why might this be? This is defined by the system ABI, which specifies when zero- or sign- extension get done. The ARM ABI explicitly requires a caller to extend types appropriately before they are passed, and a callee can depend on that. We in GCC have to follow the rules, and we can take advantage of them. I suspect the answer to your question will be found in the ABIs of the MIPS and LM32, but I'm not familiar with either of those. Andrew.
RE: Integer promotion for register based arguments
Hi Andrew, > On 07/25/2012 12:15 PM, Jon Beniston wrote: > > For MIPS and LM32, truncation is performed in the calling function and > > sign extension in the called function. One of these operations seems > > redundant. For ARM, truncation is performed in the caller, but > > sign-extension isn't performed in the callee, which seems more > > efficient. Why might this be? > > This is defined by the system ABI, which specifies when zero- or sign- > extension get done. The ARM ABI explicitly requires a caller to extend types > appropriately before they are passed, and a callee can depend on that. We > in GCC have to follow the rules, and we can take advantage of them. > > I suspect the answer to your question will be found in the ABIs of the MIPS > and LM32, but I'm not familiar with either of those. In the LM32 case, this is something that was overlooked, so it isn't that way because that's how it is required. I guess my question is what would I need to change to make it work like the ARM port? I can't see how this is being controlled. Thanks, Jon
Problems with pragma and attribute optimize.
Hi, I have been experimenting with marking specific functions to be auto- vectorized in GCC, but have had problems getting it to work. It seems the optimize attribute works sometimes, but only if the function it is used on is not static, but pragma optimize never seems to work. See the attached test-case. If you compile it with -ftree-vectorizer-verbose, you will see that only the first function is vectorized, but the two last are not. Anyone know what is wrong here? Best regards `Allan #include void __attribute__((optimize("tree-vectorize"))) innerloop_1(int16_t* destination, const int16_t* source1, const int16_t* source2, int length) { while (length--) { *(destination++) = *(source1++) + *(source2++); } } static void __attribute__((optimize("tree-vectorize"))) innerloop_2(int16_t* destination, const int16_t* source1, const int16_t* source2, int length) { while (length--) { *(destination++) = *(source1++) + *(source2++); } } void caller(int16_t* destination, const int16_t* source1, const int16_t* source2, int length) { innerloop_2(destination, source1, source2, length); } #pragma GCC optimize("tree-vectorize") void innerloop_3(int16_t* destination, const int16_t* source1, const int16_t* source2, int length) { while (length--) { *(destination++) = *(source1++) + *(source2++); } }
Re: Problems with pragma and attribute optimize.
On Wed, Jul 25, 2012 at 2:23 PM, Allan Sandfeld Jensen wrote: > Hi, > > I have been experimenting with marking specific functions to be auto- > vectorized in GCC, but have had problems getting it to work. > > It seems the optimize attribute works sometimes, but only if the function it > is used on is not static, but pragma optimize never seems to work. > > See the attached test-case. If you compile it with -ftree-vectorizer-verbose, > you will see that only the first function is vectorized, but the two last are > not. > > Anyone know what is wrong here? The attribute doesn't work in the face of inlining and generally was designed for debugging, not for controlling things like you do. Richard. > Best regards > `Allan
Optimize attribute and inlining
Declaring a function with __attribute__((optimize("O0")) turns off inlining for the translation unit (atleast) containing the function (see output at the end). Is this expected behavior? I tracked this down to the fact that when processing the optimize attribute with O0, flag_no_inline is set to 1 and is not restored back. The early_inliner pass in ipa-inline.c (obviously) skips processing if flag_no_inline is set, and therefore inlining does not occur. I did see that handle_optimize_attribute in c-family/c-common.c saves and restores whatever options have corresponding fields in struct cl_optimization, but inlining is not one of them. That I guess is happening because inline is not described as an "Optimization" option in common.opt. Test output $ cat test.c static void callee() { static int a; a++; } void caller() { callee(); } $ $ ~/Code/gcc/repo/native_install/bin/gcc -O1 -fdump-tree-all -S test.c $ $ cat test.c.*.optimized ;; Function caller (caller, funcdef_no=1, decl_uid=1358, cgraph_uid=1) caller () { int a.0; int a.1; static int a; static int a; : a.0_3 = a; a.1_4 = a.0_3 + 1; a = a.1_4; return; } $ cat test.c static void callee() { static int a; a++; } void caller() { callee(); } static void somefunc() __attribute__((optimize("O0"))); $ $ ~/Code/gcc/repo/native_install/bin/gcc -O1 -fdump-tree-all -S test.c $ $ cat test.c.*.optimized ;; Function callee (callee, funcdef_no=0, decl_uid=1355, cgraph_uid=0) callee () { static int a; int a.1; int a.0; : a.0_1 = a; a.1_2 = a.0_1 + 1; a = a.1_2; return; } ;; Function caller (caller, funcdef_no=1, decl_uid=1358, cgraph_uid=1) caller () { : callee (); return; } Regards Senthil
Re: Problems with pragma and attribute optimize.
On Wednesday 25 July 2012, Richard Guenther wrote: > On Wed, Jul 25, 2012 at 2:23 PM, Allan Sandfeld Jensen > > wrote: > > Hi, > > > > I have been experimenting with marking specific functions to be auto- > > vectorized in GCC, but have had problems getting it to work. > > > > It seems the optimize attribute works sometimes, but only if the function > > it is used on is not static, but pragma optimize never seems to work. > > > > See the attached test-case. If you compile it with > > -ftree-vectorizer-verbose, you will see that only the first function is > > vectorized, but the two last are not. > > > > Anyone know what is wrong here? > > The attribute doesn't work in the face of inlining and generally was > designed for debugging, not for controlling things like you do. > In that case the GCC manual should probably be updated to reflect that. If what you say it true, it seems it has been developed for one purpose but then documented for another. (The documentation needs updating anyway, since the attribute is no longer allowed after the function declaration like all the examples does, but only before). I found the problem with pragma though, it is apparently a long standing bug in PR 48026 and a related version in PR 41201. The last bug actually has a patch for the problem, it is apparently caused by an incorrect short-cut. `Allan
Re: Optimize attribute and inlining
On Wed, Jul 25, 2012 at 4:07 PM, Selvaraj, Senthil_Kumar wrote: > Declaring a function with __attribute__((optimize("O0")) turns off inlining > for the translation unit (atleast) containing the function (see output at the > end). Is this expected behavior? Not really. The optimize attribute processing should only affect flags it saves. -f[no-]inline is not meaningful per function and we have the noinline attribute for more proper handling. That said, I consider the optimize attribute code seriously broken and unmaintained (but sometimes useful for debugging - and only that). > I tracked this down to the fact that when processing the optimize attribute > with O0, flag_no_inline is set to 1 and is not restored back. The > early_inliner pass in ipa-inline.c (obviously) skips processing if > flag_no_inline is set, and therefore inlining does not occur. I did see that > handle_optimize_attribute in c-family/c-common.c saves and restores whatever > options have corresponding fields in struct cl_optimization, but inlining is > not one of them. That I guess is happening because inline is not described as > an "Optimization" option in common.opt. > > Test output > > $ cat test.c > static void callee() { static int a; a++; } > void caller() { callee(); } > $ > $ ~/Code/gcc/repo/native_install/bin/gcc -O1 -fdump-tree-all -S test.c > $ > $ cat test.c.*.optimized > > ;; Function caller (caller, funcdef_no=1, decl_uid=1358, cgraph_uid=1) > > caller () > { > int a.0; > int a.1; > static int a; > static int a; > > : > a.0_3 = a; > a.1_4 = a.0_3 + 1; > a = a.1_4; > return; > > } > > $ cat test.c > static void callee() { static int a; a++; } > void caller() { callee(); } > static void somefunc() __attribute__((optimize("O0"))); > $ > $ ~/Code/gcc/repo/native_install/bin/gcc -O1 -fdump-tree-all -S test.c > $ > $ cat test.c.*.optimized > > ;; Function callee (callee, funcdef_no=0, decl_uid=1355, cgraph_uid=0) > callee () > { > static int a; > int a.1; > int a.0; > > : > a.0_1 = a; > a.1_2 = a.0_1 + 1; > a = a.1_2; > return; > > } > > ;; Function caller (caller, funcdef_no=1, decl_uid=1358, cgraph_uid=1) > caller () > { > : > callee (); > return; > > } > > Regards > Senthil
Re: Problems with pragma and attribute optimize.
On Wed, Jul 25, 2012 at 4:25 PM, Allan Sandfeld Jensen wrote: > On Wednesday 25 July 2012, Richard Guenther wrote: >> On Wed, Jul 25, 2012 at 2:23 PM, Allan Sandfeld Jensen >> >> wrote: >> > Hi, >> > >> > I have been experimenting with marking specific functions to be auto- >> > vectorized in GCC, but have had problems getting it to work. >> > >> > It seems the optimize attribute works sometimes, but only if the function >> > it is used on is not static, but pragma optimize never seems to work. >> > >> > See the attached test-case. If you compile it with >> > -ftree-vectorizer-verbose, you will see that only the first function is >> > vectorized, but the two last are not. >> > >> > Anyone know what is wrong here? >> >> The attribute doesn't work in the face of inlining and generally was >> designed for debugging, not for controlling things like you do. >> > > In that case the GCC manual should probably be updated to reflect that. If > what you say it true, it seems it has been developed for one purpose but then > documented for another. (The documentation needs updating anyway, since the > attribute is no longer allowed after the function declaration like all the > examples does, but only before). > > I found the problem with pragma though, it is apparently a long standing bug > in PR 48026 and a related version in PR 41201. The last bug actually has a > patch for the problem, it is apparently caused by an incorrect short-cut. CCing the original author. Richard. > `Allan
Re: Integer promotion for register based arguments
> I guess my question is what would I need to change to make it work like the > ARM port? I can't see how this is being controlled. Try TARGET_PROMOTE_PROTOTYPES. -- Eric Botcazou
RE: Integer promotion for register based arguments
Hi Eric, > > I guess my question is what would I need to change to make it work > > like the ARM port? I can't see how this is being controlled. > > Try TARGET_PROMOTE_PROTOTYPES. For all 3 targets I believe this returns true (Both MIPS and LM32 use hook_bool_const_tree_true), so I presume it must be something else. Regards, Jon
Re: Integer promotion for register based arguments
On 07/25/2012 04:52 PM, Jon Beniston wrote: > Hi Eric, > >>> > > I guess my question is what would I need to change to make it work >>> > > like the ARM port? I can't see how this is being controlled. >> > >> > Try TARGET_PROMOTE_PROTOTYPES. > For all 3 targets I believe this returns true (Both MIPS and LM32 use > hook_bool_const_tree_true), so I presume it must be something else. I'd just step through the code at the point the sign extension is generated. Andrew.
RE: Identifying Compiler Options to Minimize Energy Consumption by Embedded Programs
Hi James, > - Which set of benchmarks are suitable for embedded applications and representative of possible applications? Have a look at CoreMark: http://www.coremark.org/ EEMBC also have EnergyBench: http://www.eembc.org/benchmark/power_sl.php although I think that might be commercial, but it may give you some ideas. Regards, Jon
Re: Optimize attribute and inlining
On 25/07/12 17:30, Richard Guenther wrote: On Wed, Jul 25, 2012 at 4:07 PM, Selvaraj, Senthil_Kumar wrote: Declaring a function with __attribute__((optimize("O0")) turns off inlining for the translation unit (atleast) containing the function (see output at the end). Is this expected behavior? Not really. The optimize attribute processing should only affect flags it saves. -f[no-]inline is not meaningful per function and we have the noinline attribute for more proper handling. That said, I consider the optimize attribute code seriously broken and unmaintained (but sometimes useful for debugging - and only that). That's a pity. It's understandable - changing optimisation levels on different functions is always going to be problematic, since inter-function optimisations (like inlining) are going to be difficult to define. But sometimes it could be nice to use specific optimisations in specific places, such as loop unrolling in a critical function while other code is to be optimised for code size. Does "#pragma Gcc optimize" work more reliably?
Re: Reserving a bit in ELF segment flags for huge page mappings
On Tue, Jul 24, 2012 at 1:40 PM, Cary Coutant wrote: >> To do this, I would like to reserve a bit in the segment flags to >> indicate that this segment is to be mapped to huge pages if possible. >> Can I reserve something like a PF_LARGE_PAGE bit? > > HP-UX has a PF_HP_PAGE_SIZE (0x0010) bit that says "Segment should > be mapped with page size specified in p_align field". Ok to define PF_LINUX_PAGE_SIZE similarly, same bit (0x0010) ? I want this to be a hint to the loader. Thanks, -Sri. > > -cary