Re: RFC: extend cprop_hardreg into a global pass

2012-07-25 Thread Steven Bosscher
On Wed, Jul 25, 2012 at 3:35 AM, Bin.Cheng  wrote:
> On Wed, Jul 25, 2012 at 2:14 AM, Steven Bosscher  
> wrote:
>> Bin Cheng wrote:
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44025
>>
>> You could foster-parent and fix the attached patch to address this issue.
>> (I'm not interested in pursuing this further myself.)
>
> Thanks for your comments.
> I haven't look into the patch yet,

If you will look at it, please make sure to add a free_dominance_info call:

+  /* Finalize and clean up.  */
+  fini_walk_dominator_tree (&walk_data);
+
+  /* We must free this, or cfgcleanup barfs.  */
+  free_dominance_info (CDI_DOMINATORS);
+

You could also add support for unpropagating constants found in
REG_EQUIV notes (usually those are the expensive ones), and for
related constants (e.g. "constval+{1,2,4,8}" and
"constval<<{-3,-2,-1,1,2,3}").

You probably also would want to put the pass in its own file and add
an rtl_opt_pass for it (I hacked it at the bottom of cse.c because I
didn't want to change Makefile.in etc.).

> but doing it before ira may
> increase register pressure, while on the other hand, loading simple
> constant is trivial operation on modern machines.

I don't think register pressure is an issue as long as the equivalence
of the register to a constant is known and the register allocator can
re-materialize it.

If the equivalence is lost, or RA decides that spilling is cheaper
than re-materializing, then this unpropagation idea will probably be
an overall loss.

I didn't investigate any of this. Writing this whole pass tool ~30
minutes, I just wanted to show how easy it can be to write a simple
experimental GCC pass. If you disable the un-propagation itself (i.e.
the validate_change call), you can at least use the pass to see how
often opportunities for this kind of transformation arise in
real-world software, and decide where to spend your effort most
efficiently.

For the other PR you mentioned, that looks like a register allocation
regression, that should be addresses in IRA rather than in regcprop.

Ciao!
Steven


Integer promotion for register based arguments

2012-07-25 Thread Jon Beniston
Hi,

I've tried compiling the following program targeting both MIPS, LM32 and
ARM.

long a, b;

void func(short p)
{
b = (long)p;
}

int main()
{
if(a < 2)
func((short)a);
return 0;
}

For MIPS and LM32, truncation is performed in the calling function and sign
extension in the called function. One of these operations seems redundant.
For ARM, truncation is performed in the caller, but sign-extension isn't
performed in the callee, which seems more efficient. Why might this be? 

- PROMOTE_MODE is defined for all targets such that HImode should be
promoted.
- TARGET_PROMOTE_FUNCTION_MODE is also defined for all targets such that
function arguments should be promoted.

Are there other target macros that control this?

Thanks,
Jon




Re: Integer promotion for register based arguments

2012-07-25 Thread Andrew Haley
On 07/25/2012 12:15 PM, Jon Beniston wrote:
> For MIPS and LM32, truncation is performed in the calling function
> and sign extension in the called function. One of these operations
> seems redundant.  For ARM, truncation is performed in the caller,
> but sign-extension isn't performed in the callee, which seems more
> efficient. Why might this be?

This is defined by the system ABI, which specifies when zero- or sign-
extension get done.  The ARM ABI explicitly requires a caller to
extend types appropriately before they are passed, and a callee can
depend on that.  We in GCC have to follow the rules, and we can take
advantage of them.

I suspect the answer to your question will be found in the ABIs of
the MIPS and LM32, but I'm not familiar with either of those.

Andrew.


RE: Integer promotion for register based arguments

2012-07-25 Thread Jon Beniston
Hi Andrew,

> On 07/25/2012 12:15 PM, Jon Beniston wrote:
> > For MIPS and LM32, truncation is performed in the calling function and
> > sign extension in the called function. One of these operations seems
> > redundant.  For ARM, truncation is performed in the caller, but
> > sign-extension isn't performed in the callee, which seems more
> > efficient. Why might this be?
> 
> This is defined by the system ABI, which specifies when zero- or sign-
> extension get done.  The ARM ABI explicitly requires a caller to extend
types
> appropriately before they are passed, and a callee can depend on that.  We
> in GCC have to follow the rules, and we can take advantage of them.
> 
> I suspect the answer to your question will be found in the ABIs of the
MIPS
> and LM32, but I'm not familiar with either of those.

In the LM32 case, this is something that was overlooked, so it isn't that
way because that's how it is required.

I guess my question is what would I need to change to make it work like the
ARM port? I can't see how this is being controlled.

Thanks,
Jon




Problems with pragma and attribute optimize.

2012-07-25 Thread Allan Sandfeld Jensen
Hi,

I have been experimenting with marking specific functions to be auto-
vectorized in GCC, but have had problems getting it to work.

It seems the optimize attribute works sometimes, but only if the function it 
is used on is not static, but pragma optimize never seems to work.

See the attached test-case. If you compile it with -ftree-vectorizer-verbose, 
you will see that only the first function is vectorized, but the two last are 
not. 

Anyone know what is wrong here?

Best regards
`Allan
#include 

void  __attribute__((optimize("tree-vectorize"))) innerloop_1(int16_t* destination, const int16_t* source1, const int16_t* source2, int length)
{
while (length--) {
*(destination++) = *(source1++) + *(source2++);
}
}

static void  __attribute__((optimize("tree-vectorize"))) innerloop_2(int16_t* destination, const int16_t* source1, const int16_t* source2, int length)
{
while (length--) {
*(destination++) = *(source1++) + *(source2++);
}
}

void caller(int16_t* destination, const int16_t* source1, const int16_t* source2, int length)
{
innerloop_2(destination, source1, source2, length);
}

#pragma GCC optimize("tree-vectorize")

void innerloop_3(int16_t* destination, const int16_t* source1, const int16_t* source2, int length)
{
while (length--) {
*(destination++) = *(source1++) + *(source2++);
}
}

Re: Problems with pragma and attribute optimize.

2012-07-25 Thread Richard Guenther
On Wed, Jul 25, 2012 at 2:23 PM, Allan Sandfeld Jensen
 wrote:
> Hi,
>
> I have been experimenting with marking specific functions to be auto-
> vectorized in GCC, but have had problems getting it to work.
>
> It seems the optimize attribute works sometimes, but only if the function it
> is used on is not static, but pragma optimize never seems to work.
>
> See the attached test-case. If you compile it with -ftree-vectorizer-verbose,
> you will see that only the first function is vectorized, but the two last are
> not.
>
> Anyone know what is wrong here?

The attribute doesn't work in the face of inlining and generally was designed
for debugging, not for controlling things like you do.

Richard.

> Best regards
> `Allan


Optimize attribute and inlining

2012-07-25 Thread Selvaraj, Senthil_Kumar
Declaring a function with __attribute__((optimize("O0")) turns off inlining for 
the translation unit (atleast) containing the function (see output at the end). 
Is this expected behavior?

I tracked this down to the fact that when processing the optimize attribute 
with O0, flag_no_inline is set to 1 and is not restored back. The early_inliner 
pass in ipa-inline.c (obviously) skips processing if flag_no_inline is set, and 
therefore inlining does not occur. I did see that handle_optimize_attribute in 
c-family/c-common.c saves and restores whatever options have corresponding 
fields in struct cl_optimization, but inlining is not one of them. That I guess 
is happening because inline is not described as an "Optimization" option in 
common.opt.

Test output

$ cat test.c
static void callee() { static int a; a++; }
void caller() { callee(); }
$ 
$ ~/Code/gcc/repo/native_install/bin/gcc -O1 -fdump-tree-all -S test.c
$ 
$ cat test.c.*.optimized

;; Function caller (caller, funcdef_no=1, decl_uid=1358, cgraph_uid=1)

caller ()
{
  int a.0;
  int a.1;
  static int a;
  static int a;

:
  a.0_3 = a;
  a.1_4 = a.0_3 + 1;
  a = a.1_4;
  return;

}

$ cat test.c
static void callee() { static int a; a++; }
void caller() { callee(); }
static void somefunc() __attribute__((optimize("O0")));
$ 
$ ~/Code/gcc/repo/native_install/bin/gcc -O1 -fdump-tree-all -S test.c
$ 
$ cat test.c.*.optimized

;; Function callee (callee, funcdef_no=0, decl_uid=1355, cgraph_uid=0)
callee ()
{
  static int a;
  int a.1;
  int a.0;

:
  a.0_1 = a;
  a.1_2 = a.0_1 + 1;
  a = a.1_2;
  return;

}

;; Function caller (caller, funcdef_no=1, decl_uid=1358, cgraph_uid=1)
caller ()
{
:
  callee ();
  return;

}

Regards
Senthil


Re: Problems with pragma and attribute optimize.

2012-07-25 Thread Allan Sandfeld Jensen
On Wednesday 25 July 2012, Richard Guenther wrote:
> On Wed, Jul 25, 2012 at 2:23 PM, Allan Sandfeld Jensen
> 
>  wrote:
> > Hi,
> > 
> > I have been experimenting with marking specific functions to be auto-
> > vectorized in GCC, but have had problems getting it to work.
> > 
> > It seems the optimize attribute works sometimes, but only if the function
> > it is used on is not static, but pragma optimize never seems to work.
> > 
> > See the attached test-case. If you compile it with
> > -ftree-vectorizer-verbose, you will see that only the first function is
> > vectorized, but the two last are not.
> > 
> > Anyone know what is wrong here?
> 
> The attribute doesn't work in the face of inlining and generally was
> designed for debugging, not for controlling things like you do.
> 

In that case the GCC manual should probably be updated to reflect that. If 
what you say it true, it seems it has been developed for one purpose but then 
documented for another. (The documentation needs updating anyway, since the 
attribute is no longer allowed after the function declaration like all the 
examples does, but only before).

I found the problem with pragma though, it is apparently a long standing bug 
in PR 48026 and a related version in PR 41201. The last bug actually has a 
patch for the problem, it is apparently caused by an incorrect short-cut.

`Allan


Re: Optimize attribute and inlining

2012-07-25 Thread Richard Guenther
On Wed, Jul 25, 2012 at 4:07 PM, Selvaraj, Senthil_Kumar
 wrote:
> Declaring a function with __attribute__((optimize("O0")) turns off inlining 
> for the translation unit (atleast) containing the function (see output at the 
> end). Is this expected behavior?

Not really.  The optimize attribute processing should only affect flags it
saves.  -f[no-]inline is not meaningful per function and we have the noinline
attribute for more proper handling.

That said, I consider the optimize attribute code seriously broken and
unmaintained (but sometimes useful for debugging - and only that).

> I tracked this down to the fact that when processing the optimize attribute 
> with O0, flag_no_inline is set to 1 and is not restored back. The 
> early_inliner pass in ipa-inline.c (obviously) skips processing if 
> flag_no_inline is set, and therefore inlining does not occur. I did see that 
> handle_optimize_attribute in c-family/c-common.c saves and restores whatever 
> options have corresponding fields in struct cl_optimization, but inlining is 
> not one of them. That I guess is happening because inline is not described as 
> an "Optimization" option in common.opt.
>
> Test output
> 
> $ cat test.c
> static void callee() { static int a; a++; }
> void caller() { callee(); }
> $
> $ ~/Code/gcc/repo/native_install/bin/gcc -O1 -fdump-tree-all -S test.c
> $
> $ cat test.c.*.optimized
>
> ;; Function caller (caller, funcdef_no=1, decl_uid=1358, cgraph_uid=1)
>
> caller ()
> {
>   int a.0;
>   int a.1;
>   static int a;
>   static int a;
>
> :
>   a.0_3 = a;
>   a.1_4 = a.0_3 + 1;
>   a = a.1_4;
>   return;
>
> }
>
> $ cat test.c
> static void callee() { static int a; a++; }
> void caller() { callee(); }
> static void somefunc() __attribute__((optimize("O0")));
> $
> $ ~/Code/gcc/repo/native_install/bin/gcc -O1 -fdump-tree-all -S test.c
> $
> $ cat test.c.*.optimized
>
> ;; Function callee (callee, funcdef_no=0, decl_uid=1355, cgraph_uid=0)
> callee ()
> {
>   static int a;
>   int a.1;
>   int a.0;
>
> :
>   a.0_1 = a;
>   a.1_2 = a.0_1 + 1;
>   a = a.1_2;
>   return;
>
> }
>
> ;; Function caller (caller, funcdef_no=1, decl_uid=1358, cgraph_uid=1)
> caller ()
> {
> :
>   callee ();
>   return;
>
> }
>
> Regards
> Senthil


Re: Problems with pragma and attribute optimize.

2012-07-25 Thread Richard Guenther
On Wed, Jul 25, 2012 at 4:25 PM, Allan Sandfeld Jensen
 wrote:
> On Wednesday 25 July 2012, Richard Guenther wrote:
>> On Wed, Jul 25, 2012 at 2:23 PM, Allan Sandfeld Jensen
>>
>>  wrote:
>> > Hi,
>> >
>> > I have been experimenting with marking specific functions to be auto-
>> > vectorized in GCC, but have had problems getting it to work.
>> >
>> > It seems the optimize attribute works sometimes, but only if the function
>> > it is used on is not static, but pragma optimize never seems to work.
>> >
>> > See the attached test-case. If you compile it with
>> > -ftree-vectorizer-verbose, you will see that only the first function is
>> > vectorized, but the two last are not.
>> >
>> > Anyone know what is wrong here?
>>
>> The attribute doesn't work in the face of inlining and generally was
>> designed for debugging, not for controlling things like you do.
>>
>
> In that case the GCC manual should probably be updated to reflect that. If
> what you say it true, it seems it has been developed for one purpose but then
> documented for another. (The documentation needs updating anyway, since the
> attribute is no longer allowed after the function declaration like all the
> examples does, but only before).
>
> I found the problem with pragma though, it is apparently a long standing bug
> in PR 48026 and a related version in PR 41201. The last bug actually has a
> patch for the problem, it is apparently caused by an incorrect short-cut.

CCing the original author.

Richard.

> `Allan


Re: Integer promotion for register based arguments

2012-07-25 Thread Eric Botcazou
> I guess my question is what would I need to change to make it work like the
> ARM port? I can't see how this is being controlled.

Try TARGET_PROMOTE_PROTOTYPES.

-- 
Eric Botcazou


RE: Integer promotion for register based arguments

2012-07-25 Thread Jon Beniston
Hi Eric,
 
> > I guess my question is what would I need to change to make it work
> > like the ARM port? I can't see how this is being controlled.
> 
> Try TARGET_PROMOTE_PROTOTYPES.

For all 3 targets I believe this returns true (Both MIPS and LM32 use
hook_bool_const_tree_true), so I presume it must be something else.

Regards,
Jon




Re: Integer promotion for register based arguments

2012-07-25 Thread Andrew Haley
On 07/25/2012 04:52 PM, Jon Beniston wrote:
> Hi Eric,
>  
>>> > > I guess my question is what would I need to change to make it work
>>> > > like the ARM port? I can't see how this is being controlled.
>> > 
>> > Try TARGET_PROMOTE_PROTOTYPES.
> For all 3 targets I believe this returns true (Both MIPS and LM32 use
> hook_bool_const_tree_true), so I presume it must be something else.

I'd just step through the code at the point the sign extension is
generated.

Andrew.


RE: Identifying Compiler Options to Minimize Energy Consumption by Embedded Programs

2012-07-25 Thread Jon Beniston
Hi James,

>  - Which set of benchmarks are suitable for embedded applications and
representative of possible applications?

Have a look at CoreMark: http://www.coremark.org/

EEMBC also have EnergyBench: http://www.eembc.org/benchmark/power_sl.php
although I think that might be commercial, but it may give you some ideas.

Regards,
Jon




Re: Optimize attribute and inlining

2012-07-25 Thread David Brown

On 25/07/12 17:30, Richard Guenther wrote:

On Wed, Jul 25, 2012 at 4:07 PM, Selvaraj, Senthil_Kumar
  wrote:

Declaring a function with __attribute__((optimize("O0")) turns off
inlining for the translation unit (atleast) containing the function
(see output at the end). Is this expected behavior?


Not really.  The optimize attribute processing should only affect
flags it saves.  -f[no-]inline is not meaningful per function and we
have the noinline attribute for more proper handling.

That said, I consider the optimize attribute code seriously broken
and unmaintained (but sometimes useful for debugging - and only
that).



That's a pity.  It's understandable - changing optimisation levels on 
different functions is always going to be problematic, since 
inter-function optimisations (like inlining) are going to be difficult 
to define.  But sometimes it could be nice to use specific optimisations 
in specific places, such as loop unrolling in a critical function while 
other code is to be optimised for code size.  Does "#pragma Gcc 
optimize" work more reliably?




Re: Reserving a bit in ELF segment flags for huge page mappings

2012-07-25 Thread Sriraman Tallam
On Tue, Jul 24, 2012 at 1:40 PM, Cary Coutant  wrote:
>>   To do this, I would like to reserve a bit in the segment flags to
>> indicate that this segment is to be mapped to huge pages if possible.
>> Can I reserve something like a PF_LARGE_PAGE bit?
>
> HP-UX has a PF_HP_PAGE_SIZE (0x0010) bit that says "Segment should
> be mapped with page size specified in p_align field".

Ok to define PF_LINUX_PAGE_SIZE similarly, same bit (0x0010) ? I
want this to be a hint to the loader.

Thanks,
-Sri.

>
> -cary