Re: Is there any reason to use vfork() ?

2014-05-14 Thread Florian Weimer

On 05/13/2014 12:12 PM, niXman wrote:


I'm curious whether there is reason to use 'vfork()' rather than 'fork()'?


Without memory overcommitment, fork needs physical backing storage (RAM 
or swap) for all copy-on-write pages in the new process.  vfork doesn't.


--
Florian Weimer / Red Hat Product Security Team


Re: [GSoC] writing test-case

2014-05-14 Thread Richard Biener
On Tue, May 13, 2014 at 11:06 PM, Prathamesh Kulkarni
 wrote:
> On Tue, May 13, 2014 at 2:36 PM, Richard Biener
>  wrote:
>> On Sun, May 11, 2014 at 5:00 PM, Prathamesh Kulkarni
>>  wrote:
>>> On Sun, May 11, 2014 at 8:10 PM, Andreas Schwab  
>>> wrote:
 Prathamesh Kulkarni  writes:

> a) I am not able to follow why 3 slashes are required here
> in x_.\\\(D\\\) ? Why does x_.\(D\) not work ?

 Two of the three backslashes are eaten by the tcl parser.  But actually
 only two backslashes are needed, since the parens are not special to tcl
 (but are special to the regexp engine, so you want a single backslash
 surviving the tcl parser).

> b) The expression after folding would be of the form:
> t2_ = x_(D) - y_(D)
> I have used the operator "." in the pattern to match digit.
> While that works in the above case, I think a better
> idea would be to match using [0-9].
> I tried the following but it does not work:
> t_[0-9] = x_[0-9]\\\(D\\\) - y_[0-9]\\\(D\\\)
> Neither does \\\[ and \\\] work.

 Brackets are special in tcl (including inside double quotes), so they
 need to be quoted.  But you want the brackets to appear unquoted to the
 regexp engine, so a single backslash will do the Right Thing.

 See tcl(n) for the tcl parsing rules.

>>> Thanks. Now I get it, the double backslash \\ is an escape sequence
>>> for \, and special characters like (, [
>>> retain their meaning in quotes, so to match input text: (D), the
>>> pattern has to be written as: "\\(D\\)".
>>> I believe "\(D\)" would only match D in the input ?
>>> I have modified the test-case. Is this version correct ?
>>
>> I usually verify that by running the testcase in isolation on a GCC version
>> that should FAIL it and on one that should PASS it (tcl quoting is also
>> try-and-error for me most of the time...).
>>
>> Thus I do
>>
>> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
>> 
>> 
>> gcc/> make cc1
>> ... compiles cc1 ...
>> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
>> 
>>
>> A more complete matching for an SSA name would be (allowing
>> for SSA name versions > 9) _\\d\+ with \\(D\\) appended if
>> suitable (that's usually known from the testcase).  \\d\+ should match
>> at least one decimal digit.
> I thought that SSA name version wouldn't exceed 9 for that test-case,
> so I decided for matching only one digit. I will change it to match
> one or more digits.
>
> * I have written test-cases for patterns in match.pd (attached patch), which
> result in PASS. Could you review them for me ?
> I couldn't write for following ones:
>
> 1] Patterns involving COMPLEX_EXPR, REALPART_EXPR, IMAGPART_EXPR
> (match_and_simplify
>   (complex (realpart @0) (imagpart @0))
>   @0)
> (match_and_simplify
>   (realpart (complex @0 @1))
>   @0)
> (match_and_simplify
>   (imagpart (complex @0 @1))
>   @1)
>
> Sorry to be daft, but I couldn't understand what these patterns meant
> (I know complex numbers).
> Could you give an example that matches one of these patterns ?
> Thanks.

The existing match-1.c testcase has some ideas.  For the first
pattern I'd do

_Complex double foo (_Complex double z)
{
  double r = __real z;
  double i = __imag z;
  return r + 1.0iF * i;
}

where the return expression is folded (yeah ...) to a COMPLEX_EXPR.

For the other two patterns sth like

double foo (double r)
{
  _Complex double z = r;
  return __real z;
}

and

double foo (double i)
{
  _Complex double z = 1.0iF * i;
  return __imag z;
}

should work.

> 2] Test-case for FMA_EXPR. I am not sure how to generate FMA_EXPR from C code.
> (match_and_simplify
>   (fma INTEGER_CST_P@0 INTEGER_CST_P@1 @3)
>   (plus (mult @0 @1) @3))

I believe it's not possible.  FMA is matched by the optimize_widen_mult
pass which runs quite late, after the last forwprop pass.  So I don't think
it's possible to write a testcase that triggers with the existing compiler.

> 3] Test-case for COND_EXPR
> (match_and_simplify
>   (cond (bit_not @0) @1 @2)
>   (cond @0 @2 @1))
>
> I believe cond corresponds to C's ternary operator ?
> However c-expression of the form:
> t2 = (x ? y : z)
> gets translated to gimple as an if-else statement, with "x" being condition,
> "y" being then-statement, and "z" being else-statement.
> So I guess we need to handle this case specially in genmatch ?
> Or am I mistaken ?

One idea was to also match if-then-else as COND_EXPR (something
to explore later), but you can also see COND_EXPRs in the GIMPLE IL,
for example they are created by if-conversion which runs before
vectorization.  But as above, it's difficult to create a testcase to
match on a forwprop transform (those patterns are more likely to
match from the various passes code-generation which need to be
updated to use the gimple_build interface provided on the breanch).

As of the if-then-else idea, for example

int foo (int x)
{
  return x ? 3 : 5;
}

is seen as

  :
  if (x_2(D) != 0)
 

Re: [GSoC] writing test-case

2014-05-14 Thread Richard Biener
On Wed, May 14, 2014 at 12:24 PM, Richard Biener
 wrote:
> On Tue, May 13, 2014 at 11:06 PM, Prathamesh Kulkarni
>  wrote:
>> On Tue, May 13, 2014 at 2:36 PM, Richard Biener
>>  wrote:
>>> On Sun, May 11, 2014 at 5:00 PM, Prathamesh Kulkarni
>>>  wrote:
 On Sun, May 11, 2014 at 8:10 PM, Andreas Schwab  
 wrote:
> Prathamesh Kulkarni  writes:
>
>> a) I am not able to follow why 3 slashes are required here
>> in x_.\\\(D\\\) ? Why does x_.\(D\) not work ?
>
> Two of the three backslashes are eaten by the tcl parser.  But actually
> only two backslashes are needed, since the parens are not special to tcl
> (but are special to the regexp engine, so you want a single backslash
> surviving the tcl parser).
>
>> b) The expression after folding would be of the form:
>> t2_ = x_(D) - y_(D)
>> I have used the operator "." in the pattern to match digit.
>> While that works in the above case, I think a better
>> idea would be to match using [0-9].
>> I tried the following but it does not work:
>> t_[0-9] = x_[0-9]\\\(D\\\) - y_[0-9]\\\(D\\\)
>> Neither does \\\[ and \\\] work.
>
> Brackets are special in tcl (including inside double quotes), so they
> need to be quoted.  But you want the brackets to appear unquoted to the
> regexp engine, so a single backslash will do the Right Thing.
>
> See tcl(n) for the tcl parsing rules.
>
 Thanks. Now I get it, the double backslash \\ is an escape sequence
 for \, and special characters like (, [
 retain their meaning in quotes, so to match input text: (D), the
 pattern has to be written as: "\\(D\\)".
 I believe "\(D\)" would only match D in the input ?
 I have modified the test-case. Is this version correct ?
>>>
>>> I usually verify that by running the testcase in isolation on a GCC version
>>> that should FAIL it and on one that should PASS it (tcl quoting is also
>>> try-and-error for me most of the time...).
>>>
>>> Thus I do
>>>
>>> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
>>> 
>>> 
>>> gcc/> make cc1
>>> ... compiles cc1 ...
>>> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
>>> 
>>>
>>> A more complete matching for an SSA name would be (allowing
>>> for SSA name versions > 9) _\\d\+ with \\(D\\) appended if
>>> suitable (that's usually known from the testcase).  \\d\+ should match
>>> at least one decimal digit.
>> I thought that SSA name version wouldn't exceed 9 for that test-case,
>> so I decided for matching only one digit. I will change it to match
>> one or more digits.
>>
>> * I have written test-cases for patterns in match.pd (attached patch), which
>> result in PASS. Could you review them for me ?
>> I couldn't write for following ones:
>>
>> 1] Patterns involving COMPLEX_EXPR, REALPART_EXPR, IMAGPART_EXPR
>> (match_and_simplify
>>   (complex (realpart @0) (imagpart @0))
>>   @0)
>> (match_and_simplify
>>   (realpart (complex @0 @1))
>>   @0)
>> (match_and_simplify
>>   (imagpart (complex @0 @1))
>>   @1)
>>
>> Sorry to be daft, but I couldn't understand what these patterns meant
>> (I know complex numbers).
>> Could you give an example that matches one of these patterns ?
>> Thanks.
>
> The existing match-1.c testcase has some ideas.  For the first
> pattern I'd do
>
> _Complex double foo (_Complex double z)
> {
>   double r = __real z;
>   double i = __imag z;
>   return r + 1.0iF * i;
> }
>
> where the return expression is folded (yeah ...) to a COMPLEX_EXPR.
>
> For the other two patterns sth like
>
> double foo (double r)
> {
>   _Complex double z = r;
>   return __real z;
> }
>
> and
>
> double foo (double i)
> {
>   _Complex double z = 1.0iF * i;
>   return __imag z;
> }
>
> should work.
>
>> 2] Test-case for FMA_EXPR. I am not sure how to generate FMA_EXPR from C 
>> code.
>> (match_and_simplify
>>   (fma INTEGER_CST_P@0 INTEGER_CST_P@1 @3)
>>   (plus (mult @0 @1) @3))
>
> I believe it's not possible.  FMA is matched by the optimize_widen_mult
> pass which runs quite late, after the last forwprop pass.  So I don't think
> it's possible to write a testcase that triggers with the existing compiler.
>
>> 3] Test-case for COND_EXPR
>> (match_and_simplify
>>   (cond (bit_not @0) @1 @2)
>>   (cond @0 @2 @1))
>>
>> I believe cond corresponds to C's ternary operator ?
>> However c-expression of the form:
>> t2 = (x ? y : z)
>> gets translated to gimple as an if-else statement, with "x" being condition,
>> "y" being then-statement, and "z" being else-statement.
>> So I guess we need to handle this case specially in genmatch ?
>> Or am I mistaken ?
>
> One idea was to also match if-then-else as COND_EXPR (something
> to explore later), but you can also see COND_EXPRs in the GIMPLE IL,
> for example they are created by if-conversion which runs before
> vectorization.  But as above, it's difficult to create a testcase to
> match on a forwprop transform (those patterns are more likely to
> match from the va

Re: [GSoC] writing test-case

2014-05-14 Thread Richard Biener
On Wed, May 14, 2014 at 12:30 PM, Richard Biener
 wrote:
> On Wed, May 14, 2014 at 12:24 PM, Richard Biener
>  wrote:
>> On Tue, May 13, 2014 at 11:06 PM, Prathamesh Kulkarni
>>  wrote:
>>> On Tue, May 13, 2014 at 2:36 PM, Richard Biener
>>>  wrote:
 On Sun, May 11, 2014 at 5:00 PM, Prathamesh Kulkarni
  wrote:
> On Sun, May 11, 2014 at 8:10 PM, Andreas Schwab  
> wrote:
>> Prathamesh Kulkarni  writes:
>>
>>> a) I am not able to follow why 3 slashes are required here
>>> in x_.\\\(D\\\) ? Why does x_.\(D\) not work ?
>>
>> Two of the three backslashes are eaten by the tcl parser.  But actually
>> only two backslashes are needed, since the parens are not special to tcl
>> (but are special to the regexp engine, so you want a single backslash
>> surviving the tcl parser).
>>
>>> b) The expression after folding would be of the form:
>>> t2_ = x_(D) - y_(D)
>>> I have used the operator "." in the pattern to match digit.
>>> While that works in the above case, I think a better
>>> idea would be to match using [0-9].
>>> I tried the following but it does not work:
>>> t_[0-9] = x_[0-9]\\\(D\\\) - y_[0-9]\\\(D\\\)
>>> Neither does \\\[ and \\\] work.
>>
>> Brackets are special in tcl (including inside double quotes), so they
>> need to be quoted.  But you want the brackets to appear unquoted to the
>> regexp engine, so a single backslash will do the Right Thing.
>>
>> See tcl(n) for the tcl parsing rules.
>>
> Thanks. Now I get it, the double backslash \\ is an escape sequence
> for \, and special characters like (, [
> retain their meaning in quotes, so to match input text: (D), the
> pattern has to be written as: "\\(D\\)".
> I believe "\(D\)" would only match D in the input ?
> I have modified the test-case. Is this version correct ?

 I usually verify that by running the testcase in isolation on a GCC version
 that should FAIL it and on one that should PASS it (tcl quoting is also
 try-and-error for me most of the time...).

 Thus I do

 gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
 
 
 gcc/> make cc1
 ... compiles cc1 ...
 gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
 

 A more complete matching for an SSA name would be (allowing
 for SSA name versions > 9) _\\d\+ with \\(D\\) appended if
 suitable (that's usually known from the testcase).  \\d\+ should match
 at least one decimal digit.
>>> I thought that SSA name version wouldn't exceed 9 for that test-case,
>>> so I decided for matching only one digit. I will change it to match
>>> one or more digits.
>>>
>>> * I have written test-cases for patterns in match.pd (attached patch), which
>>> result in PASS. Could you review them for me ?
>>> I couldn't write for following ones:
>>>
>>> 1] Patterns involving COMPLEX_EXPR, REALPART_EXPR, IMAGPART_EXPR
>>> (match_and_simplify
>>>   (complex (realpart @0) (imagpart @0))
>>>   @0)
>>> (match_and_simplify
>>>   (realpart (complex @0 @1))
>>>   @0)
>>> (match_and_simplify
>>>   (imagpart (complex @0 @1))
>>>   @1)
>>>
>>> Sorry to be daft, but I couldn't understand what these patterns meant
>>> (I know complex numbers).
>>> Could you give an example that matches one of these patterns ?
>>> Thanks.
>>
>> The existing match-1.c testcase has some ideas.  For the first
>> pattern I'd do
>>
>> _Complex double foo (_Complex double z)
>> {
>>   double r = __real z;
>>   double i = __imag z;
>>   return r + 1.0iF * i;
>> }
>>
>> where the return expression is folded (yeah ...) to a COMPLEX_EXPR.
>>
>> For the other two patterns sth like
>>
>> double foo (double r)
>> {
>>   _Complex double z = r;
>>   return __real z;
>> }
>>
>> and
>>
>> double foo (double i)
>> {
>>   _Complex double z = 1.0iF * i;
>>   return __imag z;
>> }
>>
>> should work.
>>
>>> 2] Test-case for FMA_EXPR. I am not sure how to generate FMA_EXPR from C 
>>> code.
>>> (match_and_simplify
>>>   (fma INTEGER_CST_P@0 INTEGER_CST_P@1 @3)
>>>   (plus (mult @0 @1) @3))
>>
>> I believe it's not possible.  FMA is matched by the optimize_widen_mult
>> pass which runs quite late, after the last forwprop pass.  So I don't think
>> it's possible to write a testcase that triggers with the existing compiler.
>>
>>> 3] Test-case for COND_EXPR
>>> (match_and_simplify
>>>   (cond (bit_not @0) @1 @2)
>>>   (cond @0 @2 @1))
>>>
>>> I believe cond corresponds to C's ternary operator ?
>>> However c-expression of the form:
>>> t2 = (x ? y : z)
>>> gets translated to gimple as an if-else statement, with "x" being condition,
>>> "y" being then-statement, and "z" being else-statement.
>>> So I guess we need to handle this case specially in genmatch ?
>>> Or am I mistaken ?
>>
>> One idea was to also match if-then-else as COND_EXPR (something
>> to explore later), but you can also see COND_EXPRs in the GIMPLE IL,
>> for example they are c

Re: GIMPLE tree dumping of, for example, GIMPLE_OMP_PARALLEL's CHILD_FN

2014-05-14 Thread Tom de Vries
On 21/03/14 17:30, Thomas Schwinge wrote:
> Hi!
> 
> Certain GIMPLE codes, such as OpenMP ones, have a structured block
> attached to them, for exmaple, gcc/gimple.def:GIMPLE_OMP_PARALLEL:
> 
> /* GIMPLE_OMP_PARALLEL  represents
> 
>#pragma omp parallel [CLAUSES]
>BODY
> 
>BODY is a the sequence of statements to be executed by all threads.
> [...]
>CHILD_FN is set when outlining the body of the parallel region.
>All the statements in BODY are moved into this newly created
>function when converting OMP constructs into low-GIMPLE.
> [...]
> DEFGSCODE(GIMPLE_OMP_PARALLEL, "gimple_omp_parallel", 
> GSS_OMP_PARALLEL_LAYOUT)
> 
> Using -ftree-dump-all, I can see this structured block (BODY) getting
> dumped, but it then "disappears" in the ompexp pass', and "reappears" (as
> function main._omp_fn.0) in the next ssa pass' dump.
> 
> If I'm correctly understanding the GCC sources as well as operating GDB,
> in the gimple pass we get main._omp_fn.0 dumped because
> gcc/cgraphunit.c:analyze_functions iterates over all functions
> (analyze_function -> dump_function).  In the following passes,
> presumably, this is not done anymore: omplower, lower, eh, cfg.  In
> ompexp, the GIMPLE_OMP_PARALLEL is expanded into a
> »__builtin_GOMP_parallel (main._omp_fn.0)« call, but the main._omp_fn.0
> is not dumped (and there is no BODY anymore to dump).  In the next ssa
> pass, main._omp_fn.0 again is being dumped, by means of
> gcc/passes.c:do_per_function_toporder (execute_pass_list ->
> execute_one_pass -> execute_function_dump -> dump_function_to_file), as I
> understand it.  What do I need to modify to get main._omp_fn.0 included
> in the dumps before the ssa pass, too?

Hi Thomas,

I think the answer to your question lies in two pieces of code.

1. gcc/omp-low.c:expand_omp_taskreg:
...
  /* Inform the callgraph about the new function.  */
  DECL_STRUCT_FUNCTION (child_fn)->curr_properties = cfun->curr_properties;
  cgraph_add_new_function (child_fn, true);
...
Note, the second parameter of cgraph_add_new_function is 'lowered' and set to 
true:

2.  gcc/cgraphunit.c:analyze_function:
...
  /* Make sure to gimplify bodies only once.  During analyzing a
 function we lower it, which will require gimplified nested
 functions, so we can end up here with an already gimplified
 body.  */
  if (!gimple_has_body_p (decl))
gimplify_function_tree (decl);
  dump_function (TDI_generic, decl);

  /* Lower the function.  */
  if (!node->lowered)
{
  if (node->nested)
lower_nested_functions (node->decl);
  gcc_assert (!node->nested);

  gimple_register_cfg_hooks ();
  bitmap_obstack_initialize (NULL);
  execute_pass_list (cfun, g->get_passes ()->all_lowering_passes);
  free_dominance_info (CDI_POST_DOMINATORS);
  free_dominance_info (CDI_DOMINATORS);
  compact_blocks ();
  bitmap_obstack_release (NULL);
  node->lowered = true;
}
...

The code marked by the parallel directive travels through the passes omplower,
lower, eh, and cfg as a part of main.

In ompexp, it's split off into a new function in expand_omp_taskreg. That new
function is marked as already being lowered.

When encountering the new function in analyze_function (after running the
lowering passes on main), we don't lower the code again. The confusing thing is
that we dump the lowered code in the gimplify dump, which suggest that the
function goes 'missing' in the dumps for a while.

Perhaps it would make more sense in this scenario to dump the new function to
the expand_omp dump.

Thanks,
- Tom


Re: [GSoC] writing test-case

2014-05-14 Thread Richard Biener
On Tue, May 13, 2014 at 11:06 PM, Prathamesh Kulkarni
 wrote:
> On Tue, May 13, 2014 at 2:36 PM, Richard Biener
>  wrote:
>> On Sun, May 11, 2014 at 5:00 PM, Prathamesh Kulkarni
>>  wrote:
>>> On Sun, May 11, 2014 at 8:10 PM, Andreas Schwab  
>>> wrote:
 Prathamesh Kulkarni  writes:

> a) I am not able to follow why 3 slashes are required here
> in x_.\\\(D\\\) ? Why does x_.\(D\) not work ?

 Two of the three backslashes are eaten by the tcl parser.  But actually
 only two backslashes are needed, since the parens are not special to tcl
 (but are special to the regexp engine, so you want a single backslash
 surviving the tcl parser).

> b) The expression after folding would be of the form:
> t2_ = x_(D) - y_(D)
> I have used the operator "." in the pattern to match digit.
> While that works in the above case, I think a better
> idea would be to match using [0-9].
> I tried the following but it does not work:
> t_[0-9] = x_[0-9]\\\(D\\\) - y_[0-9]\\\(D\\\)
> Neither does \\\[ and \\\] work.

 Brackets are special in tcl (including inside double quotes), so they
 need to be quoted.  But you want the brackets to appear unquoted to the
 regexp engine, so a single backslash will do the Right Thing.

 See tcl(n) for the tcl parsing rules.

>>> Thanks. Now I get it, the double backslash \\ is an escape sequence
>>> for \, and special characters like (, [
>>> retain their meaning in quotes, so to match input text: (D), the
>>> pattern has to be written as: "\\(D\\)".
>>> I believe "\(D\)" would only match D in the input ?
>>> I have modified the test-case. Is this version correct ?
>>
>> I usually verify that by running the testcase in isolation on a GCC version
>> that should FAIL it and on one that should PASS it (tcl quoting is also
>> try-and-error for me most of the time...).
>>
>> Thus I do
>>
>> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
>> 
>> 
>> gcc/> make cc1
>> ... compiles cc1 ...
>> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
>> 
>>
>> A more complete matching for an SSA name would be (allowing
>> for SSA name versions > 9) _\\d\+ with \\(D\\) appended if
>> suitable (that's usually known from the testcase).  \\d\+ should match
>> at least one decimal digit.
> I thought that SSA name version wouldn't exceed 9 for that test-case,
> so I decided for matching only one digit. I will change it to match
> one or more digits.
>
> * I have written test-cases for patterns in match.pd (attached patch), which
> result in PASS. Could you review them for me ?

Sure.  It looks good to me, though you can look at the changed match-1.c
testcase on the branch where I've changed the matching to look for the
debug output the forwprop pass dumps with -fdump-tree-forwprop1-details,
that makes sure no other pass before did the transform (you can also
move the individual dg-final lines after the testcase function to more
easily associate them with a function).

At some point the testcase should be split up as well.

How do you manage your sources at the moment?  Just a svn
checkout of the branch with local modifications?

Thanks,
Richard.


Why is this not optimized?

2014-05-14 Thread Bingfeng Mei
Hi, 
I am looking at some code of our target, which is not optimized as expected. 
For the following RTX, I expect source of insn 17 should be propagated into 
insn 20, and insn 17 is eliminated as a result. On our target, it will become a 
predicated xor instruction instead of two. Initially, I thought fwprop pass 
should do this. 

(insn 17 16 18 3 (set (reg/v:HI 102 [ crc ])
(xor:HI (reg/v:HI 108 [ crc ])
(const_int 16386 [0x4002]))) coremark.c:1632 725 {xorhi3}
 (nil))
(insn 18 17 19 3 (set (reg:BI 113)
(ne:BI (reg:QI 101 [ D.4446 ])
(const_int 1 [0x1]))) 1397 {cmp_qimode}
 (nil))
(jump_insn 19 18 55 3 (set (pc)
(if_then_else (ne (reg:BI 113)
(const_int 0 [0]))
(label_ref 23)
(pc))) 1477 {cbranchbi4}
 (expr_list:REG_DEAD (reg:BI 113)
(expr_list:REG_BR_PROB (const_int 7100 [0x1bbc])
(expr_list:REG_PRED_WIDTH (const_int 1 [0x1])
(nil
 -> 23)
(note 55 19 20 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 20 55 23 4 (set (reg:HI 112 [ crc ])
(reg/v:HI 102 [ crc ])) 502 {fp_movhi}
 (expr_list:REG_DEAD (reg/v:HI 102 [ crc ])
(nil)))
(code_label 23 20 56 5 2 "" [1 uses])


But it can't. First propagate_rtx_1 will return false because PR_CAN_APPEAR is 
false and following code is executed. 

  if (x == old_rtx)
{
  *px = new_rtx;
  return can_appear;
}

Even I forces PR_CAN_APPEAR to be set in flags, fwprop still won't go ahead in 
try_fwprpp_subst because old_cost is 0 (REG only rtx), and set_src_cost 
(SET_SRC (set), speed) is bigger than 0. So the change is deemed as not 
profitable, which is not correct IMO. 

If fwprop is not the place to do this optimization, where should it be done? I 
am working on up-to-date GCC 4.8. 

Thanks,
Bingfeng Mei


Re: Live range shrinkage in pre-reload scheduling

2014-05-14 Thread Vladimir Makarov

On 2014-05-13, 6:27 AM, Kyrill Tkachov wrote:

Hi all,

In haifa-sched.c (in rank_for_schedule) I notice that live range
shrinkage is not performed when SCHED_PRESSURE_MODEL is used and the
comment mentions that it results in much worse code.

Could anyone elaborate on this? Was it just empirically noticed on x86_64?



It was empirically noticed on SPEC2000.  The practice is a single 
criteria for heuristic optimizations.  Sometimes a new heuristic 
optimization might look promising but the reality might be quite different.


In this relation I am remembering a story told me by Bob Morgan about 
bin packing RA invention.  It was just a quick and simple first RA 
implementation for a new compiler.  After that DEC compiler team tried 
many times to improve the RA implementing more complicated optimizations 
but the first bin packing RA was always better.




Re: Live range shrinkage in pre-reload scheduling

2014-05-14 Thread Richard Sandiford
Vladimir Makarov  writes:
> On 2014-05-13, 6:27 AM, Kyrill Tkachov wrote:
>> Hi all,
>>
>> In haifa-sched.c (in rank_for_schedule) I notice that live range
>> shrinkage is not performed when SCHED_PRESSURE_MODEL is used and the
>> comment mentions that it results in much worse code.
>>
>> Could anyone elaborate on this? Was it just empirically noticed on x86_64?
>>
>
> It was empirically noticed on SPEC2000.  The practice is a single 
> criteria for heuristic optimizations.  Sometimes a new heuristic 
> optimization might look promising but the reality might be quite different.

Hey, I resent that.  You make it sound I came up with SCHED_PRESSURE_MODEL
on a whim without any evidence to back it up.  I implemented it because
it gave better EEMBC results on ARM, at least at the time that I wrote
it, and it didn't effect SPEC2000 for ARM much one way or the other.
It also produced better results for s390x on SPEC2006 at the time it
was tested, which is why it was turned on by default there too.

For anyone interested in the background and rationale, the original
posting was here: https://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html

I'm not claiming it's a great heuristic or anything.  There's bound to
be room for improvement.  But it was based on "reality" and real results.

Of course, if it turns out not be a win for ARM or s390x any more then it
should be disabled.

> In this relation I am remembering a story told me by Bob Morgan about 
> bin packing RA invention.  It was just a quick and simple first RA 
> implementation for a new compiler.  After that DEC compiler team tried 
> many times to improve the RA implementing more complicated optimizations 
> but the first bin packing RA was always better.

You make it sound like your original -fsched-pressure is unlikely
to be beaten, in the way that you think bin packing wasn't beaten.
But both versions of -fsched-pressure are off by default on most
targets for a reason.  (AFAIK the only two targets that enable it by
default are the two that use SCHED_PRESSURE_MODEL: arm and s390x.)
I think this is still an area that could be improved.  I don't mind
whether that's through improving one of the two existing heuristics
or doing something different, but it seems pessimistic to say that
scheduling based on register pressure is always going to be the optional
feature that it is now.

E.g. tracking pressure classes isn't always the right thing for
targets like PowerPC where only part of the vector register set
can be used for floating-point operations.

Thanks,
Richard


Live Range Splitting in Integrated Register Allocator

2014-05-14 Thread Ajit Kumar Agarwal
Sorry for resending again as Plain Text as my earlier mail was sent with HTML 
enable. This makes enable to send it to gcc@gcc.gnu.org.

Sorry once again.

Thanks & Regards
Ajit

From: Ajit Kumar Agarwal 
Sent: Wednesday, May 14, 2014 10:43 PM
To: 'gcc@gcc.gnu.org'; 'vmaka...@redhat.com'
Cc: 'Michael Eager'; Vinod Kathail; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: RE: Live Range Splitting in Integrated Register Allocator

Adding the gcc@gcc.gnu.org mailing list.

From: Ajit Kumar Agarwal 
Sent: Wednesday, May 14, 2014 10:33 PM
To: 'gcc@gcc.gnu.org'; 'vmaka...@redhat.com'
Cc: 'Michael Eager'; Vinod Kathail; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Live Range Splitting in Integrated Register Allocator


Hello All:

I am planning to implement the Live range splitting based on the following 
cases in the Integrated Register Allocator.

For a given Live range that spans from  from outer region to inner region of 
the loop. Such Live ranges which are LiveIn at the entry of the header of the 
Loop and Live Out at the exit of the loop but there are no references inside 
the  Loop. Such Live ranges lead to unoptimal spill and fetch inside the Loop 
conflicting with the shorter live ranges that spans inside the Loop.

Lets say such Live range as L1. L1 can be splitted at the Loop Boundary 
splitting the Live range by making a store at the header of the Loop and the 
Load at the exit of the Loop. This makes the Live range less conflicting with 
the Live ranges that are local to the Loop regions reducing the spill and Fetch 
inside the Loops.

>From the code and documentation of Integrated Register Allocator following is 
>the understanding.

As Live range L1 is live in the outer region but as there are no reference 
inside the Loop region. Since the allocno for L1 for a given variable v is 
assigned two allocno v1 and v2 . V1 being assigned allocno for the outer region 
and v2 as allocno for the inner Loop region. This allows to accumulate the 
information from the inner loop region to outer region.

Will the current Integrated Register Allocator will consider the Live range L1 
as Live inside the Loop and outer region? If Yes then there will be conflicting 
with the Live ranges that are local to the Loop region leading to spill and 
fetch inside the Loop.  If the v1 and v2 allocno are created v1 for the outer 
region and v2 for the inner region then there will v2 will be conflicting the 
local live ranges inside the Loop region and v1 will be conflicting with the 
Live ranges of the outer regions.  This is how its been considered as Live 
range splitting at the Loop Boundary for the Live range that spans inside the 
Loop but not not being referenced?

If Such cases are not being considered in the Integrated Register Allocator, 
then it will be useful to implement such cases in IRA which will be benefitted 
the microblaze target.

Please let me know what do you think.

Thanks & Regards
Ajit


Pass empty struct in C++ on x86-64 like C?

2014-05-14 Thread H.J. Lu
There is a discrepancy when passing empty struct in C++ on x86-64
between GCC and Clang:

https://groups.google.com/forum/#!topic/x86-64-abi/EZzVyvSxUx4

An empty struct of size 1 byte is classified as NO_CLASS.
GCC uses an eight byte slot to pass it on stack and returns it in
EAX while Clang just skips it.  Is this possible to pass/return empty
struct for C++ the same way as for C?

-- 
H.J.


Re: [GSoC] writing test-case

2014-05-14 Thread Prathamesh Kulkarni
On Wed, May 14, 2014 at 3:54 PM, Richard Biener
 wrote:
> On Tue, May 13, 2014 at 11:06 PM, Prathamesh Kulkarni
>  wrote:
>> On Tue, May 13, 2014 at 2:36 PM, Richard Biener
>>  wrote:
>>> On Sun, May 11, 2014 at 5:00 PM, Prathamesh Kulkarni
>>>  wrote:
 On Sun, May 11, 2014 at 8:10 PM, Andreas Schwab  
 wrote:
> Prathamesh Kulkarni  writes:
>
>> a) I am not able to follow why 3 slashes are required here
>> in x_.\\\(D\\\) ? Why does x_.\(D\) not work ?
>
> Two of the three backslashes are eaten by the tcl parser.  But actually
> only two backslashes are needed, since the parens are not special to tcl
> (but are special to the regexp engine, so you want a single backslash
> surviving the tcl parser).
>
>> b) The expression after folding would be of the form:
>> t2_ = x_(D) - y_(D)
>> I have used the operator "." in the pattern to match digit.
>> While that works in the above case, I think a better
>> idea would be to match using [0-9].
>> I tried the following but it does not work:
>> t_[0-9] = x_[0-9]\\\(D\\\) - y_[0-9]\\\(D\\\)
>> Neither does \\\[ and \\\] work.
>
> Brackets are special in tcl (including inside double quotes), so they
> need to be quoted.  But you want the brackets to appear unquoted to the
> regexp engine, so a single backslash will do the Right Thing.
>
> See tcl(n) for the tcl parsing rules.
>
 Thanks. Now I get it, the double backslash \\ is an escape sequence
 for \, and special characters like (, [
 retain their meaning in quotes, so to match input text: (D), the
 pattern has to be written as: "\\(D\\)".
 I believe "\(D\)" would only match D in the input ?
 I have modified the test-case. Is this version correct ?
>>>
>>> I usually verify that by running the testcase in isolation on a GCC version
>>> that should FAIL it and on one that should PASS it (tcl quoting is also
>>> try-and-error for me most of the time...).
>>>
>>> Thus I do
>>>
>>> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
>>> 
>>> 
>>> gcc/> make cc1
>>> ... compiles cc1 ...
>>> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
>>> 
>>>
>>> A more complete matching for an SSA name would be (allowing
>>> for SSA name versions > 9) _\\d\+ with \\(D\\) appended if
>>> suitable (that's usually known from the testcase).  \\d\+ should match
>>> at least one decimal digit.
>> I thought that SSA name version wouldn't exceed 9 for that test-case,
>> so I decided for matching only one digit. I will change it to match
>> one or more digits.
>>
>> * I have written test-cases for patterns in match.pd (attached patch), which
>> result in PASS. Could you review them for me ?
>> I couldn't write for following ones:
>>
>> 1] Patterns involving COMPLEX_EXPR, REALPART_EXPR, IMAGPART_EXPR
>> (match_and_simplify
>>   (complex (realpart @0) (imagpart @0))
>>   @0)
>> (match_and_simplify
>>   (realpart (complex @0 @1))
>>   @0)
>> (match_and_simplify
>>   (imagpart (complex @0 @1))
>>   @1)
>>
>> Sorry to be daft, but I couldn't understand what these patterns meant
>> (I know complex numbers).
>> Could you give an example that matches one of these patterns ?
>> Thanks.
>
> The existing match-1.c testcase has some ideas.  For the first
> pattern I'd do
>
> _Complex double foo (_Complex double z)
> {
>   double r = __real z;
>   double i = __imag z;
>   return r + 1.0iF * i;
> }
>
> where the return expression is folded (yeah ...) to a COMPLEX_EXPR.
>
> For the other two patterns sth like
>
> double foo (double r)
> {
>   _Complex double z = r;
>   return __real z;
> }
>
> and
>
> double foo (double i)
> {
>   _Complex double z = 1.0iF * i;
>   return __imag z;
> }
>
> should work.
>
Thanks. Now I understood the meaning of patterns.
The first pattern should return z instead of returning a new complex
number from r and i.
however the test-case doesn't appear to work.
The other two transforms real (complex x) -> real x and imag (complex
x) -> imag x were simplified.

>> 2] Test-case for FMA_EXPR. I am not sure how to generate FMA_EXPR from C 
>> code.
>> (match_and_simplify
>>   (fma INTEGER_CST_P@0 INTEGER_CST_P@1 @3)
>>   (plus (mult @0 @1) @3))
>
> I believe it's not possible.  FMA is matched by the optimize_widen_mult
> pass which runs quite late, after the last forwprop pass.  So I don't think
> it's possible to write a testcase that triggers with the existing compiler.
>
I was wondering if we could possibly use Gimple front-end to write test cases ?
If that's not possible, should we write c-extensions (only for
testing) that can generate the required pattern ?
For example something like:
int f(int x)
{
  return __fma_expr (3, 4, x);  // transform to x + 12 ?
}

>> 3] Test-case for COND_EXPR
>> (match_and_simplify
>>   (cond (bit_not @0) @1 @2)
>>   (cond @0 @2 @1))
>>
>> I believe cond corresponds to C's ternary operator ?
>> However c-expression of the form:
>> t2 = (x ? y : z)
>> gets

Re: [GSoC] writing test-case

2014-05-14 Thread Prathamesh Kulkarni
On Wed, May 14, 2014 at 4:33 PM, Richard Biener
 wrote:
> On Wed, May 14, 2014 at 12:30 PM, Richard Biener
>  wrote:
>> On Wed, May 14, 2014 at 12:24 PM, Richard Biener
>>  wrote:
>>> On Tue, May 13, 2014 at 11:06 PM, Prathamesh Kulkarni
>>>  wrote:
 On Tue, May 13, 2014 at 2:36 PM, Richard Biener
  wrote:
> On Sun, May 11, 2014 at 5:00 PM, Prathamesh Kulkarni
>  wrote:
>> On Sun, May 11, 2014 at 8:10 PM, Andreas Schwab  
>> wrote:
>>> Prathamesh Kulkarni  writes:
>>>
 a) I am not able to follow why 3 slashes are required here
 in x_.\\\(D\\\) ? Why does x_.\(D\) not work ?
>>>
>>> Two of the three backslashes are eaten by the tcl parser.  But actually
>>> only two backslashes are needed, since the parens are not special to tcl
>>> (but are special to the regexp engine, so you want a single backslash
>>> surviving the tcl parser).
>>>
 b) The expression after folding would be of the form:
 t2_ = x_(D) - y_(D)
 I have used the operator "." in the pattern to match digit.
 While that works in the above case, I think a better
 idea would be to match using [0-9].
 I tried the following but it does not work:
 t_[0-9] = x_[0-9]\\\(D\\\) - y_[0-9]\\\(D\\\)
 Neither does \\\[ and \\\] work.
>>>
>>> Brackets are special in tcl (including inside double quotes), so they
>>> need to be quoted.  But you want the brackets to appear unquoted to the
>>> regexp engine, so a single backslash will do the Right Thing.
>>>
>>> See tcl(n) for the tcl parsing rules.
>>>
>> Thanks. Now I get it, the double backslash \\ is an escape sequence
>> for \, and special characters like (, [
>> retain their meaning in quotes, so to match input text: (D), the
>> pattern has to be written as: "\\(D\\)".
>> I believe "\(D\)" would only match D in the input ?
>> I have modified the test-case. Is this version correct ?
>
> I usually verify that by running the testcase in isolation on a GCC 
> version
> that should FAIL it and on one that should PASS it (tcl quoting is also
> try-and-error for me most of the time...).
>
> Thus I do
>
> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
> 
> 
> gcc/> make cc1
> ... compiles cc1 ...
> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
> 
>
> A more complete matching for an SSA name would be (allowing
> for SSA name versions > 9) _\\d\+ with \\(D\\) appended if
> suitable (that's usually known from the testcase).  \\d\+ should match
> at least one decimal digit.
 I thought that SSA name version wouldn't exceed 9 for that test-case,
 so I decided for matching only one digit. I will change it to match
 one or more digits.

 * I have written test-cases for patterns in match.pd (attached patch), 
 which
 result in PASS. Could you review them for me ?
 I couldn't write for following ones:

 1] Patterns involving COMPLEX_EXPR, REALPART_EXPR, IMAGPART_EXPR
 (match_and_simplify
   (complex (realpart @0) (imagpart @0))
   @0)
 (match_and_simplify
   (realpart (complex @0 @1))
   @0)
 (match_and_simplify
   (imagpart (complex @0 @1))
   @1)

 Sorry to be daft, but I couldn't understand what these patterns meant
 (I know complex numbers).
 Could you give an example that matches one of these patterns ?
 Thanks.
>>>
>>> The existing match-1.c testcase has some ideas.  For the first
>>> pattern I'd do
>>>
>>> _Complex double foo (_Complex double z)
>>> {
>>>   double r = __real z;
>>>   double i = __imag z;
>>>   return r + 1.0iF * i;
>>> }
>>>
>>> where the return expression is folded (yeah ...) to a COMPLEX_EXPR.
>>>
>>> For the other two patterns sth like
>>>
>>> double foo (double r)
>>> {
>>>   _Complex double z = r;
>>>   return __real z;
>>> }
>>>
>>> and
>>>
>>> double foo (double i)
>>> {
>>>   _Complex double z = 1.0iF * i;
>>>   return __imag z;
>>> }
>>>
>>> should work.
>>>
 2] Test-case for FMA_EXPR. I am not sure how to generate FMA_EXPR from C 
 code.
 (match_and_simplify
   (fma INTEGER_CST_P@0 INTEGER_CST_P@1 @3)
   (plus (mult @0 @1) @3))
>>>
>>> I believe it's not possible.  FMA is matched by the optimize_widen_mult
>>> pass which runs quite late, after the last forwprop pass.  So I don't think
>>> it's possible to write a testcase that triggers with the existing compiler.
>>>
 3] Test-case for COND_EXPR
 (match_and_simplify
   (cond (bit_not @0) @1 @2)
   (cond @0 @2 @1))

 I believe cond corresponds to C's ternary operator ?
 However c-expression of the form:
 t2 = (x ? y : z)
 gets translated to gimple as an if-else statement, with "x" being 
 condition,
 "y" being then-statement, and "z" being else-statement.
 So I guess we need to handle this case 

Re: [GSoC] writing test-case

2014-05-14 Thread Prathamesh Kulkarni
On Wed, May 14, 2014 at 4:40 PM, Richard Biener
 wrote:
> On Tue, May 13, 2014 at 11:06 PM, Prathamesh Kulkarni
>  wrote:
>> On Tue, May 13, 2014 at 2:36 PM, Richard Biener
>>  wrote:
>>> On Sun, May 11, 2014 at 5:00 PM, Prathamesh Kulkarni
>>>  wrote:
 On Sun, May 11, 2014 at 8:10 PM, Andreas Schwab  
 wrote:
> Prathamesh Kulkarni  writes:
>
>> a) I am not able to follow why 3 slashes are required here
>> in x_.\\\(D\\\) ? Why does x_.\(D\) not work ?
>
> Two of the three backslashes are eaten by the tcl parser.  But actually
> only two backslashes are needed, since the parens are not special to tcl
> (but are special to the regexp engine, so you want a single backslash
> surviving the tcl parser).
>
>> b) The expression after folding would be of the form:
>> t2_ = x_(D) - y_(D)
>> I have used the operator "." in the pattern to match digit.
>> While that works in the above case, I think a better
>> idea would be to match using [0-9].
>> I tried the following but it does not work:
>> t_[0-9] = x_[0-9]\\\(D\\\) - y_[0-9]\\\(D\\\)
>> Neither does \\\[ and \\\] work.
>
> Brackets are special in tcl (including inside double quotes), so they
> need to be quoted.  But you want the brackets to appear unquoted to the
> regexp engine, so a single backslash will do the Right Thing.
>
> See tcl(n) for the tcl parsing rules.
>
 Thanks. Now I get it, the double backslash \\ is an escape sequence
 for \, and special characters like (, [
 retain their meaning in quotes, so to match input text: (D), the
 pattern has to be written as: "\\(D\\)".
 I believe "\(D\)" would only match D in the input ?
 I have modified the test-case. Is this version correct ?
>>>
>>> I usually verify that by running the testcase in isolation on a GCC version
>>> that should FAIL it and on one that should PASS it (tcl quoting is also
>>> try-and-error for me most of the time...).
>>>
>>> Thus I do
>>>
>>> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
>>> 
>>> 
>>> gcc/> make cc1
>>> ... compiles cc1 ...
>>> gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
>>> 
>>>
>>> A more complete matching for an SSA name would be (allowing
>>> for SSA name versions > 9) _\\d\+ with \\(D\\) appended if
>>> suitable (that's usually known from the testcase).  \\d\+ should match
>>> at least one decimal digit.
>> I thought that SSA name version wouldn't exceed 9 for that test-case,
>> so I decided for matching only one digit. I will change it to match
>> one or more digits.
>>
>> * I have written test-cases for patterns in match.pd (attached patch), which
>> result in PASS. Could you review them for me ?
>
> Sure.  It looks good to me, though you can look at the changed match-1.c
> testcase on the branch where I've changed the matching to look for the
> debug output the forwprop pass dumps with -fdump-tree-forwprop1-details,
> that makes sure no other pass before did the transform (you can also
> move the individual dg-final lines after the testcase function to more
> easily associate them with a function).
Thanks, modified the patch to scan for "gimple_match_and_simplified" instead.
>
> At some point the testcase should be split up as well.
>
> How do you manage your sources at the moment?  Just a svn
> checkout of the branch with local modifications?
Yes.
>
> Thanks,
> Richard.
Index: gcc/match.pd
===
--- gcc/match.pd	(revision 210434)
+++ gcc/match.pd	(working copy)
@@ -21,7 +21,6 @@ You should have received a copy of the G
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-
 /* Transforms formerly done by tree-ssa-forwprop.c:associate_plusminus  */
 
 /* ???  Have match_and_simplify groups guarded with common
@@ -98,6 +97,32 @@ to (minus @1 @0)
(T)(P + A) - (T)P  -> (T)A
  */
 
+/* ~A + A -> -1 */
+(match_and_simplify
+  (plus (bit_not @0) @0)
+  if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+  { build_int_cst (TREE_TYPE (@0), -1); })
+
+/* ~A + 1 -> -A */
+(match_and_simplify
+  (plus (bit_not @0) integer_onep)
+  if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+  (negate @0)) 
+
+/* A - (A +- B) -> -+ B */
+(match_and_simplify
+  (minus @0 (plus @0 @1))
+  (negate @0))
+
+(match_and_simplify
+  (minus @0 (minus @0 @1))
+  @1)
+
+/* (T)(P + A) - (T)P -> (T) A */
+(match_and_simplify
+  (minus (convert (pointer_plus @0 @1))
+	 (convert @0))
+  (convert @1)) 
 
 /* Patterns required to avoid SCCVN testsuite regressions.  */
 
Index: gcc/testsuite/gcc.dg/tree-ssa/match-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/match-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/match-2.c	(working copy)
@@ -0,0 +1,118 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop-details" }  */
+
+/* x + (-y) -> x - y */
+int f1(int x, int y)
+{
+  int t1 = -y;

gcc-4.9-20140514 is now available

2014-05-14 Thread gccadmin
Snapshot gcc-4.9-20140514 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20140514/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 210447

You'll find:

 gcc-4.9-20140514.tar.bz2 Complete GCC

  MD5=cc24b1e46859c24912728c16bd454827
  SHA1=02f0489cf4fad9a15b055ddd4d0e40ba092a

Diffs from 4.9-20140507 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Live range shrinkage in pre-reload scheduling

2014-05-14 Thread Vladimir Makarov

On 2014-05-14, 12:38 PM, Richard Sandiford wrote:

Vladimir Makarov  writes:

On 2014-05-13, 6:27 AM, Kyrill Tkachov wrote:

Hi all,

In haifa-sched.c (in rank_for_schedule) I notice that live range
shrinkage is not performed when SCHED_PRESSURE_MODEL is used and the
comment mentions that it results in much worse code.

Could anyone elaborate on this? Was it just empirically noticed on x86_64?



It was empirically noticed on SPEC2000.  The practice is a single
criteria for heuristic optimizations.  Sometimes a new heuristic
optimization might look promising but the reality might be quite different.


Hey, I resent that.  You make it sound I came up with SCHED_PRESSURE_MODEL
on a whim without any evidence to back it up.  I implemented it because
it gave better EEMBC results on ARM, at least at the time that I wrote
it, and it didn't effect SPEC2000 for ARM much one way or the other.
It also produced better results for s390x on SPEC2006 at the time it
was tested, which is why it was turned on by default there too.

For anyone interested in the background and rationale, the original
posting was here: https://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html

I'm not claiming it's a great heuristic or anything.  There's bound to
be room for improvement.  But it was based on "reality" and real results.

Of course, if it turns out not be a win for ARM or s390x any more then it
should be disabled.


In this relation I am remembering a story told me by Bob Morgan about
bin packing RA invention.  It was just a quick and simple first RA
implementation for a new compiler.  After that DEC compiler team tried
many times to improve the RA implementing more complicated optimizations
but the first bin packing RA was always better.


You make it sound like your original -fsched-pressure is unlikely
to be beaten, in the way that you think bin packing wasn't beaten.


Richard, I did not really mean it.  Quite opposite, I was glad that you 
added your implementation as I believed that the most important what I 
did was an infrastructure for implementing register-pressure scheduling 
(more accurate register pressure evaluation).  The more people use it, 
the better it for me.


Saying that, I am not satisfied as you with how GCC resolves 1st insn 
scheduler and RA conflict.  Ideally, I'd like to see that 1st insn 
scheduler (with some register pressure heuristics or better 
communication with RA) improves code for x86/x86-64.  This goal is still 
far away and I am not sure how to achieve this.  Probably I'll finish an 
active big development of RA and insn scheduler and switch to something 
else during this year.



But both versions of -fsched-pressure are off by default on most
targets for a reason.  (AFAIK the only two targets that enable it by
default are the two that use SCHED_PRESSURE_MODEL: arm and s390x.)
I think this is still an area that could be improved.  I don't mind
whether that's through improving one of the two existing heuristics
or doing something different, but it seems pessimistic to say that
scheduling based on register pressure is always going to be the optional
feature that it is now.

E.g. tracking pressure classes isn't always the right thing for
targets like PowerPC where only part of the vector register set
can be used for floating-point operations.




Re: Live Range Splitting in Integrated Register Allocator

2014-05-14 Thread Vladimir Makarov

On 2014-05-14, 1:33 PM, Ajit Kumar Agarwal wrote:



Hello All:

I am planning to implement the Live range splitting based on the following 
cases in the Integrated Register Allocator.

For a given Live range that spans from  from outer region to inner region of 
the loop. Such Live ranges which are LiveIn at the entry of the header of the 
Loop and Live Out at the exit of the loop but there are no references inside 
the  Loop. Such Live ranges lead to unoptimal spill and fetch inside the Loop 
conflicting with the shorter live ranges that spans inside the Loop.

Lets say such Live range as L1. L1 can be splitted at the Loop Boundary 
splitting the Live range by making a store at the header of the Loop and the 
Load at the exit of the Loop. This makes the Live range less conflicting with 
the Live ranges that are local to the Loop regions reducing the spill and Fetch 
inside the Loops.

 From the code and documentation of Integrated Register Allocator following is 
the understanding.

As Live range L1 is live in the outer region but as there are no reference 
inside the Loop region. Since the allocno for L1 for a given variable v is 
assigned two allocno v1 and v2 . V1 being assigned allocno for the outer region 
and v2 as allocno for the inner Loop region. This allows to accumulate the 
information from the inner loop region to outer region.

Will the current Integrated Register Allocator will consider the Live range L1 
as Live inside the Loop and outer region? If Yes then there will be conflicting 
with the Live ranges that are local to the Loop region leading to spill and 
fetch inside the Loop.  If the v1 and v2 allocno are created v1 for the outer 
region and v2 for the inner region then there will v2 will be conflicting the 
local live ranges inside the Loop region and v1 will be conflicting with the 
Live ranges of the outer regions.  This is how its been considered as Live 
range splitting at the Loop Boundary for the Live range that spans inside the 
Loop but not not being referenced?

If Such cases are not being considered in the Integrated Register Allocator, 
then it will be useful to implement such cases in IRA which will be benefitted 
the microblaze target.

Please let me know what do you think.



Allocno v2 corresponding to live range inside the loop has a very small 
cost for spilling therefore it will be spilled if we still need 
registers to pseudos local to the loop.  If allocno v1 corresponding 
live ranges outside the loop *and* inside the loop gets a hard register, 
we will have live range splitting as you propose.  So I do not see a 
necessity for the optimization you propose.


Moreover my experience shows that making a lot of explicit 
transformations (e.g. proposed splitting) even if we have 
transformations to undo them (e.g. coalescing) results in worse code. 
The explicit transformations should be as minimal as possible during RA 
in a good register allocator.  So I guess the optimization you propose 
will actually results in a worse code.  Although I might be wrong 
because it is always hard to predict the result of heuristic optimizations.


What is really missed in RA, it is a good splitting on BB boundaries. I 
have plans to try this as a part of more common pass to decrease 
register pressure on which I'll start work soon.


In any way thanks for the proposal.  You are free try it to confirm or 
reject my prediction.  Unfortunately, that is the only way to be sure 
about the result.


Re: [GSoC] writing test-case

2014-05-14 Thread Prathamesh Kulkarni
On Thu, May 15, 2014 at 4:00 AM, Prathamesh Kulkarni
 wrote:
> On Wed, May 14, 2014 at 3:54 PM, Richard Biener
>  wrote:
>> On Tue, May 13, 2014 at 11:06 PM, Prathamesh Kulkarni
>>  wrote:
>>> On Tue, May 13, 2014 at 2:36 PM, Richard Biener
>>>  wrote:
 On Sun, May 11, 2014 at 5:00 PM, Prathamesh Kulkarni
  wrote:
> On Sun, May 11, 2014 at 8:10 PM, Andreas Schwab  
> wrote:
>> Prathamesh Kulkarni  writes:
>>
>>> a) I am not able to follow why 3 slashes are required here
>>> in x_.\\\(D\\\) ? Why does x_.\(D\) not work ?
>>
>> Two of the three backslashes are eaten by the tcl parser.  But actually
>> only two backslashes are needed, since the parens are not special to tcl
>> (but are special to the regexp engine, so you want a single backslash
>> surviving the tcl parser).
>>
>>> b) The expression after folding would be of the form:
>>> t2_ = x_(D) - y_(D)
>>> I have used the operator "." in the pattern to match digit.
>>> While that works in the above case, I think a better
>>> idea would be to match using [0-9].
>>> I tried the following but it does not work:
>>> t_[0-9] = x_[0-9]\\\(D\\\) - y_[0-9]\\\(D\\\)
>>> Neither does \\\[ and \\\] work.
>>
>> Brackets are special in tcl (including inside double quotes), so they
>> need to be quoted.  But you want the brackets to appear unquoted to the
>> regexp engine, so a single backslash will do the Right Thing.
>>
>> See tcl(n) for the tcl parsing rules.
>>
> Thanks. Now I get it, the double backslash \\ is an escape sequence
> for \, and special characters like (, [
> retain their meaning in quotes, so to match input text: (D), the
> pattern has to be written as: "\\(D\\)".
> I believe "\(D\)" would only match D in the input ?
> I have modified the test-case. Is this version correct ?

 I usually verify that by running the testcase in isolation on a GCC version
 that should FAIL it and on one that should PASS it (tcl quoting is also
 try-and-error for me most of the time...).

 Thus I do

 gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
 
 
 gcc/> make cc1
 ... compiles cc1 ...
 gcc/> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"
 

 A more complete matching for an SSA name would be (allowing
 for SSA name versions > 9) _\\d\+ with \\(D\\) appended if
 suitable (that's usually known from the testcase).  \\d\+ should match
 at least one decimal digit.
>>> I thought that SSA name version wouldn't exceed 9 for that test-case,
>>> so I decided for matching only one digit. I will change it to match
>>> one or more digits.
>>>
>>> * I have written test-cases for patterns in match.pd (attached patch), which
>>> result in PASS. Could you review them for me ?
>>> I couldn't write for following ones:
>>>
>>> 1] Patterns involving COMPLEX_EXPR, REALPART_EXPR, IMAGPART_EXPR
>>> (match_and_simplify
>>>   (complex (realpart @0) (imagpart @0))
>>>   @0)
>>> (match_and_simplify
>>>   (realpart (complex @0 @1))
>>>   @0)
>>> (match_and_simplify
>>>   (imagpart (complex @0 @1))
>>>   @1)
>>>
>>> Sorry to be daft, but I couldn't understand what these patterns meant
>>> (I know complex numbers).
>>> Could you give an example that matches one of these patterns ?
>>> Thanks.
>>
>> The existing match-1.c testcase has some ideas.  For the first
>> pattern I'd do
>>
>> _Complex double foo (_Complex double z)
>> {
>>   double r = __real z;
>>   double i = __imag z;
>>   return r + 1.0iF * i;
>> }
>>
>> where the return expression is folded (yeah ...) to a COMPLEX_EXPR.
>>
>> For the other two patterns sth like
>>
>> double foo (double r)
>> {
>>   _Complex double z = r;
>>   return __real z;
>> }
>>
>> and
>>
>> double foo (double i)
>> {
>>   _Complex double z = 1.0iF * i;
>>   return __imag z;
>> }
>>
>> should work.
>>
> Thanks. Now I understood the meaning of patterns.
> The first pattern should return z instead of returning a new complex
> number from r and i.
> however the test-case doesn't appear to work.
> The other two transforms real (complex x) -> real x and imag (complex
> x) -> imag x were simplified.
>
>>> 2] Test-case for FMA_EXPR. I am not sure how to generate FMA_EXPR from C 
>>> code.
>>> (match_and_simplify
>>>   (fma INTEGER_CST_P@0 INTEGER_CST_P@1 @3)
>>>   (plus (mult @0 @1) @3))
>>
>> I believe it's not possible.  FMA is matched by the optimize_widen_mult
>> pass which runs quite late, after the last forwprop pass.  So I don't think
>> it's possible to write a testcase that triggers with the existing compiler.
>>
> I was wondering if we could possibly use Gimple front-end to write test cases 
> ?
> If that's not possible, should we write c-extensions (only for
> testing) that can generate the required pattern ?
> For example something like:
> int f(int x)
> {
>   return __fma_expr (3, 4, x);  // transform to x + 12 ?
> }
Writing c-e

Re: Why is this not optimized?

2014-05-14 Thread Bin.Cheng
On Wed, May 14, 2014 at 9:14 PM, Bingfeng Mei  wrote:
> Hi,
> I am looking at some code of our target, which is not optimized as expected. 
> For the following RTX, I expect source of insn 17 should be propagated into 
> insn 20, and insn 17 is eliminated as a result. On our target, it will become 
> a predicated xor instruction instead of two. Initially, I thought fwprop pass 
> should do this.
>
> (insn 17 16 18 3 (set (reg/v:HI 102 [ crc ])
> (xor:HI (reg/v:HI 108 [ crc ])
> (const_int 16386 [0x4002]))) coremark.c:1632 725 {xorhi3}
>  (nil))
> (insn 18 17 19 3 (set (reg:BI 113)
> (ne:BI (reg:QI 101 [ D.4446 ])
> (const_int 1 [0x1]))) 1397 {cmp_qimode}
>  (nil))
> (jump_insn 19 18 55 3 (set (pc)
> (if_then_else (ne (reg:BI 113)
> (const_int 0 [0]))
> (label_ref 23)
> (pc))) 1477 {cbranchbi4}
>  (expr_list:REG_DEAD (reg:BI 113)
> (expr_list:REG_BR_PROB (const_int 7100 [0x1bbc])
> (expr_list:REG_PRED_WIDTH (const_int 1 [0x1])
> (nil
>  -> 23)
> (note 55 19 20 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (insn 20 55 23 4 (set (reg:HI 112 [ crc ])
> (reg/v:HI 102 [ crc ])) 502 {fp_movhi}
>  (expr_list:REG_DEAD (reg/v:HI 102 [ crc ])
> (nil)))
> (code_label 23 20 56 5 2 "" [1 uses])
>
>
> But it can't. First propagate_rtx_1 will return false because PR_CAN_APPEAR 
> is false and
> following code is executed.
>
>   if (x == old_rtx)
> {
>   *px = new_rtx;
>   return can_appear;
> }
>
> Even I forces PR_CAN_APPEAR to be set in flags, fwprop still won't go ahead in
> try_fwprpp_subst because old_cost is 0 (REG only rtx), and set_src_cost 
> (SET_SRC (set),
> speed) is bigger than 0. So the change is deemed as not profitable, which is 
> not correct
> IMO.
Pass fwprop is too conservative with respect to propagation
opportunities outside of memory reference, it just gives up at many
places.  Also as in your case, seems it doesn't take into
consideration that the original insn can be removed after propagation.

We Mi once sent a patch re-implementing fwprop pass at
https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00617.html .
I also did some experiments and worked out a local patch doing similar
work to handle cases exactly like yours.
The problem is even though one instruction can be saved (as in your
case), it's not always good, because it tends to generate more complex
instructions, and such insns are somehow more vulnerable to pipeline
hazard.  Unfortunately, it's kind of impossible for fwprop to
understand the pipeline risk.

Thanks,
bin
>
> If fwprop is not the place to do this optimization, where should it be done? 
> I am working on up-to-date GCC 4.8.
>
> Thanks,
> Bingfeng Mei



-- 
Best Regards.


problem with fprintf and OpenMP

2014-05-14 Thread Siegmar Gross
Hi,

I'm using gcc-4.9.0 and have a problem with the following program.
I have reported the problem allready on gcc-help some days ago,
but didn't get any replies. Perhaps somebody in this list knows,
if the behaviour is intended.

#include 
#include 
#include 

int main (void)
{
  #pragma omp parallel default (none)
  fprintf (stderr, "Hello!\n");
  return EXIT_SUCCESS;
}


I get the following error, when I try to compile the program
on "Solaris 10 Sparc".

tyr OpenMP 116 \gcc -fopenmp omp_fprintf.c 
In file included from /usr/include/stdio.h:66:0,
 from omp_fprintf.c:38:
omp_fprintf.c: In function 'main':
omp_fprintf.c:45:12: error: '__iob' not specified in enclosing parallel
   fprintf (stderr, "Hello!\n");
^
omp_fprintf.c:44:11: error: enclosing parallel
   #pragma omp parallel default (none)
   ^
tyr OpenMP 117 


I can solve the problem if I add "shared(__iob)" to "#pragma",
but then the program isn't portable any longer.

tyr OpenMP 118 grep stderr /usr/include/iso/stdio_iso.h 
#define stderr  (&__iob[2])
#define stderr  (&_iob[2])
tyr OpenMP 119 


I have a similar problem with Linux and Mac OS X, which  need
a different solution, because "stderr" is defined in a different
way.

In my opinion the compiler knows that I use OpenMP and that
I use "default(none)" so that it can automatically add all
necessary internal variables to the "shared clause" or that
it only reports errors for variables which are not part of
system header files. I don't know how many macros or variables
will cause such an error. If their number is small, the compiler
can maintain a small table with these names. Otherwise it may
be a little bit more complicated. Furthermore I don't know if
their might be a reason to have a private copy of an internal
variable (although I cannot see any reason for such a thing,
because an application program shouldn't know about internal
variables or how system macros expand to internal variables).

I would be happy, if it would be possible to write operating
system independend code if I use standards (C11, OpenMP,
pthreads, ...) and that the compiler would solve all operating
system dependend problems if I only use standardized constructs.
Perhaps somebody knows, if it is possible to fulfill my wish in
an upcoming release of gcc.


Kind regards

Siegmar



Re: Live range shrinkage in pre-reload scheduling

2014-05-14 Thread Ramana Radhakrishnan
On Wed, May 14, 2014 at 5:38 PM, Richard Sandiford
 wrote:
> Vladimir Makarov  writes:
>> On 2014-05-13, 6:27 AM, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> In haifa-sched.c (in rank_for_schedule) I notice that live range
>>> shrinkage is not performed when SCHED_PRESSURE_MODEL is used and the
>>> comment mentions that it results in much worse code.
>>>
>>> Could anyone elaborate on this? Was it just empirically noticed on x86_64?
>>>
>>
>> It was empirically noticed on SPEC2000.  The practice is a single
>> criteria for heuristic optimizations.  Sometimes a new heuristic
>> optimization might look promising but the reality might be quite different.

Vlad - Was that based on experiments on x86_64 ?

>
> Hey, I resent that.  You make it sound I came up with SCHED_PRESSURE_MODEL
> on a whim without any evidence to back it up.  I implemented it because
> it gave better EEMBC results on ARM, at least at the time that I wrote
> it, and it didn't effect SPEC2000 for ARM much one way or the other.
> It also produced better results for s390x on SPEC2006 at the time it
> was tested, which is why it was turned on by default there too.
>
> For anyone interested in the background and rationale, the original
> posting was here: https://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html

I no longer have those results, the reason we turned them on were
because IIRC there were significant improvements on A8 and A9 for this
new weighted algorithm and it seemed to work well.

>
> I'm not claiming it's a great heuristic or anything.  There's bound to
> be room for improvement.  But it was based on "reality" and real results.
>
> Of course, if it turns out not be a win for ARM or s390x any more then it
> should be disabled.

The current situation that Kyrill is investigating is a case where we
notice the first scheduler pass being a bit too aggressive with
creating ILP opportunities with the A15 scheduler that causes
performance differences with not turning on the first scheduler pass
vs using the defaults.

>
>> In this relation I am remembering a story told me by Bob Morgan about
>> bin packing RA invention.  It was just a quick and simple first RA
>> implementation for a new compiler.  After that DEC compiler team tried
>> many times to improve the RA implementing more complicated optimizations
>> but the first bin packing RA was always better.
>
> You make it sound like your original -fsched-pressure is unlikely
> to be beaten, in the way that you think bin packing wasn't beaten.
> But both versions of -fsched-pressure are off by default on most
> targets for a reason.  (AFAIK the only two targets that enable it by
> default are the two that use SCHED_PRESSURE_MODEL: arm and s390x.)
> I think this is still an area that could be improved.  I don't mind
> whether that's through improving one of the two existing heuristics
> or doing something different, but it seems pessimistic to say that
> scheduling based on register pressure is always going to be the optional
> feature that it is now.


>
> E.g. tracking pressure classes isn't always the right thing for
> targets like PowerPC where only part of the vector register set
> can be used for floating-point operations.

Is there another post that deals with this particular case ? I tried
digging through the archives but couldn't find anything easily enough.

regards
Ramana


>
> Thanks,
> Richard