Re: target specific builtin expansion (middle end and back end definition inconsistence problem?).

2012-04-23 Thread Feng LI
Hi Ian,

2012/4/22 Ian Lance Taylor :
> Feng LI  writes:
>
>> Yes, you are right. But how could I reference to a backend defined builtin
>> function in the middle end (I need to generate the builtin function in the
>> middle end and expand it in x86 backend)?
>
> If the function doesn't have a machine-independent definition, then use
> a target hook.

Then I remove the duplicate builtin definition in x86 backend.
I define the builtin function with built_in_class as BUILT_IN_MD in
builtins.def.
So that in the expand_builtin, the target specific hook will be called:
  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD)
return targetm.expand_builtin (exp, target, subtarget, mode, ignore);

Then in the middle end I could reference to this builtin function with
tree tcreate_fn = built_in_decls[BUILT_IN_TCREATE], and it'll call
the target specific expansion in the backend.

The problem happens as the code below shows:

builtin_expand (){
enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
}

ix86_expand_builtin{
enum ix86_builtins fcode = DECL_FUNCTION_CODE (fndecl);
}

The builtin codes (enum built_in_function and ix86_builtins_fcode) are
different sets in
both middle end and backend, so I end up with trapping into this code block:

if (ix86_builtins_isa[fcode].isa
&& !(ix86_builtins_isa[fcode].isa & ix86_isa_flags))
  {
char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa, 0, NULL,
 NULL, NULL, false);

if (!opts)
  error ("%qE needs unknown isa option", fndecl);
else
  {
gcc_assert (opts != NULL);
error ("%qE needs isa option %s", fndecl, opts);
free (opts);
  }
return const0_rtx;
  }

Not sure if I'm doing it in the right way, help needed...

Thanks,
Feng
>
> (That said I've thought for a while that we need better mechanisms for
> target-specific optimization passes.  If we had those I would tell you
> to write one.  E.g., reg-stack.c is a target-specific pass.)
>
> Ian


Re: Attempting changes to the GIMPLifier

2012-04-23 Thread Richard Guenther
2012/4/22  :
> Dear all,
>
> I have a few questions regarding how to augment the information dumped in
> "004t"  GIMPLE dumps (prior any optimization).
>
> My main concerns are:
>
> 1. Printing global variables.

Look at the cgraph (.000i.cgraph) dump.

> 2. Preserving function arguments (what I call an "interface").

I think we do that now.

> Both 1 and 2 are not currently addressed, at least in the gcc-4.5.1 and
> gcc-4.6.0 gimplifiers that I work with.
>
> Is this information available in internal data structures so I can expose it
> via use of the GIMPLE API?
>
> I've also noticed inconsistencies among GIMPLE dumps produced following
> different optimizations, but this is another topic.
>
> Thanks in advance.
>
> Best regards,
> Nikolaos Kavvadias
>
>


Re: [RFC] Converting end of loop computations to MIN_EXPRs.

2012-04-23 Thread Richard Guenther
On Sun, Apr 22, 2012 at 8:50 AM, Ramana Radhakrishnan
 wrote:
> Hi,
>
> A colleague noticed that we were not vectorizing loops that had end of
> loop computations that were MIN type operations that weren't expressed
> in the form of a typical min operation. A transform from  (i < x ) &&
> ( i < y)  to ( i < min (x, y)) is only something that we should do in
> these situations rather than as a general transformation where we
> might be able to end up generating slightly better code because the
> condition would end up being dependent on an invariant outside the
> loop. However in the general case such a transformation would is not
> advisable - it's up to the reader to work that out.

I don't exactly understand why the general transform is not advisable.
We already synthesize min/max operations.

Can you elaborate on why you think that better code might be generated
when not doing this transform?

> #define min(x,y) ((x) <= (y) ? (x) : (y))
>
> void foo (int x, int y, int *  a, int * b, int *c)
> {
>  int i;
>
>  for (i = 0;
>       i < x && i < y;
>       /* i < min (x, y); */
>       i++)
>    a[i] = b[i] * c[i];
>
> }
>
> The patch below deals with this case and I'm guessing that it could
> also handle more of the comparison cases and come up with more
> intelligent choices and should be made quite a lot more robust than
> what it is right now.

Yes.  At least if you have i < 5 && i < y we canonicalize it to
i <= 4 && i < y, so your pattern matching would fail.

Btw, the canonical case this happens in is probably

   for (i = 0; i < n; ++i)
 for (j = 0; j < m && j < i; ++j)
   a[i][j] = ...

thus iterating over the lower/upper triangular part of a non-square matrix
(including or not including the diagonal, thus also j < m && j <= i)

Richard.

> regards,
> Ramana
>
>
>
> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> index ce5eb20..a529536 100644
> --- a/gcc/tree-ssa-loop-im.c
> +++ b/gcc/tree-ssa-loop-im.c
> @@ -563,6 +563,7 @@ stmt_cost (gimple stmt)
>
>   switch (gimple_assign_rhs_code (stmt))
>     {
> +    case MIN_EXPR:
>     case MULT_EXPR:
>     case WIDEN_MULT_EXPR:
>     case WIDEN_MULT_PLUS_EXPR:
> @@ -971,6 +972,124 @@ rewrite_reciprocal (gimple_stmt_iterator *bsi)
>   return stmt1;
>  }
>
> +/* We look for a sequence that is :
> +   def_stmt1  : x = a < b
> +   def_stmt2  : y = a < c
> +   stmt: z = x & y
> +   use_stmt_cond: if ( z != 0)
> +
> +   where b, c are loop invariant .
> +
> +   In which case we might as well replace this by :
> +
> +   t = min (b, c)
> +   if ( a < t )
> +*/
> +
> +static gimple
> +rewrite_min_test (gimple_stmt_iterator *bsi)
> +{
> +  gimple stmt, def_stmt_x, def_stmt_y, use_stmt_cond, stmt1;
> +  tree x, y, z, a, b, c, var, t, name;
> +  use_operand_p use;
> +  bool is_lhs_of_comparison = false;
> +
> +  stmt = gsi_stmt (*bsi);
> +  z = gimple_assign_lhs (stmt);
> +
> +  /* We start by looking at whether x is used in the
> +     right set of conditions.  */
> +  if (TREE_CODE (z) != SSA_NAME
> +      || !single_imm_use (z, &use, &use_stmt_cond)
> +      || gimple_code (use_stmt_cond) != GIMPLE_COND)
> +    return stmt;
> +
> +  x = gimple_assign_rhs1 (stmt);
> +  y = gimple_assign_rhs2 (stmt);
> +
> +  if (TREE_CODE (x) != SSA_NAME
> +      || TREE_CODE (y) != SSA_NAME)
> +    return stmt;
> +
> +  def_stmt_x = SSA_NAME_DEF_STMT (x);
> +  def_stmt_y = SSA_NAME_DEF_STMT (y);
> +
> +  /* def_stmt_x and def_stmt_y should be of the
> +     form
> +
> +     x = a cmp b
> +     y = a cmp c
> +
> +     or
> +
> +     x = b cmp a
> +     y = c cmp a
> +  */
> +  if (!is_gimple_assign (def_stmt_x)
> +      || !is_gimple_assign (def_stmt_y)
> +      || (gimple_assign_rhs_code (def_stmt_x)
> +         != gimple_assign_rhs_code (def_stmt_y)))
> +    return stmt;
> +
> +  if (gimple_assign_rhs1 (def_stmt_x) == gimple_assign_rhs1 (def_stmt_y)
> +      && (gimple_assign_rhs_code (def_stmt_x) == LT_EXPR
> +         || gimple_assign_rhs_code (def_stmt_x) == LE_EXPR))
> +    {
> +      a = gimple_assign_rhs1 (def_stmt_x);
> +      b = gimple_assign_rhs2 (def_stmt_x);
> +      c = gimple_assign_rhs2 (def_stmt_y);
> +      is_lhs_of_comparison = true;
> +    }
> +  else
> +    {
> +      if (gimple_assign_rhs2 (def_stmt_x) == gimple_assign_rhs2 (def_stmt_y)
> +         && (gimple_assign_rhs_code (def_stmt_x) == GT_EXPR
> +             || gimple_assign_rhs_code (def_stmt_x) == GE_EXPR))
> +       {
> +         a = gimple_assign_rhs2 (def_stmt_x);
> +         b = gimple_assign_rhs1 (def_stmt_x);
> +         c = gimple_assign_rhs1 (def_stmt_y);
> +       }
> +      else
> +       return stmt;
> +    }
> +
> +  if (outermost_invariant_loop (b, loop_containing_stmt (def_stmt_x)) != NULL
> +      && outermost_invariant_loop (c, loop_containing_stmt
> (def_stmt_y)) != NULL)
> +
> +    {
> +      if (dump_file)
> +       fprintf (dump_file, "Found a potential transformation to min\n");
> +
> +      /* mintmp = min (b , c).  */
> +
> +      var = create_tmp_var (TRE

A case where PHI-OPT pessimizes the code

2012-04-23 Thread Steven Bosscher
Hello,

I ported the code to expand switch() statements with bit tests from
stmt.c to GIMPLE, and looked at the resulting code to verify that the
transformation applies correctly, when I noticed this strange PHI-OPT
transformation that results in worse code for the test case of PR45830
which looks like this:

int
foo (int *a)
{
  switch (*a)
{
case 0: case 1:case 2:case 3:case 4:case 5:
case 19:case 20:case 21:case 22:case 23:
case 26:case 27:
  return 1;
default:
  return 0;
}
}


After transforming the switch() to a series of bit tests, the code
looks like this:


;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0)

beginning to process the following SWITCH statement (pr45830.c:8) : ---
switch (D.2013_3) , case 0 ... 5: , case 19 ...
23: , case 26 ... 27: >

  expanding as bit test is preferableSwitch converted

foo (int * a)
{
  _Bool D.2023;
  long unsigned int D.2022;
  long unsigned int D.2021;
  long unsigned int csui.1;
  _Bool D.2019;
  int D.2014;
  int D.2013;

:
  D.2013_3 = *a_2(D);
  D.2019_5 = D.2013_3 > 27;
  if (D.2019_5 != 0)
goto  ();
  else
goto ;

:
  D.2021_7 = (long unsigned int) D.2013_3;
  csui.1_4 = 1 << D.2021_7;
  D.2022_8 = csui.1_4 & 217579583;
  D.2023_9 = D.2022_8 != 0;
  if (D.2023_9 != 0)
goto  ();
  else
goto ;

:

:

  # D.2014_1 = PHI <1(5), 0(3)>
:
  return D.2014_1;

}


This is the equivalent code of what the expander in stmt.c would
generate. Unfortunately, the first PHI-OPT pass (phiopt1) changes the
code as follows:

;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0)

Removing basic block 4
foo (int * a)
{
  int D.2026;
  _Bool D.2025;
  long unsigned int D.2022;
  long unsigned int D.2021;
  long unsigned int csui.1;
  int D.2014;
  int D.2013;

:
  D.2013_3 = *a_2(D);
  if (D.2013_3 > 27)
goto  ();
  else
goto ;

:
  D.2021_7 = (long unsigned int) D.2013_3;
  csui.1_4 = 1 << D.2021_7;
  D.2022_8 = csui.1_4 & 217579583;
  D.2025_9 = D.2022_8 != 0;
  D.2026_5 = (int) D.2025_9;  // new statement

  # D.2014_1 = PHI  // modified PHI node
:
  return D.2014_1;

}


This results in worse code on powerpc64:

BEFORE  AFTER
foo:foo:
lwz 9,0(3)  lwz 9,0(3)
cmplwi 7,9,27 | cmpwi 7,9,27
bgt 7,.L4 | bgt 7,.L3
li 8,1| li 3,1
lis 10,0xcf8lis 10,0xcf8
sld 9,8,9 | sld 9,3,9
ori 10,10,63ori 10,10,63
and. 8,9,10 and. 8,9,10
li 3,1| mfcr 10
bnelr 0   | rlwinm 10,10,3,1
.L4:  | xori 3,10,1
  > blr
  > .p2align 4,,15
  > .L3:
li 3,0  li 3,0
blr blr

BEFORE is the code that results if stmt.c expands the switch to bit
tests (i.e. PHI-OPT never gets to transform the code as shown), and
AFTER is with my equivalent GIMPLE implementation. Apparently, the
optimizers are unable to recover from the transformation PHI-OPT
performs.

I am not sure how to fix this problem. I am somewhat surprised by the
code generated by the powerpc backend for "t=(int)(_Bool)some_bool",
because I would have expected the type range for _Bool to be <0,1> so
that the type conversion should be a single bit test. On the other
hand, maybe PHI-OPT should recognize this pattern and reject the
transformation???

Your thoughts/comments/suggestions, please?

Ciao!
Steven


P.S. Unfortunately I haven't been able to produce a test case that
shows the problem without my switch conversion pass.


Re: A case where PHI-OPT pessimizes the code

2012-04-23 Thread Richard Guenther
On Mon, Apr 23, 2012 at 2:15 PM, Steven Bosscher  wrote:
> Hello,
>
> I ported the code to expand switch() statements with bit tests from
> stmt.c to GIMPLE, and looked at the resulting code to verify that the
> transformation applies correctly, when I noticed this strange PHI-OPT
> transformation that results in worse code for the test case of PR45830
> which looks like this:
>
> int
> foo (int *a)
> {
>  switch (*a)
>    {
>    case 0:     case 1:    case 2:    case 3:    case 4:    case 5:
>    case 19:    case 20:    case 21:    case 22:    case 23:
>    case 26:    case 27:
>      return 1;
>    default:
>      return 0;
>    }
> }
>
>
> After transforming the switch() to a series of bit tests, the code
> looks like this:
>
>
> ;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0)
>
> beginning to process the following SWITCH statement (pr45830.c:8) : ---
> switch (D.2013_3) , case 0 ... 5: , case 19 ...
> 23: , case 26 ... 27: >
>
>  expanding as bit test is preferableSwitch converted
> 
> foo (int * a)
> {
>  _Bool D.2023;
>  long unsigned int D.2022;
>  long unsigned int D.2021;
>  long unsigned int csui.1;
>  _Bool D.2019;
>  int D.2014;
>  int D.2013;
>
> :
>  D.2013_3 = *a_2(D);
>  D.2019_5 = D.2013_3 > 27;
>  if (D.2019_5 != 0)
>    goto  ();
>  else
>    goto ;
>
> :
>  D.2021_7 = (long unsigned int) D.2013_3;
>  csui.1_4 = 1 << D.2021_7;
>  D.2022_8 = csui.1_4 & 217579583;
>  D.2023_9 = D.2022_8 != 0;
>  if (D.2023_9 != 0)
>    goto  ();
>  else
>    goto ;
>
> :
>
> :
>
>  # D.2014_1 = PHI <1(5), 0(3)>
> :
>  return D.2014_1;
>
> }
>
>
> This is the equivalent code of what the expander in stmt.c would
> generate. Unfortunately, the first PHI-OPT pass (phiopt1) changes the
> code as follows:
>
> ;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0)
>
> Removing basic block 4
> foo (int * a)
> {
>  int D.2026;
>  _Bool D.2025;
>  long unsigned int D.2022;
>  long unsigned int D.2021;
>  long unsigned int csui.1;
>  int D.2014;
>  int D.2013;
>
> :
>  D.2013_3 = *a_2(D);
>  if (D.2013_3 > 27)
>    goto  ();
>  else
>    goto ;
>
> :
>  D.2021_7 = (long unsigned int) D.2013_3;
>  csui.1_4 = 1 << D.2021_7;
>  D.2022_8 = csui.1_4 & 217579583;
>  D.2025_9 = D.2022_8 != 0;
>  D.2026_5 = (int) D.2025_9;  // new statement
>
>  # D.2014_1 = PHI  // modified PHI node
> :
>  return D.2014_1;
>
> }
>
>
> This results in worse code on powerpc64:
>
> BEFORE                  AFTER
> foo:                    foo:
>        lwz 9,0(3)              lwz 9,0(3)
>        cmplwi 7,9,27 |         cmpwi 7,9,27
>        bgt 7,.L4     |         bgt 7,.L3
>        li 8,1        |         li 3,1
>        lis 10,0xcf8            lis 10,0xcf8
>        sld 9,8,9     |         sld 9,3,9
>        ori 10,10,63            ori 10,10,63
>        and. 8,9,10             and. 8,9,10
>        li 3,1        |         mfcr 10
>        bnelr 0       |         rlwinm 10,10,3,1
> .L4:                  |         xori 3,10,1
>                      >         blr
>                      >         .p2align 4,,15
>                      > .L3:
>        li 3,0                  li 3,0
>        blr                     blr
>
> BEFORE is the code that results if stmt.c expands the switch to bit
> tests (i.e. PHI-OPT never gets to transform the code as shown), and
> AFTER is with my equivalent GIMPLE implementation. Apparently, the
> optimizers are unable to recover from the transformation PHI-OPT
> performs.
>
> I am not sure how to fix this problem. I am somewhat surprised by the
> code generated by the powerpc backend for "t=(int)(_Bool)some_bool",
> because I would have expected the type range for _Bool to be <0,1> so
> that the type conversion should be a single bit test. On the other
> hand, maybe PHI-OPT should recognize this pattern and reject the
> transformation???
>
> Your thoughts/comments/suggestions, please?
>
> Ciao!
> Steven
>
>
> P.S. Unfortunately I haven't been able to produce a test case that
> shows the problem without my switch conversion pass.

int foo (_Bool b)
{
  if (b)
return 1;
  else
return 0;
}

PHI-OPT tries to do conditional replacement, thus transform

 bb0:
  if (cond) goto bb2; else goto bb1;
 bb1:
 bb2:
  x = PHI <0 (bb1), 1 (bb0), ...>;

to

 bb0:
  x' = cond;
  goto bb2;
 bb2:
  x = PHI ;

trying to save a compare (assuming the target has a set-cc like instruction).

I think the ppc backend should be fixed here (if possible), or the generic
expansion of this kind of pattern needs to improve.  On x86_64 we simply
do

(insn 7 6 8 (set (reg:SI 63 [ D.1715 ])
(zero_extend:SI (reg/v:QI 61 [ b ]))) t.c:4 -1
 (nil))

Richard.


Re: A case where PHI-OPT pessimizes the code

2012-04-23 Thread Steven Bosscher
On Mon, Apr 23, 2012 at 2:27 PM, Richard Guenther
 wrote:
> int foo (_Bool b)
> {
>  if (b)
>    return 1;
>  else
>    return 0;
> }

Indeed PHI-OPT performs the transformation on this code, too. But the
resulting code on powerpc64 is fine:

[stevenb@gcc1-power7 gcc]$ cat t.c.149t.optimized

;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0)

foo (_Bool b)
{
  int D.2006;

:
  D.2006_4 = (int) b_2(D);
  return D.2006_4;

}


[stevenb@gcc1-power7 gcc]$ cat t.s
.file   "t.c"
.section".toc","aw"
.section".text"
.align 2
.p2align 4,,15
.globl foo
.section".opd","aw"
.align 3
foo:
.quad   .L.foo,.TOC.@tocbase,0
.previous
.type   foo, @function
.L.foo:
blr
.long 0
.byte 0,0,0,0,0,0,0,0
.size   foo,.-.L.foo
.ident  "GCC: (GNU) 4.8.0 20120418 (experimental) [trunk
revision 186580]"
[stevenb@gcc1-power7 gcc]$


However, this C test case shows the problem:

[stevenb@gcc1-power7 gcc]$ head -n 24 t.c
#define ONEUL (1UL)

int
foo (long unsigned int a)
{
  _Bool b;
  long unsigned int cst, csui;

  if (a > 27) goto return_zero;
/*cst = 217579583UL;*/
  cst = (ONEUL <<  0) | (ONEUL <<  1) | (ONEUL <<  2) | (ONEUL <<  3)
| (ONEUL <<  4) |
(ONEUL <<  5) | (ONEUL << 19) | (ONEUL << 20) | (ONEUL << 21)
| (ONEUL << 22) |
(ONEUL << 23) | (ONEUL << 26) | (ONEUL << 27);
  csui = (ONEUL << a);
  b = ((csui & cst) != 0);
  if (b)
return 1;
  else
return 0;

return_zero:
   return 0;
}

[stevenb@gcc1-power7 gcc]$ ./cc1 -quiet -O2 -fdump-tree-all t.c
[stevenb@gcc1-power7 gcc]$ cat t.s
.file   "t.c"
.section".toc","aw"
.section".text"
.align 2
.p2align 4,,15
.globl foo
.section".opd","aw"
.align 3
foo:
.quad   .L.foo,.TOC.@tocbase,0
.previous
.type   foo, @function
.L.foo:
cmpldi 7,3,27
bgt 7,.L3
li 10,1
lis 9,0xcf8
sld 3,10,3
ori 9,9,63
and. 10,3,9
mfcr 9
rlwinm 9,9,3,1
xori 3,9,1
blr
.p2align 4,,15
.L3:
.L2:
li 3,0
blr
.long 0
.byte 0,0,0,0,0,0,0,0
.size   foo,.-.L.foo
.ident  "GCC: (GNU) 4.8.0 20120418 (experimental) [trunk
revision 186580]"


I will file a PR for this later today, maybe after trying on a few
other targets to see if this is a middle-end problem or a target
issue.

Ciao!
Steven


Failed access check

2012-04-23 Thread Peter A. Felvegi

Hello,

clang gave an error on a code that compiled with gcc so far. The reduced 
test case is:


8<8<8<8<---
class V;

struct E
{
E(const V& v_);

char* c;
V* v;
int i;
};

class V
{
private:
union {
char* c;
struct {
V* v;
int i;
};
};
};

E::E(const V& v_) :
c(v_.c), // line 25
v(v_.v),
i(v_.i)
{
}

8<8<8<8<---

Tried with gcc 4.4, 4.5, 4.6, 4.7, 4.8, all gave the same error:

gcc-4.8 -c gccaccessbug.cpp -Wall -Wextra
gccaccessbug.cpp: In constructor ‘E::E(const V&)’:
gccaccessbug.cpp:16:9: error: ‘char* Vc’ is private
gccaccessbug.cpp:25:7: error: within this context

Line 25 is where E::c is initialized, V::c is private so the error is 
due. However, V::v and V::i are also private, but no diagnostic is 
given. If I comment out 'c(v_.c)', the source compiles w/o error.


Should I file a bug report? Checked BZ for 'access control', but found 
nothing relevant, only bugs related to templated code.


Regards, Peter



Is it possible to get unpartitioned LTO with Fortran?

2012-04-23 Thread AJM-2

Hi,

I have a simple IPA pass that requires access to all function bodies in a
program.  For C and small Fortran programs doing this at link time causes no
issues.  However, when I attempt to compile a larger Fortran program the
pass is called multiple times at link time, each time with only a portion of
the function bodies available.  Is there anyway to force unpartitioned (i.e.
single execution) of my pass on these Fortran programs, with access to all
the function bodies?

I am using GCC 4.7 with gold linker.

Cheers,
Andrew
-- 
View this message in context: 
http://old.nabble.com/Is-it-possible-to-get-unpartitioned-LTO-with-Fortran--tp33732132p33732132.html
Sent from the gcc - Dev mailing list archive at Nabble.com.



Re: Is it possible to get unpartitioned LTO with Fortran?

2012-04-23 Thread Jan Hubicka
> 
> Hi,
> 
> I have a simple IPA pass that requires access to all function bodies in a
> program.  For C and small Fortran programs doing this at link time causes no
> issues.  However, when I attempt to compile a larger Fortran program the
> pass is called multiple times at link time, each time with only a portion of
> the function bodies available.  Is there anyway to force unpartitioned (i.e.
> single execution) of my pass on these Fortran programs, with access to all
> the function bodies?
> 
> I am using GCC 4.7 with gold linker.

-flto-partition=nonde shuld work.
Honza
> 
> Cheers,
> Andrew
> -- 
> View this message in context: 
> http://old.nabble.com/Is-it-possible-to-get-unpartitioned-LTO-with-Fortran--tp33732132p33732132.html
> Sent from the gcc - Dev mailing list archive at Nabble.com.


Re: A case where PHI-OPT pessimizes the code

2012-04-23 Thread Alan Modra
On Mon, Apr 23, 2012 at 02:50:13PM +0200, Steven Bosscher wrote:
>   csui = (ONEUL << a);
>   b = ((csui & cst) != 0);
>   if (b)
> return 1;
>   else
> return 0;

We (powerpc) would be much better if this were

   csui = (ONEUL << a);
   return (csui & cst) >> a;

Other targets would probably benefit too.

-- 
Alan Modra
Australia Development Lab, IBM


Re: A case where PHI-OPT pessimizes the code

2012-04-23 Thread Jeff Law

On 04/23/2012 06:27 AM, Richard Guenther wrote:

On Mon, Apr 23, 2012 at 2:15 PM, Steven Bosscher  wrote:

Hello,

I ported the code to expand switch() statements with bit tests from
stmt.c to GIMPLE, and looked at the resulting code to verify that the
transformation applies correctly, when I noticed this strange PHI-OPT
transformation that results in worse code for the test case of PR45830
which looks like this:

int
foo (int *a)
{
  switch (*a)
{
case 0: case 1:case 2:case 3:case 4:case 5:
case 19:case 20:case 21:case 22:case 23:
case 26:case 27:
  return 1;
default:
  return 0;
}
}


After transforming the switch() to a series of bit tests, the code
looks like this:


;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0)

beginning to process the following SWITCH statement (pr45830.c:8) : ---
switch (D.2013_3), case 0 ... 5:, case 19 ...
23:, case 26 ... 27:>

  expanding as bit test is preferableSwitch converted

foo (int * a)
{
  _Bool D.2023;
  long unsigned int D.2022;
  long unsigned int D.2021;
  long unsigned int csui.1;
  _Bool D.2019;
  int D.2014;
  int D.2013;

:
  D.2013_3 = *a_2(D);
  D.2019_5 = D.2013_3>  27;
  if (D.2019_5 != 0)
goto  ();
  else
goto;

:
  D.2021_7 = (long unsigned int) D.2013_3;
  csui.1_4 = 1<<  D.2021_7;
  D.2022_8 = csui.1_4&  217579583;
  D.2023_9 = D.2022_8 != 0;
  if (D.2023_9 != 0)
goto  ();
  else
goto;

:

:

  # D.2014_1 = PHI<1(5), 0(3)>
:
  return D.2014_1;

}


This is the equivalent code of what the expander in stmt.c would
generate. Unfortunately, the first PHI-OPT pass (phiopt1) changes the
code as follows:

;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0)

Removing basic block 4
foo (int * a)
{
  int D.2026;
  _Bool D.2025;
  long unsigned int D.2022;
  long unsigned int D.2021;
  long unsigned int csui.1;
  int D.2014;
  int D.2013;

:
  D.2013_3 = *a_2(D);
  if (D.2013_3>  27)
goto  ();
  else
goto;

:
  D.2021_7 = (long unsigned int) D.2013_3;
  csui.1_4 = 1<<  D.2021_7;
  D.2022_8 = csui.1_4&  217579583;
  D.2025_9 = D.2022_8 != 0;
  D.2026_5 = (int) D.2025_9;  // new statement

  # D.2014_1 = PHI  // modified PHI node
:
  return D.2014_1;

}


This results in worse code on powerpc64:

BEFORE  AFTER
foo:foo:
lwz 9,0(3)  lwz 9,0(3)
cmplwi 7,9,27 | cmpwi 7,9,27
bgt 7,.L4 | bgt 7,.L3
li 8,1| li 3,1
lis 10,0xcf8lis 10,0xcf8
sld 9,8,9 | sld 9,3,9
ori 10,10,63ori 10,10,63
and. 8,9,10 and. 8,9,10
li 3,1| mfcr 10
bnelr 0   | rlwinm 10,10,3,1
.L4:  | xori 3,10,1
  >   blr
  >   .p2align 4,,15
  >  .L3:
li 3,0  li 3,0
blr blr

BEFORE is the code that results if stmt.c expands the switch to bit
tests (i.e. PHI-OPT never gets to transform the code as shown), and
AFTER is with my equivalent GIMPLE implementation. Apparently, the
optimizers are unable to recover from the transformation PHI-OPT
performs.

I am not sure how to fix this problem. I am somewhat surprised by the
code generated by the powerpc backend for "t=(int)(_Bool)some_bool",
because I would have expected the type range for _Bool to be<0,1>  so
that the type conversion should be a single bit test. On the other
hand, maybe PHI-OPT should recognize this pattern and reject the
transformation???

Your thoughts/comments/suggestions, please?

Ciao!
Steven


P.S. Unfortunately I haven't been able to produce a test case that
shows the problem without my switch conversion pass.


int foo (_Bool b)
{
   if (b)
 return 1;
   else
 return 0;
}

PHI-OPT tries to do conditional replacement, thus transform

  bb0:
   if (cond) goto bb2; else goto bb1;
  bb1:
  bb2:
   x = PHI<0 (bb1), 1 (bb0), ...>;

to

  bb0:
   x' = cond;
   goto bb2;
  bb2:
   x = PHI;

trying to save a compare (assuming the target has a set-cc like instruction).

I think the ppc backend should be fixed here (if possible), or the generic
expansion of this kind of pattern needs to improve.  On x86_64 we simply
do

(insn 7 6 8 (set (reg:SI 63 [ D.1715 ])
 (zero_extend:SI (reg/v:QI 61 [ b ]))) t.c:4 -1
  (nil))
FWIW, there's a patch buried in a BZ where phi-opt is extended to 
eliminate PHIs using casts, arithmetic, etc.  I never followed up on it 
because my tests showed that it wasn't a win.It might be possible to 
retask those bits to improve this code.


jeff

ps.  It was related to missing a conditional move in a loop, so a search 
for missing cmov or something like that might find the bug.  Alternately 
it was probably attached to the 4

Re: old archives from 1998

2012-04-23 Thread Jeff Law

On 04/22/2012 11:43 AM, Ian Lance Taylor wrote:


When EGCS and GCC merged back together again, the changes made to the
FSF version of GCC (that is, the non-EGCS version) were put into
FSFChangelog, which is where you found them.  There was no attempt to
copy all the entries from FSFChangeLog to the regular ChangeLog files,
so it is not surprising that the change is not there.
If I remember correctly, we just copied the ChangeLog from the old gcc2 
tree to FSFChangeLog at each import.  We made no attempt to weave the 
egcs & gcc2 ChangeLogs together.




As I recall most changes to the FSF version of GCC were discussed on the
gcc2 mailing list.  But I might be misremembering.  And I can't find any
archives of the gcc2 mailing list anywhere.
Cygnus kept some archives of the old gcc2 development list, but never 
released them as the list had always been considered.  I doubt I could 
even find those old archives anymore.


Jeff


MIPS: 2'nd pass of ira, causes weird register allocation for 2-op mult

2012-04-23 Thread Klaus Pedersen
The summery goes something like this:

It is possible for the second pass of ira to get confused and decide that
NO_REGS or a hard float register are better choices for the result of the
2 operand mult. First pass already optimally allocated in GR_AND_MD1_REGS.

Two pass ira is enabled with "-fexpensive-optimizations".


Below is the code that will provoke the problem (pre-processed fixed point
function from libgcc)

-8<
typedef unsigned long size_t;
typedef int HItype __attribute__ ((mode (HI)));
typedef unsigned int UHItype __attribute__ ((mode (HI)));
typedef _Fract HQtype __attribute__ ((mode (HQ)));
typedef unsigned _Fract UHQtype __attribute__ ((mode (UHQ)));
typedef int SItype __attribute__ ((mode (SI)));
typedef unsigned int USItype __attribute__ ((mode (SI)));
extern void *memcpy (void *, const void *, size_t);
extern USItype __saturate1uhq (USItype);

UHQtype
__mulhelperuhq (UHQtype a, UHQtype b, int satp)
{
  UHQtype c;
  UHItype x, y, z;
  USItype dx, dy, dz;

  memcpy (&x, &a, 2);
  memcpy (&y, &b, 2);
  dx = (USItype) x;
  dy = (USItype) y;
  dz = dx * dy;
  dz += ((USItype) 1 << (16 - 1));
  dz = dz >> 16;
  if (satp)
dz = __saturate1uhq (dz);
  z = (UHItype) dz;

  memcpy (&c, &z, 2);
  return c;
}
-8<

Compiling with -O1 give pretty optimal code (Check that impressive optimi-
zation of memcpy()):

-8<
.file   1 "u1.c"
.section .mdebug.abi32
.previous
.gnu_attribute 4, 3

 # -G value = 8, Arch = mips1, ISA = 1
 # GNU C version 4.7.0 (mips-sde-elf)
 #  compiled by GNU C version 4.6.3 20120306 (Red Hat 4.6.3-2), GMP
version 4.3.2, MPFR version 3.0.0, MPC version 0.9
 # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
 # options passed:  u1.c -mno-mips16 -O1 -march=mips1 -fdump-tree-all
 # -fdump-ipa-all -ftree-vectorizer-verbose=9 -fdump-rtl-all -fverbose-asm
 # -frandom-seed=0 -O1 -msoft-float -fno-expensive-optimizations
...
__mulhelperuhq:
.frame  $sp,24,$31  # vars= 0, regs= 1/0, args= 16, gp= 0
.mask   0x8000,-4
.fmask  0x,0
.setnoreorder
.setnomacro
andi$5,$5,0x # b, b
andi$4,$4,0x # a, a
mult$5,$4# b, a
mflo$2   # dz
li  $3,32768# 0x8000 # tmp209,
addu$2,$2,$3 # dz, dz, tmp209
beq $6,$0,.L5#, satp,,
srl $2,$2,16 # dz, dz,

addiu   $sp,$sp,-24  #,,
sw  $31,20($sp)  #,
jal __saturate1uhq   #
move$4,$2#, dz

lw  $31,20($sp)  #,
addiu   $sp,$sp,24   #,,
.L5:
j   $31
nop
...
.ident  "GCC: 4.7.0"
-8<

It looks as optimal as it gets...

Unfortunately, when enabling -fexpensive-optimizations the code get really
bad:

-8<
...
 # options passed:  u1.c -mno-mips16 -O1 -march=mips1 -fdump-tree-all
 # -fdump-ipa-all -ftree-vectorizer-verbose=9 -fdump-rtl-all -fverbose-asm
 # -frandom-seed=0 -O1 -msoft-float -fexpensive-optimizations
...
__mulhelperuhq:
.frame  $sp,32,$31  # vars= 8, regs= 1/0, args= 16, gp= 0
...
addiu   $sp,$sp,-32  # <<< set up stack frame
sw  $31,28($sp)  # <<< save link reg
andi$5,$5,0x # b, b
andi$4,$4,0x # a, a
mult$5,$4# b, a
mflo$2   # <<< move from mdlo
sw  $2,16($sp)   # <<< store mdlo on the stack
li  $2,32768
mflo$3   # <<< move from mdlo again!
addu$2,$3,$2 # dz,, tmp209
beq $6,$0,.L2#, satp,,
srl $2,$2,16 # dz, dz,

jal __saturate1uhq
move$4,$2#, dz

.L2:
lw  $31,28($sp)
nop
j   $31
addiu   $sp,$sp,32
...
-8<

Here two additional instructions, to get mdlo and store it on the stack,
has been added. Notice how the valid mdlo value is overwritten and then
immediately reloaded and how 16($sp) is never actually used:

mflo$2   # <<< move from mdlo
sw  $2,16($sp)   # <<< store mdlo on the stack
li  $2,32768
mflo$3   # <<< move from mdlo again!


The problem seem to originate from the ira pass find_costs_and_classes()
(ira-costs.c) when the second pass fails to find something better than pass
one.

One reason for this to happen could be because the way mflo is penaltizied:

-8<
static int
mips_move_to_gpr_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
   reg_class_t from)
{
  switch (from)
{
case G

Re: [SPAM] Re: Attempting changes to the GIMPLifier

2012-04-23 Thread nkavv

Hi Richard


1. Printing global variables.

Look at the cgraph (.000i.cgraph) dump.



2. Preserving function arguments (what I call an "interface").

I think we do that now.


Thank you very much! I'll grab the latest release and have a look.

Best regards,
Nikolaos Kavvadias





Both 1 and 2 are not currently addressed, at least in the gcc-4.5.1 and
gcc-4.6.0 gimplifiers that I work with.

Is this information available in internal data structures so I can expose it
via use of the GIMPLE API?

I've also noticed inconsistencies among GIMPLE dumps produced following
different optimizations, but this is another topic.

Thanks in advance.

Best regards,
Nikolaos Kavvadias












Re: A case where PHI-OPT pessimizes the code

2012-04-23 Thread Steven Bosscher
On Mon, Apr 23, 2012 at 4:43 PM, Alan Modra  wrote:
> On Mon, Apr 23, 2012 at 02:50:13PM +0200, Steven Bosscher wrote:
>>   csui = (ONEUL << a);
>>   b = ((csui & cst) != 0);
>>   if (b)
>> return 1;
>>   else
>> return 0;
>
> We (powerpc) would be much better if this were
>
>   csui = (ONEUL << a);
>   return (csui & cst) >> a;
>
> Other targets would probably benefit too.

Yes, this has been discussed before. See here:

  http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01791.html
  http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01950.html

However, like Roger, I would prefer to not implement this right now. I
only want to port the code from stmt.c to GIMPLE, at least initially.
Later on, we could look at different code generation approaches for
this kind of switch() statement.

Ciao!
Steven


Re: target specific builtin expansion (middle end and back end definition inconsistence problem?).

2012-04-23 Thread Ian Lance Taylor
Feng LI  writes:

> Hi Ian,
>
> 2012/4/22 Ian Lance Taylor :
>> Feng LI  writes:
>>
>>> Yes, you are right. But how could I reference to a backend defined builtin
>>> function in the middle end (I need to generate the builtin function in the
>>> middle end and expand it in x86 backend)?
>>
>> If the function doesn't have a machine-independent definition, then use
>> a target hook.
>
> Then I remove the duplicate builtin definition in x86 backend.
> I define the builtin function with built_in_class as BUILT_IN_MD in
> builtins.def.

Sorry, I meant use a target hook to actually generate the call
expression.  The target hook can refer to the target-specific builtin
function.

Ian


Re: A case where PHI-OPT pessimizes the code

2012-04-23 Thread Steven Bosscher
On Mon, Apr 23, 2012 at 2:50 PM, Steven Bosscher  wrote:
> I will file a PR for this later today, maybe after trying on a few
> other targets to see if this is a middle-end problem or a target
> issue.

This is now PR target/53087 (http://gcc.gnu.org/PR53087).

Actually the poor code looks to be coming from the &-operation, not
from the _Bool->int conversion. But I don't know enough powerpc-speak
to be sure...

Ciao!
Steven


Re: target specific builtin expansion (middle end and back end definition inconsistence problem?).

2012-04-23 Thread Feng LI
Hi Ian,

2012/4/23 Ian Lance Taylor :
> Feng LI  writes:
>
>> Hi Ian,
>>
>> 2012/4/22 Ian Lance Taylor :
>>> Feng LI  writes:
>>>
 Yes, you are right. But how could I reference to a backend defined builtin
 function in the middle end (I need to generate the builtin function in the
 middle end and expand it in x86 backend)?
>>>
>>> If the function doesn't have a machine-independent definition, then use
>>> a target hook.
>>
>> Then I remove the duplicate builtin definition in x86 backend.
>> I define the builtin function with built_in_class as BUILT_IN_MD in
>> builtins.def.
>
> Sorry, I meant use a target hook to actually generate the call
> expression.  The target hook can refer to the target-specific builtin
> function.
Just for confirmation, do you mean by calling this hook:
targetm.builtin_decl (unsigned code, bool initialized_p)
in the middle end for getting the builtin definition in the backend?

Probably I'm asking a silly question, when is the time of the initialization
of the backend builtin functions. I'm refering it in gcc middle end, near
OPENMP expansion (omp-low.c) pass.

Thanks,
Feng
>
> Ian


Re: Failed access check

2012-04-23 Thread Ian Lance Taylor
"Peter A. Felvegi"  writes:

> Should I file a bug report?

Yes, please.  Thanks.

Ian


Re: target specific builtin expansion (middle end and back end definition inconsistence problem?).

2012-04-23 Thread Ian Lance Taylor
Feng LI  writes:

> Yes, you are right. But how could I reference to a backend defined builtin
> function in the middle end (I need to generate the builtin function in the
> middle end and expand it in x86 backend)?

 If the function doesn't have a machine-independent definition, then use
 a target hook.
>>>
>>> Then I remove the duplicate builtin definition in x86 backend.
>>> I define the builtin function with built_in_class as BUILT_IN_MD in
>>> builtins.def.
>>
>> Sorry, I meant use a target hook to actually generate the call
>> expression.  The target hook can refer to the target-specific builtin
>> function.
> Just for confirmation, do you mean by calling this hook:
> targetm.builtin_decl (unsigned code, bool initialized_p)
> in the middle end for getting the builtin definition in the backend?
>
> Probably I'm asking a silly question, when is the time of the initialization
> of the backend builtin functions. I'm refering it in gcc middle end, near
> OPENMP expansion (omp-low.c) pass.

No, I mean adding a new target hook build_my_magic_call and calling
that.  That target hook would be build a call to the function.

You haven't really described the background, so I suppose I don't know
if this is appropriate.  It's not the right approach if you want to
contribute this back to GCC mainline, but then of course GCC mainline
also doesn't want a target-specific function in builtins.def.

Ian


avr-size -C --format=avr options

2012-04-23 Thread Clive Webster
I've previously used WinAVR and their port of gcc has an 'enhancement' in
the avr-size which is option '-C' or '--format=avr' which also needs you to
pass the chip type in '--mmcu=atmega328p' for example. This shows the total
amount of flash required (ie code + data), and RAM (ie data + bss)  and
compares these against the abilities of the chip you have specified. This is
very useful in build scripts to fail code that is too big to upload to the
processor. 

I'm trying to create a newer version of the AVR toolchain as WinAVR is now
quite old (January 2010!) - but need to work around this issue. 

I'm not sure if it was functionality the WinAVR folk added themselves or
whether it's a 'no longer supported' option in gcc  or if it originates
somewhere else!

Anyone know?



Re: RFC: Add STB_GNU_SECONDARY

2012-04-23 Thread H.J. Lu
On Sat, Apr 21, 2012 at 12:01 PM, Joern Rennecke  wrote:
> Quoting "H.J. Lu" :
>
>> Putting our own foo in a section with a special prefix in section name,
>> like .secondary_*, works with linker support.  But it isn't very reliable.
>
>
> In what way is requiring linker support for STB_GNU_SECONDARY more reliable
> than requiring linker support for .secondary_* sections?

It may lead to conflict with section __attribute__ in C source and section
directive in assembly code.

-- 
H.J.


Re: Failed access check

2012-04-23 Thread Jonathan Wakely
On 23 April 2012 18:48, Ian Lance Taylor wrote:
> "Peter A. Felvegi"  writes:
>
>> Should I file a bug report?
>
> Yes, please.  Thanks.

Please check it's not http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24926 first


Re: target specific builtin expansion (middle end and back end definition inconsistence problem?).

2012-04-23 Thread Richard Guenther
On Mon, Apr 23, 2012 at 8:03 PM, Ian Lance Taylor  wrote:
> Feng LI  writes:
>
>> Yes, you are right. But how could I reference to a backend defined 
>> builtin
>> function in the middle end (I need to generate the builtin function in 
>> the
>> middle end and expand it in x86 backend)?
>
> If the function doesn't have a machine-independent definition, then use
> a target hook.

 Then I remove the duplicate builtin definition in x86 backend.
 I define the builtin function with built_in_class as BUILT_IN_MD in
 builtins.def.
>>>
>>> Sorry, I meant use a target hook to actually generate the call
>>> expression.  The target hook can refer to the target-specific builtin
>>> function.
>> Just for confirmation, do you mean by calling this hook:
>> targetm.builtin_decl (unsigned code, bool initialized_p)
>> in the middle end for getting the builtin definition in the backend?
>>
>> Probably I'm asking a silly question, when is the time of the initialization
>> of the backend builtin functions. I'm refering it in gcc middle end, near
>> OPENMP expansion (omp-low.c) pass.
>
> No, I mean adding a new target hook build_my_magic_call and calling
> that.  That target hook would be build a call to the function.
>
> You haven't really described the background, so I suppose I don't know
> if this is appropriate.  It's not the right approach if you want to
> contribute this back to GCC mainline, but then of course GCC mainline
> also doesn't want a target-specific function in builtins.def.

Just to add some 2 cents - target specific builtins are either covering
a generic concept (and thus can be created by the middle-end by
involving a target hook), or they can be completely target specific,
in which case they are _not_created by the middle-end at all.

Richard.

> Ian


Re: avr-size -C --format=avr options

2012-04-23 Thread Georg-Johann Lay

Clive Webster schrieb:

I've previously used WinAVR and their port of gcc has an 'enhancement' in
the avr-size which is option '-C' or '--format=avr' which also needs you to


Please notice that this mailing list is about GCC development, not about
binutils.

size is not a part of GCC. size is part of GNU binutils.


pass the chip type in '--mmcu=atmega328p' for example. This shows the total
amount of flash required (ie code + data), and RAM (ie data + bss)  and
compares these against the abilities of the chip you have specified. This is
very useful in build scripts to fail code that is too big to upload to the
processor. 


I'm trying to create a newer version of the AVR toolchain as WinAVR is now
quite old (January 2010!) - but need to work around this issue. 


I'm not sure if it was functionality the WinAVR folk added themselves or
whether it's a 'no longer supported' option in gcc  or if it originates
somewhere else!


For patches applied to the tools shipped with WinAVR, see respective
patches in your WinAVR distribution in folder ./source

For a list of files contained in your WinAVR distribution,
read ./WinAVR-manifest.log

For links to the projects, see the files ./source/SOURCE and
WinAVR-user-manual.txt in your WinAVR distribution.

Johann



Re: A case where PHI-OPT pessimizes the code

2012-04-23 Thread Alan Modra
On Mon, Apr 23, 2012 at 06:07:52PM +0200, Steven Bosscher wrote:
> On Mon, Apr 23, 2012 at 4:43 PM, Alan Modra  wrote:
> > On Mon, Apr 23, 2012 at 02:50:13PM +0200, Steven Bosscher wrote:
> >>   csui = (ONEUL << a);
> >>   b = ((csui & cst) != 0);
> >>   if (b)
> >> return 1;
> >>   else
> >> return 0;
> >
> > We (powerpc) would be much better if this were
> >
> >   csui = (ONEUL << a);
> >   return (csui & cst) >> a;
> >
> > Other targets would probably benefit too.
> 
> Yes, this has been discussed before. See here:
> 
>   http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01791.html
>   http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01950.html

I'm suggesting something slightly different to either of these.  I
realize it's probably not that easy to implement, and is really
outside the scope of the switch statement code you're working on, but
it would be nice if we could avoid the comparison.  On high end
powerpc machines, int -> cc -> int costs the equivalent of many
operations just on int.

(In the powerpc code you showed, the comparison is folded into the
AND, emitted as "and.", the move from cc is "mfcr; rlwinm; xori".
"and." isn't cheap and "mfcr" is relatively expensive.)

-- 
Alan Modra
Australia Development Lab, IBM