from:"kugan"

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-02 Thread Kugan

I'd like to ping this  patch 1 of 2 that removes redundant zero/sign 
extension using value range information.


Bootstrapped and no new regression for  x86_64-unknown-linux-gnu and 
arm-none-linux-gnueabi.


Thanks you for your time.
Kugan

n 14/08/13 16:49, Kugan wrote:

Hi Richard,

Here is an attempt to address your earlier review comments. Bootstrapped
and there is no new regression for X86_64 and arm. Thank you very much
for your time.

Thanks,
Kugan

--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,25 @@
+2013-08-14  Kugan Vivekanandarajah  
+
+* tree-flow.h (mark_range_info_unknown): New function definition.
+* tree-ssa-alias.c (dump_alias_info) : Check pointer type.
+* tree-ssa-copy.c (fini_copy_prop) : Check pointer type and copy
+range info.
+* tree-ssanames.c (make_ssa_name_fn) : Check pointer type in
+initialize.
+* (mark_range_info_unknown) : New function.
+* (duplicate_ssa_name_range_info) : Likewise.
+* (duplicate_ssa_name_fn) : Check pointer type and call correct
+duplicate function.
+* tree-vrp.c (extract_exp_value_range): New function.
+* (simplify_stmt_using_ranges): Call extract_exp_value_range and
+tree_ssa_set_value_range.
+* tree-ssaname.c (ssa_range_info): New function.
+* tree.h (SSA_NAME_PTR_INFO) : changed to access via union
+* tree.h (SSA_NAME_RANGE_INFO) : New macro
+* gimple-pretty-print.c (print_double_int) : New function.
+* gimple-pretty-print.c (dump_gimple_phi) : Dump range info.
+* (pp_gimple_stmt_1) : Likewise.
+
   2013-08-09  Jan Hubicka  

   * cgraph.c (cgraph_create_edge_1): Clear speculative flag.

On 03/07/13 21:55, Kugan wrote:

On 17/06/13 18:33, Richard Biener wrote:

On Mon, 17 Jun 2013, Kugan wrote:
+/* Extract the value range of assigned exprassion for GIMPLE_ASSIGN
stmt.
+   If the extracted value range is valid, return true else return
+   false.  */
+static bool
+extract_exp_value_range (gimple stmt, value_range_t *vr)
+{
+  gcc_assert (is_gimple_assign (stmt));
+  tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree lhs = gimple_assign_lhs (stmt);
+  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
...
@@ -8960,6 +9016,23 @@ simplify_stmt_using_ranges (gimple_stmt_iterator
*gsi)
  {
enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree lhs = gimple_assign_lhs (stmt);
+
+  /* Set value range information for ssa.  */
+  if (!POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
+  && (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)
+  && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
+  && !SSA_NAME_RANGE_INFO (lhs))
+{
+  value_range_t vr = VR_INITIALIZER;
...
+  if (extract_exp_value_range (stmt, &vr))
+tree_ssa_set_value_range (lhs,
+  tree_to_double_int (vr.min),
+  tree_to_double_int (vr.max),
+  vr.type == VR_RANGE);
+}

This looks overly complicated to me.  In vrp_finalize you can simply do

   for (i = 0; i < num_vr_values; i++)
 if (vr_value[i])
   {
 tree name = ssa_name (i);
 if (POINTER_TYPE_P (name))
   continue;
 if (vr_value[i].type == VR_RANGE
 || vr_value[i].type == VR_ANTI_RANGE)
   tree_ssa_set_value_range (name, tree_to_double_int
(vr_value[i].min), tree_to_double_int (vr_value[i].max),
vr_value[i].type
== VR_RANGE);
   }



Thanks Richard for taking time to review it.

I was doing something like what you are suggesting earlier but noticed
some problems and that’s the reason why I changed.

For example, for the following testcase from the test suite,

unsigned long l = (unsigned long)-2;
unsigned short s;

int main () {
   long t = l + 1;
   s = l;
   if (s != (unsigned short) -2)
 abort ();
   exit (0);
}

with the following gimple stmts

main ()
{
   short unsigned int s.1;
   long unsigned int l.0;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
   l.0_2 = l;
   s.1_3 = (short unsigned int) l.0_2;
   s = s.1_3;
   if (s.1_3 != 65534)
 goto ;
   else
 goto ;
;;succ:   3
;;4

;;   basic block 3, loop depth 0
;;pred:   2
   abort ();
;;succ:

;;   basic block 4, loop depth 0
;;pred:   2
   exit (0);
;;succ:

}



has the following value range.

l.0_2: VARYING
s.1_3: [0, +INF]


 From zero/sign extension point of view, the variable s.1_3 is expected
to have a value that will overflow (or varying) as this is what is
assigned to a smaller variable. extract_range_from_assignment initially
calculates the value range as VARYING but later changed to [0, +INF] by
extract_range_basic. What I need here is the value that will be assigned
from the rhs expression and not the value that we will have with proper
assignment.

I understa

Re: [PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-09-02 Thread Kugan

I'd like to ping this  patch 2 of 2 that removes redundant zero/sign 
extension using value range information.


Bootstrapped and no new regression for  x86_64-unknown-linux-gnu and 
arm-none-linux-gnueabi.


Thanks you for your time.
Kugan

On 14/08/13 16:59, Kugan wrote:

Hi Eric,

Thanks for reviewing the patch.

On 01/07/13 18:51, Eric Botcazou wrote:

[Sorry for the delay]


For example, when an expression is evaluated and it's value is assigned
to variable of type short, the generated RTL would look something like
the following.

(set (reg:SI 110)
   (zero_extend:SI (subreg:HI (reg:SI 117) 0)))

However, if during value range propagation, if we can say for certain
that the value of the expression which is present in register 117 is
within the limits of short and there is no sign conversion, we do not
need to perform the subreg and zero_extend; instead we can generate the
following RTl.

(set (reg:SI 110)
   (reg:SI 117)))

Same could be done for other assign statements.


The idea looks interesting.  Some remarks:


+2013-06-03  Kugan Vivekanandarajah  
+
+* gcc/dojump.c (do_compare_and_jump): generates rtl without
+zero/sign extension if redundant.
+* gcc/cfgexpand.c (expand_gimple_stmt_1): Likewise.
+* gcc/gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New
+function.
+* gcc/gimple.h (gimple_assign_is_zero_sign_ext_redundant) : New
+function definition.


No gcc/ prefix in entries for gcc/ChangeLog.  "Generate RTL without..."


I have now changed it to.

--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2013-08-14  Kugan Vivekanandarajah  
+
+* dojump.c (do_compare_and_jump): Generate rtl without
+zero/sign extension if redundant.
+* cfgexpand.c (expand_gimple_stmt_1): Likewise.
+* gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New
+function.
+* gimple.h (gimple_assign_is_zero_sign_ext_redundant) : New
+function definition.
+
  2013-08-09  Jan Hubicka  

  * cgraph.c (cgraph_create_edge_1): Clear speculative flag.



+/* If the value in SUBREG of temp fits that SUBREG (does not
+   overflow) and is assigned to target SUBREG of the same
mode
+   without sign convertion, we can skip the SUBREG
+   and extension.  */
+else if (promoted
+ && gimple_assign_is_zero_sign_ext_redundant (stmt)
+ && (GET_CODE (temp) == SUBREG)
+ && (GET_MODE (target) == GET_MODE (temp))
+ && (GET_MODE (SUBREG_REG (target))
+ == GET_MODE (SUBREG_REG (temp
+  emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp));
  else if (promoted)
{
  int unsignedp = SUBREG_PROMOTED_UNSIGNED_P (target);

Can we relax the strict mode equality here?  This change augments the
same
transformation applied to the RHS when it is also a
SUBREG_PROMOTED_VAR_P at
the beginning of convert_move, but the condition on the mode is less
strict in
the latter case, so maybe it can serve as a model here.



I have now changed it based on convert_move to
+else if (promoted
+ && gimple_assign_is_zero_sign_ext_redundant (stmt)
+ && (GET_CODE (temp) == SUBREG)
+ && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (temp)))
+ >= GET_MODE_PRECISION (GET_MODE (target)))
+ && (GET_MODE (SUBREG_REG (target))
+ == GET_MODE (SUBREG_REG (temp
+  {

Is this what you wanted me to do.


+  /* Is zero/sign extension redundant as per VRP.  */
+  bool op0_ext_redundant = false;
+  bool op1_ext_redundant = false;
+
+  /* If promoted and the value in SUBREG of op0 fits (does not
overflow),
+ it is a candidate for extension elimination.  */
+  if (GET_CODE (op0) == SUBREG && SUBREG_PROMOTED_VAR_P (op0))
+op0_ext_redundant =
+  gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT
(treeop0));
+
+  /* If promoted and the value in SUBREG of op1 fits (does not
overflow),
+ it is a candidate for extension elimination.  */
+  if (GET_CODE (op1) == SUBREG && SUBREG_PROMOTED_VAR_P (op1))
+op1_ext_redundant =
+  gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT
(treeop1));

Are the gimple_assign_is_zero_sign_ext_redundant checks necessary here?
When set on a SUBREG, SUBREG_PROMOTED_VAR_P guarantees that SUBREG_REG is
always properly extended (otherwise it's a bug) so don't you just need to
compare SUBREG_PROMOTED_UNSIGNED_SET?  See do_jump for an existing case.


I am sorry I don’t think I understood you here. How would I know that
extension is redundant without calling
gimple_assign_is_zero_sign_ext_redundant ? Could you please elaborate.



+  /* If zero/sign extension is r

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-03 Thread Kugan


Thanks Richard for reviewing.

On 02/09/13 22:15, Richard Biener wrote:

On Wed, Jul 3, 2013 at 2:25 PM, Kugan  wrote:

On 17/06/13 18:33, Richard Biener wrote:


On Mon, 17 Jun 2013, Kugan wrote:
+/* Extract the value range of assigned exprassion for GIMPLE_ASSIGN stmt.
+   If the extracted value range is valid, return true else return
+   false.  */
+static bool
+extract_exp_value_range (gimple stmt, value_range_t *vr)
+{
+  gcc_assert (is_gimple_assign (stmt));
+  tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree lhs = gimple_assign_lhs (stmt);
+  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
...
@@ -8960,6 +9016,23 @@ simplify_stmt_using_ranges (gimple_stmt_iterator
*gsi)
   {
 enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree lhs = gimple_assign_lhs (stmt);
+
+  /* Set value range information for ssa.  */
+  if (!POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
+  && (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)
+  && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
+  && !SSA_NAME_RANGE_INFO (lhs))
+{
+  value_range_t vr = VR_INITIALIZER;
...
+  if (extract_exp_value_range (stmt, &vr))
+tree_ssa_set_value_range (lhs,
+  tree_to_double_int (vr.min),
+  tree_to_double_int (vr.max),
+  vr.type == VR_RANGE);
+}

This looks overly complicated to me.  In vrp_finalize you can simply do

for (i = 0; i < num_vr_values; i++)
  if (vr_value[i])
{
  tree name = ssa_name (i);
  if (POINTER_TYPE_P (name))
continue;
  if (vr_value[i].type == VR_RANGE
  || vr_value[i].type == VR_ANTI_RANGE)
tree_ssa_set_value_range (name, tree_to_double_int
(vr_value[i].min), tree_to_double_int (vr_value[i].max), vr_value[i].type
== VR_RANGE);
}



Thanks Richard for taking time to review it.

I was doing something like what you are suggesting earlier but noticed some
problems and that’s the reason why I changed.

For example, for the following testcase from the test suite,

unsigned long l = (unsigned long)-2;
unsigned short s;

int main () {
   long t = l + 1;
   s = l;
   if (s != (unsigned short) -2)
 abort ();
   exit (0);
}

with the following gimple stmts

main ()
{
   short unsigned int s.1;
   long unsigned int l.0;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
   l.0_2 = l;
   s.1_3 = (short unsigned int) l.0_2;
   s = s.1_3;
   if (s.1_3 != 65534)
 goto ;
   else
 goto ;
;;succ:   3
;;4

;;   basic block 3, loop depth 0
;;pred:   2
   abort ();
;;succ:

;;   basic block 4, loop depth 0
;;pred:   2
   exit (0);
;;succ:

}



has the following value range.

l.0_2: VARYING
s.1_3: [0, +INF]


 From zero/sign extension point of view, the variable s.1_3 is expected to
have a value that will overflow (or varying) as this is what is assigned to
a smaller variable. extract_range_from_assignment initially calculates the
value range as VARYING but later changed to [0, +INF] by
extract_range_basic. What I need here is the value that will be assigned
from the rhs expression and not the value that we will have with proper
assignment.


I don't understand this.  The relevant statement is

   s.1_3 = (short unsigned int) l.0_2;

right?  You have value-ranges for both s.1_3 and l.0_2 as above.  And
you clearly cannot optimize the truncation away (and if you could,
you wond't need value-range information for that fact).

This is true. But just by looking at the value range of s.1.3 we will 
only see [0 +INF], as we are transferring directly from the lattice to 
lhs its value range.


[0, +INF] here tells us  vrp_val_is_max and it is not 
is_positive_overflow_infinity (or varying). Thats why we need to get the 
value range of RHS expression which will tell us the actual range. We 
can then use this range and see of we can fit it to lhs type without 
truncation.



I understand that the above code of mine needs to be changed but not
convinced about the best way to do that.

I can possibly re-factor extract_range_from_assignment to give me this
information with an additional argument. Could you kindly let me know your
preference.





/* SSA name annotations.  */

+  union vrp_info_type {
+/* Pointer attributes used for alias analysis.  */
+struct GTY ((tag ("TREE_SSA_PTR_INFO"))) ptr_info_def *ptr_info;
+/* Value range attributes used for zero/sign extension elimination.
*/

/* Value range information.  */

+struct GTY ((tag ("TREE_SSA_RANGE_INFO"))) range_info_def
*range_info;
+  } GTY ((desc ("%1.def_stmt && !POINTER_TYPE_P (TREE_TYPE
((tree)&%1))"))) vrp;

why do you need to test %1.def_stmt here?

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-08 Thread Kugan



On 06/09/13 16:16, Richard Biener wrote:

On 9/3/13 2:15 PM, Kugan wrote:

Thanks Richard for reviewing.

On 02/09/13 22:15, Richard Biener wrote:

On Wed, Jul 3, 2013 at 2:25 PM, Kugan
 wrote:

On 17/06/13 18:33, Richard Biener wrote:


On Mon, 17 Jun 2013, Kugan wrote:
+/* Extract the value range of assigned exprassion for GIMPLE_ASSIGN
stmt.
+   If the extracted value range is valid, return true else return
+   false.  */
+static bool


[snip]



 for (i = 0; i < num_vr_values; i++)
   if (vr_value[i])
 {
   tree name = ssa_name (i);
   if (POINTER_TYPE_P (name))
 continue;
   if (vr_value[i].type == VR_RANGE
   || vr_value[i].type == VR_ANTI_RANGE)
 tree_ssa_set_value_range (name, tree_to_double_int
(vr_value[i].min), tree_to_double_int (vr_value[i].max),
vr_value[i].type
== VR_RANGE);
 }



Thanks Richard for taking time to review it.

I was doing something like what you are suggesting earlier but
noticed some
problems and that’s the reason why I changed.

For example, for the following testcase from the test suite,

unsigned long l = (unsigned long)-2;
unsigned short s;

int main () {
long t = l + 1;
s = l;
if (s != (unsigned short) -2)
  abort ();
exit (0);
}

with the following gimple stmts

main ()
{
short unsigned int s.1;
long unsigned int l.0;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
l.0_2 = l;
s.1_3 = (short unsigned int) l.0_2;
s = s.1_3;
if (s.1_3 != 65534)
  goto ;
else
  goto ;
;;succ:   3
;;4

;;   basic block 3, loop depth 0
;;pred:   2
abort ();
;;succ:

;;   basic block 4, loop depth 0
;;pred:   2
exit (0);
;;succ:

}



has the following value range.

l.0_2: VARYING
s.1_3: [0, +INF]


  From zero/sign extension point of view, the variable s.1_3 is
expected to
have a value that will overflow (or varying) as this is what is
assigned to
a smaller variable. extract_range_from_assignment initially
calculates the
value range as VARYING but later changed to [0, +INF] by
extract_range_basic. What I need here is the value that will be assigned
from the rhs expression and not the value that we will have with proper
assignment.


I don't understand this.  The relevant statement is

s.1_3 = (short unsigned int) l.0_2;

right?  You have value-ranges for both s.1_3 and l.0_2 as above.  And
you clearly cannot optimize the truncation away (and if you could,
you wond't need value-range information for that fact).


This is true. But just by looking at the value range of s.1.3 we will
only see [0 +INF], as we are transferring directly from the lattice to
lhs its value range.

[0, +INF] here tells us  vrp_val_is_max and it is not
is_positive_overflow_infinity (or varying). Thats why we need to get the
value range of RHS expression which will tell us the actual range. We
can then use this range and see of we can fit it to lhs type without
truncation.


Well, my point is you want to look at the l.0_2 value-range for this.
Storing the l.0_2 value-range for s.1_3 is wrong.



Yes, tree SSA_NAME should have it's correct value range. But, assigning 
rhs expression's value range is not totally wrong , it is just that it 
can be conservative value range (please correct me if I am wrong here) 
in few cases, as it can have wider range.


I can use the rhs value range in the above case. We can also eliminate 
redundant zero/sign extensions for gimple binary and ternary stmts. In 
this case we will have to calculate the value range.  We will have to 
reuse these logic in tree-vrp.


Other option is to add another attribute in range_info_t to indicate if 
set_value_range_to_nonnegative is used in value range extraction.


What is your preferred solution please.



I understand that the above code of mine needs to be changed but not
convinced about the best way to do that.

I can possibly re-factor extract_range_from_assignment to give me this
information with an additional argument. Could you kindly let me know
your
preference.





/* SSA name annotations.  */

+  union vrp_info_type {
+/* Pointer attributes used for alias analysis.  */
+struct GTY ((tag ("TREE_SSA_PTR_INFO"))) ptr_info_def *ptr_info;
+/* Value range attributes used for zero/sign extension
elimination.
*/

/* Value range information.  */

+struct GTY ((tag ("TREE_SSA_RANGE_INFO"))) range_info_def
*range_info;
+  } GTY ((desc ("%1.def_stmt && !POINTER_TYPE_P (TREE_TYPE
((tree)&%1))"))) vrp;

why do you need to test %1.def_stmt here?




I have seen some tree_ssa_name with def_stmt NULL. Thats why I added
this.
Is that something that should never happen.


It should never happen - they should have a GIMPLE_NOP.



I am seeing def_stmt of NULL for TREE_NOTHROW node.
debug_tree dumps the following in this case:

def_stmt

 version 11 in-free-list>

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-09 Thread Kugan


On 09/09/13 19:01, Richard Biener wrote:

On Mon, Sep 9, 2013 at 1:09 AM, Kugan  wrote:


On 06/09/13 16:16, Richard Biener wrote:


On 9/3/13 2:15 PM, Kugan wrote:


Thanks Richard for reviewing.

On 02/09/13 22:15, Richard Biener wrote:


On Wed, Jul 3, 2013 at 2:25 PM, Kugan
 wrote:


On 17/06/13 18:33, Richard Biener wrote:



On Mon, 17 Jun 2013, Kugan wrote:
+/* Extract the value range of assigned exprassion for GIMPLE_ASSIGN
stmt.
+   If the extracted value range is valid, return true else return
+   false.  */
+static bool



[snip]




  for (i = 0; i < num_vr_values; i++)
if (vr_value[i])
  {
tree name = ssa_name (i);
if (POINTER_TYPE_P (name))
  continue;
if (vr_value[i].type == VR_RANGE
|| vr_value[i].type == VR_ANTI_RANGE)
  tree_ssa_set_value_range (name, tree_to_double_int
(vr_value[i].min), tree_to_double_int (vr_value[i].max),
vr_value[i].type
== VR_RANGE);
  }



Thanks Richard for taking time to review it.

I was doing something like what you are suggesting earlier but
noticed some
problems and that’s the reason why I changed.

For example, for the following testcase from the test suite,

unsigned long l = (unsigned long)-2;
unsigned short s;

int main () {
 long t = l + 1;
 s = l;
 if (s != (unsigned short) -2)
   abort ();
 exit (0);
}

with the following gimple stmts

main ()
{
 short unsigned int s.1;
 long unsigned int l.0;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
 l.0_2 = l;
 s.1_3 = (short unsigned int) l.0_2;
 s = s.1_3;
 if (s.1_3 != 65534)
   goto ;
 else
   goto ;
;;succ:   3
;;4

;;   basic block 3, loop depth 0
;;pred:   2
 abort ();
;;succ:

;;   basic block 4, loop depth 0
;;pred:   2
 exit (0);
;;succ:

}



has the following value range.

l.0_2: VARYING
s.1_3: [0, +INF]


   From zero/sign extension point of view, the variable s.1_3 is
expected to
have a value that will overflow (or varying) as this is what is
assigned to
a smaller variable. extract_range_from_assignment initially
calculates the
value range as VARYING but later changed to [0, +INF] by
extract_range_basic. What I need here is the value that will be
assigned
from the rhs expression and not the value that we will have with proper
assignment.



I don't understand this.  The relevant statement is

 s.1_3 = (short unsigned int) l.0_2;

right?  You have value-ranges for both s.1_3 and l.0_2 as above.  And
you clearly cannot optimize the truncation away (and if you could,
you wond't need value-range information for that fact).


This is true. But just by looking at the value range of s.1.3 we will
only see [0 +INF], as we are transferring directly from the lattice to
lhs its value range.

[0, +INF] here tells us  vrp_val_is_max and it is not
is_positive_overflow_infinity (or varying). Thats why we need to get the
value range of RHS expression which will tell us the actual range. We
can then use this range and see of we can fit it to lhs type without
truncation.



Well, my point is you want to look at the l.0_2 value-range for this.
Storing the l.0_2 value-range for s.1_3 is wrong.



Yes, tree SSA_NAME should have it's correct value range. But, assigning rhs
expression's value range is not totally wrong , it is just that it can be
conservative value range (please correct me if I am wrong here) in few
cases, as it can have wider range.


If it's a sign-changing conversion it can be surely wrong.



It is not sign-changing conversion. Rather, when we have rhs expression 
 value which is VR_VARYING it is set to [0, +INF]



i.e, in extract_range_from_assignment, if the value range is VR_VARYING, 
follwing is done

 if (vr->type == VR_VARYING)
 extract_range_basic (vr, stmt);

In extract_range_basic (when the value range is varying), when the 
following code executes, it changes VR_VARYING to [0, +INF],


 if (INTEGRAL_TYPE_P (type)
   && gimple_stmt_nonnegative_warnv_p (stmt, &sop))
 set_value_range_to_nonnegative (vr, type,
 sop || stmt_overflow_infinity (stmt));

This will happen only when we have VR_VARYING for the rhs expression. 
This is wrong from zero/sign extension elimination point of view as we 
cant rely on this converted value range.



Currently I am leaving this as varying so that we can decide whether to 
eliminate the zero/sign extension. This is not completely wrong.


unsigned short s;
s.1_3 = (short unsigned int) l.0_2;
l.0_2: VARYING
s.1_3: [0, +INF]

Similarly (extracted form a testcase)

unsigned char _4;
unsigned char _2;
unsigned char _5;

  _5 = _4 + _2;
value range extracted for expression (_4 + _2) 
extract_range_from_binary_expr is VARYING and
_5 has value range [0 +INF] or [0, 255] after 
set_value_range_to_nonnegative is done.




I can use the rhs value ran

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-10 Thread Kugan


On 10/09/13 22:47, Richard Biener wrote:

On Tue, 10 Sep 2013, Kugan wrote:


On 09/09/13 19:01, Richard Biener wrote:

On Mon, Sep 9, 2013 at 1:09 AM, Kugan 
wrote:


On 06/09/13 16:16, Richard Biener wrote:


On 9/3/13 2:15 PM, Kugan wrote:


Thanks Richard for reviewing.

On 02/09/13 22:15, Richard Biener wrote:


On Wed, Jul 3, 2013 at 2:25 PM, Kugan
 wrote:


On 17/06/13 18:33, Richard Biener wrote:



On Mon, 17 Jun 2013, Kugan wrote:
+/* Extract the value range of assigned exprassion for
GIMPLE_ASSIGN
stmt.
+   If the extracted value range is valid, return true else
return
+   false.  */
+static bool



[snip]




   for (i = 0; i < num_vr_values; i++)
 if (vr_value[i])
   {
 tree name = ssa_name (i);
 if (POINTER_TYPE_P (name))
   continue;
 if (vr_value[i].type == VR_RANGE
 || vr_value[i].type == VR_ANTI_RANGE)
   tree_ssa_set_value_range (name, tree_to_double_int
(vr_value[i].min), tree_to_double_int (vr_value[i].max),
vr_value[i].type
== VR_RANGE);
   }



Thanks Richard for taking time to review it.

I was doing something like what you are suggesting earlier but
noticed some
problems and that?s the reason why I changed.

For example, for the following testcase from the test suite,

unsigned long l = (unsigned long)-2;
unsigned short s;

int main () {
  long t = l + 1;
  s = l;
  if (s != (unsigned short) -2)
abort ();
  exit (0);
}

with the following gimple stmts

main ()
{
  short unsigned int s.1;
  long unsigned int l.0;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  l.0_2 = l;
  s.1_3 = (short unsigned int) l.0_2;
  s = s.1_3;
  if (s.1_3 != 65534)
goto ;
  else
goto ;
;;succ:   3
;;4

;;   basic block 3, loop depth 0
;;pred:   2
  abort ();
;;succ:

;;   basic block 4, loop depth 0
;;pred:   2
  exit (0);
;;succ:

}



has the following value range.

l.0_2: VARYING
s.1_3: [0, +INF]


From zero/sign extension point of view, the variable s.1_3 is
expected to
have a value that will overflow (or varying) as this is what is
assigned to
a smaller variable. extract_range_from_assignment initially
calculates the
value range as VARYING but later changed to [0, +INF] by
extract_range_basic. What I need here is the value that will be
assigned
from the rhs expression and not the value that we will have with
proper
assignment.



I don't understand this.  The relevant statement is

  s.1_3 = (short unsigned int) l.0_2;

right?  You have value-ranges for both s.1_3 and l.0_2 as above.
And
you clearly cannot optimize the truncation away (and if you could,
you wond't need value-range information for that fact).


This is true. But just by looking at the value range of s.1.3 we will
only see [0 +INF], as we are transferring directly from the lattice to
lhs its value range.

[0, +INF] here tells us  vrp_val_is_max and it is not
is_positive_overflow_infinity (or varying). Thats why we need to get
the
value range of RHS expression which will tell us the actual range. We
can then use this range and see of we can fit it to lhs type without
truncation.



Well, my point is you want to look at the l.0_2 value-range for this.
Storing the l.0_2 value-range for s.1_3 is wrong.



Yes, tree SSA_NAME should have it's correct value range. But, assigning
rhs
expression's value range is not totally wrong , it is just that it can be
conservative value range (please correct me if I am wrong here) in few
cases, as it can have wider range.


If it's a sign-changing conversion it can be surely wrong.



It is not sign-changing conversion. Rather, when we have rhs expression  value
which is VR_VARYING it is set to [0, +INF]


i.e, in extract_range_from_assignment, if the value range is VR_VARYING,
follwing is done
  if (vr->type == VR_VARYING)
  extract_range_basic (vr, stmt);

In extract_range_basic (when the value range is varying), when the following
code executes, it changes VR_VARYING to [0, +INF],

  if (INTEGRAL_TYPE_P (type)
&& gimple_stmt_nonnegative_warnv_p (stmt, &sop))
  set_value_range_to_nonnegative (vr, type,
  sop || stmt_overflow_infinity (stmt));

This will happen only when we have VR_VARYING for the rhs expression. This is
wrong from zero/sign extension elimination point of view as we cant rely on
this converted value range.


Currently I am leaving this as varying so that we can decide whether to
eliminate the zero/sign extension. This is not completely wrong.

unsigned short s;
s.1_3 = (short unsigned int) l.0_2;
l.0_2: VARYING
s.1_3: [0, +INF]


Note that [0, +INF] is the same as VARYING and [-INF, +INF] and VARYING for
l.0_2 is the same as [-INF, +INF].



I get it now. I will all the above as varying.


Similarly (extracted form a testcase)

unsigned char _4;

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-10 Thread Kugan


Thanks Jakub for the review.

On 10/09/13 23:10, Jakub Jelinek wrote:

On Tue, Sep 10, 2013 at 03:17:50PM +0200, Richard Biener wrote:

unsigned short s;
s.1_3 = (short unsigned int) l.0_2;
l.0_2: VARYING
s.1_3: [0, +INF]


Note that [0, +INF] is the same as VARYING and [-INF, +INF] and VARYING for
l.0_2 is the same as [-INF, +INF].


Yeah, I don't see much value in differentiating between VR_VARYING and
VR_RANGE [TYPE_MIN_VALUE, TYPE_MAX_VALUE] (perhaps a question is what to do
for types with precisions different from TYPE_MODE's bitsize, if we should
store for VARYING/UNDEFINED a range of all possible values in the mode).
Unsigned type will be always >= 0, even if it is VARYING or UNDEFINED.
What is the valid bit good for?  Is it meant just for integrals with >
2*HOST_BITS_PER_WIDE_INT precision, which we can't represent in double_int?
I'd say we just don't want to keep track on the value ranges for those.


Ok, I will remove the valid.


And, do we need to distinguish between VR_RANGE and VR_ANTI_RANGE?
I mean, can't we always store the range in VR_RANGE format?  Instead of
-[3,7] we'd store [8,2] and define that if the min double_int is bigger than
max double_int, then it is [min,+infinity] merged with [-infinity,max] range
(i.e. -[max+1,min-1])?



Ok, I will change this too.


Thanks,
Kugan



Micha just suggested

   union vrp_info_type {
 /* Pointer attributes used for alias analysis.  */
 struct GTY ((tag ("0"))) ptr_info_def *ptr_info;
 /* Value range attributes used for zero/sign extension elimination.
*/
 struct GTY ((tag ("1"))) range_info_def *range_info;
   } GTY ((desc ("%1.typed.type ? !POINTER_TYPE_P (TREE_TYPE


Why not TREE_TYPE(&%1) here and why the (tree) cast?

Jakub

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-12 Thread Kugan



Here is the modified patch that addresses the comments form Richard and 
Jakub.


This also includes:
1. Added TDF_RANGE to dump range_info
2. Moved enum value_range_type to tree.h (Is this the right place?)

Bootstrapped and regtested for x86_64-unknown-linux-gnu and arm-none 
linux-gnueabi.


Is this Ok,

Thanks,
Kugan

+2013-09-12  Kugan Vivekanandarajah  
+
+   * cfgexpand.c (maybe_dump_rtl_for_gimple_stmt) : Add range to dump.
+   * gimple-pretty-print.c (print_double_int) : New function.
+   * gimple-pretty-print.c (dump_gimple_phi) : Dump range info.
+   * (pp_gimple_stmt_1) : Likewise.
+   * tree-ssa-alias.c (dump_alias_info) : Check pointer type.
+   * tree-ssa-copy.c (fini_copy_prop) : Check pointer type and copy
+   range info.
+   * tree-ssanames.c (make_ssa_name_fn) : Check pointer type in
+   initialize.
+   * (set_range_info) : New function.
+   * (get_range_info) : Likewise.
+   * (duplicate_ssa_name_range_info) : Likewise.
+   * (duplicate_ssa_name_fn) : Check pointer type and call correct
+   duplicate function.
+   * tree-vrp.c (vrp_finalize): Call set_range_info to upddate
+   value range of SSA_NAMEs.
+   * tree.h (SSA_NAME_PTR_INFO) : changed to access via union
+   * tree.h (SSA_NAME_RANGE_INFO) : New macro
+





diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index a7d9170..f3fdd49 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1820,7 +1820,7 @@ maybe_dump_rtl_for_gimple_stmt (gimple stmt, rtx since)
 {
   fprintf (dump_file, "\n;; ");
   print_gimple_stmt (dump_file, stmt, 0,
-			 TDF_SLIM | (dump_flags & TDF_LINENO));
+			 TDF_SLIM | TDF_RANGE | (dump_flags & TDF_LINENO));
   fprintf (dump_file, "\n");
 
   print_rtl (dump_file, since ? NEXT_INSN (since) : since);
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index 77f5de6..354dd92 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -82,9 +82,10 @@ enum tree_dump_index
 #define TDF_CSELIB	(1 << 23)	/* Dump cselib details.  */
 #define TDF_SCEV	(1 << 24)	/* Dump SCEV details.  */
 #define TDF_COMMENT	(1 << 25)	/* Dump lines with prefix ";;"  */
-#define MSG_OPTIMIZED_LOCATIONS  (1 << 26)  /* -fopt-info optimized sources */
-#define MSG_MISSED_OPTIMIZATION  (1 << 27)  /* missed opportunities */
-#define MSG_NOTE (1 << 28)  /* general optimization info */
+#define TDF_RANGE   (1 << 26)   /* Dump range information.  */ 
+#define MSG_OPTIMIZED_LOCATIONS  (1 << 27)  /* -fopt-info optimized sources */
+#define MSG_MISSED_OPTIMIZATION  (1 << 28)  /* missed opportunities */
+#define MSG_NOTE (1 << 29)  /* general optimization info */
 #define MSG_ALL (MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION \
  | MSG_NOTE)
 
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 1d40680..af1a13d 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1581,6 +1581,24 @@ dump_gimple_asm (pretty_printer *buffer, gimple gs, int spc, int flags)
 }
 }
 
+/* Dumps double_int CST to BUFFER.  */
+
+static void
+print_double_int (pretty_printer *buffer, double_int cst)
+{
+  tree node = double_int_to_tree (integer_type_node, cst);
+  if (TREE_INT_CST_HIGH (node) == 0)
+pp_printf (buffer, HOST_WIDE_INT_PRINT_UNSIGNED, TREE_INT_CST_LOW (node));
+  else if (TREE_INT_CST_HIGH (node) == -1
+   && TREE_INT_CST_LOW (node) != 0)
+pp_printf (buffer, "-" HOST_WIDE_INT_PRINT_UNSIGNED,
+   -TREE_INT_CST_LOW (node));
+  else
+pp_printf (buffer, "0x%" HOST_LONG_FORMAT "x%" HOST_LONG_FORMAT "x",
+   (unsigned HOST_WIDE_INT) TREE_INT_CST_HIGH (node),
+   (unsigned HOST_WIDE_INT) TREE_INT_CST_LOW (node));
+}
+
 
 /* Dump a PHI node PHI.  BUFFER, SPC and FLAGS are as in pp_gimple_stmt_1.
The caller is responsible for calling pp_flush on BUFFER to finalize
@@ -1609,6 +1627,27 @@ dump_gimple_phi (pretty_printer *buffer, gimple phi, int spc, int flags)
   pp_string (buffer, "# ");
 }
 
+  if ((flags & TDF_RANGE)
+  && !POINTER_TYPE_P (TREE_TYPE (lhs))
+  && SSA_NAME_RANGE_INFO (lhs))
+{
+  double_int min, max;
+  value_range_type range_type;
+  get_range_info (lhs, min, max, range_type);
+  if (range_type == VR_VARYING)
+pp_printf (buffer, "# RANGE  VR_VARYING");
+  else if (range_type == VR_RANGE || range_type == VR_ANTI_RANGE)
+  {
+pp_printf (buffer, "# RANGE ");
+pp_printf (buffer, "%s[", range_type == VR_RANGE ? "" : "~");
+print_double_int (buffer, min);
+pp_printf (buffer, ", ");
+print_double_int (buffer, max);
+pp_printf (buffer, "]");
+newline_and_

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-15 Thread Kugan


Hi,

Updated the patch to the latest changes in trunk that splits tree.h. I 
also noticed an error in printing double_int and fixed it.


Is this OK?

Thanks,
Kugan


+2013-09-12  Kugan Vivekanandarajah  
+
+   * cfgexpand.c (maybe_dump_rtl_for_gimple_stmt) : Add range to dump.
+   * gimple-pretty-print.c (print_double_int) : New function.
+   * gimple-pretty-print.c (dump_gimple_phi) : Dump range info.
+   * (pp_gimple_stmt_1) : Likewise.
+   * tree-ssa-alias.c (dump_alias_info) : Check pointer type.
+   * tree-ssa-copy.c (fini_copy_prop) : Check pointer type and copy
+   range info.
+   * tree-ssanames.c (make_ssa_name_fn) : Check pointer type in
+   initialize.
+   * (set_range_info) : New function.
+   * (get_range_info) : Likewise.
+   * (duplicate_ssa_name_range_info) : Likewise.
+   * (duplicate_ssa_name_fn) : Check pointer type and call correct
+   duplicate function.
+   * tree-vrp.c (vrp_finalize): Call set_range_info to upddate
+   value range of SSA_NAMEs.
+   * tree.h (SSA_NAME_PTR_INFO) : changed to access via union
+   * tree.h (SSA_NAME_RANGE_INFO) : New macro
+


diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 88e48c2..302188e 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1820,7 +1820,7 @@ maybe_dump_rtl_for_gimple_stmt (gimple stmt, rtx since)
 {
   fprintf (dump_file, "\n;; ");
   print_gimple_stmt (dump_file, stmt, 0,
-			 TDF_SLIM | (dump_flags & TDF_LINENO));
+			 TDF_SLIM | TDF_RANGE | (dump_flags & TDF_LINENO));
   fprintf (dump_file, "\n");
 
   print_rtl (dump_file, since ? NEXT_INSN (since) : since);
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index ddc770a..8896d89 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -83,9 +83,10 @@ enum tree_dump_index
 #define TDF_CSELIB	(1 << 23)	/* Dump cselib details.  */
 #define TDF_SCEV	(1 << 24)	/* Dump SCEV details.  */
 #define TDF_COMMENT	(1 << 25)	/* Dump lines with prefix ";;"  */
-#define MSG_OPTIMIZED_LOCATIONS  (1 << 26)  /* -fopt-info optimized sources */
-#define MSG_MISSED_OPTIMIZATION  (1 << 27)  /* missed opportunities */
-#define MSG_NOTE (1 << 28)  /* general optimization info */
+#define TDF_RANGE   (1 << 26)   /* Dump range information.  */
+#define MSG_OPTIMIZED_LOCATIONS  (1 << 27)  /* -fopt-info optimized sources */
+#define MSG_MISSED_OPTIMIZATION  (1 << 28)  /* missed opportunities */
+#define MSG_NOTE (1 << 29)  /* general optimization info */
 #define MSG_ALL (MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION \
  | MSG_NOTE)
 
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 01a1ab5..6531010 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1600,6 +1600,25 @@ dump_gimple_asm (pretty_printer *buffer, gimple gs, int spc, int flags)
 }
 }
 
+/* Dumps double_int CST to BUFFER.  */
+
+static void
+print_double_int (pretty_printer *buffer, double_int cst)
+{
+  tree node = double_int_to_tree (integer_type_node, cst);
+  if (TREE_INT_CST_HIGH (node) == 0)
+pp_printf (buffer, HOST_WIDE_INT_PRINT_UNSIGNED, TREE_INT_CST_LOW (node));
+  else if (TREE_INT_CST_HIGH (node) == -1
+   && TREE_INT_CST_LOW (node) != 0)
+pp_printf (buffer, "-" HOST_WIDE_INT_PRINT_UNSIGNED,
+   -TREE_INT_CST_LOW (node));
+  else
+sprintf (pp_buffer (buffer)->digit_buffer,
+ HOST_WIDE_INT_PRINT_DOUBLE_HEX,
+ (unsigned HOST_WIDE_INT) TREE_INT_CST_HIGH (node),
+ (unsigned HOST_WIDE_INT) TREE_INT_CST_LOW (node));
+}
+
 
 /* Dump a PHI node PHI.  BUFFER, SPC and FLAGS are as in pp_gimple_stmt_1.
The caller is responsible for calling pp_flush on BUFFER to finalize
@@ -1628,6 +1647,27 @@ dump_gimple_phi (pretty_printer *buffer, gimple phi, int spc, int flags)
   pp_string (buffer, "# ");
 }
 
+  if ((flags & TDF_RANGE)
+  && !POINTER_TYPE_P (TREE_TYPE (lhs))
+  && SSA_NAME_RANGE_INFO (lhs))
+{
+  double_int min, max;
+  value_range_type range_type;
+  get_range_info (lhs, min, max, range_type);
+  if (range_type == VR_VARYING)
+pp_printf (buffer, "# RANGE  VR_VARYING");
+  else if (range_type == VR_RANGE || range_type == VR_ANTI_RANGE)
+  {
+pp_printf (buffer, "# RANGE ");
+pp_printf (buffer, "%s[", range_type == VR_RANGE ? "" : "~");
+print_double_int (buffer, min);
+pp_printf (buffer, ", ");
+print_double_int (buffer, max);
+pp_printf (buffer, "]");
+newline_and_indent (buffer, spc);
+  }
+}
+
   if (flags & TDF_RAW)
   dump_gimple_fmt (buffer, spc, flags, "%G <%T, ", phi,

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-17 Thread Kugan



Thanks Richard for the review.
On 16/09/13 23:43, Richard Biener wrote:

On Mon, 16 Sep 2013, Kugan wrote:


Hi,

Updated the patch to the latest changes in trunk that splits tree.h. I also
noticed an error in printing double_int and fixed it.

Is this OK?


print_gimple_stmt (dump_file, stmt, 0,
-TDF_SLIM | (dump_flags & TDF_LINENO));
+TDF_SLIM | TDF_RANGE | (dump_flags &
TDF_LINENO));

this should be (dump_flags & (TDF_LINENO|TDF_RANGE)) do not always
dump range info.  I'd have simply re-used TDF_ALIAS (and interpret
it as SSA annotation info), adding -range in dump file modifiers
is ok with me.

+static void
+print_double_int (pretty_printer *buffer, double_int cst)
+{
+  tree node = double_int_to_tree (integer_type_node, cst);
+  if (TREE_INT_CST_HIGH (node) == 0)
+pp_printf (buffer, HOST_WIDE_INT_PRINT_UNSIGNED, TREE_INT_CST_LOW
(node));
+  else if (TREE_INT_CST_HIGH (node) == -1
+   && TREE_INT_CST_LOW (node) != 0)
+pp_printf (buffer, "-" HOST_WIDE_INT_PRINT_UNSIGNED,
+   -TREE_INT_CST_LOW (node));
+  else
+sprintf (pp_buffer (buffer)->digit_buffer,
+ HOST_WIDE_INT_PRINT_DOUBLE_HEX,
+ (unsigned HOST_WIDE_INT) TREE_INT_CST_HIGH (node),
+ (unsigned HOST_WIDE_INT) TREE_INT_CST_LOW (node));

using sprintf here looks like a layering violation to me.  You
probably want to factor out code from the INTEGER_CST handling
of tree-pretty-print.c:dump_generic_node into a pp_double_int
function in pretty-print.[ch] instead.

@@ -1628,6 +1647,27 @@ dump_gimple_phi (pretty_printer *buffer, gimple
phi, int spc, int flags)
pp_string (buffer, "# ");
  }

+  if ((flags & TDF_RANGE)
+  && !POINTER_TYPE_P (TREE_TYPE (lhs))
+  && SSA_NAME_RANGE_INFO (lhs))
+{
+  double_int min, max;
+  value_range_type range_type;

I realize the scheme is pre-existing but can you try factoring
out the dumping of SSA_NAME_PTR_INFO / SSA_NAME_RANGE_INFO into
a separate routine that can be shared by dump_gimple_phi and
pp_gimple_stmt_1?

+get_range_info (tree name, double_int &min, double_int &max,
+enum value_range_type &range_type)
+{
+  gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
+  gcc_assert (TREE_CODE (name) == SSA_NAME);
+  range_info_def *ri = SSA_NAME_RANGE_INFO (name);

the TREE_CODE (name) == SSA_NAME assert is redundant with the
tree-checking performed by SSA_NAME_RANGE_INFO.  Likewise in
the other functions.

+void
+get_range_info (tree name, double_int &min, double_int &max,
+enum value_range_type &range_type)

I'm not sure we want to use references.  Well - first time.

+  /* If min > max, it is  VR_ANTI_RANGE.  */
+  if (ri->min.scmp (ri->max) == 1)
+{

I think that's wrong and needs to be conditional on TYPE_UNSIGNED
of the SSA name.

+  else if (vr_value[i]->type == VR_ANTI_RANGE)
+{
+  /* VR_ANTI_RANGE ~[min, max] is encoded compactly as
+ [max + 1, min - 1] without additional attributes.
+ When min value > max value, we know that it is
+ VR_ANTI_RANGE; it is VR_RANGE othewise.  */
+  set_range_info (name,
+  tree_to_double_int (vr_value[i]->max)
+  + double_int_one,
+  tree_to_double_int (vr_value[i]->min)
+  - double_int_one);

there is a complication for when max + 1 or min - 1 overflow - those
should be non-canonical ranges I think, but double-check this
(check set_and_canonicalize_value_range).

I have now added a check for min == 0 for unsigned type. AFAIK, For 
double_int type, this is the only case we should check.


I have also made the other changes you have asked me to do. Please find 
the modified patch and ChangeLog.


Bootstrapped and regtested for x86_64-unknown-linux-gnu.  Is this OK.

Thanks,
Kugan


+2013-09-17  Kugan Vivekanandarajah  
+
+   * gimple-pretty-print.c (dump_ssaname_info) : New function.
+   * gimple-pretty-print.c (dump_gimple_phi) : Dump range info.
+   * (pp_gimple_stmt_1) : Likewise.
+   * tree-pretty-print.c (dump_intger_cst_node) : New function.
+   * (dump_generic_node) : Call dump_intger_cst_node for INTEGER_CST.
+   * tree-ssa-alias.c (dump_alias_info) : Check pointer type.
+   * tree-ssa-copy.c (fini_copy_prop) : Check pointer type and copy
+   range info.
+   * tree-ssanames.c (make_ssa_name_fn) : Check pointer type in
+   initialize.
+   * (set_range_info) : New function.
+   * (get_range_info) : Likewise.
+   * (duplicate_ssa_name_range_info) : Likewise.
+   * (duplicate_ssa_name_fn) : Check pointer type and call correct
+   duplicate function.
+   * tree-vrp.c (vrp_finalize): Call set

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-18 Thread Kugan


Thanks Richard for the review.

On 18/09/13 18:55, Richard Biener wrote:

On Wed, 18 Sep 2013, Kugan wrote:



Thanks Richard for the review.
On 16/09/13 23:43, Richard Biener wrote:

On Mon, 16 Sep 2013, Kugan wrote:



[Snip]



+2013-09-17  Kugan Vivekanandarajah  
+
+   * gimple-pretty-print.c (dump_ssaname_info) : New function.
+   * gimple-pretty-print.c (dump_gimple_phi) : Dump range info.
+   * (pp_gimple_stmt_1) : Likewise.


ChangeLog should be formated

* gimple-pretty-print.c (dump_ssaname_info): New function.
(dump_gimple_phi): Call it.
(pp_gimple_stmt_1: Likewise.
* tree-pretty-print.c (dump_intger_cst_node): New function.
...


+pp_printf (buffer, "# RANGE ");
+pp_printf (buffer, "%s[", range_type == VR_RANGE ? "" : "~");
+dump_intger_cst_node (buffer,
+  double_int_to_tree (TREE_TYPE (node),
min));

I was asking for a pp_double_int, not a dump_integer_cst_node function
as now you are creating a tree node in GC memory just to dump its
contents ...  pp_double_int needs to be passed information on the
signedness of the value.  It would roughly look like



Sorry, I understood it wrong.


pp_double_int (pretty_printer *pp, double_int d, bool uns)
{
   if (d.fits_shwi ())
 pp_wide_integer (pp, d.low);
   else if (d.fits_uhwi ())
 pp_unsigned_wide_integer (pp, d.low);
   else
 {
unsigned HOST_WIDE_INT low = d.low;
HOST_WIDE_INT high = d.high;
   if (!uns && d.is_negative ())
 {
   pp_minus (pp);
   high = ~high + !low;
   low = -low;
 }
   /* Would "%x%0*x" or "%x%*0x" get zero-padding on all
  systems?  */
   sprintf (pp_buffer (pp)->digit_buffer,
HOST_WIDE_INT_PRINT_DOUBLE_HEX,
(unsigned HOST_WIDE_INT) high, low);
   pp_string (pp, pp_buffer (pp)->digit_buffer);
 }
}

and the INTEGER_CST case would use it like

 if (TREE_CODE (TREE_TYPE (node)) == POINTER_TYPE)
   ...
 else
   pp_double_int (buffer, tree_to_double_int (node),
 TYPE_UNSIGNED (TREE_TYPE (node)));


+enum value_range_type
+get_range_info (tree name, double_int *min, double_int *max)
+{

ah, I see you have already made an appropriate change here.

+ /* Check for an empty range with minimum zero (of type
+ unsigned) that will wraparround.  */
+  if (!(TYPE_UNSIGNED (TREE_TYPE (name))
+  && integer_zerop (vr_value[i]->min)))
+set_range_info (name,
+tree_to_double_int (vr_value[i]->max)
++ double_int_one,
+tree_to_double_int (vr_value[i]->min)
+- double_int_one);

Yeah, I think ~[0,0] is the only anti-range that can be represented as
range that we keep.  So maybe

if (TYPE_UNSIGNED (TREE_TYPE (name))
&& integer_zerop (vr_value[i]->min)
 && integer_zerop (vr_value[i]->max))
   set_range_info (name,
   double_int_one,
   double_int::max_value
  (TYPE_PRECISION (TREE_TYPE (name)), true));
else
   set_range_info (name,
+tree_to_double_int (vr_value[i]->max)
++ double_int_one,
+tree_to_double_int (vr_value[i]->min)
+- double_int_one);

to preserve ~[0,0] which looks like an important case when for example
looking at a divisor in a division.

Ok with those changes.



I have changed all of the above in the attached patch and ChangeLog. If 
this is OK, could someone please commit it for me. I don’t have access 
to commit it.


Bootstrapped and regtested on x86_64-unknown-linux-gnu and arm-none 
linux-gnueabi.


Thanks,
Kugan


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ad70c24..6331636 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,24 @@
+2013-09-19  Kugan Vivekanandarajah  
+
+	* gimple-pretty-print.c (dump_ssaname_info) : New function.
+	* gimple-pretty-print.c (dump_gimple_phi) : Call dump_ssaname_info.
+	* (pp_gimple_stmt_1) : Likewise.
+	* tree-pretty-print.c (pp_double_int) : New function.
+	* (dump_generic_node) : Call pp_double_int.
+	* tree-ssa-alias.c (dump_alias_info) : Check pointer type.
+	* tree-ssanames.c (make_ssa_name_fn) : Check pointer type in
+	initialize.
+	* (set_range_info) : New function.
+	* (get_range_info) : Likewise.
+	* (duplicate_ssa_name_range_info) : Likewise.
+	* (duplicate_ssa_name_fn) : Check pointer type and call
+	duplicate_ssa_name_range_info.
+	*

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-24 Thread Kugan


On 24/09/13 19:23, Richard Biener wrote:

On Mon, Sep 23, 2013 at 10:34 PM, Eric Botcazou  wrote:

I have committed it for you (rev 202831), with a few modifications
(ChangeLog formatting, typos).
Here is what I have committed:

2013-09-23  Kugan Vivekanandarajah  

 * gimple-pretty-print.c (dump_ssaname_info): New function.
 (dump_gimple_phi): Call it.
 (pp_gimple_stmt_1): Likewise.
 * tree-core.h (tree_ssa_name): New union ssa_name_info_type field.
 (range_info_def): Declare.
 * tree-pretty-print.c (pp_double_int): New function.
 (dump_generic_node): Call it.
 * tree-pretty-print.h (pp_double_int): Declare.
 * tree-ssa-alias.c (dump_alias_info): Check pointer type.
 * tree-ssanames.h (range_info_def): New structure.
 (value_range_type): Move definition here.
 (set_range_info, value_range_type, duplicate_ssa_name_range_info):
 Declare.
 * tree-ssanames.c (make_ssa_name_fn): Check pointer type at
 initialization.
 (set_range_info): New function.
 (get_range_info): Likewise.
 (duplicate_ssa_name_range_info): Likewise.
 (duplicate_ssa_name_fn): Check pointer type and call
 duplicate_ssa_name_range_info.
 * tree-ssa-copy.c (fini_copy_prop): Likewise.
 * tree-vrp.c (value_range_type): Remove definition, now in
 tree-ssanames.h.
 (vrp_finalize): Call set_range_info to update value range of
 SSA_NAMEs.
 * tree.h (SSA_NAME_PTR_INFO): Macro changed to access via union.
 (SSA_NAME_RANGE_INFO): New macro.


Nice patch, but the formatting is totally wrong wrt spaces, please reformat
using 2-space indentation and 8-space TABs, as already used in the files.



I am looking at everything and will send a patch to fix that.


The patch has also introduced 2 regressions in Ada:

 === acats tests ===
FAIL:   c37211b
FAIL:   c37211c

 === acats Summary ===
# of expected passes2318
# of unexpected failures2



I am sorry I missed this as I didnt test ada. I wrongly assumed that all 
the frontends are enabled by dafault.




Program received signal SIGSEGV, Segmentation fault.
vrp_finalize () at /home/eric/svn/gcc/gcc/tree-vrp.c:9458
9458  if (POINTER_TYPE_P (TREE_TYPE (name))
(gdb) bt


I'm testing a trivial patch to fix that.

I think the return value of ssa_name () (i.e. name) can be NULL and it 
has to be checked for NULL. In tree-vrp.c it is not checked in some 
other places related to debugging. In other places (eg. in 
tree-ssa-pre.c) there are checks .


Thanks for looking into it and I will wait for your fix.

Thanks,
Kugan



Richard.


#0  vrp_finalize () at /home/eric/svn/gcc/gcc/tree-vrp.c:9458
#1  execute_vrp () at /home/eric/svn/gcc/gcc/tree-vrp.c:9583
#2  (anonymous namespace)::pass_vrp::execute (this=)
 at /home/eric/svn/gcc/gcc/tree-vrp.c:9673
#3  0x00c52c9a in execute_one_pass (pass=pass@entry=0x22e2210)
 at /home/eric/svn/gcc/gcc/passes.c:2201
#4  0x00c52e76 in execute_pass_list (pass=0x22e2210)
 at /home/eric/svn/gcc/gcc/passes.c:2253
#5  0x00c52e88 in execute_pass_list (pass=0x22e04d0)
 at /home/eric/svn/gcc/gcc/passes.c:2254
#6  0x009b9c49 in expand_function (node=0x76d12e40)
 at /home/eric/svn/gcc/gcc/cgraphunit.c:1750
#7  0x009bbc17 in expand_all_functions ()
 at /home/eric/svn/gcc/gcc/cgraphunit.c:1855
#8  compile () at /home/eric/svn/gcc/gcc/cgraphunit.c:2192
#9  0x009bc1fa in finalize_compilation_unit ()
 at /home/eric/svn/gcc/gcc/cgraphunit.c:2269
#10 0x006681b5 in gnat_write_global_declarations ()
 at /home/eric/svn/gcc/gcc/ada/gcc-interface/utils.c:5630
#11 0x00d4577d in compile_file ()
 at /home/eric/svn/gcc/gcc/toplev.c:560
#12 0x00d4750a in do_compile () at
/home/eric/svn/gcc/gcc/toplev.c:1891
#13 toplev_main (argc=14, argv=0x7fffdca8)
 at /home/eric/svn/gcc/gcc/toplev.c:1967
#14 0x76f2a23d in __libc_start_main () from /lib64/libc.so.6
#15 0x00635381 in _start () at ../sysdeps/x86_64/elf/start.S:113
(gdb) p name
$1 = (tree) 0x0


--
Eric Botcazou

[ARM, PR58578] Split shift di patterns

2013-10-01 Thread Kugan

Hi,

I am attaching a patch that reverts Split shift di patterns (r197527) as
it introduced PR58578. I am also attaching a patch to add a testcase
based on this failiures.

No regression on qemu for arm-none-eabi and new testcase now passes.

Is this OK?

Thanks,
Kugan
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7c9a6c5..abc545f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,15 @@
+2013-10-01  Kugan Vivekanandarajah  
+
+   PR target/58578
+   Revert
+   2013-04-05  Greta Yorsh  
+   * config/arm/arm.md (arm_ashldi3_1bit):  define_insn into
+   define_insn_and_split.
+   (arm_ashrdi3_1bit,arm_lshrdi3_1bit): Likewise.
+   (shiftsi3_compare): New pattern.
+   (rrx): New pattern.
+   * config/arm/unspecs.md (UNSPEC_RRX): New.
+
 2013-09-30  Richard Sandiford  
 
* vec.h (vec_prefix, vec): Prefix member names with "m_".
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 3b22081..4b9f991 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2013-10-01  Kugan Vivekanandarajah  
+
+   PR Target/58578
+   * gcc.target/arm/pr58578.c: New test.
+
 2013-09-30  Jakub Jelinek  
 
PR middle-end/58564
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index b094cff..e8d5464 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3867,26 +3867,13 @@
   "
 )
 
-(define_insn_and_split "arm_ashldi3_1bit"
+(define_insn "arm_ashldi3_1bit"
   [(set (match_operand:DI0 "s_register_operand" "=r,&r")
 (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
(const_int 1)))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT"
-  "#"   ; "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  "&& reload_completed"
-  [(parallel [(set (reg:CC CC_REGNUM)
-  (compare:CC (ashift:SI (match_dup 1) (const_int 1))
-   (const_int 0)))
- (set (match_dup 0) (ashift:SI (match_dup 1) (const_int 1)))])
-   (set (match_dup 2) (plus:SI (plus:SI (match_dup 3) (match_dup 3))
-  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
-  {
-operands[2] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[3] = gen_highpart (SImode, operands[1]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-  }
+  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
   [(set_attr "conds" "clob")
(set_attr "length" "8")
(set_attr "type" "multiple")]
@@ -3964,43 +3951,18 @@
   "
 )
 
-(define_insn_and_split "arm_ashrdi3_1bit"
+(define_insn "arm_ashrdi3_1bit"
   [(set (match_operand:DI  0 "s_register_operand" "=r,&r")
 (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
  (const_int 1)))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT"
-  "#"   ; "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  "&& reload_completed"
-  [(parallel [(set (reg:CC CC_REGNUM)
-   (compare:CC (ashiftrt:SI (match_dup 3) (const_int 1))
-   (const_int 0)))
-  (set (match_dup 2) (ashiftrt:SI (match_dup 3) (const_int 1)))])
-   (set (match_dup 0) (unspec:SI [(match_dup 1)
-  (reg:CC_C CC_REGNUM)]
- UNSPEC_RRX))]
-  {
-operands[2] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[3] = gen_highpart (SImode, operands[1]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-  }
+  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
   [(set_attr "conds" "clob")
(set_attr "length" "8")
(set_attr "type" "multiple")]
 )
 
-(define_insn "*rrx"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
-(unspec:SI [(match_operand:SI 1 "s_register_operand" "r")
-(reg:CC_C CC_REGNUM)]
-   UNSPEC_RRX))]
-  "TARGET_32BIT"
-  "mov\\t%0, %1, rrx"
-  [(set_attr "conds" "use")
-   (set_attr "type" "mov_shift")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
(ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4070,27 +4032,13 @@
   "
 )
 
-(define_insn_and_split "arm_lshrdi3_1bit"
+(define_insn "arm_lshrdi3_1bit"
   [(set (match_operand:DI  0 "s_register_operand" "=r,&r&q

Re: [PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-10-08 Thread Kugan

Ping~

Thanks,
Kugan

+2013-09-25  Kugan Vivekanandarajah  
+
+   * dojump.c (do_compare_and_jump): Generate rtl without
+   zero/sign extension if redundant.
+   * cfgexpand.c (expand_gimple_stmt_1): Likewise.
+   * gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New
+   function.
+   * gimple.h (gimple_assign_is_zero_sign_ext_redundant) : Declare.
+


On 26/09/13 18:04, Kugan Vivekanandarajah wrote:
> Hi,
> 
> This is the updated patch for expanding gimple stmts without zer/sign
> extensions when it is safe to do that. This is based on the
>  latest changes to propagating value range information to SSA_NAMEs
> and addresses review comments from Eric.
> 
> Bootstrapped and regtested on x86_64-unknown-linux-gnu and arm-none
> linux-gnueabi. Is this OK ?
> 
> Thanks,
> Kugan
> 

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 88e48c2..6a22f8b 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2311,6 +2311,20 @@ expand_gimple_stmt_1 (gimple stmt)
 
 	if (temp == target)
 	  ;
+	/* If the value in SUBREG of temp fits that SUBREG (does not
+	   overflow) and is assigned to target SUBREG of the same mode
+	   without sign convertion, we can skip the SUBREG
+	   and extension.  */
+	else if (promoted
+		 && gimple_assign_is_zero_sign_ext_redundant (stmt)
+		 && (GET_CODE (temp) == SUBREG)
+		 && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (temp)))
+			 >= GET_MODE_PRECISION (GET_MODE (target)))
+		 && (GET_MODE (SUBREG_REG (target))
+			 == GET_MODE (SUBREG_REG (temp
+	  {
+		emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp));
+	  }
 	else if (promoted)
 	  {
 		int unsignedp = SUBREG_PROMOTED_UNSIGNED_P (target);
diff --git a/gcc/dojump.c b/gcc/dojump.c
index 3f04eac..9ea5995 100644
--- a/gcc/dojump.c
+++ b/gcc/dojump.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ggc.h"
 #include "basic-block.h"
 #include "tm_p.h"
+#include "gimple.h"
 
 static bool prefer_and_bit_test (enum machine_mode, int);
 static void do_jump_by_parts_greater (tree, tree, int, rtx, rtx, int);
@@ -1108,6 +1109,64 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code,
 
   type = TREE_TYPE (treeop0);
   mode = TYPE_MODE (type);
+
+  /* Is zero/sign extension redundant.  */
+  bool op0_ext_redundant = false;
+  bool op1_ext_redundant = false;
+
+  /* If promoted and the value in SUBREG of op0 fits (does not overflow),
+ it is a candidate for extension elimination.  */
+  if (GET_CODE (op0) == SUBREG && SUBREG_PROMOTED_VAR_P (op0))
+op0_ext_redundant =
+  gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT (treeop0));
+
+  /* If promoted and the value in SUBREG of op1 fits (does not overflow),
+ it is a candidate for extension elimination.  */
+  if (GET_CODE (op1) == SUBREG && SUBREG_PROMOTED_VAR_P (op1))
+op1_ext_redundant =
+  gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT (treeop1));
+
+  /* If zero/sign extension is redundant, generate RTL
+ for operands without zero/sign extension.  */
+  if ((op0_ext_redundant || TREE_CODE (treeop0) == INTEGER_CST)
+  && (op1_ext_redundant || TREE_CODE (treeop1) == INTEGER_CST))
+{
+  if ((TREE_CODE (treeop1) == INTEGER_CST)
+	  && (!mode_signbit_p (GET_MODE (op1), op1)))
+	{
+	  /* First operand is constant and signbit is not set (not
+	 represented in RTL as a negative constant).  */
+	  rtx new_op0 = gen_reg_rtx (GET_MODE (SUBREG_REG (op0)));
+	  emit_move_insn (new_op0, SUBREG_REG (op0));
+	  op0 = new_op0;
+	}
+  else if ((TREE_CODE (treeop0) == INTEGER_CST)
+	   && (!mode_signbit_p (GET_MODE (op0), op0)))
+	{
+	  /* Other operand is constant and signbit is not set (not
+	 represented in RTL as a negative constant).  */
+	  rtx new_op1 = gen_reg_rtx (GET_MODE (SUBREG_REG (op1)));
+
+	  emit_move_insn (new_op1, SUBREG_REG (op1));
+	  op1 = new_op1;
+	}
+  else if ((TREE_CODE (treeop0) != INTEGER_CST)
+	   && (TREE_CODE (treeop1) != INTEGER_CST)
+	   && (GET_MODE (op0) == GET_MODE (op1))
+	   && (GET_MODE (SUBREG_REG (op0)) == GET_MODE (SUBREG_REG (op1
+	{
+	  /* Compare registers fits SUBREG and of the
+	 same mode.  */
+	  rtx new_op0 = gen_reg_rtx (GET_MODE (SUBREG_REG (op0)));
+	  rtx new_op1 = gen_reg_rtx (GET_MODE (SUBREG_REG (op1)));
+
+	  emit_move_insn (new_op0, SUBREG_REG (op0));
+	  emit_move_insn (new_op1, SUBREG_REG (op1));
+	  op0 = new_op0;
+	  op1 = new_op1;
+	}
+}
+
   if (TREE_CODE (treeop0) == INTEGER_CST
   && (TREE_CODE (treeop1) != INTEGER_CST
   || (GET_MODE_BITSIZE (mode)
diff --git a/gcc/gimple.c b/gcc/gimple.c
index 59fcf43..7bb93a6 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -200,6 +200,102 @@ gimple_call_re

[PING^2][PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-10-15 Thread Kugan

Hi Eric,

Can you please help to review this patch?
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00452.html

Thanks,
Kugan

> +2013-09-25  Kugan Vivekanandarajah  
> +
> + * dojump.c (do_compare_and_jump): Generate rtl without
> + zero/sign extension if redundant.
> + * cfgexpand.c (expand_gimple_stmt_1): Likewise.
> + * gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New
> + function.
> + * gimple.h (gimple_assign_is_zero_sign_ext_redundant) : Declare.
> +
> 
>

Re: [PING^2][PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-10-15 Thread Kugan

Thanks Richard for the review.

On 15/10/13 23:55, Richard Biener wrote:
> On Tue, 15 Oct 2013, Kugan wrote:
> 
>> Hi Eric,
>>
>> Can you please help to review this patch?
>> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00452.html
> 
> I think that gimple_assign_is_zero_sign_ext_redundant and its
> description is somewhat confused.  You seem to have two cases
> here, one being NOP_EXPR or CONVERT_EXPR that are in the IL
> because they are required for type correctness.  So,
> 

I have changed the name and the comments to make it more clear.

>long = (long) int
> 
> which usually expands to
> 
>(set:DI (sext:DI reg:SI))
> 
> you want to expand as
> 
>(set:DI (subreg:DI reg:SI))
> 
> where you have to be careful for modes smaller than word_mode.
> You don't seem to implement this optimization though (but
> the gimple_assign_is_zero_sign_ext_redundant talks about it).
> 


I am actually handling only the cases smaller than word_mode. If there
is any sign change, I dont do any changes. In the place RTL expansion is
done, I have added these condition as it is done for convert_move and
others.

For example, when an expression is evaluated and it's value is assigned
to variable of type short, the generated RTL would look something like
the following.

(set (reg:SI 110)
(zero_extend:SI (subreg:HI (reg:SI 117) 0)))

However, if during value range propagation, if we can say for certain
that the value of the expression which is present in register 117 is
within the limits of short and there is no sign conversion, we do not
need to perform the subreg and zero_extend; instead we can generate the
following RTl.

(set (reg:SI 110)
(reg:SI 117)))


> Second are promotions required by the target (PROMOTE_MODE)
> that do arithmetic on wider registers like for
> 
>   char = char + char
> 
> where they cannot do
> 
>   (set:QI (plus:QI reg:QI reg:QI))
> 
> but instead have, for example reg:SI only and you get
> 
>   (set:SI (plus:SI reg:SI reg:SI))
>   (set:SI ([sz]ext:SI (subreg:QI reg:SI)))
> 
> that you try to address with the cfgexpand hunk but I believe
> it doesn't work the way you do it.  That is because on GIMPLE
> you do not see SImode temporaries and thus no SImode value-ranges.
> Consider
> 
>   tem = (char) 255 + (char) 1;
> 
> which has a value-range of [0,0] but clearly when computed in
> SImode the value-range is [256, 256].  That is, VRP computes
> value-ranges in the expression type, not in some arbitrary
> larger type.
> 
> So what you'd have to do is take the value-ranges of the
> two operands of the plus and see whether the plus can overflow
> QImode when computed in SImode (for the example).
>

Yes. Instead of calculating the value ranges of the two operand in
SImode, What I am doing in this case is to look at the value range of
tem and if it is within [CHAR_MIN + 1, CHAR_MAX -1]. As you have
explained earlier, we cant rely on being within the [CHAR_MIN, CHAR_MAX]
as the range could have been modified to fit the LHS type. This ofcourse
will miss some of the cases where we can remove extensions but
simplifies the logic.

> [exposing the effect of PROMOTE_MODE earlier than at RTL expansion
> time may make this less awkward]

Please find the modified patch attached.


+2013-10-16  Kugan Vivekanandarajah  
+
+   * dojump.c (do_compare_and_jump): Generate rtl without
+   zero/sign extension if redundant.
+   * cfgexpand.c (expand_gimple_stmt_1): Likewise.
+   * gimple.c (gimple_is_rhs_value_fits_in_assign) : New
+   function.
+   * gimple.h (gimple_is_rhs_value_fits_in_assign) : Declare.
+

Thanks,
Kugan
> 
> Thanks,
> Richard.
> 
>> Thanks,
>> Kugan
>>
>>> +2013-09-25  Kugan Vivekanandarajah  
>>> +
>>> +   * dojump.c (do_compare_and_jump): Generate rtl without
>>> +   zero/sign extension if redundant.
>>> +   * cfgexpand.c (expand_gimple_stmt_1): Likewise.
>>> +   * gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New
>>> +   function.
>>> +   * gimple.h (gimple_assign_is_zero_sign_ext_redundant) : Declare.
>>> +
>>>
>>>

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 88e48c2..60869ce 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2311,6 +2311,20 @@ expand_gimple_stmt_1 (gimple stmt)
 
if (temp == target)
  ;
+   /* If the value in SUBREG of temp fits that SUBREG (does not
+  overflow) and is assigned to target SUBREG of the same mode
+  without sign conversion, we can skip the SUBREG
+  and extension.  */
+   else if (promoted
+&& gimple_is_rhs_value_fits_in_assign (stmt)
+&& (

Re: [PING^2][PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-10-22 Thread Kugan


>>>   tem = (char) 255 + (char) 1;
>>>
>>> which has a value-range of [0,0] but clearly when computed in
>>> SImode the value-range is [256, 256].  That is, VRP computes
>>> value-ranges in the expression type, not in some arbitrary
>>> larger type.
>>>
>>> So what you'd have to do is take the value-ranges of the
>>> two operands of the plus and see whether the plus can overflow
>>> QImode when computed in SImode (for the example).
>>>
Ok, I will handle it as you have suggested here.

> Not sure if I understand what you are saying here.  As for the above
> case
> 
>>>   tem = (char) 255 + (char) 1;
> 
> tem is always of type 'char' in GIMPLE (even if later promoted
> via PROMOTE_MODE) the value-range is a 'char' value-range and thus
> never will exceed [CHAR_MIN, CHAR_MAX].  The only way you can
> use that directly is if you can rely on undefined behavior
> happening for signed overflow - but if you argue that way you
> can simply _always_ drop the (sext:SI (subreg:QI part and you
> do not need value ranges for this.  For unsigned operations
> for example [250, 254] + [8, 10] will simply wrap to [3, 7]
> (if I got the math correct) which is inside your [CHAR_MIN + 1,
> CHAR_MAX - 1] but if performed in SImode you can get 259 and
> thus clearly you cannot drop the (zext:SI (subreg:QI parts.
> The same applies to signed types if you do not want to rely
> on signed overflow being undefined of course.
> 

Thanks for the explanation. I now get it and I will rework the patch.

Thanks,
Kugan

[ARM][PATCH] Fix testsuite testcase neon-vcond-[ltgt,unordered].c

2013-10-23 Thread Kugan

Hi,

arm testcases neon-vcond-ltgt.c and neon-vcond-unordered.c fails in
Linaro 4.8 branch. It is not reproducable with trunk but it can happen.
Both neon-vcond-ltgt.c and neon-vcond-unordered.c scans for vbsl
instruction, with other vector instructions. However, as per the comment
 for "neon_vbsl_internal" md pattern defined in neon.md, gcc can
generate vbsl or vbit or vbif depending on the register allocation.
Therfore, these testcases should scan for one of these three
instructions instead of just vbsl. I have updated the testcases to scan
vbsl or vbit or vbif now.

Is this OK?

Thanks,
Kugan

2013-10-23  Kugan Vivekanandarajah  
* gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif.
* gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif.



diff --git a/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c 
b/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c
index acb23a9..c8306e3 100644
--- a/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c
+++ b/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c
@@ -15,4 +15,4 @@ void foo (int ilast,float* w, float* w2)
 
 /* { dg-final { scan-assembler-times "vcgt\\.f32\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" 2 } } */
 /* { dg-final { scan-assembler "vorr\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
-/* { dg-final { scan-assembler "vbsl\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
+/* { dg-final { scan-assembler "vbsl|vbit|vbif\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c 
b/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c
index c3e448d..3bb67d3 100644
--- a/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c
+++ b/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c
@@ -16,4 +16,4 @@ void foo (int ilast,float* w, float* w2)
 /* { dg-final { scan-assembler "vcgt\\.f32\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
 /* { dg-final { scan-assembler "vcge\\.f32\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
 /* { dg-final { scan-assembler "vorr\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
-/* { dg-final { scan-assembler "vbsl\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
+/* { dg-final { scan-assembler "vbsl|vbit|vbif\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */

Re: [ARM][PATCH] Fix testsuite testcase neon-vcond-[ltgt,unordered].c

2013-10-24 Thread Kugan


> I can't seem to get it to fail on my checkout of the linaro 4.8 branch.
> I tried both arm-none-eabi and arm-none-linux-gnueabihf. What kind of
> options/configuration are needed to reproduce this? Also, what kind of
> assembly is produced when the testcase fails? It'd be nice to make sure
> that the allocator doesn't end up doing something sub-optimal and
> unnecessarily moving stuff around to satisfy the alternative constraints
> that produce the other bit-select variants.
> 

Hi Kyrill,

It happens for armv5te arm-none-linux-gnueabi. --with-mode=arm
--with-arch=armv5te --with-float=soft

You can also find the logs here in
http://cbuild.validation.linaro.org/build/gcc-linaro-4.8-2013.10/logs/armv7l-precise-cbuild461-calxeda02_21_00_precise_armel-armv5r2/

I changed neon-vcond-gt.c too.

Thanks,
Kugan

2013-10-23  Kugan Vivekanandarajah  

* gcc.target/arm/neon-vcond-gt.c: Scan for vbsl or vbit or vbif.
* gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif.
* gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif.





diff --git a/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c 
b/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c
index 86ccf95..8e9f378 100644
--- a/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c
+++ b/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c
@@ -14,4 +14,4 @@ void foo (int ilast,float* w, float* w2)
 }
 
 /* { dg-final { scan-assembler "vcgt\\.f32\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
-/* { dg-final { scan-assembler "vbit\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
+/* { dg-final { scan-assembler "vbsl|vbit|vbif\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c 
b/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c
index acb23a9..c8306e3 100644
--- a/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c
+++ b/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c
@@ -15,4 +15,4 @@ void foo (int ilast,float* w, float* w2)
 
 /* { dg-final { scan-assembler-times "vcgt\\.f32\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" 2 } } */
 /* { dg-final { scan-assembler "vorr\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
-/* { dg-final { scan-assembler "vbsl\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
+/* { dg-final { scan-assembler "vbsl|vbit|vbif\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c 
b/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c
index c3e448d..3bb67d3 100644
--- a/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c
+++ b/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c
@@ -16,4 +16,4 @@ void foo (int ilast,float* w, float* w2)
 /* { dg-final { scan-assembler "vcgt\\.f32\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
 /* { dg-final { scan-assembler "vcge\\.f32\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
 /* { dg-final { scan-assembler "vorr\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
-/* { dg-final { scan-assembler "vbsl\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
+/* { dg-final { scan-assembler "vbsl|vbit|vbif\[\\t \]*q\[0-9\]+,\[\\t 
\]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */

Re: [ARM][PATCH] Fix testsuite testcase neon-vcond-[ltgt,unordered].c

2013-10-28 Thread Kugan

On 25/10/13 19:04, Kyrill Tkachov wrote:
> On 24/10/13 20:03, Kugan wrote:
>>
>> Hi Kyrill,
>>
>> It happens for armv5te arm-none-linux-gnueabi. --with-mode=arm
>> --with-arch=armv5te --with-float=soft
> 
> Ah ok, I can reproduce it now. So, while I agree that we add a scan for
> vbit and vbif to these testcases, there seems to be something dodgy
> going on with the register allocation.
> 
> With -march=armv5te I'm getting the following snippet of code in the
> ltgt case:
> 
> .L12:
> ldr r4, [ip]
> ldr r5, [ip, #4]
> ldr r6, [ip, #8]
> ldr r7, [ip, #12]
> vmovd20, r4, r5  @ v4sf
> vmovd21, r6, r7
> vcgt.f32q8, q10, q9
> vcgt.f32q10, q9, q10
> vorrq8, q8, q10
> vmovd22, r4, r5  @ v4sf
> vmovd23, r6, r7
> vbitq11, q9, q8
> vmovr4, r5, d22  @ v4sf
> vmovr6, r7, d23
> 
> The second vcgt.f32 trashes q10, then recreates it in q11 with:
> vmovd22, r4, r5  @ v4sf
> vmovd23, r6, r7
> 
> so it can do the vbit. Surely there's something better that can be done?
> 
> In contrast, with -march=armv7-a we get:
> 
> .L12:
> vld1.32 {q9}, [r4]!
> vcgt.f32q8, q9, q10
> vcgt.f32q11, q10, q9
> vorrq8, q8, q11
> vbslq8, q10, q9
> vst1.32 {q8}, [lr]!
> 

This is because  of the unaligned access done for armv7-a. arm.c has the
following comment:

  /* Enable -munaligned-access by default for
 - all ARMv6 architecture-based processors
 - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
 - ARMv8 architecture-base processors.

 Disable -munaligned-access by default for
 - all pre-ARMv6 architecture-based processors
 - ARMv6-M architecture-based processors.  */

Please look at the rtl difference.
- is armv7-a
+ is armv5te

;; vect_var_.18_61 = MEM[(float *)vect_pw2.14_59];

-(insn 71 70 72 (set (reg:V4SF 192)
-(unspec:V4SF [
-(mem:V4SF (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float
*)vect_pw2.14_59]+0 S16 A32])
-] UNSPEC_MISALIGNED_ACCESS)) neon-vcond-ltgt.c:12 -1
+(insn 71 70 72 (clobber (reg:V4SF 168 [ vect_var_.18 ]))
neon-vcond-ltgt.c:12 -1
+ (nil))
+
+(insn 72 71 73 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 0)
+(mem:SI (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float
*)vect_pw2.14_59]+0 S4 A32])) neon-vcond-ltgt.c:12 -1
+ (nil))
+
+(insn 73 72 74 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 4)
+(mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+(const_int 4 [0x4])) [0 MEM[(float *)vect_pw2.14_59]+4
S4 A32])) neon-vcond-ltgt.c:12 -1
+ (nil))
+
+(insn 74 73 75 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 8)
+(mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+(const_int 8 [0x8])) [0 MEM[(float *)vect_pw2.14_59]+8
S4 A32])) neon-vcond-ltgt.c:12 -1
  (nil))

-(insn 72 71 0 (set (reg:V4SF 168 [ vect_var_.18 ])
-(reg:V4SF 192)) neon-vcond-ltgt.c:12 -1
+(insn 75 74 0 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 12)
+(mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+(const_int 12 [0xc])) [0 MEM[(float
*)vect_pw2.14_59]+12 S4 A32])) neon-vcond-ltgt.c:12 -1
  (nil))

Remove redundant unshare_expr from ipa-prop

2016-01-21 Thread Kugan

Hi,

There is a redundant unshare_expr in ipa-prop. Attached patch removes
it. Bootstrapped and regression tested on x86_64-pc-linux-gnu with no
new regressions.

Is this OK for trunk?

Thanks,
Kugan

gcc/ChangeLog:

2016-01-22  Kugan Vivekanandarajah  

* ipa-prop.c (ipa_set_jf_constant): Remove redundant unshare_expr.
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 06a9aa2..d62c704 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -402,9 +402,6 @@ static void
 ipa_set_jf_constant (struct ipa_jump_func *jfunc, tree constant,
 struct cgraph_edge *cs)
 {
-  constant = unshare_expr (constant);
-  if (constant && EXPR_P (constant))
-SET_EXPR_LOCATION (constant, UNKNOWN_LOCATION);
   jfunc->type = IPA_JF_CONST;
   jfunc->value.constant.value = unshare_expr_without_location (constant);

Re: Incorrect code due to indirect tail call of varargs function with hard float ABI

2016-01-25 Thread Kugan


This issue also remains in 4.9 and 5.0 branches. Is this OK to backport
to the release branches.

Thanks,
Kugan

On 02/12/15 10:00, Kugan wrote:
> 
>>>
>>> gcc/ChangeLog:
>>>
>>> 2015-11-18  Kugan Vivekanandarajah  
>>>
>>> PR target/68390
>>> * config/arm/arm.c (arm_function_ok_for_sibcall): Get function type
>>> for indirect function call.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2015-11-18  Kugan Vivekanandarajah  
>>>
>>> PR target/68390
>>> * gcc.target/arm/PR68390.c: New test.
>>>
>>
>> s/PR/pr in the test name and put this in gcc.c-torture/execute instead - 
>> there is nothing ARM specific about the test. Tests in gcc.target/arm should 
>> really only be architecture specific. This isn't.
>>
>>>
>>>
>>>
>>> p.txt
>>>
>>>
>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>> index a379121..0dae7da 100644
>>> --- a/gcc/config/arm/arm.c
>>> +++ b/gcc/config/arm/arm.c
>>> @@ -6680,8 +6680,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
>>>  a VFP register but then need to transfer it to a core
>>>  register.  */
>>>rtx a, b;
>>> +  tree fn_decl = decl;
>>
>> Call it decl_or_type instead - it's really that ... 
>>
>>>  
>>> -  a = arm_function_value (TREE_TYPE (exp), decl, false);
>>> +  /* If it is an indirect function pointer, get the function type.  */
>>> +  if (!decl)
>>> +   fn_decl = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
>>> +
>>
>> This is probably just my mail client - but please watch out for indentation.
>>
>>> +  a = arm_function_value (TREE_TYPE (exp), fn_decl, false);
>>>b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)),
>>>   cfun->decl, false);
>>>if (!rtx_equal_p (a, b))
>>
>>
>> OK with those changes.
>>
>> Ramana
>>

>

Re: [PR66726] Fixe regression caused by Factor conversion out of COND_EXPR

2016-02-11 Thread kugan




On 12/02/16 17:18, Markus Trippelsdorf wrote:

On 2016.02.08 at 09:49 -0700, Jeff Law wrote:

On 01/18/2016 08:52 PM, Kugan wrote:


2016-01-19  Kugan Vivekanandarajah  

PR middle-end/66726
* tree-ssa-reassoc.c (optimize_range_tests): Handle tcc_compare stmt
whose result is used in PHI.
(maybe_optimize_range_tests): Likewise.
(final_range_test_p): Lokweise.


Otherwise this looks OK for the trunk.  It really hasn't changed much since
the version from July.  And while the PR is not marked as such, this is a
code quality regression fix for targets with a BRANCH_COST > 1.


This causes LTO/PGO bootstrap on ppc64le to ICE all over the place:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69781



Sorry for the breakage. I will revert the patch while I investigate this.

Thanks,
Kugan

[RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-02-25 Thread kugan




Hi,

This is an attempt to fix missed optimization: x + (-y * z * z) => x - y 
* z * z as reported in PR40921.


Regression tested and bootstrapped on x86-64-linux-gnu with no new 
regressions.


Is this OK for next stage1?

Thanks,
Kugan


gcc/ChangeLog:

2016-02-26  Kugan Vivekanandarajah  

PR middle-end/40921
* tree-ssa-reassoc.c (propagate_neg_to_sub_or_add): New.
(reassociate_bb): Call propagate_neg_to_sub_or_add.


gcc/testsuite/ChangeLog:

2016-02-26  Kugan Vivekanandarajah  

PR middle-end/40921
* gcc.dg/tree-ssa/pr40921.c: New test.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c
index e69de29..6a3529b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c
@@ -0,0 +1,11 @@
+
+/* PR middle-end/40921.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc1 -fno-rounding-math" } */
+
+double foo (double x, double y, double z)
+{
+return x + (-y * z*z);
+}
+
+/* { dg-final { scan-tree-dump-times "= -" 0 "reassoc1" } } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index e54700e..f99635b 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4784,6 +4784,78 @@ transform_stmt_to_multiply (gimple_stmt_iterator *gsi, 
gimple *stmt,
 }
 }
 
+/* Propagate NEGATE_EXPR to MINUS_EXPR/PLUS_EXPR when the neageted
+   expression is multiplied and used in MINUS_EXPR/PLUS_EXPR.  */
+static void
+propagate_neg_to_sub_or_add (gimple_stmt_iterator *gsi, gimple *stmt)
+{
+  tree lhs = gimple_assign_lhs (stmt);
+  tree rhs1, rhs2, mult_lhs;
+  gimple *use_stmt;
+  gimple *use_stmt2;
+  use_operand_p use;
+  enum tree_code code;
+  gassign *g;
+
+  /* Note that -frounding-math should disable the proposed
+ optimization.  */
+  if (flag_rounding_math)
+return;
+
+  if (!single_imm_use (lhs, &use, &use_stmt))
+return;
+
+  if (!is_gimple_assign (use_stmt))
+return;
+
+  code = gimple_assign_rhs_code (use_stmt);
+  if (code != MULT_EXPR)
+return;
+  mult_lhs = gimple_assign_lhs (use_stmt);
+  while (code == MULT_EXPR)
+{
+  if (!single_imm_use (mult_lhs, &use, &use_stmt2))
+   break;
+  if (!is_gimple_assign (use_stmt2))
+   break;
+  code = gimple_assign_rhs_code (use_stmt2);
+  mult_lhs = gimple_assign_lhs (use_stmt2);
+  use_stmt = use_stmt2;
+}
+
+  if (code != PLUS_EXPR
+  && code != MINUS_EXPR)
+return;
+
+  lhs = gimple_assign_lhs (use_stmt);
+  rhs1 = gimple_assign_rhs1 (use_stmt);
+  rhs2 = gimple_assign_rhs2 (use_stmt);
+
+  if (rhs1 == USE_FROM_PTR (use))
+{
+  if (code == MINUS_EXPR)
+   return;
+  std::swap (rhs1, rhs2);
+  code = MINUS_EXPR;
+}
+  else
+{
+  if (code == PLUS_EXPR)
+   code = MINUS_EXPR;
+  else
+   code = PLUS_EXPR;
+}
+
+  g = gimple_build_assign (lhs, code, rhs1, rhs2);
+  gimple_stmt_iterator gsi2 = gsi_for_stmt (use_stmt);
+  gsi_replace (&gsi2, g, true);
+
+  lhs = gimple_assign_lhs (stmt);
+  rhs1 = gimple_assign_rhs1 (stmt);
+  g = gimple_build_assign (lhs, SSA_NAME, rhs1);
+  gsi_replace (gsi, g, true);
+}
+
 /* Reassociate expressions in basic block BB and its post-dominator as
children.
 
@@ -4809,6 +4881,11 @@ reassociate_bb (basic_block bb)
{
  tree lhs, rhs1, rhs2;
  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
+ if (rhs_code == NEGATE_EXPR)
+   {
+ propagate_neg_to_sub_or_add (&gsi, stmt);
+ continue;
+   }
 
  /* If this is not a gimple binary expression, there is
 nothing for us to do with it.  */
@@ -4884,6 +4961,7 @@ reassociate_bb (basic_block bb)
  if (rhs_code == MULT_EXPR)
attempt_builtin_copysign (&ops);
 
+
  if (reassoc_insert_powi_p
  && rhs_code == MULT_EXPR
  && flag_unsafe_math_optimizations)

[RFC][PATCH][PR63586] Convert x+x+x+x into 4*x

2016-02-25 Thread kugan




Hi,

This is an attempt to fix missed optimization: x+x+x+x -> 4*x as 
reported in PR63586.


Regression tested and bootstrapped on x86-64-linux-gnu with no new 
regressions.


Is this OK for next stage1?

Thanks,
Kugan


gcc/testsuite/ChangeLog:

2016-02-26  Kugan Vivekanandarajah  

PR middle-end/63586
* gcc.dg/tree-ssa/reassoc-14.c: Fix multiply count.

gcc/ChangeLog:

2016-02-26  Kugan Vivekanandarajah  

PR middle-end/63586
* tree-ssa-reassoc.c (transform_add_to_multiply): New.
(reassociate_bb): Call transform_add_to_multiply.


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c
index 62802d1..16ebc86 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c
@@ -19,6 +19,7 @@ unsigned int test2 (unsigned int x, unsigned int y, unsigned 
int z,
   return tmp1 + tmp2 + tmp3;
 }
 
-/* There should be one multiplication left in test1 and three in test2.  */
+/* There should be two multiplication left in test1 (inculding one generated
+   when converting addition to multiplication) and three in test2.  */
 
-/* { dg-final { scan-tree-dump-times "\\\*" 4 "reassoc1" } } */
+/* { dg-final { scan-tree-dump-times "\\\*" 5 "reassoc1" } } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index dfd0da1..2454b9d 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1698,6 +1698,61 @@ eliminate_redundant_comparison (enum tree_code opcode,
   return false;
 }
 
+/* Recursively transoform repeated addition of same values into multiply with
+   constant.  */
+static void
+transform_add_to_multiply (gimple_stmt_iterator *gsi, gimple *stmt, 
vec *ops)
+{
+  operand_entry *oe;
+  tree op = NULL_TREE;
+  int i, start = -1, end = 0, count = 0;
+
+  /* Look for repeated operands.  */
+  FOR_EACH_VEC_ELT (*ops, i, oe)
+{
+  if (start == -1)
+   {
+ count = 1;
+ op = oe->op;
+ start = i;
+   }
+  else if (operand_equal_p (oe->op, op, 0))
+   {
+ count++;
+ end = i;
+   }
+  else if (count == 1)
+   {
+ count = 1;
+ op = oe->op;
+ start = i;
+   }
+  else
+   break;
+}
+
+  if (count > 1)
+{
+  /* Convert repeated operand addition to multiplication.  */
+  for (i = end; i >= start; --i)
+   ops->unordered_remove (i);
+  tree tmp = make_temp_ssa_name (TREE_TYPE (op), NULL, "reassocmul");
+  gassign *mul_stmt = gimple_build_assign (tmp, MULT_EXPR,
+  op, build_int_cst 
(TREE_TYPE(op), count));
+  gimple_set_location (mul_stmt, gimple_location (stmt));
+  gimple_set_uid (mul_stmt, gimple_uid (stmt));
+  gsi_insert_before (gsi, mul_stmt, GSI_SAME_STMT);
+  oe = operand_entry_pool.allocate ();
+  oe->op = tmp;
+  oe->rank = get_rank (op) * count;
+  oe->id = 0;
+  oe->count = 1;
+  ops->safe_push (oe);
+  transform_add_to_multiply (gsi, stmt, ops);
+}
+}
+
+
 /* Perform various identities and other optimizations on the list of
operand entries, stored in OPS.  The tree code for the binary
operation between all the operands is OPCODE.  */
@@ -4854,6 +4909,12 @@ reassociate_bb (basic_block bb)
  && flag_unsafe_math_optimizations)
powi_result = attempt_builtin_powi (stmt, &ops);
 
+ if (rhs_code == PLUS_EXPR)
+   {
+ transform_add_to_multiply (&gsi, stmt, &ops);
+ ops.qsort (sort_by_operand_rank);
+   }
+
  /* If the operand vector is now empty, all operands were 
 consumed by the __builtin_powi optimization.  */
  if (ops.length () == 0)

Re: [RFC][PATCH][PR63586] Convert x+x+x+x into 4*x

2016-02-28 Thread kugan




That looks better, but I think the unordered_remove will break operand sorting
and thus you probably don't handle x + x + x + x + y + y + y + y + y +
y + z + z + z + z
optimally.

I'd say you simply want to avoid the recursion and collect a vector of
[start, end] pairs
before doing any modification to the ops vector.


Hi Richard,

Is the attached patch looks better?

Thanks,
Kugan
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c
index e69de29..a002bdd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc1" } */
+
+unsigned f1 (unsigned x)
+{
+unsigned y = x + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+return y;
+}
+
+unsigned f2 (unsigned x, unsigned z)
+{
+unsigned y = x + x;
+y = y + x;
+y = y + x;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + z;
+return y;
+}
+
+unsigned f3 (unsigned x, unsigned z, unsigned k)
+{
+unsigned y = x + x;
+y = y + x;
+y = y + x;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + k;
+return y;
+}
+
+unsigned f4 (unsigned x, unsigned z, unsigned k)
+{
+unsigned y = k + x;
+y = y + x;
+y = y + x;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + x;
+return y;
+}
+
+unsigned f5 (unsigned x, unsigned y, unsigned z)
+{
+return x + x + x + x + y + y + y + y + y +
+  y + z + z + z + z;
+}
+
+
+/* { dg-final { scan-tree-dump-times "\\\*" 10 "reassoc1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c
index 62802d1..16ebc86 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c
@@ -19,6 +19,7 @@ unsigned int test2 (unsigned int x, unsigned int y, unsigned 
int z,
   return tmp1 + tmp2 + tmp3;
 }
 
-/* There should be one multiplication left in test1 and three in test2.  */
+/* There should be two multiplication left in test1 (inculding one generated
+   when converting addition to multiplication) and three in test2.  */
 
-/* { dg-final { scan-tree-dump-times "\\\*" 4 "reassoc1" } } */
+/* { dg-final { scan-tree-dump-times "\\\*" 5 "reassoc1" } } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 17eb64f..0a43faf 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1698,6 +1698,79 @@ eliminate_redundant_comparison (enum tree_code opcode,
   return false;
 }
 
+/* Transoform repeated addition of same values into multiply with
+   constant.  */
+static void
+transform_add_to_multiply (gimple_stmt_iterator *gsi, gimple *stmt, 
vec *ops)
+{
+  operand_entry *oe;
+  tree op = NULL_TREE;
+  int j;
+  int i, start = -1, end = 0, count = 0;
+  vec  start_inds = vNULL;
+  vec  end_inds = vNULL;
+  vec  op_list = vNULL;
+
+  /* Look for repeated operands.  */
+  FOR_EACH_VEC_ELT (*ops, i, oe)
+{
+  if (start == -1)
+   {
+ count = 1;
+ op = oe->op;
+ start = i;
+   }
+  else if (operand_equal_p (oe->op, op, 0))
+   {
+ count++;
+ end = i;
+   }
+  else
+   {
+ if (count > 1)
+   {
+ start_inds.safe_push (start);
+ end_inds.safe_push (end);
+ op_list.safe_push (op);
+   }
+ count = 1;
+ op = oe->op;
+ start = i;
+   }
+}
+
+  if (count > 1)
+{
+  start_inds.safe_push (start);
+  end_inds.safe_push (end);
+  op_list.safe_push (op);
+}
+
+  for (j = start_inds.length () - 1; j >= 0; --j)
+{
+  /* Convert repeated operand addition to multiplication.  */
+  start = start_inds[j];
+  end = end_inds[j];
+  op = op_list[j];
+  count = end - start + 1;
+  for (i = end; i >= start; --i)
+   ops->unordered_remove (i);
+  tree tmp = make_temp_ssa_name (TREE_TYPE (op), NULL, "reassocmul");
+  gassign *mul_stmt = gimple_build_assign (tmp, MULT_EXPR,
+  op, build_int_cst 
(TREE_TYPE(op), count));
+  gimple_set_location (mul_stmt, gimple_location (stmt));
+  gimple_set_uid (mul_stmt, gimple_uid (stmt));
+  gsi_insert_before (gsi, mul_stmt, GSI_SAME_STMT);
+  oe = operand_entry_pool.allocate ();
+  oe->op = tmp;
+  oe->rank = get_rank (op) * count;
+  oe->id = 0;
+  oe->count = 1;
+  ops->safe_push (oe);
+}
+}
+
+
 /* Perform various identities and other optimizations on the list of
operand entries, stored in OPS.  The tree code for the binary
operation between all the operands is OPCODE.  */
@@ -4922,6 +4995,12 @@ reassociate_bb (basic_block bb)
  && f

Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-02-29 Thread kugan



Err.  I think the way you implement that in reassoc is ad-hoc and not
related to reassoc at all.

In fact what reassoc is missing is to handle

  -y * z * (-w) * x -> y * x * w * x

thus optimize negates as if they were additional * -1 entries in a
multiplication chain.  And
then optimize a single remaining * -1 in the result chain to a negate.

Then match.pd handles x + (-y) -> x - y (independent of -frounding-math btw).

So no, this isn't ok as-is, IMHO you want to expand the multiplication ops chain
pulling in the * -1 ops (if single-use, of course).



I agree. Here is the updated patch along what you suggested. Does this 
look better ?


Thanks,
Kugan
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 17eb64f..bbb5ffb 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4674,6 +4674,41 @@ attempt_builtin_powi (gimple *stmt, vec 
*ops)
   return result;
 }
 
+/* Factor out NEGATE_EXPR from the multiplication operands.  */
+static void
+factor_out_negate_expr (gimple_stmt_iterator *gsi,
+   gimple *stmt, vec *ops)
+{
+  operand_entry *oe;
+  unsigned int i;
+  int neg_count = 0;
+
+  FOR_EACH_VEC_ELT (*ops, i, oe)
+{
+  if (TREE_CODE (oe->op) != SSA_NAME
+ || !has_single_use (oe->op))
+   continue;
+  gimple *def_stmt = SSA_NAME_DEF_STMT (oe->op);
+  if (!is_gimple_assign (def_stmt)
+ || gimple_assign_rhs_code (def_stmt) != NEGATE_EXPR)
+   continue;
+  oe->op = gimple_assign_rhs1 (def_stmt);
+  neg_count ++;
+}
+
+  if (neg_count % 2)
+{
+  tree lhs = gimple_assign_lhs (stmt);
+  tree tmp = make_temp_ssa_name (TREE_TYPE (lhs), NULL, "reassocneg");
+  gimple_set_lhs (stmt, tmp);
+  gassign *neg_stmt = gimple_build_assign (lhs, NEGATE_EXPR,
+  tmp);
+  gimple_set_location (neg_stmt, gimple_location (stmt));
+  gimple_set_uid (neg_stmt, gimple_uid (stmt));
+  gsi_insert_after (gsi, neg_stmt, GSI_SAME_STMT);
+}
+}
+
 /* Attempt to optimize
CST1 * copysign (CST2, y) -> copysign (CST1 * CST2, y) if CST1 > 0, or
CST1 * copysign (CST2, y) -> -copysign (CST1 * CST2, y) if CST1 < 0.  */
@@ -4917,6 +4952,12 @@ reassociate_bb (basic_block bb)
  if (rhs_code == MULT_EXPR)
attempt_builtin_copysign (&ops);
 
+ if (rhs_code == MULT_EXPR)
+   {
+ factor_out_negate_expr (&gsi, stmt, &ops);
+ ops.qsort (sort_by_operand_rank);
+   }
+
  if (reassoc_insert_powi_p
  && rhs_code == MULT_EXPR
  && flag_unsafe_math_optimizations)

[RFC][PR69708] IPA inline not working for function reference in static const struc

2016-02-29 Thread kugan


Hi,

As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69708 and 
corresponding mailing list discussion, IPA CP is not detecting  a 
jump-function with the sq function as value.



static int sq(int x) {
  return x * x;
}

static const F f = {sq};
...
dosomething (g(f, x));
...

I added a check at  determine_locally_known_aggregate_parts to detect 
this. This fixes the testcase and passes x86-64-linux-gnu lto bootstrap 
and regression testing with no new regression. Does this look sensible 
place to fix this?


Thanks,
Kugan

gcc/ChangeLog:



2016-03-01  Kugan Vivekanandarajah  



* ipa-prop.c (determine_locally_known_aggregate_parts): Determine jump

 function for static constant initialization.

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 72c2fed..22da097 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -1562,6 +1562,57 @@ determine_locally_known_aggregate_parts (gcall *call, 
tree arg,
   jfunc->agg.by_ref = by_ref;
   build_agg_jump_func_from_list (list, const_count, arg_offset, jfunc);
 }
+  else if ((TREE_CODE (arg) == VAR_DECL)
+  && is_global_var (arg))
+{
+  /* PR69708:  Figure out aggregate jump-function with constant init
+value.  */
+  struct ipa_known_agg_contents_list *n, **p;
+  HOST_WIDE_INT offset = 0, size, max_size;
+  varpool_node *node = varpool_node::get (arg);
+  if (node
+ && DECL_INITIAL (node->decl)
+ && TREE_READONLY (node->decl)
+ && TREE_CODE (DECL_INITIAL (node->decl)) == CONSTRUCTOR)
+   {
+ tree exp = DECL_INITIAL (node->decl);
+ unsigned HOST_WIDE_INT ix;
+ tree field, val;
+ bool reverse;
+ FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (exp), ix, field, val)
+   {
+ bool already_there = false;
+ if (!field)
+   break;
+ get_ref_base_and_extent (field, &offset, &size,
+  &max_size, &reverse);
+ if (max_size == -1
+ || max_size != size)
+   break;
+ p = get_place_in_agg_contents_list (&list, offset, size,
+ &already_there);
+ if (!p)
+   break;
+ n = XALLOCA (struct ipa_known_agg_contents_list);
+ n->size = size;
+ n->offset = offset;
+ if (is_gimple_ip_invariant (val))
+   {
+ n->constant = val;
+ const_count++;
+   }
+ else
+   n->constant = NULL_TREE;
+ n->next = *p;
+ *p = n;
+   }
+   }
+  if (const_count)
+   {
+ jfunc->agg.by_ref = by_ref;
+ build_agg_jump_func_from_list (list, const_count, arg_offset, jfunc);
+   }
+}
 }
 
 static tree

Re: [RFC][LIBGCC][2 of 2] 64 bit divide implementation for processor without hw divide instruction

2013-11-26 Thread Kugan

On 27/11/13 02:07, Richard Earnshaw wrote:
> On 23/11/13 01:54, Kugan wrote:

[snip]

>> +2013-11-22  Kugan Vivekanandarajah  
>> +
>> +* libgcc/config/arm/pbapi-lib.h (HAVE_NO_HW_DIVIDE): Define for
> 
> It's bpabi-lib.h

Thanks for the review.

>> +__ARM_ARCH_7_A__.
>> +
>>
>>
> 
> No, this will:
> 1) Do the wrong thing for Cortex-a7, A12 and A15 (which all have HW
> divide, and currently define __ARM_ARCH_7_A__).
> 2) Do the wrong thing for v7-M and v7-R devices, which have Thumb HW
> division instructions.
> 3) Do the wrong thing for all pre-v7 devices, which don't have HW division.
> 
> I think the correct solution is to test !defined(__ARM_ARCH_EXT_IDIV__)

I understand it now and updated the code as attached.

+2013-11-27  Kugan Vivekanandarajah  
+
+   * config/arm/bpapi-lib.h (TARGET_HAS_NO_HW_DIVIDE): Define for
+   architectures that does not have hardware divide instruction.
+   i.e. architectures that does not define __ARM_ARCH_EXT_IDIV__.
+


Is this OK for trunk now?
Thanks,
Kugan
diff --git a/libgcc/config/arm/bpabi-lib.h b/libgcc/config/arm/bpabi-lib.h
index e0e46a6..7c6b489 100644
--- a/libgcc/config/arm/bpabi-lib.h
+++ b/libgcc/config/arm/bpabi-lib.h
@@ -75,3 +75,7 @@
helper functions - not everything in libgcc - in the interests of
maintaining backward compatibility.  */
 #define LIBGCC2_FIXEDBIT_GNU_PREFIX
+
+#if (!defined(__ARM_ARCH_EXT_IDIV__))
+# define TARGET_HAS_NO_HW_DIVIDE
+#endif

Re: [RFC][LIBGCC][2 of 2] 64 bit divide implementation for processor without hw divide instruction

2013-12-02 Thread Kugan

ping

Thanks,
Kugan

On 27/11/13 15:30, Kugan wrote:
> On 27/11/13 02:07, Richard Earnshaw wrote:
>> On 23/11/13 01:54, Kugan wrote:
> 
> [snip]
> 
>>> +2013-11-22  Kugan Vivekanandarajah  
>>> +
>>> +   * libgcc/config/arm/pbapi-lib.h (HAVE_NO_HW_DIVIDE): Define for
>>
>> It's bpabi-lib.h
> 
> Thanks for the review.
> 
>>> +   __ARM_ARCH_7_A__.
>>> +
>>>
>>>
>>
>> No, this will:
>> 1) Do the wrong thing for Cortex-a7, A12 and A15 (which all have HW
>> divide, and currently define __ARM_ARCH_7_A__).
>> 2) Do the wrong thing for v7-M and v7-R devices, which have Thumb HW
>> division instructions.
>> 3) Do the wrong thing for all pre-v7 devices, which don't have HW division.
>>
>> I think the correct solution is to test !defined(__ARM_ARCH_EXT_IDIV__)
> 
> I understand it now and updated the code as attached.
> 
> +2013-11-27  Kugan Vivekanandarajah  
> +
> + * config/arm/bpapi-lib.h (TARGET_HAS_NO_HW_DIVIDE): Define for
> + architectures that does not have hardware divide instruction.
> + i.e. architectures that does not define __ARM_ARCH_EXT_IDIV__.
> +
> 
> 
> Is this OK for trunk now?
> Thanks,
> Kugan
>

AARCH64 configure check for gas -mabi support

2013-12-04 Thread Kugan

Hi,

gcc trunk aarch64 bootstrapping fails with gas version 2.23.2 (with
error message similar to cannot compute suffix of object files) as this
particular version does not support -mabi=lp64. It succeeds with later
versions of gas that supports -mabi.

Attached patch add checking for -mabi=lp64 and prompts upgradation. Is
this Ok?

Thanks,
Kugan

+2013-12-05  Kugan Vivekanandarajah  
+   * configure.ac: Add checks for aarch64 assembler -mabi support.
+   * configure: Regenerate.
+
diff --git a/gcc/configure b/gcc/configure
index fdf0cd0..17b6e85 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -24057,6 +24057,38 @@ $as_echo "#define HAVE_AS_NO_MUL_BUG_ABORT_OPTION 1" 
>>confdefs.h
 fi
 ;;
 
+ aarch64-*-*)
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for -mabi 
option" >&5
+$as_echo_n "checking assembler for -mabi option... " >&6; }
+if test "${gcc_cv_as_aarch64_mabi+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_aarch64_mabi=no
+  if test x$gcc_cv_as != x; then
+$as_echo '.text' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags -mabi=lp64 -o conftest.o 
conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_aarch64_mabi=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_aarch64_mabi" >&5
+$as_echo "$gcc_cv_as_aarch64_mabi" >&6; }
+
+if test x$gcc_cv_as_aarch64_mabi = xno; then
+   as_fn_error "Assembler support for -mabi=lp64 is required. Upgrade the 
Assembler." "$LINENO" 5
+fi
+;;
+
   sparc*-*-*)
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .register" 
>&5
 $as_echo_n "checking assembler for .register... " >&6; }
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 91a22d5..730ada0 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3532,6 +3532,15 @@ case "$target" in
[Define if your assembler supports the -no-mul-bug-abort 
option.])])
 ;;
 
+ aarch64-*-*)
+gcc_GAS_CHECK_FEATURE([-mabi option],
+  gcc_cv_as_aarch64_mabi,,
+  [-mabi=lp64], [.text],,,)
+if test x$gcc_cv_as_aarch64_mabi = xno; then
+   AC_MSG_ERROR([Assembler support for -mabi=lp64 is required. Upgrade the 
Assembler.])
+fi
+;;
+
   sparc*-*-*)
 gcc_GAS_CHECK_FEATURE([.register], gcc_cv_as_sparc_register_op,,,
   [.register %g2, #scratch],,

Re: [PING^2][PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-12-05 Thread Kugan


>>>>   tem = (char) 255 + (char) 1;
>>
>> tem is always of type 'char' in GIMPLE (even if later promoted
>> via PROMOTE_MODE) the value-range is a 'char' value-range and thus
>> never will exceed [CHAR_MIN, CHAR_MAX].  The only way you can
>> use that directly is if you can rely on undefined behavior
>> happening for signed overflow - but if you argue that way you
>> can simply _always_ drop the (sext:SI (subreg:QI part and you
>> do not need value ranges for this.  For unsigned operations
>> for example [250, 254] + [8, 10] will simply wrap to [3, 7]
>> (if I got the math correct) which is inside your [CHAR_MIN + 1,
>> CHAR_MAX - 1] but if performed in SImode you can get 259 and
>> thus clearly you cannot drop the (zext:SI (subreg:QI parts.
>> The same applies to signed types if you do not want to rely
>> on signed overflow being undefined of course.
>>
> 
> Thanks for the explanation. I now get it and I will rework the patch.
> 

I have attempted to implement what Richard suggested. If you think this
is what you want, I will go ahead and implement the missing gimple
binary statements.

Thanks again.
Kugan

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 98983f4..60ce54b 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3236,6 +3236,20 @@ expand_gimple_stmt_1 (gimple stmt)
 
if (temp == target)
  ;
+   /* If the value in SUBREG of temp fits that SUBREG (does not
+  overflow) and is assigned to target SUBREG of the same mode
+  without sign conversion, we can skip the SUBREG
+  and extension.  */
+   else if (promoted
+&& is_assigned_exp_fit_type (lhs)
+&& (GET_CODE (temp) == SUBREG)
+&& (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (temp)))
+>= GET_MODE_PRECISION (GET_MODE (target)))
+&& (GET_MODE (SUBREG_REG (target))
+== GET_MODE (SUBREG_REG (temp
+ {
+   emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp));
+ }
else if (promoted)
  {
int unsignedp = SUBREG_PROMOTED_UNSIGNED_P (target);
diff --git a/gcc/dojump.c b/gcc/dojump.c
index 2aef34d..0f3aeae 100644
--- a/gcc/dojump.c
+++ b/gcc/dojump.c
@@ -1143,6 +1143,62 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum 
rtx_code signed_code,
 
   type = TREE_TYPE (treeop0);
   mode = TYPE_MODE (type);
+
+  /* Is zero/sign extension redundant.  */
+  bool op0_ext_redundant = false;
+  bool op1_ext_redundant = false;
+
+  /* If promoted and the value in SUBREG of op0 fits (does not overflow),
+ it is a candidate for extension elimination.  */
+  if (GET_CODE (op0) == SUBREG && SUBREG_PROMOTED_VAR_P (op0))
+op0_ext_redundant = is_assigned_exp_fit_type (treeop0);
+
+  /* If promoted and the value in SUBREG of op1 fits (does not overflow),
+ it is a candidate for extension elimination.  */
+  if (GET_CODE (op1) == SUBREG && SUBREG_PROMOTED_VAR_P (op1))
+op1_ext_redundant = is_assigned_exp_fit_type (treeop1);
+
+  /* If zero/sign extension is redundant, generate RTL
+ for operands without zero/sign extension.  */
+  if ((op0_ext_redundant || TREE_CODE (treeop0) == INTEGER_CST)
+  && (op1_ext_redundant || TREE_CODE (treeop1) == INTEGER_CST))
+{
+  if ((TREE_CODE (treeop1) == INTEGER_CST)
+ && (!mode_signbit_p (GET_MODE (op1), op1)))
+   {
+ /* First operand is constant and signbit is not set (not
+represented in RTL as a negative constant).  */
+ rtx new_op0 = gen_reg_rtx (GET_MODE (SUBREG_REG (op0)));
+ emit_move_insn (new_op0, SUBREG_REG (op0));
+ op0 = new_op0;
+   }
+  else if ((TREE_CODE (treeop0) == INTEGER_CST)
+  && (!mode_signbit_p (GET_MODE (op0), op0)))
+   {
+ /* Other operand is constant and signbit is not set (not
+represented in RTL as a negative constant).  */
+ rtx new_op1 = gen_reg_rtx (GET_MODE (SUBREG_REG (op1)));
+
+ emit_move_insn (new_op1, SUBREG_REG (op1));
+ op1 = new_op1;
+   }
+  else if ((TREE_CODE (treeop0) != INTEGER_CST)
+  && (TREE_CODE (treeop1) != INTEGER_CST)
+  && (GET_MODE (op0) == GET_MODE (op1))
+  && (GET_MODE (SUBREG_REG (op0)) == GET_MODE (SUBREG_REG (op1
+   {
+ /* If both comapre registers fits SUBREG and of the
+same mode.  */
+ rtx new_op0 = gen_reg_rtx (GET_MODE (SUBREG_REG (op0)));
+ rtx new_op1 = gen_reg_rtx (GET_MODE (SUBREG_REG (op1)));
+
+ emit_move_insn (new_op0, SUBREG_REG (op0));
+ emit_move_insn (new_op1, SUBREG_REG (op1));
+

Re: AARCH64 configure check for gas -mabi support

2013-12-09 Thread Kugan

Thanks Yufeng for the review.

On 07/12/13 03:18, Yufeng Zhang wrote:

>> gcc trunk aarch64 bootstrapping fails with gas version 2.23.2 (with
>> error message similar to cannot compute suffix of object files) as this
>> particular version does not support -mabi=lp64. It succeeds with later
>> versions of gas that supports -mabi.
> 
> The -mabi option was introduced to gas when the support for ILP32 was
> added.  Initially the options were named -milp32 and -mlp64:
> 
>   http://sourceware.org/ml/binutils/2013-06/msg00178.html
> 
> and later on they were change to -mabi=ilp32 and -mabi=lp64 for
> consistency with those in the aarch64 gcc:
> 
>   http://sourceware.org/ml/binutils/2013-07/msg00180.html
> 
> The following gcc patch made the driver use the explicit option to drive
> gas:
> 
>   http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00083.html
> 
> It is a neglect of the backward compatibility with binutils 2.23.
> 
>>
>> Attached patch add checking for -mabi=lp64 and prompts upgradation. Is
>> this Ok?
> 
> I think instead of mandating the support for the -mabi option, the
> compiler shall be changed able to work with binutils 2.23.  The 2.23
> binutils have a good support for aarch64 and the main difference from
> 2.24 is the ILP32 support.  I think it is necessary to maintain the
> backward compatibility, and it should be achieved by suppressing the
> compiler's support for ILP32 when the -mabi option is not found
> available in gas during the configuration time.
> 
> I had a quick look at areas need to be updated:
> 
> * multilib support
> 
> In gcc/config.gcc, the default and the only accepted value for
> --with-multilib-list and --with-abi shall be lp64 when -mabi is not
> available.
> 
> * -mabi option
> 
> I suggest we keep the -mabi option, but reject -mabi=ilp32 in
> gcc/config/aarch64/aarch64.c:aarch64_override_options ()
> 
> * driver spec
> 
> In gcc/config/aarch64/aarch64-elf.h, the DRIVER_SELF_SPECS and ASM_SPEC
> shall be updated to not pass/specify -mabi for gas.
> 
> * documentation
> 
> I think it needs to be mentioned in gcc/doc/install.texi the constraint
> of using pre-2.24 binutils with aarch64 gcc that is 4.9 or later.
> 
> It is a quick scouting, but hopefully it has provided provide some
> guidance.  If you need more help, just let me know.
> 
> 
> Yufeng
> 
> P.s. some minor comments on the attached patch.
> 
>>
>> diff --git a/gcc/configure b/gcc/configure
>> index fdf0cd0..17b6e85 100755
>> --- a/gcc/configure
>> +++ b/gcc/configure
> 
> Diff result of auto-generation is usually excluded from a patch.
> 
>> diff --git a/gcc/configure.ac b/gcc/configure.ac
>> index 91a22d5..730ada0 100644
>> --- a/gcc/configure.ac
>> +++ b/gcc/configure.ac
>> @@ -3532,6 +3532,15 @@ case "$target" in
>>   [Define if your assembler supports the -no-mul-bug-abort
>> option.])])
>>   ;;
>>
>> + aarch64-*-*)
> 
> aarch64*-*-*
> 
>> +gcc_GAS_CHECK_FEATURE([-mabi option],
>> +  gcc_cv_as_aarch64_mabi,,
>> +  [-mabi=lp64], [.text],,,)
>> +if test x$gcc_cv_as_aarch64_mabi = xno; then
>> +AC_MSG_ERROR([Assembler support for -mabi=lp64 is required.
>> Upgrade the Assembler.])
>> +fi
>> +;;
>> +
>> sparc*-*-*)
>>   gcc_GAS_CHECK_FEATURE([.register], gcc_cv_as_sparc_register_op,,,
>> [.register %g2, #scratch],,
>>
> 
> 

Here is an attempt to do it the way you have suggested.

Thanks,
Kugan

gcc/

+2013-12-09  Kugan Vivekanandarajah  
+   * configure.ac: Add check for aarch64 assembler -mabi support.
+   * configure: Regenerate.
+   * config.in: Regenerate.
+   * config/aarch64/aarch64-elf.h (ASM_MABI_SPEC): New define.
+   (ASM_SPEC): Update to substitute -mabi with ASM_MABI_SPEC.
+   * config/aarch64/aarch64.h (aarch64_override_options):  Issue error if
+   Assembler does not support -mabi and option ilp32 is selected.
+   * doc/install.texi: Added note that building gcc 4.9 and after with pre
+   2.24 binutils will not support -mabi=ilp32.
+


diff --git a/gcc/config/aarch64/aarch64-elf.h b/gcc/config/aarch64/aarch64-elf.h
index 4757d22..b260b7c 100644
--- a/gcc/config/aarch64/aarch64-elf.h
+++ b/gcc/config/aarch64/aarch64-elf.h
@@ -134,13 +134,19 @@
   " %{!mbig-endian:%{!mlittle-endian:" ENDIAN_SPEC "}}" \
   " %{!mabi=*:" ABI_SPEC "}"
 
+#ifdef HAVE_AS_MABI_OPTION
+#define ASM_MABI_SPEC  "%{mabi=*:-mabi=%*}"
+#else
+#define ASM_MABI_SPEC  "%{mabi=lp64*:}"
+#endif
+
 #ifndef ASM_SPEC
 #define ASM_SPEC "\
 %{mbig-endian:-EB} \

Re: AARCH64 configure check for gas -mabi support

2013-12-09 Thread Kugan

Hi Yufeng,

Thanks for the quick response.

>> +#define ASM_MABI_SPEC"%{mabi=lp64*:}"
> 
> Is '*' necessary here?

Removed it.

>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index b1b4eef..c1a9cbd 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -5186,6 +5186,10 @@ aarch64_override_options (void)
>>   {
>> aarch64_parse_tune ();
>>   }
>> +#ifndef HAVE_AS_MABI_OPTION
>> +  if (TARGET_ILP32)
>> +error ("Assembler does not supprt -mabi=ilp32");
>> +#endif
> 
> A blank line before #ifndef and some comment to explain the reason please.

Blank line and comments are added.

>> + aarch64*-*-*)
> 
> Alphabetically, this should be placed before alpha*.

Moved it up.

> 
> It is not sufficient to only check with_abi itself.  By default,
> aarch64*-*-elf builds both ilp32 and lp64 libraries (e.g. libgcc).  This
> needs to be turned off if test x$gcc_cv_as_aarch64_mabi = xno.  We also
> need to detect the situation where users explicitly configure the
> toolchain with --with-multilib-list=lp64,ilp32
> 
> Here is an incremental diff based on your change to gcc/configure.ac to
> give an example on a more thorough check:
> 
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index c8cf274..c590ad7 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -3488,12 +3488,27 @@ case "$target" in
>  gcc_GAS_CHECK_FEATURE([-mabi option],
>gcc_cv_as_aarch64_mabi,,
>[-mabi=lp64], [.text],,,)
> -if test $gcc_cv_as_aarch64_mabi = yes ; then
> +if test x$gcc_cv_as_aarch64_mabi = xyes ; then
>  AC_DEFINE(HAVE_AS_MABI_OPTION, 1,
>[Define if your assembler supports the -mabi option.])
> -fi
> -if test x$gcc_cv_as_aarch64_mabi = xno && test x$with_abi = xilp32;
> then
> -AC_MSG_ERROR([Assembler doesnot support -mabi=ilp32. Upgrade the
> Assembler.])
> +else
> +if test x$with_abi = xilp32; then
> +  AC_MSG_ERROR([Assembler does not support -mabi=ilp32.  Upgrade
> the Assembler.])
> +fi
> +if test x"$with_multilib_list" = xdefault; then
> +  TM_MULTILIB_CONFIG=lp64
> +else
> +  aarch64_multilibs=`echo $with_multilib_list | sed -e 's/,/ /g'`
> +  for aarch64_multilib in ${aarch64_multilibs}; do
> +case ${aarch64_multilib} in
> +  ilp32 )
> +AC_MSG_ERROR([Assembler does not support -mabi=ilp32.  Upgrade
> the Assembler.])
> +;;
> +  *)
> +;;
> +esac
> +  done
> +fi
>  fi
>  ;;

Updated it and tested with

1. binutils 2.23.2
   a. bootstrapped with defaults and tested gcc for -mabi=lp64
(compiles) and -mabi=ilp32 gives error
   b. Trying to boottsrap with --with-multilibs-list=lp64,ilp32 fails
with error msg
   c. Trying to bootstrap with --with-multilibs-list=ilp32 fails with
error msg
   d. Bootstrap with --with-multilibs-list=lp64 works.

2. binutils 2.24.51
a. bootstrapped with defaults and tested gcc for -mabi=lp64
(compiles) and -mabi=ilp32 (compiles)
   b. Bootstrap with --with-multilibs-list=lp64,ilp32 works and tested
gcc for -mabi=lp64
compiles and -mabi=ilp32  compiles(* gives linker error in my setup -
aarch64:ilp32 architecture of input file `/tmp/ccIFqSxU.o' is
incompatible with aarch64 output; I believe this is not related to what
I am testing)
   c. Bootstrap with default works


Thanks,
kugan

gcc/

+2013-12-09  Kugan Vivekanandarajah  
+   * configure.ac: Add check for aarch64 assembler -mabi support.
+   * configure: Regenerate.
+   * config.in: Regenerate.
+   * config/aarch64/aarch64-elf.h (ASM_MABI_SPEC): New define.
+   (ASM_SPEC): Update to substitute -mabi with ASM_MABI_SPEC.
+   * config/aarch64/aarch64.h (aarch64_override_options):  Issue error if
+   assebler does not support -mabi and option ilp32 is selected.
+   * doc/install.texi: Added note that building gcc 4.9 and after with pre
+   2.24 binutils will not support -mabi=ilp32.
+



diff --git a/gcc/config/aarch64/aarch64-elf.h b/gcc/config/aarch64/aarch64-elf.h
index 4757d22..a66c3db 100644
--- a/gcc/config/aarch64/aarch64-elf.h
+++ b/gcc/config/aarch64/aarch64-elf.h
@@ -134,13 +134,19 @@
   " %{!mbig-endian:%{!mlittle-endian:" ENDIAN_SPEC "}}" \
   " %{!mabi=*:" ABI_SPEC "}"
 
+#ifdef HAVE_AS_MABI_OPTION
+#define ASM_MABI_SPEC  "%{mabi=*:-mabi=%*}"
+#else
+#define ASM_MABI_SPEC  "%{mabi=lp64:}"
+#endif
+
 #ifndef ASM_SPEC
 #define ASM_SPEC "\
 %{mbig-endian:-EB} \
 %{mlittle-endian:-EL} \
 %{mcpu=*:-mcpu=%*} \
-%{march=*:-march=%*} \
-%{mabi=*:-mabi=%*}"
+%{march=*:-

Re: AARCH64 configure check for gas -mabi support

2013-12-10 Thread Kugan


[snip]

>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index b1b4eef..a53febc 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -5187,6 +5187,13 @@ aarch64_override_options (void)
>> aarch64_parse_tune ();
>>   }
>>
>> +/* Issue error if assembler does not support -mabi and option ilp32
>> +  is selected.  */
>
>I'd prefer the comment to be "The compiler may have been configured
>with 2.23.* binutils, which does not have support for ILP32."


>> +#ifndef HAVE_AS_MABI_OPTION
>> +  if (TARGET_ILP32)
>> +error ("Assembler does not supprt -mabi=ilp32");
>> +#endif
> 
> supprt/support
> 
[snip]

> I'm not very sure about the indent rules for configury files, but in
> other areas of configure.ac, it seems using a similar indent convention
> as in .c files.
> 

Thanks Yufeng. I have updated the patch based on the comments above.

Marcus, is this OK for trunk now?

Thanks,
Kugan


gcc/

+2013-12-11  Kugan Vivekanandarajah  
+   * configure.ac: Add check for aarch64 assembler -mabi support.
+   * configure: Regenerate.
+   * config.in: Regenerate.
+   * config/aarch64/aarch64-elf.h (ASM_MABI_SPEC): New define.
+   (ASM_SPEC): Update to substitute -mabi with ASM_MABI_SPEC.
+   * config/aarch64/aarch64.h (aarch64_override_options):  Issue error if
+   assembler does not support -mabi and option ilp32 is selected.
+   * doc/install.texi: Added note that building gcc 4.9 and after with pre
+   2.24 binutils will not support -mabi=ilp32.
+
diff --git a/gcc/config/aarch64/aarch64-elf.h b/gcc/config/aarch64/aarch64-elf.h
index 4757d22..a66c3db 100644
--- a/gcc/config/aarch64/aarch64-elf.h
+++ b/gcc/config/aarch64/aarch64-elf.h
@@ -134,13 +134,19 @@
   " %{!mbig-endian:%{!mlittle-endian:" ENDIAN_SPEC "}}" \
   " %{!mabi=*:" ABI_SPEC "}"
 
+#ifdef HAVE_AS_MABI_OPTION
+#define ASM_MABI_SPEC  "%{mabi=*:-mabi=%*}"
+#else
+#define ASM_MABI_SPEC  "%{mabi=lp64:}"
+#endif
+
 #ifndef ASM_SPEC
 #define ASM_SPEC "\
 %{mbig-endian:-EB} \
 %{mlittle-endian:-EL} \
 %{mcpu=*:-mcpu=%*} \
-%{march=*:-march=%*} \
-%{mabi=*:-mabi=%*}"
+%{march=*:-march=%*}" \
+ASM_MABI_SPEC
 #endif
 
 #undef TYPE_OPERAND_FMT
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b1b4eef..01dbe23 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5187,6 +5187,13 @@ aarch64_override_options (void)
   aarch64_parse_tune ();
 }
 
+#ifndef HAVE_AS_MABI_OPTION
+  /* The compiler may have been configured with 2.23.* binutils, which does
+ not have support for ILP32.  */
+  if (TARGET_ILP32)
+error ("Assembler does not support -mabi=ilp32");
+#endif
+
   initialize_aarch64_code_model ();
 
   aarch64_build_bitmask_table ();
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 91a22d5..0a3b97b 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3495,6 +3495,35 @@ AC_DEFINE_UNQUOTED(HAVE_LTO_PLUGIN, $gcc_cv_lto_plugin,
 AC_MSG_RESULT($gcc_cv_lto_plugin)
 
 case "$target" in
+  aarch64*-*-*)
+gcc_GAS_CHECK_FEATURE([-mabi option], gcc_cv_as_aarch64_mabi,,
+  [-mabi=lp64], [.text],,,)
+if test x$gcc_cv_as_aarch64_mabi = xyes; then
+  AC_DEFINE(HAVE_AS_MABI_OPTION, 1,
+[Define if your assembler supports the -mabi option.])
+else
+  if test x$with_abi = xilp32; then
+AC_MSG_ERROR([Assembler does not support -mabi=ilp32.\
+ Upgrade the Assembler.])
+  fi
+  if test x"$with_multilib_list" = xdefault; then
+TM_MULTILIB_CONFIG=lp64
+  else
+aarch64_multilibs=`echo $with_multilib_list | sed -e 's/,/ /g'`
+for aarch64_multilib in ${aarch64_multilibs}; do
+  case ${aarch64_multilib} in
+ilp32)
+  AC_MSG_ERROR([Assembler does not support -mabi=ilp32.\
+Upgrade the Assembler.])
+  ;;
+*)
+  ;;
+  esac
+done
+  fi
+fi
+;;
+
   # All TARGET_ABI_OSF targets.
   alpha*-*-linux* | alpha*-*-*bsd*)
 gcc_GAS_CHECK_FEATURE([explicit relocation support],
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index a8f9f8a..00c4f0d 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3735,6 +3735,15 @@ removed and the system libunwind library will always be 
used.
 
 @html
 
+@end html
+@anchor{aarch64-x-x}
+@heading aarch64*-*-*
+Pre 2.24 binutils does not have support for selecting -mabi and does not
+support ILP32.  If GCC 4.9 or later is built with pre 2.24, GCC will not
+support option -mabi=ilp32.
+
+@html
+
 
 @end html
 @anchor{x-ibm-aix}

Re: [PING^2][PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2014-01-07 Thread Kugan

ping ?

I have reorganised the last patch and now handling only
VIEW_CONVERT_EXPR, CONVERT_EXPR and NOP_EXPR. Once it is reviewed and
necessary changes are made, I will address the other cases as a separate
patch (when it reaches that stage).

Thanks,
Kugan

gcc/

+2014-01-07  Kugan Vivekanandarajah  
+
+   * dojump.c (do_compare_and_jump): Generate rtl without
+   zero/sign extension if redundant.
+   * cfgexpand.c (expand_gimple_stmt_1): Likewise.
+   (is_assigned_exp_fit_type) : New function.
+   * cfgexpand.h (is_assigned_exp_fit_type) : Declare.
+
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 7a93975..b2e2f90 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -476,6 +476,66 @@ add_scope_conflicts_1 (basic_block bb, bitmap work, bool 
for_conflict)
 }
 }
 
+
+/* Check gimple assign stmt and see if zero/sign extension is
+   redundant.  i.e.  if an assignment gimple statement has RHS expression
+   value that can fit in LHS type, subreg and extension to fit can be
+   redundant.  Zero/sign extensions in this case can be removed.  */
+
+bool
+is_assigned_exp_fit_type (tree lhs)
+{
+  double_int type_min, type_max;
+  double_int min1, max1;
+  enum tree_code stmt_code;
+  tree rhs1;
+  gimple stmt = SSA_NAME_DEF_STMT (lhs);
+
+  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+return false;
+
+  /* We remove extension for non-pointer and integral stmts.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+  || POINTER_TYPE_P (TREE_TYPE (lhs)))
+return false;
+
+  stmt_code = gimple_assign_rhs_code (stmt);
+  rhs1 = gimple_assign_rhs1 (stmt);
+  type_max = tree_to_double_int (TYPE_MAX_VALUE (TREE_TYPE (lhs)));
+  type_min = tree_to_double_int (TYPE_MIN_VALUE (TREE_TYPE (lhs)));
+
+  if (TREE_CODE_CLASS (stmt_code) == tcc_unary)
+{
+  bool uns = TYPE_UNSIGNED (TREE_TYPE (rhs1));
+  /* Get the value range.  */
+  if (TREE_CODE (rhs1) == INTEGER_CST)
+   {
+ min1 = tree_to_double_int (rhs1);
+ max1 = tree_to_double_int (rhs1);
+   }
+  else if (get_range_info (rhs1, &min1, &max1) != VR_RANGE)
+   return false;
+
+  switch (stmt_code)
+   {
+   case VIEW_CONVERT_EXPR:
+   case CONVERT_EXPR:
+   case NOP_EXPR:
+ /* If rhs value range fits lhs type, zero/sign extension is
+   redundant.  */
+ if (max1.cmp (type_max, 0) != 1
+ && (type_min.cmp (min1, 0)) != 1)
+   return true;
+ else
+   return false;
+   default:
+ return false;
+   }
+}
+
+  return false;
+}
+
 /* Generate stack partition conflicts between all partitions that are
simultaneously live.  */
 
@@ -3247,6 +3307,20 @@ expand_gimple_stmt_1 (gimple stmt)
 
if (temp == target)
  ;
+   /* If the value in SUBREG of temp fits that SUBREG (does not
+  overflow) and is assigned to target SUBREG of the same mode
+  without sign conversion, we can skip the SUBREG
+  and extension.  */
+   else if (promoted
+&& is_assigned_exp_fit_type (lhs)
+&& (GET_CODE (temp) == SUBREG)
+&& (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (temp)))
+>= GET_MODE_PRECISION (GET_MODE (target)))
+&& (GET_MODE (SUBREG_REG (target))
+== GET_MODE (SUBREG_REG (temp
+ {
+   emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp));
+ }
else if (promoted)
  {
int unsignedp = SUBREG_PROMOTED_UNSIGNED_P (target);
diff --git a/gcc/cfgexpand.h b/gcc/cfgexpand.h
index 04517a3..c7d73e8 100644
--- a/gcc/cfgexpand.h
+++ b/gcc/cfgexpand.h
@@ -22,5 +22,6 @@ along with GCC; see the file COPYING3.  If not see
 
 extern tree gimple_assign_rhs_to_tree (gimple);
 extern HOST_WIDE_INT estimated_stack_frame_size (struct cgraph_node *);
+extern bool is_assigned_exp_fit_type (tree lhs);
 
 #endif /* GCC_CFGEXPAND_H */
diff --git a/gcc/dojump.c b/gcc/dojump.c
index 73df6d1..73a4b6b 100644
--- a/gcc/dojump.c
+++ b/gcc/dojump.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ggc.h"
 #include "basic-block.h"
 #include "tm_p.h"
+#include "cfgexpand.h"
 
 static bool prefer_and_bit_test (enum machine_mode, int);
 static void do_jump_by_parts_greater (tree, tree, int, rtx, rtx, int);
@@ -1166,6 +1167,62 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum 
rtx_code signed_code,
 
   type = TREE_TYPE (treeop0);
   mode = TYPE_MODE (type);
+
+  /* Is zero/sign extension redundant.  */
+  bool op0_ext_redundant = false;
+  bool op1_ext_redundant = false;
+
+  /* If promoted and the value in SUBREG of op0 fits (does not overflow),
+ it is a candidate for extension elimination.  */
+  if (GET_CODE (op0) == SUBREG && SUBREG_PRO

Re: [PING^2][PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2014-01-07 Thread Kugan

On 07/01/14 23:23, Richard Biener wrote:
> On Tue, 7 Jan 2014, Kugan wrote:

[snip]

> Note that VIEW_CONVERT_EXPR is wrong here.  I think you are
> handling this wrong still.  From a quick look you want to avoid
> the actual promotion for
> 
>   reg_1 = 
> 
> when reg_1 is promoted and thus the target is (subreg:XX N).
> The RHS has been expanded in XXmode.  Dependent on the value-range
> of reg_1 you want to set N to a paradoxical subreg of the expanded
> result.  You can always do that if the reg is zero-extended
> and else if the MSB is not set for any of the values of reg_1.

Thanks Richard for the explanation. I just want to double confirm I
understand you correctly before I attempt to fix it. So let me try this
for the following example,

for a gimple stmt of the following from:
unsigned short _5;
short int _6;
_6 = (short int)_5;

;; _6 = (short int) _5;
target = (subreg/s/u:HI (reg:SI 110 [ D.4144 ]) 0)
temp = (subreg:HI (reg:SI 118) 0)

So, I must generate the following if it satisfies the other conditions.
(set (reg:SI 110 [ D.4144 ]) (subreg:SI temp ))

Is my understanding correct?

> I don't see how is_assigned_exp_fit_type reflects this in any way.
>

What I tried doing with the patch is:

(insn 13 12 0 (set (reg:SI 110 [ D.4144 ])
(zero_extend:SI (subreg:HI (reg:SI 118) 0))) c5.c:8 -1
 (nil))

If the values in register (reg:SI 118) fits HI mode (without
overflowing), I assume that it is not necessary to just drop the higher
bits and zero_extend as done above and generate the following instead.

(insn 13 12 0 (set (reg:SI 110 [ D.4144 ])
(((reg:SI 118) 0))) c5.c:8 -1
 (nil))

is_assigned_exp_fit_type just checks if the range fits (in the above
case, the value in eg:SI 118 fits HI mode) and the checks before
emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp)); checks the
modes match.

Is this wrong  or am I missing the whole point?

> Anyway, the patch should not introduce another if (promoted)
> case but only short-cut the final convert_move call of the existing
> one.
>

Thanks,
Kugan

[AARCH64][PATCH] PR59695

2014-01-11 Thread Kugan

Hi,

aarch64_build_constant incorrectly truncates the immediate when
constants are generated with MOVN. This causes coinor-osi tests to fail
(tracked also in https://bugs.launchpad.net/gcc-linaro/+bug/1263576)

Attached patch fixes this. Also attaching a reduced testcase that
reproduces this. Tested on aarch64-none-linux-gnu with no new
regressions. Is this OK for trunk?

Thanks,
Kugan

gcc/
+2013-10-15  Matthew Gretton-Dann  
+   Kugan Vivekanandarajah  
+
+   PR target/59588
+   * config/aarch64/aarch64.c (aarch64_build_constant): Fix incorrect
+   truncation.
+


gcc/testsuite/
+2014-01-11  Matthew Gretton-Dann  
+   Kugan Vivekanandarajah  
+
+   PR target/59695
+   * g++.dg/pr59695.C: New file.
+
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3d32ea5..854666f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2486,7 +2486,7 @@ aarch64_build_constant (int regnum, HOST_WIDE_INT val)
   if (ncount < zcount)
{
  emit_move_insn (gen_rtx_REG (Pmode, regnum),
- GEN_INT ((~val) & 0x));
+ GEN_INT (~((~val) & 0x)));
  tval = 0x;
}
   else
diff --git a/gcc/testsuite/g++.dg/pr59695.C b/gcc/testsuite/g++.dg/pr59695.C
index e69de29..0da06cb 100644
--- a/gcc/testsuite/g++.dg/pr59695.C
+++ b/gcc/testsuite/g++.dg/pr59695.C
@@ -0,0 +1,125 @@
+
+/* PR target/53055 */
+/* { dg-do run { target aarch64*-*-* } } */
+/* { dg-options "-O0" } */
+
+#define  DEFINE_VIRTUALS_FNS(i)virtual void  xxx##i () {} \
+  virtual void  foo1_##i (){}\
+  virtual void  foo2_##i (){}\
+  virtual void  foo3_##i (){}\
+  virtual void  foo4_##i (){}\
+  virtual void  foo5_##i (){}\
+  virtual void  foo6_##i (){}\
+  virtual void  foo7_##i (){}\
+  virtual void  foo8_##i (){}\
+  virtual void  foo9_##i (){}\
+  virtual void  foo10_##i ()   {}\
+  virtual void  foo11_##i ()   {}\
+  virtual void  foo12_##i ()   {}\
+  virtual void  foo13_##i ()   {}\
+  virtual void  foo14_##i ()   {}\
+  virtual void  foo15_##i ()   {}\
+  virtual void  foo16_##i ()   {}\
+  virtual void  foo17_##i ()   {}\
+  virtual void  foo18_##i ()   {}\
+  virtual void  foo19_##i ()   {}\
+  virtual void  foo20_##i ()   {}\
+  virtual void  foo21_##i ()   {}\
+  virtual void  foo22_##i ()   {}\
+
+class base_class_2
+{
+
+public:
+  /* Define lots of virtual functions */
+  DEFINE_VIRTUALS_FNS (1)
+  DEFINE_VIRTUALS_FNS (2)
+  DEFINE_VIRTUALS_FNS (3)
+  DEFINE_VIRTUALS_FNS (4)
+  DEFINE_VIRTUALS_FNS (5)
+  DEFINE_VIRTUALS_FNS (6)
+  DEFINE_VIRTUALS_FNS (7)
+  DEFINE_VIRTUALS_FNS (8)
+  DEFINE_VIRTUALS_FNS (9)
+  DEFINE_VIRTUALS_FNS (10)
+  DEFINE_VIRTUALS_FNS (11)
+  DEFINE_VIRTUALS_FNS (12)
+  DEFINE_VIRTUALS_FNS (13)
+  DEFINE_VIRTUALS_FNS (14)
+  DEFINE_VIRTUALS_FNS (15)
+  DEFINE_VIRTUALS_FNS (16)
+  DEFINE_VIRTUALS_FNS (17)
+  DEFINE_VIRTUALS_FNS (18)
+  DEFINE_VIRTUALS_FNS (19)
+  DEFINE_VIRTUALS_FNS (20)
+
+  base_class_2();
+  virtual ~base_class_2 ();
+};
+
+base_class_2::base_class_2()
+{
+}
+
+base_class_2::~base_class_2 ()
+{
+}
+
+class base_class_1
+{
+public:
+  virtual ~base_class_1();
+  base_class_1();
+};
+
+base_class_1::base_class_1()
+{
+}
+
+base_class_1::~base_class_1()
+{
+}
+
+class base_Impl_class :
+  virtual public base_class_2, public base_class_1
+{
+public:
+  base_Impl_class ();
+  virtual ~base_Impl_class ();
+};
+
+base_Impl_class::base_Impl_class ()
+{
+}
+
+base_Impl_class::~base_Impl_class ()
+{
+}
+
+
+class test_cls : public base_Impl_class
+{
+public:
+  test_cls();
+  virtual ~test_cls();
+};
+
+test_cls::test_cls()
+{
+}
+
+test_cls::~test_cls()
+{
+}
+
+int main()
+{
+  test_cls *test = new test_cls;
+  base_class_2 *p1 = test;
+
+  /* PR 53055  destructor thunk offsets are not setup
+   correctly resulting in crash.  */
+  delete p1;
+  return 0;
+}
+

Re: [AARCH64][PATCH] PR59695

2014-01-15 Thread Kugan

On 13/01/14 21:05, Richard Earnshaw wrote:
> On 11/01/14 23:42, Kugan wrote:
>> Hi,
>>
>> aarch64_build_constant incorrectly truncates the immediate when
>> constants are generated with MOVN. This causes coinor-osi tests to fail
>> (tracked also in https://bugs.launchpad.net/gcc-linaro/+bug/1263576)
>>
>> Attached patch fixes this. Also attaching a reduced testcase that
>> reproduces this. Tested on aarch64-none-linux-gnu with no new
>> regressions. Is this OK for trunk?
>>
>> Thanks,
>> Kugan
>>
>> gcc/
>> +2013-10-15  Matthew Gretton-Dann  
>> +Kugan Vivekanandarajah  
>> +
>> +PR target/59588
>> +* config/aarch64/aarch64.c (aarch64_build_constant): Fix incorrect
>> +truncation.
>> +
>>
>>
>> gcc/testsuite/
>> +2014-01-11  Matthew Gretton-Dann  
>> +Kugan Vivekanandarajah  
>> +
>> +PR target/59695
>> +* g++.dg/pr59695.C: New file.
>> +
>>
>>
>> p.txt
>>
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 3d32ea5..854666f 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -2486,7 +2486,7 @@ aarch64_build_constant (int regnum, HOST_WIDE_INT val)
>>if (ncount < zcount)
>>  {
>>emit_move_insn (gen_rtx_REG (Pmode, regnum),
>> -  GEN_INT ((~val) & 0x));
>> +  GEN_INT (~((~val) & 0x)));
> 
> I think that would be better written as
> 
>   GEN_INT (val | ~(HOST_WIDE_INT) 0x);
> 
> Note the cast after the ~ to ensure we invert the right number of bits.
> 
> Otherwise OK.
> 

Thanks Richard. Is this OK for back-porting to 4.8 as well?

Thanks,
Kugan

Re: [PR66726] Factor conversion out of COND_EXPR

2015-07-09 Thread Kugan



On 08/07/15 00:41, Jeff Law wrote:
> On 07/07/2015 06:50 AM, Kugan wrote:
>>
>> Thanks for the review. I have addressed your comments above in the
>> attached patch.
>>
>> I have one question with respect to unary operation. For generic unary
>> operation with INTEGER_CST, do we skip this or do we have to perform the
>> inverse operation so that the conversion after PHI will restore it? Is
>> there any easy way to do this safely?
> I think we'd have to invert the operation -- some of which are trivial,
> such as BIT_NOT_EXPR.
> 
> NEGATE_EXPR is trivial once you filter out the cases where inversion
> will create signed overflow (ie INT_MIN and like when arg1 is an
> INTEGER_CST).
> 
> Similarly ABS_EXPR is trivial once you filter out cases where arg1 is a
> negative INTEGER_CST.
> 
> If you want to try and handle those cases, I'm certainly comfortable
> with that as a follow-up.  Obviously we'll want to testcases for them,
> including the cases where we don't want to make the transformation for
> NEGATE_EXPR and ABS_EXPR.
> 
> There may be other special cases we need to handle for other unary
> operations.  I haven't walked through the full list.

Thanks Jeff for the review.As you said later, I will skip generic unary
in this patch and work on that as an addition on top of this.


> 
>>
>> Bootstrapped and regression tested the attached patch on
>> x86-64-none-linux-gnu with no new regressions.
>>
>> Thanks,
>> Kugan
>>
>>
>> p.txt
>>
>>
>> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
>> index 92b4ab0..1d6de9b 100644
>> --- a/gcc/tree-ssa-phiopt.c
>> +++ b/gcc/tree-ssa-phiopt.c
>> @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
>>   static unsigned int tree_ssa_phiopt_worker (bool, bool);
>>   static bool conditional_replacement (basic_block, basic_block,
>>edge, edge, gphi *, tree, tree);
>> +static bool factor_out_conditional_conversion (edge, edge, gphi *,
>> tree, tree);
>>   static int value_replacement (basic_block, basic_block,
>> edge, edge, gimple, tree, tree);
>>   static bool minmax_replacement (basic_block, basic_block,
>> @@ -335,6 +336,17 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool
>> do_hoist_loads)
>>node.  */
>> gcc_assert (arg0 != NULL && arg1 != NULL);
>>
>> +  if (factor_out_conditional_conversion (e1, e2, phi, arg0, arg1))
>> +{
>> +  /* Update arg0 and arg1.  */
>> +  phis = phi_nodes (bb2);
>> +  phi = single_non_singleton_phi_for_edges (phis, e1, e2);
>> +  gcc_assert (phi);
>> +  arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
>> +  arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
>> +  gcc_assert (arg0 != NULL && arg1 != NULL);
>> +}
>> +
> So small comment before this block of code indicating why we're
> recomputing these values.  Something like this perhaps:
> 
> /* factor_out_conditional_conversion may create a new PHI in BB2 and
>eliminate an existing PHI in BB2.  Recompute values that may be
>affected by that change.  */
> 
> 
> Or something along those lines.

Done.

> 
> 
>> /* Do the replacement of conditional if it can be done.  */
>> if (conditional_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
>>   cfgchanged = true;
>> @@ -410,6 +422,133 @@ replace_phi_edge_with_variable (basic_block
>> cond_block,
>> bb->index);
>>   }
>>
>> +/* PR66726: Factor conversion out of COND_EXPR.  If the arguments of
>> the PHI
>> +   stmt are CONVERT_STMT, factor out the conversion and perform the
>> conversion
>> +   to the result of PHI stmt.  */
>> +
>> +static bool
>> +factor_out_conditional_conversion (edge e0, edge e1, gphi *phi,
>> +   tree arg0, tree arg1)
>> +{
>> +  gimple arg0_def_stmt = NULL, arg1_def_stmt = NULL, new_stmt;
>> +  tree new_arg0 = NULL_TREE, new_arg1 = NULL_TREE;
>> +  tree temp, result;
>> +  gphi *newphi;
>> +  gimple_stmt_iterator gsi, gsi_for_def;
>> +  source_location locus = gimple_location (phi);
>> +  enum tree_code convert_code;
>> +
>> +  /* Handle only PHI statements with two arguments.  TODO: If all
>> + other arguments to PHI are INTEGER_CST, we can handle more
>> + than two arguments too.  */
>> +  if (gimple_phi_num_args (phi) != 2)
>> +return false;
> Similarly we can hand

Re: [PR66726] Factor conversion out of COND_EXPR

2015-07-12 Thread Kugan



On 11/07/15 06:40, Jeff Law wrote:
> On 07/09/2015 05:08 PM, Kugan wrote:
> 
>> Done. Bootstrapped and regression tested on x86-64-none-linux-gnu with
>> no new regressions. Is this OK for trunk?
> Thanks for the additional testcases.
> 
> 
> 
>> +  else
>> +{
>> +  /* If arg1 is an INTEGER_CST, fold it to new type.  */
>> +  if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
>> +  && int_fits_type_p (arg1, TREE_TYPE (new_arg0)))
>> +{
>> +  if (gimple_assign_cast_p (arg0_def_stmt))
>> +new_arg1 = fold_convert (TREE_TYPE (new_arg0), arg1);
>> +  else
>> +return false;
>> +}
>> +  else
>> +return false;
>> +}
> Something looks goofy here formatting-wise.  Can you please check for
> horizontal whitespace consistency before committing.
> 
> 
> 
>> +
>> +  /* If types of new_arg0 and new_arg1 are different bailout.  */
>> +  if (TREE_TYPE (new_arg0) != TREE_TYPE (new_arg1))
>> +return false;
> Seems like this should use types_compatible_p here.  You're testing
> pointer equality, but as long as the types are compatible, we should be
> able to make the transformation.
> 
> With the horizontal whitespace fixed and using types_compatible_p this
> is OK for the trunk.  So pre-approved with those two changes and a final
> bootstrap/regression test (due to the types_compatible_p change).
> 
> jeff
> 

Thanks. Committed as r225722 with the changes. Also did a fresh
bootstrap and regression testing on x86_64-none-linux-gnu before committing.

Thanks,
Kugan

Re: [PR66726] Factor conversion out of COND_EXPR

2015-07-15 Thread Kugan


Here is a patch to fix to teach tree-ssa-reassoc the sinking the cast.
Bootstrapped and regression tested on x86-64-none-linux-gnu with no new
regressions. Also regression tested on qemu arm.

I also verified the issue Andreas Schwab raised is fixed on arm
cortex-a5 where the same issue was present. Does this make sense?

Thanks,
Kugan

gcc/ChangeLog:

2015-07-15  Kugan Vivekanandarajah  

PR middle-end/66726
* tree-ssa-reassoc.c (optimize_range_tests): Handle sinking the cast
after PHI.
(final_range_test_p): Detect sinking the cast after PHI.
(maybe_optimize_range_tests): Handle sinking the cast after PHI.
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 932c83a..3058eb5 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -2707,18 +2707,32 @@ optimize_range_tests (enum tree_code opcode,
# _345 = PHI <_123(N), 1(...), 1(...)>
where _234 has bool type, _123 has single use and
bb N has a single successor M.  This is commonly used in
-   the last block of a range test.  */
+   the last block of a range test.
+
+   Also Return true if STMT is tcc_compare like:
+   :
+   ...
+   _234 = a_2(D) == 2;
 
+   :
+   # _345 = PHI <_234(N), 1(...), 1(...)>
+   _346 = (int) _345;
+   where _234 has booltype, single use and
+   bb N has a single successor M.  This is commonly used in
+   the last block of a range test.  */
 static bool
 final_range_test_p (gimple stmt)
 {
-  basic_block bb, rhs_bb;
+  basic_block bb, rhs_bb, lhs_bb;
   edge e;
   tree lhs, rhs;
   use_operand_p use_p;
   gimple use_stmt;
 
-  if (!gimple_assign_cast_p (stmt))
+  if (!gimple_assign_cast_p (stmt)
+  && (!is_gimple_assign (stmt)
+ || (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
+ != tcc_comparison)))
 return false;
   bb = gimple_bb (stmt);
   if (!single_succ_p (bb))
@@ -2729,9 +2743,8 @@ final_range_test_p (gimple stmt)
 
   lhs = gimple_assign_lhs (stmt);
   rhs = gimple_assign_rhs1 (stmt);
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
-  || TREE_CODE (rhs) != SSA_NAME
-  || TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE)
+  if (TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE
+  && TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)
 return false;
 
   /* Test whether lhs is consumed only by a PHI in the only successor bb.  */
@@ -2743,10 +2756,21 @@ final_range_test_p (gimple stmt)
 return false;
 
   /* And that the rhs is defined in the same loop.  */
-  rhs_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs));
-  if (rhs_bb == NULL
-  || !flow_bb_inside_loop_p (loop_containing_stmt (stmt), rhs_bb))
-return false;
+  if (gimple_assign_cast_p (stmt))
+{
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+ || TREE_CODE (rhs) != SSA_NAME
+ || !(rhs_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs)))
+ || !flow_bb_inside_loop_p (loop_containing_stmt (stmt), rhs_bb))
+   return false;
+}
+  else
+{
+  if (TREE_CODE (lhs) != SSA_NAME
+ || !(lhs_bb = gimple_bb (SSA_NAME_DEF_STMT (lhs)))
+ || !flow_bb_inside_loop_p (loop_containing_stmt (stmt), lhs_bb))
+   return false;
+}
 
   return true;
 }
@@ -3132,6 +3156,8 @@ maybe_optimize_range_tests (gimple stmt)
 
  /* stmt is
 _123 = (int) _234;
+OR
+_234 = a_2(D) == 2;
 
 followed by:
 :
@@ -3161,6 +3187,8 @@ maybe_optimize_range_tests (gimple stmt)
 of the bitwise or resp. and, recursively.  */
  if (!get_ops (rhs, code, &ops,
loop_containing_stmt (stmt))
+ && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
+ != tcc_comparison)
  && has_single_use (rhs))
{
  /* Otherwise, push the _234 range test itself.  */
@@ -3173,6 +3201,22 @@ maybe_optimize_range_tests (gimple stmt)
  ops.safe_push (oe);
  bb_ent.last_idx++;
}
+ else if (!get_ops (lhs, code, &ops,
+loop_containing_stmt (stmt))
+  && is_gimple_assign (stmt)
+  && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
+  == tcc_comparison)
+  && has_single_use (lhs))
+   {
+ /* Push the _234 range test itself.  */
+ operand_entry_t oe = operand_entry_pool.allocate ();
+ oe->op = lhs;
+ oe->rank = code;
+ oe->id = 0;
+ oe->count = 1;
+ ops.safe_push (oe);
+ bb_ent.last_idx++;
+   }
  else
bb_ent.last_idx = ops.length ();
  bb_ent.op = rhs;
@@ -3267,7 +3311,8 @@ maybe_optimize_range_tests (gimple stmt)
else if (gimple_assign_cast_p (use_stmt))
  cast_stmt = use_stmt;
else
-

Re: [PR66726] Factor conversion out of COND_EXPR

2015-07-15 Thread Kugan


>>
>> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
>> index 932c83a..3058eb5 100644
>> --- a/gcc/tree-ssa-reassoc.c
>> +++ b/gcc/tree-ssa-reassoc.c
> 
>>   return false;
>> bb = gimple_bb (stmt);
>> if (!single_succ_p (bb))
>> @@ -2729,9 +2743,8 @@ final_range_test_p (gimple stmt)
>>
>> lhs = gimple_assign_lhs (stmt);
>> rhs = gimple_assign_rhs1 (stmt);
>> -  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>> -  || TREE_CODE (rhs) != SSA_NAME
>> -  || TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE)
>> +  if (TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE
>> +  && TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)
>>   return false;
> So you're ensuring that one of the two is a boolean...  Note that
> previously we ensured that the rhs was a boolean and the lhs was an
> integral type (which I believe is true for booleans).
> 
> Thus if we had
> bool x;
> int y;
> 
> x = (bool) y;
> 
> The old code would have rejected that case.  But I think it gets through
> now, right?
> 
> I think once that issue is addressed, this will be good for the trunk.
> 

Thanks for the review. How about:

-  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
-  || TREE_CODE (rhs) != SSA_NAME
-  || TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE)
+  if (gimple_assign_cast_p (stmt)
+  && (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+ || TREE_CODE (rhs) != SSA_NAME
+ || TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE))


Thanks,
Kugan

gcc/ChangeLog:

2015-07-16  Kugan Vivekanandarajah  

PR middle-end/66726
* tree-ssa-reassoc.c (optimize_range_tests): Handle tcc_compare stmt
whose result is used in PHI.
(maybe_optimize_range_tests): Likewise.
(final_range_test_p): Lokweise.

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 932c83a..78c80d6 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -2707,18 +2707,32 @@ optimize_range_tests (enum tree_code opcode,
# _345 = PHI <_123(N), 1(...), 1(...)>
where _234 has bool type, _123 has single use and
bb N has a single successor M.  This is commonly used in
-   the last block of a range test.  */
+   the last block of a range test.
+
+   Also Return true if STMT is tcc_compare like:
+   :
+   ...
+   _234 = a_2(D) == 2;
 
+   :
+   # _345 = PHI <_234(N), 1(...), 1(...)>
+   _346 = (int) _345;
+   where _234 has booltype, single use and
+   bb N has a single successor M.  This is commonly used in
+   the last block of a range test.  */
 static bool
 final_range_test_p (gimple stmt)
 {
-  basic_block bb, rhs_bb;
+  basic_block bb, rhs_bb, lhs_bb;
   edge e;
   tree lhs, rhs;
   use_operand_p use_p;
   gimple use_stmt;
 
-  if (!gimple_assign_cast_p (stmt))
+  if (!gimple_assign_cast_p (stmt)
+  && (!is_gimple_assign (stmt)
+ || (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
+ != tcc_comparison)))
 return false;
   bb = gimple_bb (stmt);
   if (!single_succ_p (bb))
@@ -2729,9 +2743,10 @@ final_range_test_p (gimple stmt)
 
   lhs = gimple_assign_lhs (stmt);
   rhs = gimple_assign_rhs1 (stmt);
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
-  || TREE_CODE (rhs) != SSA_NAME
-  || TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE)
+  if (gimple_assign_cast_p (stmt)
+  && (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+ || TREE_CODE (rhs) != SSA_NAME
+ || TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE))
 return false;
 
   /* Test whether lhs is consumed only by a PHI in the only successor bb.  */
@@ -2743,10 +2758,20 @@ final_range_test_p (gimple stmt)
 return false;
 
   /* And that the rhs is defined in the same loop.  */
-  rhs_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs));
-  if (rhs_bb == NULL
-  || !flow_bb_inside_loop_p (loop_containing_stmt (stmt), rhs_bb))
-return false;
+  if (gimple_assign_cast_p (stmt))
+{
+  if (TREE_CODE (rhs) != SSA_NAME
+ || !(rhs_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs)))
+ || !flow_bb_inside_loop_p (loop_containing_stmt (stmt), rhs_bb))
+   return false;
+}
+  else
+{
+  if (TREE_CODE (lhs) != SSA_NAME
+ || !(lhs_bb = gimple_bb (SSA_NAME_DEF_STMT (lhs)))
+ || !flow_bb_inside_loop_p (loop_containing_stmt (stmt), lhs_bb))
+   return false;
+}
 
   return true;
 }
@@ -3132,6 +3157,8 @@ maybe_optimize_range_tests (gimple stmt)
 
  /* stmt is
 _123 = (int) _234;
+OR
+_234 = a_2(D) == 2;
 
 followed by:
 :
@@ -3161,6 +3188,8 @@ maybe_optimize_range_tests (gimple stmt)
 of the bitwise or resp. and, recursively.  */
  if (!get_ops (rhs, code, &ops,
loop_containing_stmt (stmt))
+ && (TREE_

Re: [PATCH 2/2] Set REG_EQUAL

2015-07-17 Thread Kugan

Ping?


On 28/06/15 21:30, Kugan wrote:
> This patch sets REG_EQUAL when emitting arm_emit_movpair.
> 
> Thanks,
> Kugan
> 
> gcc/testsuite/ChangeLog:
> 
> 2015-06-26  Kugan Vivekanandarajah  
> 
>   * gcc.target/arm/reg_equal_test.c: New test.
> 
> gcc.
> 
> 2015-06-26  Kugan Vivekanandarajah  
> 
>   * config/arm/arm.c (arm_emit_movpair): Add REG_EQUAL notes to
>   instruction.
>

Re: [PATCH 1/2] Allow REG_EQUAL for ZERO_EXTRACT

2015-07-19 Thread Kugan

I have made a mistake while addressing the review comments for this
patch. Unfortunately, It was not detected in my earlier testing. My
sincere graphology for the mistake.

I have basically missed the STRICT_LOW_PART check for the first if-check
thus the second part (which is the ZERO_EXTRACT part) will never get
executed. Attached patch fixes this along with some minor changes.

Bootstrapped and regression tested on arm-none-linux (Chromebook) and
x86-64-linux-gnu with no new regression along with the ARM ennoblement
patch.

Also did a complete arm qemu regression testing with Chriophe's scripts
with no new regression.
(http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/225987-reg4/report-build-info.html)

Is this OK for trunk,


Thanks,
Kugan

gcc/ChangeLog:

2015-07-20  Kugan Vivekanandarajah  

* cse.c (cse_insn): Fix missing check for STRICT_LOW_PART and minor
clean up.
diff --git a/gcc/cse.c b/gcc/cse.c
index 1c14d83..96adf18 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4529,10 +4529,10 @@ cse_insn (rtx_insn *insn)
  this case, and if it isn't set, then there will be no equivalence
  for the destination.  */
   if (n_sets == 1 && REG_NOTES (insn) != 0
-  && (tem = find_reg_note (insn, REG_EQUAL, NULL_RTX)) != 0)
+  && (tem = find_reg_note (insn, REG_EQUAL, NULL_RTX)) != 0
+  && (! rtx_equal_p (XEXP (tem, 0), SET_SRC (sets[0].rtl
 {
-  if ((! rtx_equal_p (XEXP (tem, 0), SET_SRC (sets[0].rtl)))
- || GET_CODE (SET_DEST (sets[0].rtl)) == STRICT_LOW_PART)
+  if (GET_CODE (SET_DEST (sets[0].rtl)) == STRICT_LOW_PART)
src_eqv = copy_rtx (XEXP (tem, 0));
 
   /* If DEST is of the form ZERO_EXTACT, as in:
@@ -4544,14 +4544,14 @@ cse_insn (rtx_insn *insn)
 point.  Note that this is different from SRC_EQV. We can however
 calculate SRC_EQV with the position and width of ZERO_EXTRACT.  */
   else if (GET_CODE (SET_DEST (sets[0].rtl)) == ZERO_EXTRACT
-  && CONST_INT_P (src_eqv)
+  && CONST_INT_P (XEXP (tem, 0))
   && CONST_INT_P (XEXP (SET_DEST (sets[0].rtl), 1))
   && CONST_INT_P (XEXP (SET_DEST (sets[0].rtl), 2)))
{
  rtx dest_reg = XEXP (SET_DEST (sets[0].rtl), 0);
  rtx width = XEXP (SET_DEST (sets[0].rtl), 1);
  rtx pos = XEXP (SET_DEST (sets[0].rtl), 2);
- HOST_WIDE_INT val = INTVAL (src_eqv);
+ HOST_WIDE_INT val = INTVAL (XEXP (tem, 0));
  HOST_WIDE_INT mask;
  unsigned int shift;
  if (BITS_BIG_ENDIAN)

Re: [PATCH 1/2] Allow REG_EQUAL for ZERO_EXTRACT

2015-07-26 Thread Kugan



On 27/07/15 05:38, Andreas Schwab wrote:
> Kugan  writes:
> 
>>  * cse.c (cse_insn): Fix missing check for STRICT_LOW_PART and minor
>>  clean up.
> 
> This breaks 
> 
> gcc.target/m68k/tls-ie-xgot.c scan-assembler jsr __m68k_read_tp
> gcc.target/m68k/tls-ie.c scan-assembler jsr __m68k_read_tp
> gcc.target/m68k/tls-le-xtls.c scan-assembler jsr __m68k_read_tp
> gcc.target/m68k/tls-le.c scan-assembler jsr __m68k_read_tp

I am Looking into it now.

Thanks,
Kugan

Re: [PR66726] Factor conversion out of COND_EXPR

2015-07-26 Thread Kugan



On 24/07/15 05:05, Jeff Law wrote:
> On 07/15/2015 11:52 PM, Kugan wrote:
>>
>>>>
>>>> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
>>>> index 932c83a..3058eb5 100644
>>>> --- a/gcc/tree-ssa-reassoc.c
>>>> +++ b/gcc/tree-ssa-reassoc.c
>>>
>>>>return false;
>>>>  bb = gimple_bb (stmt);
>>>>  if (!single_succ_p (bb))
>>>> @@ -2729,9 +2743,8 @@ final_range_test_p (gimple stmt)
>>>>
>>>>  lhs = gimple_assign_lhs (stmt);
>>>>  rhs = gimple_assign_rhs1 (stmt);
>>>> -  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>>>> -  || TREE_CODE (rhs) != SSA_NAME
>>>> -  || TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE)
>>>> +  if (TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE
>>>> +  && TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)
>>>>return false;
>>> So you're ensuring that one of the two is a boolean...  Note that
>>> previously we ensured that the rhs was a boolean and the lhs was an
>>> integral type (which I believe is true for booleans).
>>>
>>> Thus if we had
>>> bool x;
>>> int y;
>>>
>>> x = (bool) y;
>>>
>>> The old code would have rejected that case.  But I think it gets through
>>> now, right?
>>>
>>> I think once that issue is addressed, this will be good for the trunk.
>>>
>>
>> Thanks for the review. How about:
>>
>> -  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>> -  || TREE_CODE (rhs) != SSA_NAME
>> -  || TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE)
>> +  if (gimple_assign_cast_p (stmt)
>> +  && (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>> +  || TREE_CODE (rhs) != SSA_NAME
>> +  || TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE))
> But then I think you need to verify that for the  _234 = a_2(D) == 2;
> case that type of the RHS is a boolean.
> 
> ie, each case has requirements for the types.  I don't think they can be
> reasonably unified.  So something like this:
> 
> if (gimple_assign_cast_p (stmt)
> && ! (correct types for cast)
>return false;
> 
> if (!gimple_assign_cast_p (stmt)
> && ! (correct types for tcc_comparison case))
>   return false;
> 
> 
> This works because we've already verified that it's either a type
> conversion or a comparison on the RHS.
>
I thought that when !gimple_assign_cast_p (stmt), RHS will always
boolean. I have now added this check in the attached patch.

I also noticed that in maybe_optimize_range_tests, GIMPLE_COND can
have non compatible types when new_op is updated
(boolean types coming from tcc_compare results) and hence need to be
converted. Changed that as well.

Bootstrapped and regression tested on x86-64-none-linux-gnu with no new
regressions. Is this OK for trunk?

Thanks,
Kugan

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index efb813c..cc215b6 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -2707,18 +2707,32 @@ optimize_range_tests (enum tree_code opcode,
# _345 = PHI <_123(N), 1(...), 1(...)>
where _234 has bool type, _123 has single use and
bb N has a single successor M.  This is commonly used in
-   the last block of a range test.  */
+   the last block of a range test.
+
+   Also Return true if STMT is tcc_compare like:
+   :
+   ...
+   _234 = a_2(D) == 2;
 
+   :
+   # _345 = PHI <_234(N), 1(...), 1(...)>
+   _346 = (int) _345;
+   where _234 has booltype, single use and
+   bb N has a single successor M.  This is commonly used in
+   the last block of a range test.  */
 static bool
 final_range_test_p (gimple stmt)
 {
-  basic_block bb, rhs_bb;
+  basic_block bb, rhs_bb, lhs_bb;
   edge e;
   tree lhs, rhs;
   use_operand_p use_p;
   gimple use_stmt;
 
-  if (!gimple_assign_cast_p (stmt))
+  if (!gimple_assign_cast_p (stmt)
+  && (!is_gimple_assign (stmt)
+ || (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
+ != tcc_comparison)))
 return false;
   bb = gimple_bb (stmt);
   if (!single_succ_p (bb))
@@ -2729,11 +2743,16 @@ final_range_test_p (gimple stmt)
 
   lhs = gimple_assign_lhs (stmt);
   rhs = gimple_assign_rhs1 (stmt);
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
-  || TREE_CODE (rhs) != SSA_NAME
-  || TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE)
+  if (gimple_assign_cast_p (stmt)
+  && (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+ || TREE_CODE (rhs) != SSA_NAME
+ || TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE))
 return false;
 
+  if (!gimple_assign_cast_p (stmt)
+  && (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)

Re: [PATCH 1/2] Allow REG_EQUAL for ZERO_EXTRACT

2015-07-28 Thread Kugan



On 27/07/15 05:38, Andreas Schwab wrote:
> Kugan  writes:
> 
>>  * cse.c (cse_insn): Fix missing check for STRICT_LOW_PART and minor
>>  clean up.
> 
> This breaks 
> 
> gcc.target/m68k/tls-ie-xgot.c scan-assembler jsr __m68k_read_tp
> gcc.target/m68k/tls-ie.c scan-assembler jsr __m68k_read_tp
> gcc.target/m68k/tls-le-xtls.c scan-assembler jsr __m68k_read_tp
> gcc.target/m68k/tls-le.c scan-assembler jsr __m68k_read_tp
> 
> Andreas.
> 

Sorry for the breakage. My patch to add ZERO_EXTRACT unfortunately
restricts the behaviour in one other case. That is, even when REG_EQUAL
note and src are same, we were setting src_eqv to src when it is
STRICT_LOW_PART. Not sure why but restored the old behaviour.

I could reproduce this issue by inspecting the generated asm and made
sure that it is fixed. However I could not run regression for m68k
(Sorry I don’t have access to the set-up).
I bootstrapped and regression tested on x86_64-linux-gnu and
arm-none-linux-gnu with no new regressions.

Thanks,
Kugan


gcc/ChangeLog:

2015-07-27  Kugan Vivekanandarajah  

* cse.c (cse_insn): Restoring old behaviour for src_eqv
 when dest and value in the REG_EQUAL are same and dest
 is STRICT_LOW_PART.
diff --git a/gcc/cse.c b/gcc/cse.c
index 96adf18..17c0954 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4529,12 +4529,13 @@ cse_insn (rtx_insn *insn)
  this case, and if it isn't set, then there will be no equivalence
  for the destination.  */
   if (n_sets == 1 && REG_NOTES (insn) != 0
-  && (tem = find_reg_note (insn, REG_EQUAL, NULL_RTX)) != 0
-  && (! rtx_equal_p (XEXP (tem, 0), SET_SRC (sets[0].rtl
+  && (tem = find_reg_note (insn, REG_EQUAL, NULL_RTX)) != 0)
 {
-  if (GET_CODE (SET_DEST (sets[0].rtl)) == STRICT_LOW_PART)
-   src_eqv = copy_rtx (XEXP (tem, 0));
 
+  if (GET_CODE (SET_DEST (sets[0].rtl)) != ZERO_EXTRACT
+ && (! rtx_equal_p (XEXP (tem, 0), SET_SRC (sets[0].rtl))
+ || GET_CODE (SET_DEST (sets[0].rtl)) == STRICT_LOW_PART))
+   src_eqv = copy_rtx (XEXP (tem, 0));
   /* If DEST is of the form ZERO_EXTACT, as in:
 (set (zero_extract:SI (reg:SI 119)
  (const_int 16 [0x10])

Re: [RFC] Elimination of zext/sext - type promotion pass

2015-08-04 Thread kugan

nt _11;
+  unsigned int _12;
+  unsigned int _13;
+  unsigned int _15;
+  unsigned int _16;
+  unsigned int _18;
+  unsigned int _19;
+  unsigned int _21;
+  unsigned int _22;
+  unsigned int _24;
+  short unsigned int _25;
+  unsigned int _26;
+  unsigned int _27;
+  unsigned int _28;
+  unsigned int _29;

   :
+  _8 = (unsigned int) data_4(D);
+  _7 = (unsigned int) crc_30(D);

   :
-  # crc_28 = PHI 
-  # data_29 = PHI 
-  # ivtmp_18 = PHI 
-  _9 = (unsigned char) crc_28;
-  _10 = _9 ^ data_29;
-  x16_11 = _10 & 1;
-  data_12 = data_29 >> 1;
-  if (x16_11 == 1)
+  # _28 = PHI <_2(5), _7(2)>
+  # _29 = PHI <_12(5), _8(2)>
+  # _18 = PHI <_5(5), 8(2)>
+  _9 = _28 & 255;
+  _10 = _9 ^ _29;
+  _11 = _10 & 1;
+  _3 = _29 & 255;
+  _12 = _3 >> 1;
+  _27 = _11 & 255;
+  if (_27 == 1)
 goto ;
   else
 goto ;

   :
-  crc_13 = crc_28 ^ 16386;
-  crc_24 = crc_13 >> 1;
-  crc_15 = crc_24 | 32768;
+  _13 = _28 ^ 16386;
+  _26 = _13 & 65535;
+  _24 = _26 >> 1;
+  _15 = _24 | 4294934528;

   :
-  # crc_2 = PHI 
-  ivtmp_5 = ivtmp_18 - 1;
-  if (ivtmp_5 != 0)
+  # _2 = PHI <_15(4), _21(7)>
+  _5 = _18 - 1;
+  _22 = _5 & 255;
+  if (_22 != 0)
 goto ;
   else
 goto ;

   :
-  # crc_19 = PHI 
-  return crc_19;
+  # _19 = PHI <_2(5)>
+  _25 = (short unsigned int) _19;
+  return _25;

   :
-  crc_21 = crc_28 >> 1;
+  _16 = _28 & 65535;
+  _21 = _16 >> 1;
   goto ;

 }


--- crc.org.s   2015-08-05 08:54:17.491131520 +1000
+++ crc.new.s   2015-08-05 08:53:12.183132534 +1000
@@ -15,27 +15,28 @@
.global crc2
.syntax divided
.arm
.type   crc2, %function
 crc2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
mov ip, #32768
movtip, 65535
str lr, [sp, #-4]!
-   mov r3, #8
+   mov r2, #8
movwlr, #16386
 .L3:
-   eor r2, r1, r0
-   sub r3, r3, #1
-   tst r2, #1
+   uxtbr3, r0
+   eor r3, r3, r1
mov r1, r1, lsr #1
+   tst r3, #1
eorne   r0, r0, lr
-   moveq   r0, r0, lsr #1
-   orrne   r0, ip, r0, lsr #1
-   uxthne  r0, r0
-   andsr3, r3, #255
+   ubfxeq  r0, r0, #1, #15
+   ubfxne  r0, r0, #1, #15
+   orrne   r0, r0, ip
+   subsr2, r2, #1
bne .L3
+   uxthr0, r0
ldr pc, [sp], #4
.size   crc2, .-crc2
.ident  "GCC: (GNU) 6.0.0 20150724 (experimental)"
.section.note.GNU-stack,"",%progbits



Testsuite regression for x86_64-unknown-linux-gnu:
Tests that now fail, but worked before:
gfortran.dg/graphite/pr42393-1.f90   -O  (test for excess errors)


Testsuite regression for  arm-linux-gnu:
Tests that now fail, but worked before:
arm-sim: gcc.dg/fixed-point/convert-sat.c execution test
arm-sim: gcc.dg/tree-ssa/20030729-1.c scan-tree-dump-times dom2 "\\(unsigned
int\\)" 0
arm-sim: gcc.dg/tree-ssa/pr54245.c scan-tree-dump-times slsr "Inserting
initializer" 0
arm-sim: gcc.dg/tree-ssa/shorten-1.c scan-tree-dump-not optimized
"\\(int\\)"
arm-sim: gcc.dg/tree-ssa/shorten-1.c scan-tree-dump-times optimized
"\\(unsigned char\\)" 8
arm-sim: gcc.target/arm/mla-2.c scan-assembler smlalbb
arm-sim: gcc.target/arm/unsigned-extend-2.c scan-assembler ands
arm-sim: gcc.target/arm/wmul-1.c scan-assembler-times smlabb 2
arm-sim: gcc.target/arm/wmul-2.c scan-assembler-times smulbb 1
arm-sim: gcc.target/arm/wmul-3.c scan-assembler-times smulbb 2
arm-sim: gcc.target/arm/wmul-9.c scan-assembler smlalbb
arm-sim: gfortran.dg/graphite/pr42393-1.f90   -O  (test for excess errors)

Tests that now work, but didn't before:
arm-sim: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
"Read tp_first_run: 0" 2
arm-sim: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
"Read tp_first_run: 2" 1
arm-sim: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
"Read tp_first_run: 3" 1
arm-sim: gcc.target/arm/builtin-bswap-1.c scan-assembler-times rev16ne\\t 1
arm-sim: gcc.target/arm/builtin-bswap-1.c scan-assembler-times revshne\\t 1
arm-sim: gcc.target/arm/smlaltb-1.c scan-assembler smlaltb\\t
arm-sim: gcc.target/arm/smlaltt-1.c scan-assembler smlaltt\\t


Testsuite regression for  aarch64-linux-gnu:
Tests that now fail, but worked before:
c-c++-common/torture/vector-compare-1.c   -O3 -g  (test for excess errors)
c-c++-common/torture/vector-compare-1.c   -O3 -g  (test for excess errors)
gcc.dg/tree-ssa/20030729-1.c scan-tree-dump-times dom2 "\\(unsigned int\\)"
0
gcc.dg/tree-ssa/pr54245.c scan-tree-dump-times slsr "Inserting initializer"
0
gcc.dg/tree-ssa/shorten-1.c scan-tree-dump-not optimized "\\(int\\)"
gcc.dg/tree-ssa/shorten-1.c scan-tree-dump-times optimized "\\(unsigned
char\\)" 8

Than

[AARCH64] Add missing entries in iterator vwcore

2015-10-01 Thread Kugan

Hi,

In "aarch64_get_lane" operand 0 is VEL, so  for %0,
iterator vwcore should (?) support all the modes in VEL.

Ran into following error with a local patch for an existing test case.
However it can also be reproduced with the attached test case.

fnction ‘fn1’:
t.c:25:1: internal compiler error: output_operand: invalid %-code
 }
 ^
0x8198fb output_operand_lossage(char const*, ...)
../../base/gcc/final.c:3417
0x81a45b output_asm_insn(char const*, rtx_def**)
../../base/gcc/final.c:3782
0x81b9d3 output_asm_insn(char const*, rtx_def**)
../../base/gcc/final.c:2364
0x81b9d3 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
../../base/gcc/final.c:3029
0x81be2b final(rtx_insn*, _IO_FILE*, int)
../../base/gcc/final.c:2058
0x81c6e7 rest_of_handle_final
../../base/gcc/final.c:4449
0x81c6e7 execute
../../base/gcc/final.c:4524


Attached patch fixes this. Bootstrapped and regression tested for
aarch64-none-linux-gnu with no new regression. Is this OK for trunk?

Thanks,
Kugan

gcc/ChangeLog:

2015-10-02  Kugan Vivekanandarajah  

* config/aarch64/iterators.md: Add missing core element mode for
 mode.

gcc/testsuite/ChangeLog:

2015-10-02  Kugan Vivekanandarajah  

* gcc.target/aarch64/foo.c: New test.

diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 38c5a24..e49abd5 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -537,8 +537,11 @@
   (V4HI "w") (V8HI "w")
   (V2SI "w") (V4SI "w")
   (DI   "x") (V2DI "x")
+  (V4HF "w") (V8HF "w")
   (V2SF "w") (V4SF "w")
-  (V2DF "x")])
+  (V2DF "x") (SI   "x")
+  (HI   "x") (QI   "x")])
+
 
 ;; Double vector types for ALLX.
 (define_mode_attr Vallxd [(QI "8b") (HI "4h") (SI "2s")])
diff --git a/gcc/testsuite/gcc.target/aarch64/foo.c 
b/gcc/testsuite/gcc.target/aarch64/foo.c
index e69de29..77f161e 100644
--- a/gcc/testsuite/gcc.target/aarch64/foo.c
+++ b/gcc/testsuite/gcc.target/aarch64/foo.c
@@ -0,0 +1,25 @@
+
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+void fn2 ();
+
+typedef __Float16x4_t float16x4_t;
+__fp16 result_float16x4[1];
+float16x4_t exec_vst1_lane_vector_float16x4, exec_vst1_lane___trans_tmp_1;
+
+void fn1 ()
+{
+  exec_vst1_lane_vector_float16x4 = exec_vst1_lane___trans_tmp_1;
+  __fp16 *__a = result_float16x4;
+  float16x4_t __b = exec_vst1_lane___trans_tmp_1;
+  int __lane = 0;
+  *__a = ({ __b[__lane]; });
+  union {
+  short i;
+  __fp16 f;
+  } tmp_res;
+  tmp_res.f = result_float16x4[0];
+  if (tmp_res.i)
+fn2();
+}

Re: [AARCH64] Add missing entries in iterator vwcore

2015-10-05 Thread Kugan



On 05/10/15 21:33, James Greenhalgh wrote:
> On Thu, Oct 01, 2015 at 09:41:20PM +0100, Kugan wrote:
>> Hi,
>>
>> In "aarch64_get_lane" operand 0 is VEL, so  for %0,
>> iterator vwcore should (?) support all the modes in VEL.
>>
>> Ran into following error with a local patch for an existing test case.
>> However it can also be reproduced with the attached test case.
>>
>> fnction ???fn1???:
>> t.c:25:1: internal compiler error: output_operand: invalid %-code
>>  }
>>  ^
>> 0x8198fb output_operand_lossage(char const*, ...)
>>  ../../base/gcc/final.c:3417
>> 0x81a45b output_asm_insn(char const*, rtx_def**)
>>  ../../base/gcc/final.c:3782
>> 0x81b9d3 output_asm_insn(char const*, rtx_def**)
>>  ../../base/gcc/final.c:2364
>> 0x81b9d3 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
>>  ../../base/gcc/final.c:3029
>> 0x81be2b final(rtx_insn*, _IO_FILE*, int)
>>  ../../base/gcc/final.c:2058
>> 0x81c6e7 rest_of_handle_final
>>  ../../base/gcc/final.c:4449
>> 0x81c6e7 execute
>>  ../../base/gcc/final.c:4524
>>
>>
>> Attached patch fixes this. Bootstrapped and regression tested for
>> aarch64-none-linux-gnu with no new regression. Is this OK for trunk?
>>
>> gcc/ChangeLog:
>>
>> 2015-10-02  Kugan Vivekanandarajah  
>>
>>  * config/aarch64/iterators.md: Add missing core element mode for
>>   mode.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2015-10-02  Kugan Vivekanandarajah  
>>
>>  * gcc.target/aarch64/foo.c: New test.
>>
> 
> "foo.c" is not OK, please give this testcase a meaningful name.
> 
>> diff --git a/gcc/config/aarch64/iterators.md 
>> b/gcc/config/aarch64/iterators.md
>> index 38c5a24..e49abd5 100644
>> --- a/gcc/config/aarch64/iterators.md
>> +++ b/gcc/config/aarch64/iterators.md
>> @@ -537,8 +537,11 @@
>> (V4HI "w") (V8HI "w")
>> (V2SI "w") (V4SI "w")
>> (DI   "x") (V2DI "x")
>> +   (V4HF "w") (V8HF "w")
>> (V2SF "w") (V4SF "w")
>> -   (V2DF "x")])
>> +   (V2DF "x") (SI   "x")
>> +   (HI   "x") (QI   "x")])
> 
> I don't understand the reasoning here, Surely we want "w" for SI,HI,QI
> modes? Though are you sure we need them to fix your bug? I'd have expected
> the hunk for V4HF and V8HF to be enough.

Yes, the hunk for V4HF and V8HF is enough and that is what I started
with. Then I was thinking maybe we should cover all the modes in VEL.
Please find the attached that has just V4HF and V8HF.



> 
>>  
>>  ;; Double vector types for ALLX.
>>  (define_mode_attr Vallxd [(QI "8b") (HI "4h") (SI "2s")])
>> diff --git a/gcc/testsuite/gcc.target/aarch64/foo.c 
>> b/gcc/testsuite/gcc.target/aarch64/foo.c
>> index e69de29..77f161e 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/foo.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/foo.c
> 
> Again, please give this test a meaningful name.

Renamed the test case.

Is this OK now?

Thanks,
Kugan



gcc/ChangeLog:

2015-10-06  Kugan Vivekanandarajah  

* config/aarch64/iterators.md: Add missing core element mode for
 mode.

gcc/testsuite/ChangeLog:

2015-10-06  Kugan Vivekanandarajah  

* gcc.target/aarch64/vcore_ice_test.c: New test.



> 
> Thanks,
> James
> 
>> @@ -0,0 +1,25 @@
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +
>> +void fn2 ();
>> +
>> +typedef __Float16x4_t float16x4_t;
>> +__fp16 result_float16x4[1];
>> +float16x4_t exec_vst1_lane_vector_float16x4, exec_vst1_lane___trans_tmp_1;
>> +
>> +void fn1 ()
>> +{
>> +  exec_vst1_lane_vector_float16x4 = exec_vst1_lane___trans_tmp_1;
>> +  __fp16 *__a = result_float16x4;
>> +  float16x4_t __b = exec_vst1_lane___trans_tmp_1;
>> +  int __lane = 0;
>> +  *__a = ({ __b[__lane]; });
>> +  union {
>> +  short i;
>> +  __fp16 f;
>> +  } tmp_res;
>> +  tmp_res.f = result_float16x4[0];
>> +  if (tmp_res.i)
>> +fn2();
>> +}
> 
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 38c5a24..90e8533 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -537,6 +537,7 @

Re: [3/7] Optimize ZEXT_EXPR with tree-vrp

2015-10-06 Thread kugan



Hi Richard,

Thanks for the review.

On 15/09/15 23:08, Richard Biener wrote:

On Mon, Sep 7, 2015 at 4:58 AM, Kugan  wrote:

This patch tree-vrp handling and optimization for ZEXT_EXPR.


+  else if (code == SEXT_EXPR)
+{
+  gcc_assert (range_int_cst_p (&vr1));
+  unsigned int prec = tree_to_uhwi (vr1.min);
+  type = vr0.type;
+  wide_int tmin, tmax;
+  wide_int type_min, type_max;
+  wide_int may_be_nonzero, must_be_nonzero;
+
+  gcc_assert (!TYPE_UNSIGNED (expr_type));

hmm, I don't think we should restrict SEXT_EXPR this way.  SEXT_EXPR
should operate on both signed and unsigned types and the result type
should be the same as the type of operand 0.

+  type_min = wi::shwi (1 << (prec - 1),
+  TYPE_PRECISION (TREE_TYPE (vr0.min)));
+  type_max = wi::shwi (((1 << (prec - 1)) - 1),
+  TYPE_PRECISION (TREE_TYPE (vr0.max)));

there is wi::min_value and max_value for this.


As of now, SEXT_EXPR in gimple is of the form: x = y sext 8 and types of 
all the operand and results are of the wider type. Therefore we cant use 
the  wi::min_value. Or do you want to convert this precision (in this 
case 8) to a type and use wi::min_value?


Please find the patch that addresses the other comments.

Thanks,
Kugan



+ HOST_WIDE_INT int_may_be_nonzero = may_be_nonzero.to_uhwi ();
+ HOST_WIDE_INT int_must_be_nonzero = must_be_nonzero.to_uhwi ();

this doesn't need to fit a HOST_WIDE_INT, please use wi::bit_and (can't
find a test_bit with a quick search).

+  tmin = wi::sext (tmin, prec - 1);
+  tmax = wi::sext (tmax, prec - 1);
+  min = wide_int_to_tree (expr_type, tmin);
+  max = wide_int_to_tree (expr_type, tmax);

not sure why you need the extra sign-extensions here.

+case SEXT_EXPR:
+   {
+ gcc_assert (is_gimple_min_invariant (op1));
+ unsigned int prec = tree_to_uhwi (op1);

no need to assert, tree_to_uhwi will do that for you.

+ HOST_WIDE_INT may_be_nonzero = may_be_nonzero0.to_uhwi ();
+ HOST_WIDE_INT must_be_nonzero = must_be_nonzero0.to_uhwi ();

likewise with HOST_WIDE__INT issue.

Otherwise looks ok to me.  Btw, this and adding of SEXT_EXPR could be
accompanied with a match.pd pattern detecting sign-extension patterns,
that would give some extra test coverage.

Thanks,
Richard.




gcc/ChangeLog:

2015-09-07  Kugan Vivekanandarajah  

 * tree-vrp.c (extract_range_from_binary_expr_1): Handle SEXT_EXPR.
 (simplify_bit_ops_using_ranges): Likewise.
 (simplify_stmt_using_ranges): Likewise.
>From 75fb9b8bcacd36a1409bf94c38048de83a5eab62 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Mon, 17 Aug 2015 13:45:52 +1000
Subject: [PATCH 3/7] Optimize ZEXT_EXPR with tree-vrp

---
 gcc/tree-vrp.c | 73 ++
 1 file changed, 73 insertions(+)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 2cd71a2..9c7d8d8 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -2317,6 +2317,7 @@ extract_range_from_binary_expr_1 (value_range_t *vr,
   && code != LSHIFT_EXPR
   && code != MIN_EXPR
   && code != MAX_EXPR
+  && code != SEXT_EXPR
   && code != BIT_AND_EXPR
   && code != BIT_IOR_EXPR
   && code != BIT_XOR_EXPR)
@@ -2877,6 +2878,53 @@ extract_range_from_binary_expr_1 (value_range_t *vr,
   extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1);
   return;
 }
+  else if (code == SEXT_EXPR)
+{
+  gcc_assert (range_int_cst_p (&vr1));
+  unsigned int prec = tree_to_uhwi (vr1.min);
+  type = vr0.type;
+  wide_int tmin, tmax;
+  wide_int sign_bit;
+  wide_int type_min, type_max;
+  wide_int may_be_nonzero, must_be_nonzero;
+
+  type_min = wi::shwi (1 << (prec - 1),
+			   TYPE_PRECISION (TREE_TYPE (vr0.min)));
+  type_max = wi::shwi (((1 << (prec - 1)) - 1),
+			   TYPE_PRECISION (TREE_TYPE (vr0.max)));
+  sign_bit = wi::shwi (1 << (prec - 1),
+			   TYPE_PRECISION (TREE_TYPE (vr0.min)));
+  if (zero_nonzero_bits_from_vr (expr_type, &vr0,
+ &may_be_nonzero,
+ &must_be_nonzero))
+	{
+	  if (wi::bit_and (must_be_nonzero, sign_bit) == sign_bit)
+	{
+	  /* If to-be-extended sign bit is one.  */
+	  tmin = type_min;
+	  tmax = may_be_nonzero;
+	}
+	  else if (wi::bit_and (may_be_nonzero, sign_bit)
+		   != sign_bit)
+	{
+	  /* If to-be-extended sign bit is zero.  */
+	  tmin = must_be_nonzero;
+	  tmax = may_be_nonzero;
+	}
+	  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+	}
+  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+  min = wide_int_to_tree (expr_type, tmin);
+  max = wide_int_to_tree (expr_type, tmax);
+}
   else if (code == RSHIFT_EXPR
 	   ||

Re: [4/7] Use correct promoted mode sign for result of GIMPLE_CALL

2015-10-06 Thread kugan




On 15/09/15 22:47, Richard Biener wrote:

On Tue, Sep 8, 2015 at 11:50 PM, Jim Wilson  wrote:

On 09/08/2015 08:39 AM, Jeff Law wrote:

Is this another instance of the PROMOTE_MODE issue that was raised by
Jim Wilson a couple months ago?


It looks like a closely related problem.  The one I am looking at has
confusion with a function arg and a local variable as they have
different sign extension promotion rules.  Kugan's is with a function
return value and a local variable as they have different sign extension
promotion rules.

The bug report is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932

The gcc-patches thread spans a month end boundary, so it has multiple heads
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00112.html
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00524.html

Function args and function return values get the same sign extension
treatment when promoted, this is handled by
TARGET_PROMOTE_FUNCTION_MODE. Local variables are treated differently,
via PROMOTE_MODE. I think the function arg/return treatment is wrong,
but changing that is an ABI change which is undesirable.  I suppose we
could change local variables to match function args and return values,
but I think that is moving in the wrong direction.  Though Kugan's new
optimization pass will remove some of the extra unnecessary sign/zero
extensions added by the arm TARGET_PROMOTE_FUNCTION_MODE definition, so
maybe it won't matter enough to worry about any more.

If we can't fix this in the arm backend, then we may need different
middle fixes for these two cases.  I was looking at ways to fix this in
the tree-out-of-ssa pass.  I don't know if this will work for Kugan's
testcase, I'd need time to look at it.


As you said, I dont think the fix in tree-out-of-ssa pass will not fix 
this case. Kyrill also saw the same problem with the trunk as in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67714


I think the function return value should have been "promoted" according to
the ABI by the lowering pass.  Thus the call stmt return type be changed,
exposing the "mismatch" and compensating the IL with a sign-conversion.



Function return value is promoted as per ABI.
In the example from PR67714

 _8 = fn1D.5055 ();
  e_9 = (charD.4) _8;
  f_13 = _8;
...

_8 is sign extended correctly. But in f_13 = _8, it is promoted to 
unsigned and zero extended due to the backend PROMOTE_MODE. We thus have:


The zero-extension during expand:
;; f_13 = _8;

(insn 15 14 0 (set (reg/v:SI 110 [ f ])
(zero_extend:SI (subreg/u:QI (reg/v:SI 110 [ f ]) 0))) 
arm-zext.c:18 -1

 (nil))

This is wrong.


As for your original issue with function arguments they should really
get similar
treatment, eventually in function arg gimplification already, by making
the PARM_DECLs promoted and using a local variable for further uses
with the "local" type.  Eventually one can use DECL_VALUE_EXPR to fixup
the IL, not sure.  Or we can do this in the promotion pass as well.



I will try doing this see if I can do this.

Thanks,
Kugan


Richard.


Jim

Re: [3/7] Optimize ZEXT_EXPR with tree-vrp

2015-10-07 Thread Kugan



On 07/10/15 19:20, Richard Biener wrote:
> On Wed, Oct 7, 2015 at 1:12 AM, kugan  
> wrote:
>>
>> Hi Richard,
>>
>> Thanks for the review.
>>
>> On 15/09/15 23:08, Richard Biener wrote:
>>>
>>> On Mon, Sep 7, 2015 at 4:58 AM, Kugan 
>>> wrote:
>>>>
>>>> This patch tree-vrp handling and optimization for ZEXT_EXPR.
>>>
>>>
>>> +  else if (code == SEXT_EXPR)
>>> +{
>>> +  gcc_assert (range_int_cst_p (&vr1));
>>> +  unsigned int prec = tree_to_uhwi (vr1.min);
>>> +  type = vr0.type;
>>> +  wide_int tmin, tmax;
>>> +  wide_int type_min, type_max;
>>> +  wide_int may_be_nonzero, must_be_nonzero;
>>> +
>>> +  gcc_assert (!TYPE_UNSIGNED (expr_type));
>>>
>>> hmm, I don't think we should restrict SEXT_EXPR this way.  SEXT_EXPR
>>> should operate on both signed and unsigned types and the result type
>>> should be the same as the type of operand 0.
>>>
>>> +  type_min = wi::shwi (1 << (prec - 1),
>>> +  TYPE_PRECISION (TREE_TYPE (vr0.min)));
>>> +  type_max = wi::shwi (((1 << (prec - 1)) - 1),
>>> +  TYPE_PRECISION (TREE_TYPE (vr0.max)));
>>>
>>> there is wi::min_value and max_value for this.
>>
>>
>> As of now, SEXT_EXPR in gimple is of the form: x = y sext 8 and types of all
>> the operand and results are of the wider type. Therefore we cant use the
>> wi::min_value. Or do you want to convert this precision (in this case 8) to
>> a type and use wi::min_value?
> 
> I don't understand - wi::min/max_value get a precision and sign, not a type.
> your 1 << (prec - 1) is even wrong for prec > 32 (it's an integer type
> expression).
> Thus
> 
>   type_min = wi::min_value (prec, SIGNED);
>   type_max = wi::max_value (prec, SIGNED);
> 

Thanks for the comments. Is the attached patch looks better. It is based
on the above. I am still assuming the position of sign-bit in SEXT_EXPR
will be less than 64bit (for calculating sign_bit in wide_int format). I
think this will always be the case but please let me know if this is not OK.

Thanks,
Kugan


> 
>> Please find the patch that addresses the other comments.
>From 963e5ed4576bd7f82e83b21f35c58e9962dbbc74 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Mon, 17 Aug 2015 13:45:52 +1000
Subject: [PATCH 3/7] Optimize ZEXT_EXPR with tree-vrp

---
 gcc/tree-vrp.c | 69 ++
 1 file changed, 69 insertions(+)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 2cd71a2..ada1c9f 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -2317,6 +2317,7 @@ extract_range_from_binary_expr_1 (value_range_t *vr,
   && code != LSHIFT_EXPR
   && code != MIN_EXPR
   && code != MAX_EXPR
+  && code != SEXT_EXPR
   && code != BIT_AND_EXPR
   && code != BIT_IOR_EXPR
   && code != BIT_XOR_EXPR)
@@ -2877,6 +2878,51 @@ extract_range_from_binary_expr_1 (value_range_t *vr,
   extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1);
   return;
 }
+  else if (code == SEXT_EXPR)
+{
+  gcc_assert (range_int_cst_p (&vr1));
+  unsigned int prec = tree_to_uhwi (vr1.min);
+  type = vr0.type;
+  wide_int tmin, tmax;
+  wide_int may_be_nonzero, must_be_nonzero;
+
+  wide_int type_min = wi::min_value (prec, SIGNED);
+  wide_int type_max = wi::max_value (prec, SIGNED);
+  type_min = wide_int_to_tree (expr_type, type_min);
+  type_max = wide_int_to_tree (expr_type, type_max);
+  wide_int sign_bit = wi::shwi (1ULL << (prec - 1),
+TYPE_PRECISION (TREE_TYPE (vr0.min)));
+  if (zero_nonzero_bits_from_vr (expr_type, &vr0,
+ &may_be_nonzero,
+ &must_be_nonzero))
+	{
+	  if (wi::bit_and (must_be_nonzero, sign_bit) == sign_bit)
+	{
+	  /* If to-be-extended sign bit is one.  */
+	  tmin = type_min;
+	  tmax = may_be_nonzero;
+	}
+	  else if (wi::bit_and (may_be_nonzero, sign_bit)
+		   != sign_bit)
+	{
+	  /* If to-be-extended sign bit is zero.  */
+	  tmin = must_be_nonzero;
+	  tmax = may_be_nonzero;
+	}
+	  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+	}
+  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+  min = wide_int_to_tree (expr_type, tmin);
+  max = wide_int_to_tree (expr_type, tmax);
+}
   else if (code == RSHIFT_EXPR
 	   || code == LSHIFT_EXPR)
 {
@@ -9244,6 +9290,28 @@ simplify_bit_ops_using_ranges (gimple_stmt_iterator *gsi, gimple *stmt)
 	  break;
 	}

Re: [3/7] Optimize ZEXT_EXPR with tree-vrp

2015-10-10 Thread Kugan



On 09/10/15 21:29, Richard Biener wrote:
> +  unsigned int prec = tree_to_uhwi (vr1.min);
> 
> this should use unsigned HOST_WIDE_INT
> 
> +  wide_int sign_bit = wi::shwi (1ULL << (prec - 1),
> +   TYPE_PRECISION (TREE_TYPE (vr0.min)));
> 
> use wi::one (TYPE_PRECISION (TREE_TYPE (vr0.min))) << (prec - 1);
> 
> That is, you really need to handle precisions bigger than HOST_WIDE_INT.
> 
> But I suppose wide_int really misses a test_bit function (it has a set_bit
> one already).
> 
> + if (wi::bit_and (must_be_nonzero, sign_bit) == sign_bit)
> +   {
> + /* If to-be-extended sign bit is one.  */
> + tmin = type_min;
> + tmax = may_be_nonzero;
> 
> I think tmax should be zero-extended may_be_nonzero from prec.
> 
> + else if (wi::bit_and (may_be_nonzero, sign_bit)
> +  != sign_bit)
> +   {
> + /* If to-be-extended sign bit is zero.  */
> + tmin = must_be_nonzero;
> + tmax = may_be_nonzero;
> 
> likewise here tmin/tmax should be zero-extended may/must_be_nonzero from prec.
> 
> +case SEXT_EXPR:
> +   {
> + unsigned int prec = tree_to_uhwi (op1);
> + wide_int sign_bit = wi::shwi (1ULL << (prec - 1),
> +   TYPE_PRECISION (TREE_TYPE (vr0.min)));
> + wide_int mask = wi::shwi (((1ULL << (prec - 1)) - 1),
> +   TYPE_PRECISION (TREE_TYPE (vr0.max)));
> 
> this has the same host precision issues of 1ULL (HOST_WIDE_INT).
> There is wi::mask, eventually you can use wi::set_bit_in_zero to
> produce the sign-bit wide_int (also above).


Thanks Ricahrd. Does the attached patch looks better ?

Thanks,
Kugan
>From cf5f75f5c96d30cdd968e71035a398cb0d5fcff7 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Mon, 17 Aug 2015 13:45:52 +1000
Subject: [PATCH 3/7] Optimize ZEXT_EXPR with tree-vrp

---
 gcc/tree-vrp.c | 70 ++
 1 file changed, 70 insertions(+)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 2cd71a2..c04d290 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -2317,6 +2317,7 @@ extract_range_from_binary_expr_1 (value_range_t *vr,
   && code != LSHIFT_EXPR
   && code != MIN_EXPR
   && code != MAX_EXPR
+  && code != SEXT_EXPR
   && code != BIT_AND_EXPR
   && code != BIT_IOR_EXPR
   && code != BIT_XOR_EXPR)
@@ -2877,6 +2878,52 @@ extract_range_from_binary_expr_1 (value_range_t *vr,
   extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1);
   return;
 }
+  else if (code == SEXT_EXPR)
+{
+  gcc_assert (range_int_cst_p (&vr1));
+  HOST_WIDE_INT prec = tree_to_uhwi (vr1.min);
+  type = vr0.type;
+  wide_int tmin, tmax;
+  wide_int may_be_nonzero, must_be_nonzero;
+
+  wide_int type_min = wi::min_value (prec, SIGNED);
+  wide_int type_max = wi::max_value (prec, SIGNED);
+  type_min = wide_int_to_tree (expr_type, type_min);
+  type_max = wide_int_to_tree (expr_type, type_max);
+  wide_int sign_bit
+	= wi::set_bit_in_zero (prec - 1,
+			   TYPE_PRECISION (TREE_TYPE (vr0.min)));
+  if (zero_nonzero_bits_from_vr (expr_type, &vr0,
+ &may_be_nonzero,
+ &must_be_nonzero))
+	{
+	  if (wi::bit_and (must_be_nonzero, sign_bit) == sign_bit)
+	{
+	  /* If to-be-extended sign bit is one.  */
+	  tmin = type_min;
+	  tmax = wi::zext (may_be_nonzero, prec);
+	}
+	  else if (wi::bit_and (may_be_nonzero, sign_bit)
+		   != sign_bit)
+	{
+	  /* If to-be-extended sign bit is zero.  */
+	  tmin = wi::zext (must_be_nonzero, prec);
+	  tmax = wi::zext (may_be_nonzero, prec);
+	}
+	  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+	}
+  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+  min = wide_int_to_tree (expr_type, tmin);
+  max = wide_int_to_tree (expr_type, tmax);
+}
   else if (code == RSHIFT_EXPR
 	   || code == LSHIFT_EXPR)
 {
@@ -9244,6 +9291,28 @@ simplify_bit_ops_using_ranges (gimple_stmt_iterator *gsi, gimple *stmt)
 	  break;
 	}
   break;
+case SEXT_EXPR:
+	{
+	  unsigned int prec = tree_to_uhwi (op1);
+	  wide_int sign_bit
+	= wi::set_bit_in_zero (prec - 1,
+   TYPE_PRECISION (TREE_TYPE (vr0.min)));
+	  wide_int mask = wi::mask (prec, true,
+TYPE_PRECISION (TREE_TYPE (vr0.min)));
+	  if (wi::bit_and (must_be_nonzero0, sign_bit) == sign_bit)
+	{
+	  /* If to-be-extended sign bit is one.  */
+	  if (wi::bit_and (must_be_nonzero0, mask) == mask)
+		op = op0;
+	}
+	  else if (wi::bit_and (may_be_nonzero0, sign_bit)

Re: [1/7] Add new tree code SEXT_EXPR

2015-10-11 Thread Kugan



On 15/09/15 23:18, Richard Biener wrote:
> On Mon, Sep 7, 2015 at 4:55 AM, Kugan  
> wrote:
>>
>> This patch adds support for new tree code SEXT_EXPR.
> 
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index d567a87..bbc3c10 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -5071,6 +5071,10 @@ expand_debug_expr (tree exp)
>  case FMA_EXPR:
>return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2);
> 
> +case SEXT_EXPR:
> +  return op0;
> 
> that looks wrong.  Generate (sext:... ) here?
> 
> +case SEXT_EXPR:
> +   {
> + rtx op0 = expand_normal (treeop0);
> + rtx temp;
> + if (!target)
> +   target = gen_reg_rtx (TYPE_MODE (TREE_TYPE (treeop0)));
> +
> + machine_mode inner_mode
> +   = smallest_mode_for_size (tree_to_shwi (treeop1),
> + MODE_INT);
> + temp = convert_modes (inner_mode,
> +   TYPE_MODE (TREE_TYPE (treeop0)), op0, 0);
> + convert_move (target, temp, 0);
> + return target;
> +   }
> 
> Humm - is that really how we expand sign extensions right now?  No helper
> that would generate (sext ...) directly?  I wouldn't try using 'target' btw 
> but
> simply return (sext:mode op0 op1) or so.  But I am no way an RTL expert.
> 
> Note that if we don't disallow arbitrary precision SEXT_EXPRs we have to
> fall back to using shifts (and smallest_mode_for_size is simply wrong).
> 
> +case SEXT_EXPR:
> +  {
> +   if (!INTEGRAL_TYPE_P (lhs_type)
> +   || !INTEGRAL_TYPE_P (rhs1_type)
> +   || TREE_CODE (rhs2) != INTEGER_CST)
> 
> please constrain this some more, with
> 
>|| !useless_type_conversion_p (lhs_type, rhs1_type)
> 
> + {
> +   error ("invalid operands in sext expr");
> +   return true;
> + }
> +   return false;
> +  }
> 
> @@ -3414,6 +3422,9 @@ op_symbol_code (enum tree_code code)
>  case MIN_EXPR:
>return "min";
> 
> +case SEXT_EXPR:
> +  return "sext from bit";
> +
> 
> just "sext" please.
> 
> +/*  Sign-extend operation.  It will sign extend first operand from
> + the sign bit specified by the second operand.  */
> +DEFTREECODE (SEXT_EXPR, "sext_expr", tcc_binary, 2)
> 
> "from the INTEGER_CST sign bit specified"
> 
> Also add "The type of the result is that of the first operand."
> 



Thanks for the review. Attached patch attempts to address the above
comments. Does this look better?


Thanks,
Kugan
>From 2326daf0e7088e01e87574c9824b1c7248395798 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Mon, 17 Aug 2015 13:37:15 +1000
Subject: [PATCH 1/7] Add new SEXT_EXPR tree code

---
 gcc/cfgexpand.c | 10 ++
 gcc/expr.c  | 20 
 gcc/fold-const.c|  4 
 gcc/tree-cfg.c  | 13 +
 gcc/tree-inline.c   |  1 +
 gcc/tree-pretty-print.c | 11 +++
 gcc/tree.def|  5 +
 7 files changed, 64 insertions(+)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 58e55d2..dea0e37 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5057,6 +5057,16 @@ expand_debug_expr (tree exp)
 case FMA_EXPR:
   return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2);
 
+case SEXT_EXPR:
+  gcc_assert (CONST_INT_P (op1));
+  inner_mode = mode_for_size (INTVAL (op1), MODE_INT, 0);
+  if (mode != inner_mode)
+	op0 = simplify_gen_unary (SIGN_EXTEND,
+  mode,
+  gen_lowpart_SUBREG (inner_mode, op0),
+  inner_mode);
+  return op0;
+
 default:
 flag_unsupported:
 #ifdef ENABLE_CHECKING
diff --git a/gcc/expr.c b/gcc/expr.c
index 0bbfccd..30898a2 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9296,6 +9296,26 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
   target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
   return target;
 
+case SEXT_EXPR:
+	{
+	  machine_mode inner_mode = mode_for_size (tree_to_shwi (treeop1),
+		   MODE_INT, 0);
+	  rtx temp, result;
+	  rtx op0 = expand_normal (treeop0);
+	  op0 = force_reg (mode, op0);
+	  if (mode != inner_mode)
+	{
+	  result = gen_reg_rtx (mode);
+	  temp = simplify_gen_unary (SIGN_EXTEND, mode,
+	 gen_lowpart_SUBREG (inner_mode, op0),
+	 inner_mode);
+	  convert_move (result, temp, 0);
+	}
+	  else
+	result = op0;
+	  return result;
+	}
+
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 7231fd6..d693b42 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -984,6 +984,10 @@ int

Re: [5/7] Allow gimple debug stmt in widen mode

2015-10-14 Thread Kugan



On 15/09/15 22:57, Richard Biener wrote:
> On Tue, Sep 8, 2015 at 2:00 AM, Kugan  
> wrote:
>>
>> Thanks for the review.
>>
>> On 07/09/15 23:20, Michael Matz wrote:
>>> Hi,
>>>
>>> On Mon, 7 Sep 2015, Kugan wrote:
>>>
>>>> Allow GIMPLE_DEBUG with values in promoted register.
>>>
>>> Patch does much more.
>>>
>>
>> Oops sorry. Copy and paste mistake.
>>
>> gcc/ChangeLog:
>>
>> 2015-09-07 Kugan Vivekanandarajah 
>>
>> * cfgexpand.c (expand_debug_locations): Remove assert as now we are
>> also allowing values in promoted register.
>> * gimple-ssa-type-promote.c (fixup_uses): Allow GIMPLE_DEBUG to bind
>> values in promoted register.
>> * rtl.h (wi::int_traits ::decompose): Accept zero extended value
>> also.
>>
>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2015-09-07  Kugan Vivekanandarajah  
>>>>
>>>>  * expr.c (expand_expr_real_1): Set proper SUNREG_PROMOTED_MODE for
>>>>  SSA_NAME that was set by GIMPLE_CALL and assigned to another
>>>>  SSA_NAME of same type.
>>>
>>> ChangeLog doesn't match patch, and patch contains dubious changes:
>>>
>>>> --- a/gcc/cfgexpand.c
>>>> +++ b/gcc/cfgexpand.c
>>>> @@ -5240,7 +5240,6 @@ expand_debug_locations (void)
>>>> tree value = (tree)INSN_VAR_LOCATION_LOC (insn);
>>>> rtx val;
>>>> rtx_insn *prev_insn, *insn2;
>>>> -   machine_mode mode;
>>>>
>>>> if (value == NULL_TREE)
>>>>   val = NULL_RTX;
>>>> @@ -5275,16 +5274,6 @@ expand_debug_locations (void)
>>>>
>>>> if (!val)
>>>>   val = gen_rtx_UNKNOWN_VAR_LOC ();
>>>> -   else
>>>> - {
>>>> -   mode = GET_MODE (INSN_VAR_LOCATION (insn));
>>>> -
>>>> -   gcc_assert (mode == GET_MODE (val)
>>>> -   || (GET_MODE (val) == VOIDmode
>>>> -   && (CONST_SCALAR_INT_P (val)
>>>> -   || GET_CODE (val) == CONST_FIXED
>>>> -   || GET_CODE (val) == LABEL_REF)));
>>>> - }
>>>>
>>>> INSN_VAR_LOCATION_LOC (insn) = val;
>>>> prev_insn = PREV_INSN (insn);
>>>
>>> So it seems that the modes of the values location and the value itself
>>> don't have to match anymore, which seems dubious when considering how a
>>> debugger should load the value in question from the given location.  So,
>>> how is it supposed to work?
>>
>> For example (simplified test-case from creduce):
>>
>> fn1() {
>>   char a = fn1;
>>   return a;
>> }
>>
>> --- test.c.142t.veclower21  2015-09-07 23:47:26.362201640 +
>> +++ test.c.143t.promotion   2015-09-07 23:47:26.362201640 +
>> @@ -5,13 +5,18 @@
>>  {
>>char a;
>>long int fn1.0_1;
>> +  unsigned int _2;
>>int _3;
>> +  unsigned int _5;
>> +  char _6;
>>
>>:
>>fn1.0_1 = (long int) fn1;
>> -  a_2 = (char) fn1.0_1;
>> -  # DEBUG a => a_2
>> -  _3 = (int) a_2;
>> +  _5 = (unsigned int) fn1.0_1;
>> +  _2 = _5 & 255;
>> +  # DEBUG a => _2
>> +  _6 = (char) _2;
>> +  _3 = (int) _6;
>>return _3;
>>
>>  }
>>
>> Please see that DEBUG now points to _2 which is a promoted mode. I am
>> assuming that the debugger would load required precision from promoted
>> register. May be I am missing the details but how else we can handle
>> this? Any suggestions?
> 
> I would have expected the DEBUG insn to be adjusted as
> 
> # DEBUG a => (char)_2

Thanks for the review. Please find the attached patch that attempts to
do this. I have also tested a version of this patch with gdb testsuite.

As Michael wanted, I have also removed the changes in rtl.h and
promoting constants in GIMPLE_DEBUG.


> Btw, why do we have
> 
>> +  _6 = (char) _2;
>> +  _3 = (int) _6;
> 
> ?  I'd have expected
> 
>  unsigned int _6 = SEXT <_2, 8>
>  _3 = (int) _6;
>  return _3;

I am looking into it.

> 
> see my other mail about promotion of PARM_DECLs and RESULT_DECLs -- we should
> promote those as well.
> 

Just to be sure, are you referring to
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00244.html
where you wanted an IPA p

Re: [1/7] Add new tree code SEXT_EXPR

2015-10-14 Thread Kugan



On 12/10/15 23:21, Richard Biener wrote:
> On Sun, Oct 11, 2015 at 12:35 PM, Kugan
>  wrote:
>>
>>
>> On 15/09/15 23:18, Richard Biener wrote:
>>> On Mon, Sep 7, 2015 at 4:55 AM, Kugan  
>>> wrote:
>>>>
>>>> This patch adds support for new tree code SEXT_EXPR.
>>>
>>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>>> index d567a87..bbc3c10 100644
>>> --- a/gcc/cfgexpand.c
>>> +++ b/gcc/cfgexpand.c
>>> @@ -5071,6 +5071,10 @@ expand_debug_expr (tree exp)
>>>  case FMA_EXPR:
>>>return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2);
>>>
>>> +case SEXT_EXPR:
>>> +  return op0;
>>>
>>> that looks wrong.  Generate (sext:... ) here?
>>>
>>> +case SEXT_EXPR:
>>> +   {
>>> + rtx op0 = expand_normal (treeop0);
>>> + rtx temp;
>>> + if (!target)
>>> +   target = gen_reg_rtx (TYPE_MODE (TREE_TYPE (treeop0)));
>>> +
>>> + machine_mode inner_mode
>>> +   = smallest_mode_for_size (tree_to_shwi (treeop1),
>>> + MODE_INT);
>>> + temp = convert_modes (inner_mode,
>>> +   TYPE_MODE (TREE_TYPE (treeop0)), op0, 0);
>>> + convert_move (target, temp, 0);
>>> + return target;
>>> +   }
>>>
>>> Humm - is that really how we expand sign extensions right now?  No helper
>>> that would generate (sext ...) directly?  I wouldn't try using 'target' btw 
>>> but
>>> simply return (sext:mode op0 op1) or so.  But I am no way an RTL expert.
>>>
>>> Note that if we don't disallow arbitrary precision SEXT_EXPRs we have to
>>> fall back to using shifts (and smallest_mode_for_size is simply wrong).
>>>
>>> +case SEXT_EXPR:
>>> +  {
>>> +   if (!INTEGRAL_TYPE_P (lhs_type)
>>> +   || !INTEGRAL_TYPE_P (rhs1_type)
>>> +   || TREE_CODE (rhs2) != INTEGER_CST)
>>>
>>> please constrain this some more, with
>>>
>>>|| !useless_type_conversion_p (lhs_type, rhs1_type)
>>>
>>> + {
>>> +   error ("invalid operands in sext expr");
>>> +   return true;
>>> + }
>>> +   return false;
>>> +  }
>>>
>>> @@ -3414,6 +3422,9 @@ op_symbol_code (enum tree_code code)
>>>  case MIN_EXPR:
>>>return "min";
>>>
>>> +case SEXT_EXPR:
>>> +  return "sext from bit";
>>> +
>>>
>>> just "sext" please.
>>>
>>> +/*  Sign-extend operation.  It will sign extend first operand from
>>> + the sign bit specified by the second operand.  */
>>> +DEFTREECODE (SEXT_EXPR, "sext_expr", tcc_binary, 2)
>>>
>>> "from the INTEGER_CST sign bit specified"
>>>
>>> Also add "The type of the result is that of the first operand."
>>>
>>
>>
>>
>> Thanks for the review. Attached patch attempts to address the above
>> comments. Does this look better?
> 
> +case SEXT_EXPR:
> +  gcc_assert (CONST_INT_P (op1));
> +  inner_mode = mode_for_size (INTVAL (op1), MODE_INT, 0);
> 
> We should add
> 
> gcc_assert (GET_MODE_BITSIZE (inner_mode) == INTVAL (op1));
> 
> +  if (mode != inner_mode)
> +   op0 = simplify_gen_unary (SIGN_EXTEND,
> + mode,
> + gen_lowpart_SUBREG (inner_mode, op0),
> + inner_mode);
> 
> as we're otherwise silently dropping things like SEXT (short-typed-var, 13)
> 
> +case SEXT_EXPR:
> +   {
> + machine_mode inner_mode = mode_for_size (tree_to_shwi (treeop1),
> +  MODE_INT, 0);
> 
> Likewise.  Also treeop1 should be unsigned, thus tree_to_uhwi?
> 
> + rtx temp, result;
> + rtx op0 = expand_normal (treeop0);
> + op0 = force_reg (mode, op0);
> + if (mode != inner_mode)
> +   {
> 
> Again, for the RTL bits I'm not sure they are correct.  For example I don't
> see why we need a lowpart SUBREG, isn't a "regular" SUBREG enough?
> 
> +case SEXT_EXPR:
> +  {
> +   if (!INTEGRAL_TYPE_P (lhs_type)
> +   || !useless_type_conver

Re: [2/7] Add new type promotion pass

2015-10-14 Thread Kugan



On 07/09/15 12:56, Kugan wrote:
> 
> This pass applies type promotion to SSA names in the function and
> inserts appropriate truncations to preserve the semantics.  Idea of this
> pass is to promote operations such a way that we can minimize generation
> of subreg in RTL, that intern results in removal of redundant zero/sign
> extensions.
> 
> gcc/ChangeLog:
> 
> 2015-09-07  Kugan Vivekanandarajah  
> 
>   * Makefile.in: Add gimple-ssa-type-promote.o.
>   * common.opt: New option -ftree-type-promote.
>   * doc/invoke.texi: Document -ftree-type-promote.
>   * gimple-ssa-type-promote.c: New file.
>   * passes.def: Define new pass_type_promote.
>   * timevar.def: Define new TV_TREE_TYPE_PROMOTE.
>   * tree-pass.h (make_pass_type_promote): New.
>   * tree-ssanames.c (set_range_info): Adjust range_info.
> 

Here is the latest version of patch.

Thanks,
Kugan
>From 69c05e27b39cd9977e1a412e1c1b3255409ba351 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Mon, 17 Aug 2015 13:44:50 +1000
Subject: [PATCH 2/7] Add type promotion pass

---
 gcc/Makefile.in   |   1 +
 gcc/common.opt|   4 +
 gcc/doc/invoke.texi   |  10 +
 gcc/gimple-ssa-type-promote.c | 827 ++
 gcc/passes.def|   1 +
 gcc/timevar.def   |   1 +
 gcc/tree-pass.h   |   1 +
 gcc/tree-ssanames.c   |   3 +-
 8 files changed, 847 insertions(+), 1 deletion(-)
 create mode 100644 gcc/gimple-ssa-type-promote.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 009c745..0946055 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1498,6 +1498,7 @@ OBJS = \
 	tree-vect-slp.o \
 	tree-vectorizer.o \
 	tree-vrp.o \
+	gimple-ssa-type-promote.o \
 	tree.o \
 	valtrack.o \
 	value-prof.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 94d1d88..b5a93b0 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2378,6 +2378,10 @@ ftree-vrp
 Common Report Var(flag_tree_vrp) Init(0) Optimization
 Perform Value Range Propagation on trees
 
+ftree-type-promote
+Common Report Var(flag_tree_type_promote) Init(1) Optimization
+Perform Type Promotion on trees
+
 funit-at-a-time
 Common Report Var(flag_unit_at_a_time) Init(1)
 Compile whole compilation unit at a time
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 50cc520..e6f0ce1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9051,6 +9051,16 @@ enabled by default at @option{-O2} and higher.  Null pointer check
 elimination is only done if @option{-fdelete-null-pointer-checks} is
 enabled.
 
+@item -ftree-type-promote
+@opindex ftree-type-promote
+This pass applies type promotion to SSA names in the function and
+inserts appropriate truncations to preserve the semantics.  Idea of
+this pass is to promote operations such a way that we can minimise
+generation of subreg in RTL, that intern results in removal of
+redundant zero/sign extensions.
+
+This optimization is enabled by default.
+
 @item -fsplit-ivs-in-unroller
 @opindex fsplit-ivs-in-unroller
 Enables expression of values of induction variables in later iterations
diff --git a/gcc/gimple-ssa-type-promote.c b/gcc/gimple-ssa-type-promote.c
new file mode 100644
index 000..513d20d
--- /dev/null
+++ b/gcc/gimple-ssa-type-promote.c
@@ -0,0 +1,827 @@
+/* Type promotion of SSA names to minimise redundant zero/sign extension.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "hash-set.h"
+#include "machmode.h"
+#include "vec.h"
+#include "double-int.h"
+#include "input.h"
+#include "symtab.h"
+#include "wide-int.h"
+#include "inchash.h"
+#include "tree.h"
+#include "fold-const.h"
+#include "stor-layout.h"
+#include "predict.h"
+#include "function.h"
+#include "dominance.h"
+#include "cfg.h"
+#include "basic-block.h"
+#include "tree-ssa-alias.h"
+#include "gimple-fold.h"
+#include "tree-eh.h"
+#include "gimple-expr.h"
+#include "is-a.h"
+

Re: [5/7] Allow gimple debug stmt in widen mode

2015-10-18 Thread Kugan


> You remove
> 
> 
> @@ -5269,16 +5268,6 @@ expand_debug_locations (void)
> 
> if (!val)
>   val = gen_rtx_UNKNOWN_VAR_LOC ();
> -   else
> - {
> -   mode = GET_MODE (INSN_VAR_LOCATION (insn));
> -
> -   gcc_assert (mode == GET_MODE (val)
> -   || (GET_MODE (val) == VOIDmode
> -   && (CONST_SCALAR_INT_P (val)
> -   || GET_CODE (val) == CONST_FIXED
> -   || GET_CODE (val) == LABEL_REF)));
> - }
> 
> which is in place to ensure the debug insns are "valid" in some form(?)
> On what kind of insn does the assert trigger with your patch so that
> you have to remove it?

Thanks for the review. Please find the attached patch this removes it
and does the conversion as part of the GIMPLE_DEBUG.

Does this look better?


Thanks,
Kugan



gcc/ChangeLog:

2015-10-19  Kugan Vivekanandarajah  

* gimple-ssa-type-promote.c (fixup_uses): For GIMPLE_DEBUG stmts,
convert the values computed in promoted_type to original and bind.


> 
> +
> + switch (TREE_CODE_CLASS (TREE_CODE (op)))
> +   {
> +   case tcc_exceptional:
> +   case tcc_unary:
> +   {
> 
> Hmm.  So when we promote _1 in
> 
>   _1 = ...;
>  # DEBUG i = _1 + 7;
> 
> to sth else it would probably best to instead of doing conversion of operands
> where necessary introduce a debug temporary like
> 
>  # DEBUG D#1 = (type-of-_1) replacement-of-_1;
> 
> and replace debug uses of _1 with D#1

>From 47469bb461dcafdf0ce5fe5f020faed0e8d6d4d9 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Tue, 1 Sep 2015 08:40:40 +1000
Subject: [PATCH 5/7] debug stmt in widen mode

---
 gcc/gimple-ssa-type-promote.c | 82 +--
 1 file changed, 79 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-ssa-type-promote.c b/gcc/gimple-ssa-type-promote.c
index d4ca1a3..660bd3f 100644
--- a/gcc/gimple-ssa-type-promote.c
+++ b/gcc/gimple-ssa-type-promote.c
@@ -589,10 +589,86 @@ fixup_uses (tree use, tree promoted_type, tree old_type)
 	{
 	case GIMPLE_DEBUG:
 	{
-	  gsi = gsi_for_stmt (stmt);
-	  gsi_remove (&gsi, true);
-	  break;
+	  /* Change the GIMPLE_DEBUG stmt such that the value bound is
+		 computed in promoted_type and then converted to required
+		 type.  */
+	  tree op, new_op = NULL_TREE;
+	  gdebug *copy = NULL, *gs = as_a  (stmt);
+	  enum tree_code code;
+
+	  /* Get the value that is bound in debug stmt.  */
+	  switch (gs->subcode)
+		{
+		case GIMPLE_DEBUG_BIND:
+		  op = gimple_debug_bind_get_value (gs);
+		  break;
+		case GIMPLE_DEBUG_SOURCE_BIND:
+		  op = gimple_debug_source_bind_get_value (gs);
+		  break;
+		default:
+		  gcc_unreachable ();
+		}
+
+	  code = TREE_CODE (op);
+	  /* Convert the value computed in promoted_type to
+		 old_type.  */
+	  if (code == SSA_NAME && use == op)
+		new_op = build1 (NOP_EXPR, old_type, use);
+	  else if (TREE_CODE_CLASS (TREE_CODE (op)) == tcc_unary
+		   && code != NOP_EXPR)
+		{
+		  tree op0 = TREE_OPERAND (op, 0);
+		  if (op0 == use)
+		{
+		  tree temp = build1 (code, promoted_type, op0);
+		  new_op = build1 (NOP_EXPR, old_type, temp);
+		}
+		}
+	  else if (TREE_CODE_CLASS (TREE_CODE (op)) == tcc_binary
+		   /* Skip codes that are rejected in safe_to_promote_use_p.  */
+		   && code != LROTATE_EXPR
+		   && code != RROTATE_EXPR
+		   && code != COMPLEX_EXPR)
+		{
+		  tree op0 = TREE_OPERAND (op, 0);
+		  tree op1 = TREE_OPERAND (op, 1);
+		  if (op0 == use || op1 == use)
+		{
+		  if (TREE_CODE (op0) == INTEGER_CST)
+			op0 = convert_int_cst (promoted_type, op0, SIGNED);
+		  if (TREE_CODE (op1) == INTEGER_CST)
+			op1 = convert_int_cst (promoted_type, op1, SIGNED);
+		  tree temp = build2 (code, promoted_type, op0, op1);
+		  new_op = build1 (NOP_EXPR, old_type, temp);
+		}
+		}
+
+	  /* Create new GIMPLE_DEBUG stmt with the new value (new_op) to
+		 be bound, if new value has been calculated */
+	  if (new_op)
+		{
+		  if (gimple_debug_bind_p (stmt))
+		{
+		  copy = gimple_build_debug_bind
+			(gimple_debug_bind_get_var (stmt),
+			 new_op,
+			 stmt);
+		}
+		  if (gimple_debug_source_bind_p (stmt))
+		{
+		  copy = gimple_build_debug_source_bind
+			(gimple_debug_source_bind_get_var (stmt), new_op,
+			 stmt);
+		}
+
+		  if (copy)
+		{
+		  gsi = gsi_for_stmt (stmt);
+		  gsi_replace (&gsi, copy, false);
+		}
+		}
 	}
+	  break;
 
 	case GIMPLE_ASM:
 	case GIMPLE_CALL:
-- 
1.9.1

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-10-20 Thread Kugan

On 07/09/15 12:53, Kugan wrote:
> 
> This a new version of the patch posted in
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
> more testing and spitted the patch to make it more easier to review.
> There are still couple of issues to be addressed and I am working on them.
> 
> 1. AARCH64 bootstrap now fails with the commit
> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is mis-compiled
> in stage2 and fwprop.c is failing. It looks to me that there is a latent
> issue which gets exposed my patch. I can also reproduce this in x86_64
> if I use the same PROMOTE_MODE which is used in aarch64 port. For the
> time being, I am using  patch
> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
> workaround. This meeds to be fixed before the patches are ready to be
> committed.
> 
> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
> -O3 -g Error: unaligned opcodes detected in executable segment. It works
> fine if I remove the -g. I am looking into it and needs to be fixed as well.

Hi Richard,

Now that stage 1 is going to close, I would like to get these patches
accepted for stage1. I will try my best to address your review comments
ASAP.

* Issue 1 above (AARCH64 bootstrap now fails with the commit) is no
longer present as it is fixed in trunk. Patch-6 is no longer needed.

* Issue 2 is also reported as known issue

*  Promotion of PARM_DECLs and RESULT_DECLs in IPA pass and patterns in
match.pd for SEXT_EXPR, I would like to propose them as a follow up
patch once this is accepted.

* I am happy to turn this pass off by default till IPA and match.pd
changes are accepted. I can do regular testing to make sure that this
pass works properly till we enable it by default.

Please let me know what you think,

Thanks,
Kugan

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-10-22 Thread Kugan



On 21/10/15 23:45, Richard Biener wrote:
> On Tue, Oct 20, 2015 at 10:03 PM, Kugan
>  wrote:
>>
>>
>> On 07/09/15 12:53, Kugan wrote:
>>>
>>> This a new version of the patch posted in
>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
>>> more testing and spitted the patch to make it more easier to review.
>>> There are still couple of issues to be addressed and I am working on them.
>>>
>>> 1. AARCH64 bootstrap now fails with the commit
>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is mis-compiled
>>> in stage2 and fwprop.c is failing. It looks to me that there is a latent
>>> issue which gets exposed my patch. I can also reproduce this in x86_64
>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the
>>> time being, I am using  patch
>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
>>> workaround. This meeds to be fixed before the patches are ready to be
>>> committed.
>>>
>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
>>> -O3 -g Error: unaligned opcodes detected in executable segment. It works
>>> fine if I remove the -g. I am looking into it and needs to be fixed as well.
>>
>> Hi Richard,
>>
>> Now that stage 1 is going to close, I would like to get these patches
>> accepted for stage1. I will try my best to address your review comments
>> ASAP.
> 
> Ok, can you make the whole patch series available so I can poke at the
> implementation a bit?  Please state the revision it was rebased on
> (or point me to a git/svn branch the work resides on).
> 

Thanks. Please find the patched rebated against trunk@229156. I have
skipped the test-case readjustment patches.


Thanks,
Kugan
>From 2dc1cccfc59ae6967928b52396227b52a50803d9 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Thu, 22 Oct 2015 10:54:31 +1100
Subject: [PATCH 4/4] debug stmt in widen mode

---
 gcc/gimple-ssa-type-promote.c | 82 +--
 1 file changed, 79 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-ssa-type-promote.c b/gcc/gimple-ssa-type-promote.c
index e62a7c6..c0b6aa1 100644
--- a/gcc/gimple-ssa-type-promote.c
+++ b/gcc/gimple-ssa-type-promote.c
@@ -589,10 +589,86 @@ fixup_uses (tree use, tree promoted_type, tree old_type)
 	{
 	case GIMPLE_DEBUG:
 	{
-	  gsi = gsi_for_stmt (stmt);
-	  gsi_remove (&gsi, true);
-	  break;
+	  /* Change the GIMPLE_DEBUG stmt such that the value bound is
+		 computed in promoted_type and then converted to required
+		 type.  */
+	  tree op, new_op = NULL_TREE;
+	  gdebug *copy = NULL, *gs = as_a  (stmt);
+	  enum tree_code code;
+
+	  /* Get the value that is bound in debug stmt.  */
+	  switch (gs->subcode)
+		{
+		case GIMPLE_DEBUG_BIND:
+		  op = gimple_debug_bind_get_value (gs);
+		  break;
+		case GIMPLE_DEBUG_SOURCE_BIND:
+		  op = gimple_debug_source_bind_get_value (gs);
+		  break;
+		default:
+		  gcc_unreachable ();
+		}
+
+	  code = TREE_CODE (op);
+	  /* Convert the value computed in promoted_type to
+		 old_type.  */
+	  if (code == SSA_NAME && use == op)
+		new_op = build1 (NOP_EXPR, old_type, use);
+	  else if (TREE_CODE_CLASS (TREE_CODE (op)) == tcc_unary
+		   && code != NOP_EXPR)
+		{
+		  tree op0 = TREE_OPERAND (op, 0);
+		  if (op0 == use)
+		{
+		  tree temp = build1 (code, promoted_type, op0);
+		  new_op = build1 (NOP_EXPR, old_type, temp);
+		}
+		}
+	  else if (TREE_CODE_CLASS (TREE_CODE (op)) == tcc_binary
+		   /* Skip codes that are rejected in safe_to_promote_use_p.  */
+		   && code != LROTATE_EXPR
+		   && code != RROTATE_EXPR
+		   && code != COMPLEX_EXPR)
+		{
+		  tree op0 = TREE_OPERAND (op, 0);
+		  tree op1 = TREE_OPERAND (op, 1);
+		  if (op0 == use || op1 == use)
+		{
+		  if (TREE_CODE (op0) == INTEGER_CST)
+			op0 = convert_int_cst (promoted_type, op0, SIGNED);
+		  if (TREE_CODE (op1) == INTEGER_CST)
+			op1 = convert_int_cst (promoted_type, op1, SIGNED);
+		  tree temp = build2 (code, promoted_type, op0, op1);
+		  new_op = build1 (NOP_EXPR, old_type, temp);
+		}
+		}
+
+	  /* Create new GIMPLE_DEBUG stmt with the new value (new_op) to
+		 be bound, if new value has been calculated */
+	  if (new_op)
+		{
+		  if (gimple_debug_bind_p (stmt))
+		{
+		  copy = gimple_build_debug_bind
+			(gimple_debug_bind_get_var (stmt),
+			 new_op,
+			 stmt);
+		}
+		  if (gimple_debug_source_bind_p (stmt))
+		{
+		  copy = gimple_build_debug_source_bind
+			(gimple_debug_source_bind_get_var (stmt), new_op,
+			 stmt);
+		}
+
+		  if (copy)
+		{
+		  gsi = gsi_for_stm

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-10-26 Thread kugan




On 23/10/15 01:23, Richard Biener wrote:

On Thu, Oct 22, 2015 at 12:50 PM, Kugan
 wrote:



On 21/10/15 23:45, Richard Biener wrote:

On Tue, Oct 20, 2015 at 10:03 PM, Kugan
 wrote:



On 07/09/15 12:53, Kugan wrote:


This a new version of the patch posted in
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
more testing and spitted the patch to make it more easier to review.
There are still couple of issues to be addressed and I am working on them.

1. AARCH64 bootstrap now fails with the commit
94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is mis-compiled
in stage2 and fwprop.c is failing. It looks to me that there is a latent
issue which gets exposed my patch. I can also reproduce this in x86_64
if I use the same PROMOTE_MODE which is used in aarch64 port. For the
time being, I am using  patch
0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
workaround. This meeds to be fixed before the patches are ready to be
committed.

2. vector-compare-1.c from c-c++-common/torture fails to assemble with
-O3 -g Error: unaligned opcodes detected in executable segment. It works
fine if I remove the -g. I am looking into it and needs to be fixed as well.


Hi Richard,

Now that stage 1 is going to close, I would like to get these patches
accepted for stage1. I will try my best to address your review comments
ASAP.


Ok, can you make the whole patch series available so I can poke at the
implementation a bit?  Please state the revision it was rebased on
(or point me to a git/svn branch the work resides on).



Thanks. Please find the patched rebated against trunk@229156. I have
skipped the test-case readjustment patches.


Some quick observations.  On x86_64 when building


Hi Richard,

Thanks for the review.


short bar (short y);
int foo (short x)
{
   short y = bar (x) + 15;
   return y;
}

with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs)
I get

   :
   _1 = (int) x_10(D);
   _2 = (_1) sext (16);
   _11 = bar (_2);
   _5 = (int) _11;
   _12 = (unsigned int) _5;
   _6 = _12 & 65535;
   _7 = _6 + 15;
   _13 = (int) _7;
   _8 = (_13) sext (16);
   _9 = (_8) sext (16);
   return _9;

which looks fine but the VRP optimization doesn't trigger for the redundant sext
(ranges are computed correctly but the 2nd extension is not removed).

This also makes me notice trivial match.pd patterns are missing, like
for example

(simplify
  (sext (sext@2 @0 @1) @3)
  (if (tree_int_cst_compare (@1, @3) <= 0)
   @2
   (sext @0 @3)))

as VRP doesn't run at -O1 we must rely on those to remove rendudant extensions,
otherwise generated code might get worse compared to without the pass(?)


Do you think that we should enable this pass only when vrp is enabled. 
Otherwise, even when we do the simple optimizations you mentioned below, 
we might not be able to remove all the redundancies.




I also notice that the 'short' argument does not get it's sign-extension removed
as redundand either even though we have

_1 = (int) x_8(D);
Found new range for _1: [-32768, 32767]



I am looking into it.


In the end I suspect that keeping track of the "simple" cases in the promotion
pass itself (by keeping a lattice) might be a good idea (after we fix VRP to do
its work).  In some way whether the ABI guarantees promoted argument
registers might need some other target hook queries.

Now onto the 0002 patch.

+static bool
+type_precision_ok (tree type)
+{
+  return (TYPE_PRECISION (type)  == 8
+ || TYPE_PRECISION (type) == 16
+ || TYPE_PRECISION (type) == 32);
+}

that's a weird function to me.  You probably want
TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
here?  And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P?



I will change this. (I have a patch which I am testing with other 
changes you have asked for)



+/* Return the promoted type for TYPE.  */
+static tree
+get_promoted_type (tree type)
+{
+  tree promoted_type;
+  enum machine_mode mode;
+  int uns;
+  if (POINTER_TYPE_P (type)
+  || !INTEGRAL_TYPE_P (type)
+  || !type_precision_ok (type))
+return type;
+
+  mode = TYPE_MODE (type);
+#ifdef PROMOTE_MODE
+  uns = TYPE_SIGN (type);
+  PROMOTE_MODE (mode, uns, type);
+#endif
+  uns = TYPE_SIGN (type);
+  promoted_type = lang_hooks.types.type_for_mode (mode, uns);
+  if (promoted_type
+  && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type)))
+type = promoted_type;

I think what you want to verify is that TYPE_PRECISION (promoted_type)
== GET_MODE_PRECISION (mode).
And to not even bother with this simply use

promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), uns);



I am changing this too.


You use a domwalk but also might create new basic-blocks during it
(insert_on_edge_immediate), that's a
no-no, commit edge inserts after the domwalk.


I am sorry, I dont understand "commit edge inserts after t

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-11-02 Thread Kugan



On 29/10/15 02:45, Richard Biener wrote:
> On Tue, Oct 27, 2015 at 1:50 AM, kugan
>  wrote:
>>
>>
>> On 23/10/15 01:23, Richard Biener wrote:
>>>
>>> On Thu, Oct 22, 2015 at 12:50 PM, Kugan
>>>  wrote:
>>>>
>>>>
>>>>
>>>> On 21/10/15 23:45, Richard Biener wrote:
>>>>>
>>>>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan
>>>>>  wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 07/09/15 12:53, Kugan wrote:
>>>>>>>
>>>>>>>
>>>>>>> This a new version of the patch posted in
>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
>>>>>>> more testing and spitted the patch to make it more easier to review.
>>>>>>> There are still couple of issues to be addressed and I am working on
>>>>>>> them.
>>>>>>>
>>>>>>> 1. AARCH64 bootstrap now fails with the commit
>>>>>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is
>>>>>>> mis-compiled
>>>>>>> in stage2 and fwprop.c is failing. It looks to me that there is a
>>>>>>> latent
>>>>>>> issue which gets exposed my patch. I can also reproduce this in x86_64
>>>>>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the
>>>>>>> time being, I am using  patch
>>>>>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
>>>>>>> workaround. This meeds to be fixed before the patches are ready to be
>>>>>>> committed.
>>>>>>>
>>>>>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
>>>>>>> -O3 -g Error: unaligned opcodes detected in executable segment. It
>>>>>>> works
>>>>>>> fine if I remove the -g. I am looking into it and needs to be fixed as
>>>>>>> well.
>>>>>>
>>>>>>
>>>>>> Hi Richard,
>>>>>>
>>>>>> Now that stage 1 is going to close, I would like to get these patches
>>>>>> accepted for stage1. I will try my best to address your review comments
>>>>>> ASAP.
>>>>>
>>>>>
>>>>> Ok, can you make the whole patch series available so I can poke at the
>>>>> implementation a bit?  Please state the revision it was rebased on
>>>>> (or point me to a git/svn branch the work resides on).
>>>>>
>>>>
>>>> Thanks. Please find the patched rebated against trunk@229156. I have
>>>> skipped the test-case readjustment patches.
>>>
>>>
>>> Some quick observations.  On x86_64 when building
>>
>>
>> Hi Richard,
>>
>> Thanks for the review.
>>
>>>
>>> short bar (short y);
>>> int foo (short x)
>>> {
>>>short y = bar (x) + 15;
>>>return y;
>>> }
>>>
>>> with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs)
>>> I get
>>>
>>>:
>>>_1 = (int) x_10(D);
>>>_2 = (_1) sext (16);
>>>_11 = bar (_2);
>>>_5 = (int) _11;
>>>_12 = (unsigned int) _5;
>>>_6 = _12 & 65535;
>>>_7 = _6 + 15;
>>>_13 = (int) _7;
>>>_8 = (_13) sext (16);
>>>_9 = (_8) sext (16);
>>>return _9;
>>>
>>> which looks fine but the VRP optimization doesn't trigger for the
>>> redundant sext
>>> (ranges are computed correctly but the 2nd extension is not removed).

Thanks for the comments. Please fond the attached patches with which I
am now getting
cat .192t.optimized

;; Function foo (foo, funcdef_no=0, decl_uid=1406, cgraph_uid=0,
symbol_order=0)

foo (short int x)
{
  signed int _1;
  int _2;
  signed int _5;
  unsigned int _6;
  unsigned int _7;
  signed int _8;
  int _9;
  short int _11;
  unsigned int _12;
  signed int _13;

  :
  _1 = (signed int) x_10(D);
  _2 = _1;
  _11 = bar (_2);
  _5 = (signed int) _11;
  _12 = (unsigned int) _11;
  _6 = _12 & 65535;
  _7 = _6 + 15;
  _13 = (signed int) _7;
  _8 = (_13) sext (16);
  _9 = _8;
  return _9;

}


There are still some redundancies. The asm difference after RTL
optimizations is

-   addl$15, %eax
+   addw$15, %ax


>>>
>>&g

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-11-08 Thread Kugan

_154;


> +  || code == VIEW_CONVERT_EXPR
> +  || code == LROTATE_EXPR
> +  || code == RROTATE_EXPR
> +  || code == CONSTRUCTOR
> +  || code == BIT_FIELD_REF
> +  || code == COMPLEX_EXPR
> +  || code == ASM_EXPR
> +  || VECTOR_TYPE_P (TREE_TYPE (lhs)))
> +return false;
> +  return true;
> 
> ASM_EXPR can never appear here.  I think PROMOTE_MODE never
> promotes vector types - what cases did you need to add VECTOR_TYPE_P for?

Done
> 
> +/* Return true if the SSA_NAME has to be truncated to preserve the
> +   semantics.  */
> +static bool
> +truncate_use_p (gimple *stmt)
> +{
> +  enum tree_code code = gimple_assign_rhs_code (stmt);
> 
> I think the description can be improved.  This is about stray bits set
> beyond the original type, correct?
> 
> Please use NOP_EXPR wherever you use CONVERT_EXPR right how.
> 
> + if (TREE_CODE_CLASS (code)
> + == tcc_comparison)
> +   promote_cst_in_stmt (stmt, promoted_type, true);
> 
> don't you always need to promote constant operands?

I am promoting all the constants. Here, I am promoting the the constants
that are part of the conditions.


Thanks,
Kugan
>From a25f711713778cd3ed3d0976cc3f37d541479afb Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Thu, 22 Oct 2015 10:53:56 +1100
Subject: [PATCH 3/4] Optimize ZEXT_EXPR with tree-vrp

---
 gcc/match.pd   |  6 ++
 gcc/tree-vrp.c | 59 ++
 2 files changed, 65 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 0a9598e..1b152f1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2585,3 +2585,9 @@ along with GCC; see the file COPYING3.  If not see
   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
(op @0 (ext @1 @2)
 
+(simplify
+ (sext (sext@2 @0 @1) @3)
+ (if (tree_int_cst_compare (@1, @3) <= 0)
+  @2
+  (sext @0 @3)))
+
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index fe34ffd..671a388 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -2241,6 +2241,7 @@ extract_range_from_binary_expr_1 (value_range *vr,
   && code != LSHIFT_EXPR
   && code != MIN_EXPR
   && code != MAX_EXPR
+  && code != SEXT_EXPR
   && code != BIT_AND_EXPR
   && code != BIT_IOR_EXPR
   && code != BIT_XOR_EXPR)
@@ -2801,6 +2802,52 @@ extract_range_from_binary_expr_1 (value_range *vr,
   extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1);
   return;
 }
+  else if (code == SEXT_EXPR)
+{
+  gcc_assert (range_int_cst_p (&vr1));
+  HOST_WIDE_INT prec = tree_to_uhwi (vr1.min);
+  type = vr0.type;
+  wide_int tmin, tmax;
+  wide_int may_be_nonzero, must_be_nonzero;
+
+  wide_int type_min = wi::min_value (prec, SIGNED);
+  wide_int type_max = wi::max_value (prec, SIGNED);
+  type_min = wide_int_to_tree (expr_type, type_min);
+  type_max = wide_int_to_tree (expr_type, type_max);
+  wide_int sign_bit
+	= wi::set_bit_in_zero (prec - 1,
+			   TYPE_PRECISION (TREE_TYPE (vr0.min)));
+  if (zero_nonzero_bits_from_vr (expr_type, &vr0,
+ &may_be_nonzero,
+ &must_be_nonzero))
+	{
+	  if (wi::bit_and (must_be_nonzero, sign_bit) == sign_bit)
+	{
+	  /* If to-be-extended sign bit is one.  */
+	  tmin = type_min;
+	  tmax = wi::zext (may_be_nonzero, prec);
+	}
+	  else if (wi::bit_and (may_be_nonzero, sign_bit)
+		   != sign_bit)
+	{
+	  /* If to-be-extended sign bit is zero.  */
+	  tmin = wi::zext (must_be_nonzero, prec);
+	  tmax = wi::zext (may_be_nonzero, prec);
+	}
+	  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+	}
+  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+  min = wide_int_to_tree (expr_type, tmin);
+  max = wide_int_to_tree (expr_type, tmax);
+}
   else if (code == RSHIFT_EXPR
 	   || code == LSHIFT_EXPR)
 {
@@ -9166,6 +9213,17 @@ simplify_bit_ops_using_ranges (gimple_stmt_iterator *gsi, gimple *stmt)
 	  break;
 	}
   break;
+case SEXT_EXPR:
+	{
+	  unsigned int prec = tree_to_uhwi (op1);
+	  wide_int min = vr0.min;
+	  wide_int max = vr0.max;
+	  wide_int sext_min = wi::sext (min, prec);
+	  wide_int sext_max = wi::sext (max, prec);
+	  if (min == sext_min && max == sext_max)
+	op = op0;
+	}
+  break;
 default:
   gcc_unreachable ();
 }
@@ -9868,6 +9926,7 @@ simplify_stmt_using_ranges (gimple_stmt_iterator *gsi)
 
 	case BIT_AND_EXPR:
 	case BIT_IOR_EXPR:
+	case SEXT_EXPR:
 	  /* Optimize away BIT_AND_EXPR and BIT_IOR_EXPR
 	 if all the bits being cleared are already cleared or
 	 all the bits being set are already set.  */
-- 
1.9.1

>From f1b226443b63eda75f38f204a0befa5578e6df0f Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Thu, 22 Oct 201

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-11-11 Thread Kugan

/gcc/tree-ssa.c
>> @@ -752,7 +752,8 @@ verify_use (basic_block bb, basic_block def_bb,
>> use_operand_p use_p,
>>TREE_VISITED (ssa_name) = 1;
>>
>>if (gimple_nop_p (SSA_NAME_DEF_STMT (ssa_name))
>> -  && SSA_NAME_IS_DEFAULT_DEF (ssa_name))
>> +  && (SSA_NAME_IS_DEFAULT_DEF (ssa_name)
>> + || SSA_NAME_VAR (ssa_name) == NULL))
>>  ; /* Default definitions have empty statements.  Nothing to do.  */
>>else if (!def_bb)
>>  {
>>
>> Does this look OK?
> 
> Hmm, no, this looks bogus.

I have removed all the above.

> 
> I think the best thing to do is not promoting default defs at all and instead
> promote at the uses.
> 
>   /* Create a promoted copy of parameters.  */
>   bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>   gcc_assert (bb);
>   gsi2 = gsi_after_labels (bb);
>   new_def = copy_ssa_name (def);
>   set_ssa_promoted (new_def);
>   set_ssa_default_def (cfun, SSA_NAME_VAR (def), new_def);
>   duplicate_default_ssa (new_def, def);
>   TREE_TYPE (def) = promoted_type;
> 
> AFAIK this is just an awkward way of replacing all uses by a new DEF, sth
> that should be supported by the machinery so that other default defs can just
> do
> 
>  new_def = get_or_create_default_def (create_tmp_reg
> (promoted_type));
> 
> and have all uses ('def') replaced by new_def.

I experimented with get_or_create_default_def. Here  we have to have a
SSA_NAME_VAR (def) of promoted type.

In the attached patch I am doing the following and seems to work. Does
this looks OK?

+ }
+   else if (TREE_CODE (SSA_NAME_VAR (def)) != PARM_DECL)
+ {
+   tree var = copy_node (SSA_NAME_VAR (def));
+   TREE_TYPE (var) = promoted_type;
+   TREE_TYPE (def) = promoted_type;
+   SET_SSA_NAME_VAR_OR_IDENTIFIER (def, var);
+ }

I prefer to promote def as otherwise iterating over the uses and
promoting can look complicated (have to look at all the different types
of stmts again and do the right thing as It was in the earlier version
of this before we move to this approach)

>>>
>>> Note that as followup things like the rotates should be "expanded" like
>>> we'd do on RTL (open-coding the thing).  And we'd need a way to
>>> specify zero-/sign-extended loads.
>>>
>>> +/* Return true if it is safe to promote the use in the STMT.  */
>>> +static bool
>>> +safe_to_promote_use_p (gimple *stmt)
>>> +{
>>> +  enum tree_code code = gimple_assign_rhs_code (stmt);
>>> +  tree lhs = gimple_assign_lhs (stmt);
>>> +
>>> +  if (gimple_vuse (stmt) != NULL_TREE
>>> +  || gimple_vdef (stmt) != NULL_TREE
>>>
>>> I think the vuse/vdef check is bogus, you can have a use of 'i_3' in say
>>> _2 = a[i_3];
>>>
>> When I remove this, I see errors in stmts like:
>>
>> unsigned char
>> unsigned int
>> # .MEM_197 = VDEF <.MEM_187>
>> fs_9(D)->fde_encoding = _154;
> 
> Yeah, as said a stmt based check is really bogus without context.  As the
> predicate is only used in a single place it's better to inline it
> there.  In this
> case you want to handle loads/stores differently.  From this context it
> looks like not iterating over uses in the caller but rather iterating over
> uses here makes most sense as you then can do
> 
>if (gimple_store_p (stmt))
>  {
> promote all uses that are not gimple_assign_rhs1 ()
>  }
> 
> you can also transparently handle constants for the cases where promoting
> is required.  At the moment their handling is interwinded with the def 
> promotion
> code.  That makes the whole thing hard to follow.


I have updated the comments with:

+/* Promote constants in STMT to TYPE.  If PROMOTE_COND_EXPR is true,
+   promote only the constants in conditions part of the COND_EXPR.
+
+   We promote the constants when the associated operands are promoted.
+   This usually means that we promote the constants when we promote the
+   defining stmnts (as part of promote_ssa). However for COND_EXPR, we
+   can promote only when we promote the other operand. Therefore, this
+   is done during fixup_use.  */


I am handling gimple_debug separately to avoid any code difference with
and without -g option. I have updated the comments for this.

Tested attached patch on ppc64, aarch64 and x86-none-linux-gnu.
regression testing for ppc64 is progressing. I also noticed that
tree-ssa-uninit sometimes gives false positives due to the assumptions
it makes. Is it OK to move this pa

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-11-13 Thread Kugan

t can
give some indication.
Basewith pass   Percentage improvement
==
arm 10476   10372   0.9927453226
aarch64 954595210.2514405448
ppc64   12236   12052   1.5037593985


After resolving the above issues, I would like propose that we  commit
the pass as not enabled by default (even though the patch as it stands
enabled by default - I am doing it for testing purposes).

Thanks,
Kugan


>From 8e71ea17eaf6f282325076f588dbdf4f53c8b865 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Thu, 22 Oct 2015 10:53:56 +1100
Subject: [PATCH 3/5] Optimize ZEXT_EXPR with tree-vrp

---
 gcc/match.pd   |  6 ++
 gcc/tree-vrp.c | 61 ++
 2 files changed, 67 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 0a9598e..1b152f1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2585,3 +2585,9 @@ along with GCC; see the file COPYING3.  If not see
   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
(op @0 (ext @1 @2)
 
+(simplify
+ (sext (sext@2 @0 @1) @3)
+ (if (tree_int_cst_compare (@1, @3) <= 0)
+  @2
+  (sext @0 @3)))
+
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index fe34ffd..024c8ef 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -2241,6 +2241,7 @@ extract_range_from_binary_expr_1 (value_range *vr,
   && code != LSHIFT_EXPR
   && code != MIN_EXPR
   && code != MAX_EXPR
+  && code != SEXT_EXPR
   && code != BIT_AND_EXPR
   && code != BIT_IOR_EXPR
   && code != BIT_XOR_EXPR)
@@ -2801,6 +2802,54 @@ extract_range_from_binary_expr_1 (value_range *vr,
   extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1);
   return;
 }
+  else if (code == SEXT_EXPR)
+{
+  gcc_assert (range_int_cst_p (&vr1));
+  HOST_WIDE_INT prec = tree_to_uhwi (vr1.min);
+  type = vr0.type;
+  wide_int tmin, tmax;
+  wide_int may_be_nonzero, must_be_nonzero;
+
+  wide_int type_min = wi::min_value (prec, SIGNED);
+  wide_int type_max = wi::max_value (prec, SIGNED);
+  type_min = wide_int_to_tree (expr_type, type_min);
+  type_max = wide_int_to_tree (expr_type, type_max);
+  type_min = wi::sext (type_min, prec);
+  type_max = wi::sext (type_max, prec);
+  wide_int sign_bit
+	= wi::set_bit_in_zero (prec - 1,
+			   TYPE_PRECISION (TREE_TYPE (vr0.min)));
+  if (zero_nonzero_bits_from_vr (expr_type, &vr0,
+ &may_be_nonzero,
+ &must_be_nonzero))
+	{
+	  if (wi::bit_and (must_be_nonzero, sign_bit) == sign_bit)
+	{
+	  /* If to-be-extended sign bit is one.  */
+	  tmin = type_min;
+	  tmax = wi::zext (may_be_nonzero, prec);
+	}
+	  else if (wi::bit_and (may_be_nonzero, sign_bit)
+		   != sign_bit)
+	{
+	  /* If to-be-extended sign bit is zero.  */
+	  tmin = wi::zext (must_be_nonzero, prec);
+	  tmax = wi::zext (may_be_nonzero, prec);
+	}
+	  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+	}
+  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+  min = wide_int_to_tree (expr_type, tmin);
+  max = wide_int_to_tree (expr_type, tmax);
+}
   else if (code == RSHIFT_EXPR
 	   || code == LSHIFT_EXPR)
 {
@@ -9166,6 +9215,17 @@ simplify_bit_ops_using_ranges (gimple_stmt_iterator *gsi, gimple *stmt)
 	  break;
 	}
   break;
+case SEXT_EXPR:
+	{
+	  unsigned int prec = tree_to_uhwi (op1);
+	  wide_int min = vr0.min;
+	  wide_int max = vr0.max;
+	  wide_int sext_min = wi::sext (min, prec);
+	  wide_int sext_max = wi::sext (max, prec);
+	  if (min == sext_min && max == sext_max)
+	op = op0;
+	}
+  break;
 default:
   gcc_unreachable ();
 }
@@ -9868,6 +9928,7 @@ simplify_stmt_using_ranges (gimple_stmt_iterator *gsi)
 
 	case BIT_AND_EXPR:
 	case BIT_IOR_EXPR:
+	case SEXT_EXPR:
 	  /* Optimize away BIT_AND_EXPR and BIT_IOR_EXPR
 	 if all the bits being cleared are already cleared or
 	 all the bits being set are already set.  */
-- 
1.9.1

>From 42128668393c32c3860d346ead7b3118a090ffa4 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Thu, 22 Oct 2015 10:52:37 +1100
Subject: [PATCH 2/5] Add type promotion pass

---
 gcc/Makefile.in   |   1 +
 gcc/auto-profile.c|   2 +-
 gcc/common.opt|   4 +
 gcc/doc/invoke.texi   |  10 +
 gcc/gimple-ssa-type-promote.c | 867 ++
 gcc/passes.def|   1 +
 gcc/timevar.def   |   1 +
 gcc/tree-pass.h   |   1 +
 libiberty/cp-demangle.c   |   2 +-
 9 files changed, 887 insertions(+), 2 deletions(-)
 create mode 100644 gcc/gimple-ssa-type-promote.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index b91b8dc..c6aed45 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1

Incorrect code due to indirect tail call of varargs function with hard float ABI

2015-11-16 Thread Kugan

Following testcase fails on ARM (from
https://bugs.linaro.org/show_bug.cgi?id=1900).

__attribute__ ((noinline))
double direct(int x, ...)
{
  return x*x;
}

__attribute__ ((noinline))
double broken(double (*indirect)(int x, ...), int v)
{
  return indirect(v);
}

int main ()
{
  double d1, d2;
  int i = 2;
  d1 = broken (direct, i);
  if (d1 != i*i)
{
  __builtin_abort ();
}
  return 0;
}


Please note that we have a sibcall from "broken" to "indirect".

"direct" is variadic function so it is conforming to AAPCS base standard.

"broken" is a non-variadic function and will return the value in
floating point register for TARGET_HARD_FLOAT. Thus we should not be
doing sibcall here.

Attached patch fixes this. Bootstrap and regression testing is ongoing.
Is this OK if no issues with the testing?

Thanks,
Kugan

gcc/ChangeLog:

2015-11-17  Kugan Vivekanandarajah  

* config/arm/arm.c (arm_function_ok_for_sibcall): Disable sibcall to
indirect function when TARGET_HARD_FLOAT.

gcc/testsuite/ChangeLog:

2015-11-17  Kugan Vivekanandarajah  

* gcc.target/arm/variadic_sibcall.c: New test.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a379121..8b560bc 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6681,6 +6681,12 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
 register.  */
   rtx a, b;
 
+  /* When it is an indirect call (i.e, decl == NULL), it could be
+returning its result in a VFP or could be a variadic function.
+Thus return false.  */
+  if (!decl && TARGET_HARD_FLOAT)
+   return false;
+
   a = arm_function_value (TREE_TYPE (exp), decl, false);
   b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)),
  cfun->decl, false);
diff --git a/gcc/testsuite/gcc.target/arm/variadic_sibcall.c 
b/gcc/testsuite/gcc.target/arm/variadic_sibcall.c
index e69de29..86f07fe 100644
--- a/gcc/testsuite/gcc.target/arm/variadic_sibcall.c
+++ b/gcc/testsuite/gcc.target/arm/variadic_sibcall.c
@@ -0,0 +1,27 @@
+/* { dg-do run }  */
+/* { dg-options "-O2" } */
+
+__attribute__ ((noinline))
+double direct(int x, ...)
+{
+  return x*x;
+}
+
+__attribute__ ((noinline))
+double broken(double (*indirect)(int x, ...), int v)
+{
+  return indirect(v);
+}
+
+int main ()
+{
+  double d1, d2;
+  int i = 2;
+  d1 = broken (direct, i);
+  if (d1 != i*i)
+{
+  __builtin_abort ();
+}
+  return 0;
+}
+

Re: Incorrect code due to indirect tail call of varargs function with hard float ABI

2015-11-16 Thread Kugan



On 17/11/15 12:00, Charles Baylis wrote:
> On 16 November 2015 at 22:24, Kugan  wrote:
> 
>> Please note that we have a sibcall from "broken" to "indirect".
>>
>> "direct" is variadic function so it is conforming to AAPCS base standard.
>>
>> "broken" is a non-variadic function and will return the value in
>> floating point register for TARGET_HARD_FLOAT. Thus we should not be
>> doing sibcall here.
>>
>> Attached patch fixes this. Bootstrap and regression testing is ongoing.
>> Is this OK if no issues with the testing?
> 
> Hi Kugan,
> 
> It looks like this patch should work, but I think this is an overly
> conservative fix, as it prevents all sibcalls for hardfloat targets.
> It would be better if only variadic sibcalls were prevented on
> hardfloat. You can check for variadic calls by checking the
> function_type in the call expression (exp) using stdarg_p().
> 
> As an example to show how to test for variadic function calls, this is
> how to test it in gdb:
> 
> (gdb) b arm_function_ok_for_sibcall
> Breakpoint 1 at 0xdae59c: file
> /home/cbaylis/srcarea/gcc/gcc-git/gcc/config/arm/arm.c, line 6634.
> (gdb) r
> ...
> Breakpoint 1, arm_function_ok_for_sibcall (decl=0x0, exp=0x76104ce8)
> at /home/cbaylis/srcarea/gcc/gcc-git/gcc/config/arm/arm.c:6634
> 6634  if (cfun->machine->sibcall_blocked)
> (gdb) print debug_tree(exp)
>   type  size 
> unit size 
> align 64 symtab 0 alias set -1 canonical type 0x762835e8
> precision 64
> pointer_to_this >
> side-effects addressable
> fn  type 
> ...
> (gdb) print stdarg_p((tree)0x760e9348)<--- from function_type ^
> $2 = true
> 

Hi Charles,

I wrongly thought that for indirect call we wouldn't know if it is
variadic or not. I should check stdarg_p here.

But we should really fix aapcs_allocate_return_reg as it is simply
setting  pcs_variant = arm_pcs_default without checking if this is
stdarg_p. I will send an updated patch.

Thanks,
Kugan

Re: Incorrect code due to indirect tail call of varargs function with hard float ABI

2015-11-16 Thread Kugan



On 17/11/15 12:00, Charles Baylis wrote:
> On 16 November 2015 at 22:24, Kugan  wrote:
> 
>> Please note that we have a sibcall from "broken" to "indirect".
>>
>> "direct" is variadic function so it is conforming to AAPCS base standard.
>>
>> "broken" is a non-variadic function and will return the value in
>> floating point register for TARGET_HARD_FLOAT. Thus we should not be
>> doing sibcall here.
>>
>> Attached patch fixes this. Bootstrap and regression testing is ongoing.
>> Is this OK if no issues with the testing?
> 
> Hi Kugan,
> 
> It looks like this patch should work, but I think this is an overly
> conservative fix, as it prevents all sibcalls for hardfloat targets.
> It would be better if only variadic sibcalls were prevented on
> hardfloat. You can check for variadic calls by checking the
> function_type in the call expression (exp) using stdarg_p().
> 
> As an example to show how to test for variadic function calls, this is
> how to test it in gdb:
> 
> (gdb) b arm_function_ok_for_sibcall
> Breakpoint 1 at 0xdae59c: file
> /home/cbaylis/srcarea/gcc/gcc-git/gcc/config/arm/arm.c, line 6634.
> (gdb) r
> ...
> Breakpoint 1, arm_function_ok_for_sibcall (decl=0x0, exp=0x76104ce8)
> at /home/cbaylis/srcarea/gcc/gcc-git/gcc/config/arm/arm.c:6634
> 6634  if (cfun->machine->sibcall_blocked)
> (gdb) print debug_tree(exp)
>   type  size 
> unit size 
> align 64 symtab 0 alias set -1 canonical type 0x762835e8
> precision 64
> pointer_to_this >
> side-effects addressable
> fn  type 
> ...
> (gdb) print stdarg_p((tree)0x760e9348)<--- from function_type ^
> $2 = true
> 

How about:

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a379121..2376d66 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6681,6 +6681,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
 register.  */
   rtx a, b;

+  /* If it is an indirect function pointer, get the function type.  */
+  if (!decl
+ && POINTER_TYPE_P (TREE_TYPE (CALL_EXPR_FN (exp)))
+ && (TREE_CODE (TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp
+ == FUNCTION_TYPE))
+   decl = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
+
   a = arm_function_value (TREE_TYPE (exp), decl, false);
   b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)),
  cfun->decl, false);


Thanks,
Kugan

Re: Incorrect code due to indirect tail call of varargs function with hard float ABI

2015-11-17 Thread Kugan



On 17/11/15 21:05, Ramana Radhakrishnan wrote:
> Hi Kugan,
> 
> It does look like an issue.
> 
> Please open a bug report.
> 
>>
>>
>> On 17/11/15 12:00, Charles Baylis wrote:
>>> On 16 November 2015 at 22:24, Kugan  
>>> wrote:
>>>
>>>> Please note that we have a sibcall from "broken" to "indirect".
>>>>
>>>> "direct" is variadic function so it is conforming to AAPCS base standard.
>>>>
>>>> "broken" is a non-variadic function and will return the value in
>>>> floating point register for TARGET_HARD_FLOAT. Thus we should not be
>>>> doing sibcall here.
>>>>
>>>> Attached patch fixes this. Bootstrap and regression testing is ongoing.
>>>> Is this OK if no issues with the testing?
>>>
>>> Hi Kugan,
>>>
>>> It looks like this patch should work, but I think this is an overly
>>> conservative fix, as it prevents all sibcalls for hardfloat targets.
>>> It would be better if only variadic sibcalls were prevented on
>>> hardfloat. You can check for variadic calls by checking the
>>> function_type in the call expression (exp) using stdarg_p().
>>>
>>> As an example to show how to test for variadic function calls, this is
>>> how to test it in gdb:
>>>
>>> (gdb) b arm_function_ok_for_sibcall
>>> Breakpoint 1 at 0xdae59c: file
>>> /home/cbaylis/srcarea/gcc/gcc-git/gcc/config/arm/arm.c, line 6634.
>>> (gdb) r
>>> ...
>>> Breakpoint 1, arm_function_ok_for_sibcall (decl=0x0, exp=0x76104ce8)
>>> at /home/cbaylis/srcarea/gcc/gcc-git/gcc/config/arm/arm.c:6634
>>> 6634  if (cfun->machine->sibcall_blocked)
>>> (gdb) print debug_tree(exp)
>>>  >> type >> size 
>>> unit size 
>>> align 64 symtab 0 alias set -1 canonical type 0x762835e8
>>> precision 64
>>> pointer_to_this >
>>> side-effects addressable
>>> fn >> type >> 0x760e9348>
>>> ...
>>> (gdb) print stdarg_p((tree)0x760e9348)<--- from function_type ^
>>> $2 = true
>>>
>>
>> How about:
> 
> 
> 
> A run time testcase and a changelog would also be needed.
> 
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index a379121..2376d66 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -6681,6 +6681,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
>>  register.  */
>>rtx a, b;
>>
>> +  /* If it is an indirect function pointer, get the function type.  */
>> +  if (!decl
>> + && POINTER_TYPE_P (TREE_TYPE (CALL_EXPR_FN (exp)))
>> + && (TREE_CODE (TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp
>> + == FUNCTION_TYPE))
>> +   decl = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
>> +
> 
> If decl is null it's guaranteed to be an indirect function call - drop the 
> additional checks in the if clause.
> 
> 
>>a = arm_function_value (TREE_TYPE (exp), decl, false);
>>b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)),
>>   cfun->decl, false);
>>
> 
> 
> Please resubmit with a testcase, Changelog and after testing.

Hi Ramana,

Thanks for the review. I have opened a gcc bug-report for this. I tested
the attached patch for  arm-none-linux-gnueabihf and
arm-none-linux-gnueabi with no new regressions. Is this OK?


Thanks,
Kugan

gcc/ChangeLog:

2015-11-18  Kugan Vivekanandarajah  

PR target/68390
* config/arm/arm.c (arm_function_ok_for_sibcall): Get function type
for indirect function call.

gcc/testsuite/ChangeLog:

2015-11-18  Kugan Vivekanandarajah  

PR target/68390
* gcc.target/arm/PR68390.c: New test.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a379121..a4509f4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6681,6 +6681,10 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
 register.  */
   rtx a, b;
 
+  /* If it is an indirect function pointer, get the function type.  */
+  if (!decl)
+   decl = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
+
   a = arm_function_value (TREE_TYPE (exp), decl, false);
   b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)),
  cfun->decl, false);
diff --git a/gcc/testsuite/gcc.target/arm/PR68390.c 
b/gcc/testsuite/gcc.target/arm/PR68390.c
index e69de29..86f07fe 100644
--- a/gcc/testsuite/gcc.target/arm/PR68390.c
+++ b/gcc/testsuite/gcc.target/arm/PR68390.c
@@ -0,0 +1,27 @@
+/* { dg-do run }  */
+/* { dg-options "-O2" } */
+
+__attribute__ ((noinline))
+double direct(int x, ...)
+{
+  return x*x;
+}
+
+__attribute__ ((noinline))
+double broken(double (*indirect)(int x, ...), int v)
+{
+  return indirect(v);
+}
+
+int main ()
+{
+  double d1, d2;
+  int i = 2;
+  d1 = broken (direct, i);
+  if (d1 != i*i)
+{
+  __builtin_abort ();
+}
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/variadic_sibcall.c 
b/gcc/testsuite/gcc.target/arm/variadic_sibcall.c
deleted file mode 100644
index e69de29..000

Re: Incorrect code due to indirect tail call of varargs function with hard float ABI

2015-11-17 Thread Kugan


> Hi Ramana,
> 
> Thanks for the review. I have opened a gcc bug-report for this. I tested
> the attached patch for  arm-none-linux-gnueabihf and
> arm-none-linux-gnueabi with no new regressions. Is this OK?
> 
> 
> Thanks,
> Kugan
> 
> gcc/ChangeLog:
> 
> 2015-11-18  Kugan Vivekanandarajah  
> 
>   PR target/68390
>   * config/arm/arm.c (arm_function_ok_for_sibcall): Get function type
>   for indirect function call.
> 
> gcc/testsuite/ChangeLog:
> 
> 2015-11-18  Kugan Vivekanandarajah  
> 
>   PR target/68390
>   * gcc.target/arm/PR68390.c: New test.
> 
> 
Hi Ramana,

With further testing on bare-metal, I found that for the following decl
has to be null for indirect functions.

  if (TARGET_AAPCS_BASED
  && arm_abi == ARM_ABI_AAPCS
  && decl
  && DECL_WEAK (decl))
return false;

Here is the updated patch and ChangeLog. Sorry for the noise.

Thanks,
Kugan


gcc/ChangeLog:

2015-11-18  Kugan Vivekanandarajah  

PR target/68390
* config/arm/arm.c (arm_function_ok_for_sibcall): Get function type
for indirect function call.

gcc/testsuite/ChangeLog:

2015-11-18  Kugan Vivekanandarajah  

PR target/68390
* gcc.target/arm/PR68390.c: New test.



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a379121..0dae7da 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6680,8 +6680,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
 a VFP register but then need to transfer it to a core
 register.  */
   rtx a, b;
+  tree fn_decl = decl;
 
-  a = arm_function_value (TREE_TYPE (exp), decl, false);
+  /* If it is an indirect function pointer, get the function type.  */
+  if (!decl)
+   fn_decl = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
+
+  a = arm_function_value (TREE_TYPE (exp), fn_decl, false);
   b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)),
  cfun->decl, false);
   if (!rtx_equal_p (a, b))
diff --git a/gcc/testsuite/gcc.target/arm/PR68390.c 
b/gcc/testsuite/gcc.target/arm/PR68390.c
index e69de29..86f07fe 100644
--- a/gcc/testsuite/gcc.target/arm/PR68390.c
+++ b/gcc/testsuite/gcc.target/arm/PR68390.c
@@ -0,0 +1,27 @@
+/* { dg-do run }  */
+/* { dg-options "-O2" } */
+
+__attribute__ ((noinline))
+double direct(int x, ...)
+{
+  return x*x;
+}
+
+__attribute__ ((noinline))
+double broken(double (*indirect)(int x, ...), int v)
+{
+  return indirect(v);
+}
+
+int main ()
+{
+  double d1, d2;
+  int i = 2;
+  d1 = broken (direct, i);
+  if (d1 != i*i)
+{
+  __builtin_abort ();
+}
+  return 0;
+}
+

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-11-23 Thread Kugan

Hi Richard,

Thanks for you comments. I am attaching  an updated patch with details
below.

On 19/11/15 02:06, Richard Biener wrote:
> On Wed, Nov 18, 2015 at 3:04 PM, Richard Biener
>  wrote:
>> On Sat, Nov 14, 2015 at 2:15 AM, Kugan
>>  wrote:
>>>
>>> Attached is the latest version of the patch. With the patches
>>> 0001-Add-new-SEXT_EXPR-tree-code.patch,
>>> 0002-Add-type-promotion-pass.patch and
>>> 0003-Optimize-ZEXT_EXPR-with-tree-vrp.patch.
>>>
>>> I did bootstrap on ppc64-linux-gnu, aarch64-linux-gnu and
>>> x64-64-linux-gnu and regression testing on ppc64-linux-gnu,
>>> aarch64-linux-gnu arm64-linux-gnu and x64-64-linux-gnu. I ran into three
>>> issues in ppc64-linux-gnu regression testing. There are some other test
>>> cases which needs adjustment for scanning for some patterns that are not
>>> valid now.
>>>
>>> 1. rtl fwprop was going into infinite loop. Works with the following patch:
>>> diff --git a/gcc/fwprop.c b/gcc/fwprop.c
>>> index 16c7981..9cf4f43 100644
>>> --- a/gcc/fwprop.c
>>> +++ b/gcc/fwprop.c
>>> @@ -948,6 +948,10 @@ try_fwprop_subst (df_ref use, rtx *loc, rtx
>>> new_rtx, rtx_insn *def_insn,
>>>int old_cost = 0;
>>>bool ok;
>>>
>>> +  /* Value to be substituted is the same, nothing to do.  */
>>> +  if (rtx_equal_p (*loc, new_rtx))
>>> +return false;
>>> +
>>>update_df_init (def_insn, insn);
>>>
>>>/* forward_propagate_subreg may be operating on an instruction with
>>
>> Which testcase was this on?

After re-basing the trunk, I cannot reproduce it anymore.

>>
>>> 2. gcc.dg/torture/ftrapv-1.c fails
>>> This is because we are checking for the  SImode trapping. With the
>>> promotion of the operation to wider mode, this is i think expected. I
>>> think the testcase needs updating.
>>
>> No, it is not expected.  As said earlier you need to refrain from promoting
>> integer operations that trap.  You can use ! operation_no_trapping_overflow
>> for this.
>>

I have changed this.

>>> 3. gcc.dg/sms-3.c fails
>>> It fails with  -fmodulo-sched-allow-regmoves  and OK when I remove it. I
>>> am looking into it.
>>>
>>>
>>> I also have the following issues based on the previous review (as posted
>>> in the previous patch). Copying again for the review purpose.
>>>
>>> 1.
>>>> you still call promote_ssa on both DEFs and USEs and promote_ssa looks
>>>> at SSA_NAME_DEF_STMT of the passed arg.  Please call promote_ssa just
>>>> on DEFs and fixup_uses on USEs.
>>>
>>> I am doing this to promote SSA that are defined with GIMPLE_NOP. Is
>>> there anyway to iterate over this. I have added gcc_assert to make sure
>>> that promote_ssa is called only once.
>>
>>   gcc_assert (!ssa_name_info_map->get_or_insert (def));
>>
>> with --disable-checking this will be compiled away so you need to do
>> the assert in a separate statement.
>>
>>> 2.
>>>> Instead of this you should, in promote_all_stmts, walk over all uses
>>> doing what
>>>> fixup_uses does and then walk over all defs, doing what promote_ssa does.
>>>>
>>>> +case GIMPLE_NOP:
>>>> +   {
>>>> + if (SSA_NAME_VAR (def) == NULL)
>>>> +   {
>>>> + /* Promote def by fixing its type for anonymous def.  */
>>>> + TREE_TYPE (def) = promoted_type;
>>>> +   }
>>>> + else
>>>> +   {
>>>> + /* Create a promoted copy of parameters.  */
>>>> + bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>>>>
>>>> I think the uninitialized vars are somewhat tricky and it would be best
>>>> to create a new uninit anonymous SSA name for them.  You can
>>>> have SSA_NAME_VAR != NULL and def _not_ being a parameter
>>>> btw.
>>>
>>> I experimented with get_or_create_default_def. Here  we have to have a
>>> SSA_NAME_VAR (def) of promoted type.
>>>
>>> In the attached patch I am doing the following and seems to work. Does
>>> this looks OK?
>>>
>>> + }
>>> +   else if (TREE_CODE (SSA_NAME_VAR (def)) != PARM_DECL)
>>> + {
>>> +   tree var = copy_node (SSA_NAME_VAR (def));
>>> +   TREE_TYPE (var) = promoted_type;
>>

Re: Incorrect code due to indirect tail call of varargs function with hard float ABI

2015-12-01 Thread Kugan


>>
>> gcc/ChangeLog:
>>
>> 2015-11-18  Kugan Vivekanandarajah  
>>
>>  PR target/68390
>>  * config/arm/arm.c (arm_function_ok_for_sibcall): Get function type
>>  for indirect function call.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2015-11-18  Kugan Vivekanandarajah  
>>
>>  PR target/68390
>>  * gcc.target/arm/PR68390.c: New test.
>>
> 
> s/PR/pr in the test name and put this in gcc.c-torture/execute instead - 
> there is nothing ARM specific about the test. Tests in gcc.target/arm should 
> really only be architecture specific. This isn't.
> 
>>
>>
>>
>> p.txt
>>
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index a379121..0dae7da 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -6680,8 +6680,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
>>   a VFP register but then need to transfer it to a core
>>   register.  */
>>rtx a, b;
>> +  tree fn_decl = decl;
> 
> Call it decl_or_type instead - it's really that ... 
> 
>>  
>> -  a = arm_function_value (TREE_TYPE (exp), decl, false);
>> +  /* If it is an indirect function pointer, get the function type.  */
>> +  if (!decl)
>> +fn_decl = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
>> +
> 
> This is probably just my mail client - but please watch out for indentation.
> 
>> +  a = arm_function_value (TREE_TYPE (exp), fn_decl, false);
>>b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)),
>>cfun->decl, false);
>>if (!rtx_equal_p (a, b))
> 
> 
> OK with those changes.
> 
> Ramana
> 


Hi Ramana,

This issue also remains in 4.9 and 5.0 branches. Is this OK to backport
to the release branches.

Thanks,
Kugan

Re: [PATCH] Fix phiopt ICE in Factor conversion in COND_EXPR (PR tree-optimization/66949)

2015-12-08 Thread Kugan



On 09/12/15 03:21, Marek Polacek wrote:
> The following is a conservative fix for this PR.  This is an ICE transpiring
> in the new "Factor conversion in COND_EXPR" optimization added in r225722.
> 
> Before this optimization kicks in, we have
>   :
>   ...
>   p1_32 = (short unsigned int) _20;
> 
>   :
>   ...
>   iftmp.0_18 = (short unsigned int) _20;
> 
>   :
>   ...
>   # iftmp.0_19 = PHI 
> 
> after factor_out_conditional_conversion does its work, we end up with those 
> two
> def stmts removed and instead of the PHI we'll have
> 
>   # _35 = PHI <_20(3), _20(2)>
>   iftmp.0_19 = (short unsigned int) _35;
> 
> That itself looks like a fine optimization, but after 
> factor_out_conditional_conversion
> there's
>  320   phis = phi_nodes (bb2);
>  321   phi = single_non_singleton_phi_for_edges (phis, e1, e2);
>  322   gcc_assert (phi);
> and phis look like
>   b.2_38 = PHI 
>   _35 = PHI <_20(3), _20(2)>
> so single_non_singleton_phi_for_edges returns NULL and the subsequent assert 
> triggers.
> 
> With this patch we won't ICE (and PRE should clean this up anyway), but I 
> don't know,
> maybe I should try harder to optimize even this problematical case (not sure 
> how hard
> it would be...)?

Hi Marek,

Thanks for fixing this. Yes, we can try remove the PHI in
factor_out_conditional_conversion. But as you said, it might not be
important. In any case, please let me know if you want me to try doing that.

Thanks,
Kugan


> 
> pr66949-2.c only ICEd on powerpc64le and I have verified that this patch 
> fixes it too.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2015-12-08  Marek Polacek  
> 
>   PR tree-optimization/66949
>   * tree-ssa-phiopt.c (factor_out_conditional_conversion): Return false if
>   NEW_ARG0 and NEW_ARG1 are equal.
> 
>   * gcc.dg/torture/pr66949-1.c: New test.
>   * gcc.dg/torture/pr66949-2.c: New test.
> 
> diff --git gcc/testsuite/gcc.dg/torture/pr66949-1.c 
> gcc/testsuite/gcc.dg/torture/pr66949-1.c
> index e69de29..1b765bc 100644
> --- gcc/testsuite/gcc.dg/torture/pr66949-1.c
> +++ gcc/testsuite/gcc.dg/torture/pr66949-1.c
> @@ -0,0 +1,28 @@
> +/* PR tree-optimization/66949 */
> +/* { dg-do compile } */
> +
> +int a, *b = &a, c;
> +
> +unsigned short
> +fn1 (unsigned short p1, unsigned int p2)
> +{
> +  return p2 > 1 || p1 >> p2 ? p1 : p1 << p2;
> +}
> +
> +void
> +fn2 ()
> +{
> +  int *d = &a;
> +  for (a = 0; a < -1; a = 1)
> +;
> +  if (a < 0)
> +c = 0;
> +  *b = fn1 (*d || c, *d);
> +}
> +
> +int
> +main ()
> +{
> +  fn2 ();
> +  return 0;
> +}
> diff --git gcc/testsuite/gcc.dg/torture/pr66949-2.c 
> gcc/testsuite/gcc.dg/torture/pr66949-2.c
> index e69de29..e6250a3 100644
> --- gcc/testsuite/gcc.dg/torture/pr66949-2.c
> +++ gcc/testsuite/gcc.dg/torture/pr66949-2.c
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/66949 */
> +/* { dg-do compile } */
> +
> +char a;
> +int b, c, d;
> +extern int fn2 (void);
> +
> +short
> +fn1 (short p1, short p2)
> +{
> +  return p2 == 0 ? p1 : p1 / p2;
> +}
> +
> +int
> +main (void)
> +{
> +  char e = 1;
> +  int f = 7;
> +  c = a >> f;
> +  b = fn1 (c, 0 < d <= e && fn2 ());
> +
> +  return 0;
> +}
> diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c
> index 344cd2f..caac5d5 100644
> --- gcc/tree-ssa-phiopt.c
> +++ gcc/tree-ssa-phiopt.c
> @@ -477,6 +477,11 @@ factor_out_conditional_conversion (edge e0, edge e1, 
> gphi *phi,
>   return false;
>  }
>  
> +  /* If we were to continue, we'd create a PHI with same arguments for edges
> + E0 and E1.  That could get us in trouble later, so punt.  */
> +  if (operand_equal_for_phi_arg_p (new_arg0, new_arg1))
> +return false;
> +
>/*  If arg0/arg1 have > 1 use, then this transformation actually increases
>the number of expressions evaluated at runtime.  */
>if (!has_single_use (arg0)
> 
>   Marek
>

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-12-09 Thread Kugan

Hi Riachard,

Thanks for the reviews.

I think since we have some unresolved issues here, it is best to aim for
the next stage1. I however would like any feedback so that I can
continue to improve this.

https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01063.html is also related
to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67714. I don't think
there is any agreement on this. Or is there any better place to fix this?

Thanks,
Kugan

Re: [PATCH GCC][4/5]Improve loop distribution to handle hmmer

2017-06-07 Thread kugan

>vertices[j].post
+  /* We only need to remove edges connecting vertices in the same
+strong connected component to break it.  */
+  && component == cbdata->vertices_component[j]
+  /* Check if we want to break the strong connected component or not.  */
+  && !bitmap_bit_p (cbdata->sccs_to_merge, component))
+cbdata->alias_ddrs->safe_splice (edata->alias_ddrs);
+}
+
+/* This is the main function breaking strong conected components in
+   PARTITIONS giving reduced depdendence graph RDG and data dependences
+   in DDR_TABLE.  Store data dependence relations for runtime alias
+   check in ALIAS_DDRS.  */
+
+static void
+break_alias_scc_partitions (struct graph *rdg,
+   hash_table *ddr_table,
+   vec *partitions,
+   vec *alias_ddrs)


I am not sure I understand this. When you are in 
pg_add_dependence_edges, when you record alias_ddrs for runtime checking 
you set this_dur io 1. That means you have broken the dpendency there 
itself. Were you planning to keep this_dir = 2 and break the dependency 
here ?



Thanks,
Kugan


+{
+  int i, j, num_sccs, num_sccs_no_alias;
+  /* Build partition dependence graph.  */
+  graph *pg = build_partition_graph (rdg, ddr_table, partitions, alias_ddrs);
+
+  alias_ddrs->truncate (0);
+  /* Find strong connected components in the graph, with all dependence edges
+ considered.  */
+  num_sccs = graphds_scc (pg, NULL);
+  /* All SCCs now can be broken by runtime alias checks because SCCs caused by
+ compilation time known dependences are merged before this function.  */
+  if ((unsigned) num_sccs < partitions->length ())
+{
+  struct pg_edge_callback_data cbdata;
+  auto_bitmap sccs_to_merge;
+  auto_vec scc_types;
+  struct partition *partition, *first;
+
+  /* If all paritions in a SCC has the same type, we can simply merge the
+SCC.  This loop finds out such SCCS and record them in bitmap.  */
+  bitmap_set_range (sccs_to_merge, 0, (unsigned) num_sccs);
+  for (i = 0; i < num_sccs; ++i)
+   {
+ for (j = 0; partitions->iterate (j, &first); ++j)
+   if (pg->vertices[j].component == i)
+ break;
+ for (++j; partitions->iterate (j, &partition); ++j)
+   {
+ if (pg->vertices[j].component != i)
+   continue;
+
+ if (first->type != partition->type)
+   {
+ bitmap_clear_bit (sccs_to_merge, i);
+ break;
+   }
+   }
+   }
+
+  /* Initialize callback data for traversing.  */
+  cbdata.sccs_to_merge = sccs_to_merge;
+  cbdata.alias_ddrs = alias_ddrs;
+  cbdata.vertices_component = XNEWVEC (int, pg->n_vertices);
+  /* Record the component information which will be corrupted by next
+graph scc finding call.  */
+  for (i = 0; i < pg->n_vertices; ++i)
+   cbdata.vertices_component[i] = pg->vertices[i].component;
+
+  /* Collect data dependences for runtime alias checks to break SCCs.  */
+  if (bitmap_count_bits (sccs_to_merge) != (unsigned) num_sccs)
+   {
+ /* Run SCC finding algorithm again, with alias dependence edges
+skipped.  This is to topologically sort paritions according to
+compilation time known dependence.  Note the topological order
+is stored in the form of pg's post order number.  */
+ num_sccs_no_alias = graphds_scc (pg, NULL, pg_skip_alias_edge);
+ gcc_assert (partitions->length () == (unsigned) num_sccs_no_alias);
+ /* With topological order, we can construct two subgraphs L and R.
+L contains edge  where x < y in terms of post order, while
+R contains edge  where x > y.  Edges for compilation time
+known dependence all fall in R, so we break SCCs by removing all
+(alias) edges of in subgraph L.  */
+ for_each_edge (pg, pg_collect_alias_ddrs, &cbdata);
+   }
+
+  /* For SCC that doesn't need to be broken, merge it.  */
+  for (i = 0; i < num_sccs; ++i)
+   {
+ if (!bitmap_bit_p (sccs_to_merge, i))
+   continue;
+
+ for (j = 0; partitions->iterate (j, &first); ++j)
+   if (cbdata.vertices_component[j] == i)
+ break;
+ for (++j; partitions->iterate (j, &partition); ++j)
+   {
+ struct pg_vdata *data;
+
+ if (cbdata.vertices_component[j] != i)
+   continue;
+
+ partition_merge_into (first, partition, FUSE_SAME_SCC);
+ (*partitions)[j] = NULL;
+ partition_free (partition);
+ data = (struct pg_vdata *)pg->vertices[j].data;
+ gcc_assert (data->id == j);
+ data->partition = NULL;
+   }
+   }
+}
+
+

Re: [PATCH 2/2] Enable elimination of zext/sext

2014-08-06 Thread Kugan

On 06/08/14 23:29, Richard Biener wrote:
> On Wed, Aug 6, 2014 at 3:21 PM, Kugan  
> wrote:
>> On 06/08/14 22:09, Richard Biener wrote:
>>> On Tue, Aug 5, 2014 at 4:21 PM, Jakub Jelinek  wrote:
>>>> On Tue, Aug 05, 2014 at 04:17:41PM +0200, Richard Biener wrote:
>>>>> what's the semantic of setting SRP_SIGNED_AND_UNSIGNED
>>>>> on the subreg?  That is, for the created (subreg:lhs_mode
>>>>> (reg: N))?
>>>>
>>>> SRP_SIGNED_AND_UNSIGNED on a subreg should mean that
>>>> the subreg is both zero and sign extended, which means
>>>> that the topmost bit of the narrower mode is known to be zero,
>>>> and all bits above it in the wider mode are known to be zero too.
>>>> SRP_SIGNED means that the topmost bit of the narrower mode is
>>>> either 0 or 1 and depending on that the above wider mode bits
>>>> are either all 0 or all 1.
>>>> SRP_UNSIGNED means that regardless of the topmost bit value,
>>>> all above wider mode bits are 0.
>>>
>>> Ok, then from the context of the patch we already know that
>>> either SRP_UNSIGNED or SRP_SIGNED is true which means
>>> that the value is sign- or zero-extended.
>>>
>>> I suppose inside promoted_for_type_p
>>> TYPE_MODE (TREE_TYPE (ssa)) == lhs_mode, I'm not sure
>>> why you pass !unsignedp as lhs_uns.
>>
>> In expand_expr_real_1, it is already known that it is promoted for
>> unsigned_p and we are setting SUBREG_PROMOTED_SET (temp, unsignedp).
>>
>> If we can prove that it is also promoted for !unsignedp, we can set
>> SUBREG_PROMOTED_SET (temp, SRP_SIGNED_AND_UNSIGNED).
>>
>> promoted_for_type_p should prove this based on the value range info.
>>
>>>
>>> Now, from 'ssa' alone we can't tell anything about a larger mode
>>> registers value if that is either zero- or sign-extended.  But we
>>> know that those bits are properly zero-extended if unsignedp
>>> and properly sign-extended if !unsignedp?
>>>
>>> So what the predicate tries to prove is that sign- and zero-extending
>>> results in the same larger-mode value.  This is true if the
>>> MSB of the smaller mode is not set.
>>>
>>> Let's assume that smaller mode is that of 'ssa' then the test
>>> is just
>>>
>>>   return (!tree_int_cst_sign_bit (min) && !tree_int_cst_sign_bit (max));
>>>
>>> no?
>>
>> hmm,  is this because we will never have a call to promoted_for_type_p
>> with same sign (ignoring PROMOTE_MODE) for 'ssa' and the larger mode.
>> The case with larger mode signed and 'ssa' unsigned will not work.
>> Therefore larger mode unsigned and 'ssa' signed will be the only case
>> that we should consider.
>>
>> However, with PROMOTE_MODE, isnt that we will miss some cases with this.
> 
> No, PROMOTE_MODE will still either sign- or zero-extend.  If either
> results in zeros in the upper bits then PROMOTE_MODE doesn't matter.
> 

Thanks for the explanation. Please find the attached patch that
implements this. I have updated the comments and predicate to match this.

Bootstrap tested on x86_64-unknown-linux-gnu and regression tested on
x86_64-unknown-linux-gnu and arm-none-linux-gnueabi with no new
regressions. Is this OK?

Thanks,
Kugan

gcc/
2014-08-07  Kugan Vivekanandarajah  

* calls.c (precompute_arguments): Check
 promoted_for_signed_and_unsigned_p and set the promoted mode.
(promoted_for_signed_and_unsigned_p): New function.
(expand_expr_real_1): Check promoted_for_signed_and_unsigned_p
and set the promoted mode.
* expr.h (promoted_for_signed_and_unsigned_p): New function definition.
* cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if
SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.


gcc/testsuite
2014-08-07  Kugan Vivekanandarajah  

* gcc.dg/zero_sign_ext_test.c: New test.


diff --git a/gcc/calls.c b/gcc/calls.c
index 00c5028..4285ec1 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1484,7 +1484,10 @@ precompute_arguments (int num_actuals, struct arg_data 
*args)
  args[i].initial_value
= gen_lowpart_SUBREG (mode, args[i].value);
  SUBREG_PROMOTED_VAR_P (args[i].initial_value) = 1;
- SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp);
+ if (promoted_for_signed_and_unsigned_p (args[i].tree_value, mode))
+   SUBREG_PROMOTED_SET (args[i].initial_value, 
SRP_SIGNED_AND_UNSIGNED);
+ else
+   SUBREG_PROMOTED_SET (

PR tree-optimization/52904 testcase

2014-08-09 Thread Kugan

Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52904

Tescase was generating warning: assuming signed overflow does not occur
when simplifying conditional to constant [-Wstrict-overflow] due to VRP
missing the value range.

This seems to have been fixed and the PR is now closed. However, as
requested there in the PR, I am sending this patch to add the test-case
to test-suite.


Is this OK ?

Thanks,
Kugan

gcc/testsuite


2014-08-09  Kugan Vivekanandarajah  

PR tree-optimization/52904
* gcc.dg/PR52904.c: New test.
diff --git a/gcc/testsuite/gcc.dg/PR52904.c b/gcc/testsuite/gcc.dg/PR52904.c
index e69de29..e490d23 100644
--- a/gcc/testsuite/gcc.dg/PR52904.c
+++ b/gcc/testsuite/gcc.dg/PR52904.c
@@ -0,0 +1,26 @@
+
+/* { dg-do compile } */
+/* { dg-options "-S -Wstrict-overflow -O2 -fdump-tree-vrp2" } */
+
+extern int foo (int);
+
+
+int
+wait_reading_process_output (void)
+{
+  int nfds = 0;
+  int channel;
+  for (channel = 0; channel < 1024; ++channel)
+{
+  if (foo (channel))
+   nfds++;
+}
+  if (nfds < 0)
+return 1;
+  return 0;
+}
+
+/* { dg-bogus "assuming signed overflow does not occur when simplifying\
+   conditional to constant" */
+/* { dg-final { scan-tree-dump "\\\[0, 1023\\\]" "vrp2" } } */
+/* { dg-final { cleanup-tree-dump "vrp2" } } */

Re: PR tree-optimization/52904 testcase

2014-08-11 Thread Kugan


On 11/08/14 18:03, Richard Biener wrote:
> On Sat, Aug 9, 2014 at 2:33 PM, Kugan  
> wrote:
>> Hi,
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52904
>>
>> Tescase was generating warning: assuming signed overflow does not occur
>> when simplifying conditional to constant [-Wstrict-overflow] due to VRP
>> missing the value range.
>>
>> This seems to have been fixed and the PR is now closed. However, as
>> requested there in the PR, I am sending this patch to add the test-case
>> to test-suite.
>>
>>
>> Is this OK ?
> 
> Did you verify the testcase fails before the revision that fixed it?
> Esp. the placement of the dg-bogus looks bogus to me.

I tried it on Linaro 4.9 (It should be the same in fsf gcc 4.9 branch)
and the test cases is failing there. Passes on trunk.

In any case, I have moved it to the top and reverified. I have also
trimmed the warning pattern to check as there was some changes there
from 4.9 to trunk.

> 
> Also don't use -S in dg-options, use lower-case filenames and
> avoid spurious vertical white-space.  The VRP dump scan is
> also very unspecific - I suggest to drop it entirely.
> 

Done.


Is this OK?


Thanks,
Kugan

gcc/testsuite
2014-08-12  Kugan Vivekanandarajah  

PR tree-optimization/52904
    * gcc.dg/pr52904.c: New test.



> Thanks,
> Richard.
> 
>> Thanks,
>> Kugan
>>
>> gcc/testsuite
>>
>>
>> 2014-08-09  Kugan Vivekanandarajah  
>>
>> PR tree-optimization/52904
>> * gcc.dg/PR52904.c: New test.
diff --git a/gcc/testsuite/gcc.dg/pr52904.c b/gcc/testsuite/gcc.dg/pr52904.c
index e69de29..7c04187 100644
--- a/gcc/testsuite/gcc.dg/pr52904.c
+++ b/gcc/testsuite/gcc.dg/pr52904.c
@@ -0,0 +1,24 @@
+
+/* { dg-do compile } */
+/* { dg-options "-Wstrict-overflow -O2" } */
+/* { dg-bogus "assuming signed overflow does not occur when simplifying" */
+
+extern int foo (int);
+
+int
+wait_reading_process_output (void)
+{
+  int nfds = 0;
+  int channel;
+  for (channel = 0; channel < 1024; ++channel)
+{
+  if (foo (channel))
+   nfds++;
+}
+
+  if (nfds < 0)
+return 1;
+
+  return 0;
+}
+

Re: PR tree-optimization/52904 testcase

2014-08-12 Thread Kugan

>>> Did you verify the testcase fails before the revision that fixed it?
>>> Esp. the placement of the dg-bogus looks bogus to me.
>>
>> I tried it on Linaro 4.9 (It should be the same in fsf gcc 4.9 branch)
>> and the test cases is failing there. Passes on trunk.
> 
> Well, it probably fails because of excess errors, not because of
> the dg-bogus failing.  The dg-bogus has to be on the line that
> the warning triggers on.

It was indeed excess errors and I wrongly assumed that this was the
error I should expect. I have now moved the dg-bogus to the place where
warning is being generated and verified that I am getting the error from
test for bogus messages.

> 
>> In any case, I have moved it to the top and reverified. I have also
>> trimmed the warning pattern to check as there was some changes there
>> from 4.9 to trunk.
>>
>>>
>>> Also don't use -S in dg-options, use lower-case filenames and
>>> avoid spurious vertical white-space.  The VRP dump scan is
>>> also very unspecific - I suggest to drop it entirely.
>>>
>>
>> Done.
>>
>>
>> Is this OK?
> 
> Err.
> 
> @@ -0,0 +1,24 @@
> +
> 
> Excessive vertical space
> 
> +/* { dg-do compile } */
> +/* { dg-options "-Wstrict-overflow -O2" } */
> +/* { dg-bogus "assuming signed overflow does not occur when simplifying" */
> +
I have fixed it.

Is this OK?

Thanks,
Kugan
diff --git a/gcc/testsuite/gcc.dg/pr52904.c b/gcc/testsuite/gcc.dg/pr52904.c
index e69de29..107d89e 100644
--- a/gcc/testsuite/gcc.dg/pr52904.c
+++ b/gcc/testsuite/gcc.dg/pr52904.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-Wstrict-overflow -O2" } */
+extern int foo (int);
+
+int
+wait_reading_process_output (void)
+{
+  int nfds = 0;
+  int channel;
+
+  for (channel = 0; channel < 1024; ++channel)
+{
+  if (foo (channel))
+   nfds++;
+}
+
+  if (nfds < 0) /* { dg-bogus "assuming signed overflow does not occur" } */
+return 1;
+  return 0;
+}

Re: [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for BIC,ORN,EON

2014-08-12 Thread Kugan


On 13/08/14 00:55, Alan Lawrence wrote:
> ...patch attached...
> 
> Alan Lawrence wrote:
>> [When I wrote that xor was broken on GPRs and this fixes it, I meant
>> xor_one_cmpl rather than xor, sorry!]
>>
>> The pattern for xor_one_cmpl never matched, due to the action of
>> combine_simplify_rtx; hence, separate this pattern out from that for
>> ORN/BIC.
>>
>> ORN/BIC have equivalent SIMD-reg variants, so add those for the
>> benefit of values in vector registers (e.g. passed as [u]int64x1_t
>> parameters).
>>
>> EON does not have a SIMD-reg variant; however, it seems better to
>> split it (to XOR + NOT) than to move both arguments to GPRs, perform
>> EON, and move the result back.
>>


+;; (xor (not a) b) is simplify_rtx-ed down to (not (xor a b)).
+;; eon does not operate on SIMD registers so the vector variant must be
split.
+(define_insn_and_split "*xor_one_cmpl3"
+  [(set (match_operand:GPI 0 "register_operand" "=r,w")
+(not:GPI (xor:GPI (match_operand:GPI 1 "register_operand" "r,?w")

Hi Alan,

Is there any specific reason for why you are disparaging slightly this
alternative  with ‘?’. Your earlier patch removes '!' from subdi3.

Thanks,
Kugan


+  (match_operand:GPI 2 "register_operand"
"r,w"]
+  ""
+  "eon\\t%0, %1, %2" ;; For GPR registers (only).
+  "reload_completed && (which_alternative == 1)" ;; For SIMD registers.
+  [(set (match_operand:GPI 0 "register_operand" "=w")
+(xor:GPI (match_operand:GPI 1 "register_operand" "w")
+ (match_operand:GPI 2 "register_operand" "w")))
+   (set (match_dup 0) (not:GPI (match_dup 0)))]

Re: [PATCH 2/2] Enable elimination of zext/sext

2014-08-27 Thread Kugan

On 27/08/14 20:01, Uros Bizjak wrote:
> Hello!
> 
>> 2014-08-07  Kugan Vivekanandarajah  
>>
>> * calls.c (precompute_arguments): Check
>> promoted_for_signed_and_unsigned_p and set the promoted mode.
>> (promoted_for_signed_and_unsigned_p): New function.
>> (expand_expr_real_1): Check promoted_for_signed_and_unsigned_p
>> and set the promoted mode.
>> * expr.h (promoted_for_signed_and_unsigned_p): New function definition.
>> * cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if
>> SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.
> 
> This patch regresses:
> 
> Running target unix
> FAIL: libgomp.fortran/simd7.f90   -O2  execution test
> FAIL: libgomp.fortran/simd7.f90   -Os  execution test
> 

[snip]

> When compiling this code, we have:
> 
> lhs = _63
> target = (subreg/s/v/u:SI (reg:DI 145 [ D.1694 ]) 0)
> temp = (subreg:SI (reg:DI 540) 0)
> 
> So, the code assumes that it is possible to copy (reg:DI 540) directly
> to (reg:DI 154). However, this is not the case, since we still have
> garbage in the top 32bits.
> 
> Reverting the part above fixes the runtime failure, since (insn 599) is now:
> 
> (insn 599 598 0 (set (reg:DI 145 [ D.1694 ])
> (zero_extend:DI (subreg:SI (reg:DI 540) 0))) -1
>  (nil))
> 
> It looks to me that we have also to check the temp with SUBREG_PROMOTED_*.

Sorry for the breakage. I am looking into this now and I can reproduce
it on qemu-alpha.

I have noticed the following VRP data which is used in deciding this
erroneous removal. It seems suspicious to me.

_343: [2147483652, 2147483715]
_344: [8, 134]
_345: [8, 134]

_343 = ivtmp.179_52 + 2147483645;
_344 = _343 * 2;
_345 = (integer(kind=4)) _344;

Error comes from the third statement.

Thanks,
Kugan

Re: [PATCH 2/2] Enable elimination of zext/sext

2014-08-27 Thread Kugan


On 27/08/14 23:02, Kugan wrote:
> On 27/08/14 20:01, Uros Bizjak wrote:
>> Hello!
>>
>>> 2014-08-07  Kugan Vivekanandarajah  
>>>
>>> * calls.c (precompute_arguments): Check
>>> promoted_for_signed_and_unsigned_p and set the promoted mode.
>>> (promoted_for_signed_and_unsigned_p): New function.
>>> (expand_expr_real_1): Check promoted_for_signed_and_unsigned_p
>>> and set the promoted mode.
>>> * expr.h (promoted_for_signed_and_unsigned_p): New function definition.
>>> * cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if
>>> SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.
>>
>> This patch regresses:
>>
>> Running target unix
>> FAIL: libgomp.fortran/simd7.f90   -O2  execution test
>> FAIL: libgomp.fortran/simd7.f90   -Os  execution test
>>
> 
> [snip]
> 
>> When compiling this code, we have:
>>
>> lhs = _63
>> target = (subreg/s/v/u:SI (reg:DI 145 [ D.1694 ]) 0)
>> temp = (subreg:SI (reg:DI 540) 0)
>>
>> So, the code assumes that it is possible to copy (reg:DI 540) directly
>> to (reg:DI 154). However, this is not the case, since we still have
>> garbage in the top 32bits.
>>
>> Reverting the part above fixes the runtime failure, since (insn 599) is now:
>>
>> (insn 599 598 0 (set (reg:DI 145 [ D.1694 ])
>> (zero_extend:DI (subreg:SI (reg:DI 540) 0))) -1
>>  (nil))
>>
>> It looks to me that we have also to check the temp with SUBREG_PROMOTED_*.
> 
> Sorry for the breakage. I am looking into this now and I can reproduce
> it on qemu-alpha.
> 
> I have noticed the following VRP data which is used in deciding this
> erroneous removal. It seems suspicious to me.
> 
> _343: [2147483652, 2147483715]
> _344: [8, 134]
> _345: [8, 134]
> 
> _343 = ivtmp.179_52 + 2147483645;
> _344 = _343 * 2;
> _345 = (integer(kind=4)) _344;
> 
> Error comes from the third statement.

In tree-vrp.c, in extract_range_from_binary_expr_1, there is a loss of
precision and the value_range is truncated. For the test-case provided
by Uros, it is

_344 = _343 * 2;
[...,0x10008], precision = 384
[...,0x10086], precision = 384

and it is converted to following when it goes from wide_int to tree.
[8, 134]

How about doing something like this to fix it.

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index d16fd8a..c0fb902 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -2625,6 +2625,8 @@ extract_range_from_binary_expr_1 (value_range_t *vr,
   > vrp_int_cst;
  vrp_int sizem1 = wi::mask  (prec, false);
  vrp_int size = sizem1 + 1;
+ vrp_int type_min = vrp_int_cst (TYPE_MIN_VALUE (expr_type));
+ vrp_int type_max = vrp_int_cst (TYPE_MAX_VALUE (expr_type));

  /* Extend the values using the sign of the result to PREC2.
 From here on out, everthing is just signed math no matter
@@ -2688,7 +2690,9 @@ extract_range_from_binary_expr_1 (value_range_t *vr,

  /* diff = max - min.  */
  prod2 = prod3 - prod0;
- if (wi::geu_p (prod2, sizem1))
+ if (wi::geu_p (prod2, sizem1)
+ || wi::lts_p (prod0, type_min)
+ || wi::gts_p (prod3, type_max))
{
  /* the range covers all values.  */
  set_value_range_to_varying (vr);


If this looks reasonable I will do proper testing and post the results
with the Changelog.

Thanks,
Kugan

Re: [PATCH 2/2] Enable elimination of zext/sext

2014-08-28 Thread Kugan



On 28/08/14 16:44, Marc Glisse wrote:
> On Thu, 28 Aug 2014, Kugan wrote:
> 
>> On 27/08/14 23:02, Kugan wrote:
>>> On 27/08/14 20:01, Uros Bizjak wrote:
>>>> Hello!
>>>>
>>>>> 2014-08-07  Kugan Vivekanandarajah  
>>>>>
>>>>> * calls.c (precompute_arguments): Check
>>>>> promoted_for_signed_and_unsigned_p and set the promoted mode.
>>>>> (promoted_for_signed_and_unsigned_p): New function.
>>>>> (expand_expr_real_1): Check promoted_for_signed_and_unsigned_p
>>>>> and set the promoted mode.
>>>>> * expr.h (promoted_for_signed_and_unsigned_p): New function
>>>>> definition.
>>>>> * cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if
>>>>> SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.
>>>>
>>>> This patch regresses:
>>>>
>>>> Running target unix
>>>> FAIL: libgomp.fortran/simd7.f90   -O2  execution test
>>>> FAIL: libgomp.fortran/simd7.f90   -Os  execution test
>>>>
>>>
>>> [snip]
>>>
>>>> When compiling this code, we have:
>>>>
>>>> lhs = _63
>>>> target = (subreg/s/v/u:SI (reg:DI 145 [ D.1694 ]) 0)
>>>> temp = (subreg:SI (reg:DI 540) 0)
>>>>
>>>> So, the code assumes that it is possible to copy (reg:DI 540) directly
>>>> to (reg:DI 154). However, this is not the case, since we still have
>>>> garbage in the top 32bits.
>>>>
>>>> Reverting the part above fixes the runtime failure, since (insn 599)
>>>> is now:
>>>>
>>>> (insn 599 598 0 (set (reg:DI 145 [ D.1694 ])
>>>> (zero_extend:DI (subreg:SI (reg:DI 540) 0))) -1
>>>>  (nil))
>>>>
>>>> It looks to me that we have also to check the temp with
>>>> SUBREG_PROMOTED_*.
>>>
>>> Sorry for the breakage. I am looking into this now and I can reproduce
>>> it on qemu-alpha.
>>>
>>> I have noticed the following VRP data which is used in deciding this
>>> erroneous removal. It seems suspicious to me.
>>>
>>> _343: [2147483652, 2147483715]
>>> _344: [8, 134]
>>> _345: [8, 134]
>>>
>>> _343 = ivtmp.179_52 + 2147483645;
>>> _344 = _343 * 2;
>>> _345 = (integer(kind=4)) _344;
>>>
>>> Error comes from the third statement.
>>
>> In tree-vrp.c, in extract_range_from_binary_expr_1, there is a loss of
>> precision and the value_range is truncated. For the test-case provided
>> by Uros, it is
>>
>> _344 = _343 * 2;
>> [...,0x10008], precision = 384
>> [...,0x10086], precision = 384
>>
>> and it is converted to following when it goes from wide_int to tree.
>> [8, 134]
> 
> Why do you believe that is wrong? Assuming _344 has a 32 bit type with
> wrapping overflow, this is just doing the wrapping modulo 2^32.
> 

Indeed. I missed the TYPE_OVERFLOW_WRAPS check earlier. Thanks for
pointing me to that.

Kugan

Re: [PATCH 2/2] Enable elimination of zext/sext

2014-08-28 Thread Kugan



On 27/08/14 20:07, Richard Biener wrote:
> On Wed, Aug 27, 2014 at 12:01 PM, Uros Bizjak  wrote:
>> Hello!
>>
>>> 2014-08-07  Kugan Vivekanandarajah  
>>>
>>> * calls.c (precompute_arguments): Check
>>> promoted_for_signed_and_unsigned_p and set the promoted mode.
>>> (promoted_for_signed_and_unsigned_p): New function.
>>> (expand_expr_real_1): Check promoted_for_signed_and_unsigned_p
>>> and set the promoted mode.
>>> * expr.h (promoted_for_signed_and_unsigned_p): New function definition.
>>> * cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if
>>> SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.
>>
>> This patch regresses:
>>
>> Running target unix
>> FAIL: libgomp.fortran/simd7.f90   -O2  execution test
>> FAIL: libgomp.fortran/simd7.f90   -Os  execution test
>>
>> on alphaev6-linux-gnu.
>>
>> The problem can be illustrated with attached testcase with a
>> crosscompiler to alphaev68-linux-gnu (-O2 -fopenmp). The problem is in
>> missing SImode extension after DImode shift of SImode subregs for this
>> part:
>>
>> --cut here--
>>   # test.23_12 = PHI <0(37), 1(36)>
>>   _242 = ivtmp.181_73 + 2147483645;
>>   _240 = _242 * 2;
>>   _63 = (integer(kind=4)) _240;
>>   if (ubound.6_99 <= 2)
>> goto ;
>>   else
>> goto ;
>> ;;succ:   39
>> ;;40
>>
>> ;;   basic block 39, loop depth 1
>> ;;pred:   38
>>   pretmp_337 = test.23_12 | l_76;
>>   goto ;
>> ;;succ:   45
>>
>> ;;   basic block 40, loop depth 1
>> ;;pred:   38
>>   _11 = *c_208[0];
>>   if (_11 != _63)
>> goto ;
>>   else
>> goto ;
>> --cut here--
>>
>> this expands to:
>>
>> (code_label 592 591 593 35 "" [0 uses])
>>
>> (note 593 592 0 NOTE_INSN_BASIC_BLOCK)
>>
>> ;; _63 = (integer(kind=4)) _240;
>>
>> (insn 594 593 595 (set (reg:SI 538)
>> (const_int 1073741824 [0x4000])) -1
>>  (nil))
>>
>> (insn 595 594 596 (set (reg:SI 539)
>> (plus:SI (reg:SI 538)
>> (const_int 1073741824 [0x4000]))) -1
>>  (nil))
>>
>> (insn 596 595 597 (set (reg:SI 537)
>> (plus:SI (reg:SI 539)
>> (const_int -3 [0xfffd]))) -1
>>  (expr_list:REG_EQUAL (const_int 2147483645 [0x7ffd])
>> (nil)))
>>
>> (insn 597 596 598 (set (reg:SI 536 [ D.1700 ])
>> (plus:SI (subreg/s/v/u:SI (reg:DI 144 [ ivtmp.181 ]) 0)
>> (reg:SI 537))) -1
>>  (nil))
>>
>> (insn 598 597 599 (set (reg:DI 540)
>> (ashift:DI (subreg:DI (reg:SI 536 [ D.1700 ]) 0)
>> (const_int 1 [0x1]))) -1
>>  (nil))
>>
>> (insn 599 598 0 (set (reg:DI 145 [ D.1694 ])
>> (reg:DI 540)) -1
>>  (nil))
>>
>> ...
>>
>> (note 610 609 0 NOTE_INSN_BASIC_BLOCK)
>>
>> ;; _11 = *c_208[0];
>>
>> (insn 611 610 0 (set (reg:DI 120 [ D.1694 ])
>> (sign_extend:DI (mem:SI (reg/v/f:DI 227 [ c ]) [7 *c_208+0 S4
>> A128]))) simd7.f90:12 -1
>>  (nil))
>>
>> ;; if (_11 != _63)
>>
>> (insn 612 611 613 40 (set (reg:DI 545)
>> (eq:DI (reg:DI 120 [ D.1694 ])
>> (reg:DI 145 [ D.1694 ]))) simd7.f90:12 -1
>>  (nil))
>>
>> (jump_insn 613 612 616 40 (set (pc)
>> (if_then_else (eq (reg:DI 545)
>> (const_int 0 [0]))
>> (label_ref 0)
>> (pc))) simd7.f90:12 -1
>>  (int_list:REG_BR_PROB 450 (nil)))
>>
>> which results in following asm:
>>
>> $L35:
>> addl $25,$7,$2 # 597addsi3/1[length = 4]
>> addq $2,$2,$2 # 598ashldi3/1[length = 4] <-- here
>> bne $24,$L145 # 601*bcc_normal[length = 4]
>> lda $4,4($20) # 627*adddi_internal/2[length = 4]
>> ldl $8,0($20) # 611*extendsidi2_1/2[length = 4]
>> lda $3,3($31) # 74*movdi/2[length = 4]
>> cmpeq $8,$2,$2 # 612*setcc_internal[length = 4]  <-- compare
>> bne $2,$L40 # 613*bcc_normal[length = 4]
>> br $31,$L88 # 2403jump[length = 4]
>> .align 4
>> ...
>>
>> Tracking the values with the debugger shows wrong calculation:
>>
>>0x00012000108c <+1788>:  addlt10,t12,t1
>>0x000120001090 <+17

Re: [PATCH 2/2] Enable elimination of zext/sext

2014-09-03 Thread Kugan

>> I added this part of the code (in cfgexpand.c) to handle binary/unary/..
>> gimple operations and used the LHS value range to infer the assigned
>> value range. I will revert this part of the code as this is wrong.
>>
>> I dont think checking promoted_mode for temp will be necessary here as
>> convert_move will handle it correctly if promoted_mode is set for temp.
>>
>> Thus, I will reimplement setting promoted_mode to temp (in
>> expand_expr_real_2) based on the gimple statement content on RHS. i.e.
>> by looking at the RHS operands and its value ranges and by calculating
>> the resulting value range. Does this sound OK to you.
> 
> No, this sounds backward again and won't work because those operands
> again could be just truncated - thus you can't rely on their value-range.
> 
> What you would need is VRP computing value-ranges in the promoted
> mode from the start (and it doesn't do that).


Hi Richard,

Here is an attempt to do the value range computation in promoted_mode's
type when it is overflowing. Bootstrapped on x86-84.

Based on your feedback, I will do more testing on this.

Thanks for your time,
Kugan

gcc/ChangeLog:

2014-09-04  Kugan Vivekanandarajah 

* tree-ssa-ccp.c (ccp_finalize): Adjust the nonzero_bits precision to
the type.
(evaluate_stmt): Likewise.
* tree-ssanames.c (set_range_info): Adjust if the precision of stored
value range is different.
* tree-vrp.c (normalize_int_cst_precision): New function.
(set_value_range): Add assert to check precision.
(set_and_canonicalize_value_range): Call normalize_int_cst_precision
on min and max.
(promoted_type): New function.
(promote_unary_vr): Likewise.
(promote_binary_vr): Likewise.
(extract_range_from_binary_expr_1): Adjust type to match value range.
Store value ranges in promoted type if they overflow.
(extract_range_from_unary_expr_1): Likewise.
(adjust_range_with_scev): Call normalize_int_cst_precision
on min and max.
(vrp_visit_assignment_or_call): Likewise.
(simplify_bit_ops_using_ranges): Adjust the value range precision.
(test_for_singularity): Likewise.
(simplify_stmt_for_jump_threading): Likewise.
(extract_range_from_assert): Likewise.
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index a90f708..1733073 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -916,7 +916,11 @@ ccp_finalize (void)
  unsigned int precision = TYPE_PRECISION (TREE_TYPE (val->value));
  wide_int nonzero_bits = wide_int::from (val->mask, precision,
  UNSIGNED) | val->value;
- nonzero_bits &= get_nonzero_bits (name);
+ wide_int nonzero_bits_name = get_nonzero_bits (name);
+ if (precision != nonzero_bits_name.get_precision ())
+   nonzero_bits = wi::shwi (*nonzero_bits.get_val (),
+nonzero_bits_name.get_precision ());
+ nonzero_bits &= nonzero_bits_name;
  set_nonzero_bits (name, nonzero_bits);
}
 }
@@ -1852,6 +1856,8 @@ evaluate_stmt (gimple stmt)
 {
   tree lhs = gimple_get_lhs (stmt);
   wide_int nonzero_bits = get_nonzero_bits (lhs);
+  if (TYPE_PRECISION (TREE_TYPE (lhs)) != nonzero_bits.get_precision ())
+ nonzero_bits = wide_int_to_tree (TREE_TYPE (lhs), nonzero_bits);
   if (nonzero_bits != -1)
{
  if (!is_constant)
diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
index 3af80a0..459c669 100644
--- a/gcc/tree-ssanames.c
+++ b/gcc/tree-ssanames.c
@@ -192,7 +192,7 @@ set_range_info (tree name, enum value_range_type range_type,
   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
   gcc_assert (range_type == VR_RANGE || range_type == VR_ANTI_RANGE);
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
-  unsigned int precision = TYPE_PRECISION (TREE_TYPE (name));
+  unsigned int precision = min.get_precision ();
 
   /* Allocate if not available.  */
   if (ri == NULL)
@@ -204,6 +204,15 @@ set_range_info (tree name, enum value_range_type 
range_type,
   SSA_NAME_RANGE_INFO (name) = ri;
   ri->set_nonzero_bits (wi::shwi (-1, precision));
 }
+  else if (ri->get_min ().get_precision () != precision)
+{
+  size_t size = (sizeof (range_info_def)
++ trailing_wide_ints <3>::extra_size (precision));
+  ri = static_cast (ggc_realloc (ri, size));
+  ri->ints.set_precision (precision);
+  SSA_NAME_RANGE_INFO (name) = ri;
+  ri->set_nonzero_bits (wi::shwi (-1, precision));
+}
 
   /* Record the range type.  */
   if (SSA_NAME_RANGE_TYPE (name) != range_type)
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index d16fd8a..772676a 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -61

Re: [PATCH 2/2] Enable elimination of zext/sext

2014-09-04 Thread Kugan

>> Here is an attempt to do the value range computation in promoted_mode's
>> type when it is overflowing. Bootstrapped on x86-84.
> 
> Err - I think you misunderstood this as a suggestion to do this ;)
> value-ranges should be computed according to the type not according
> to the (promoted) mode.  Otherwise we will miss optimization
> opportunities.

Oops, sorry, I had my doubts about making trees aware of back-end stuff.

Coming back to the original problem, what would be the best approach to
handle this. Looking at the VRP pass, it seems to me that only MULT_EXPR
and LSHIFT_EXPR are truncating values this way. All other operation are
setting it to type_min, type_max. Can we rely on this ?

Is this error not showing up in PROMOTED_MODE <= word_mode (and
the mode precision of register from which we SUBREG is <= word_mode
precision) is just a coincidence. Can we rely on this?

Is there anyway we can fix this?

Thanks again,
Kugan

pr43550 - remove unnecessary uxts in bswap

2014-09-04 Thread Kugan

Hi All,

For the bswap built-in, there are unnecessary uxts generated as reported
in pr43550. Can we rely on the argument being unsigned and set the
SUBREG promoted accordingly.

At least in ARM ABI, arguments are supposed to be properly zero/sign
extended.

Any thoughts?

Bootstrapped and regression tested on x86_64 and arm.

Thanks,
Kugan

gcc/testsuite
2014-09-05  Kugan Vivekanandarajah  

PR target/43550
* gcc.target/arm/pr43550.c: New test.

gcc/
2014-09-05  Kugan Vivekanandarajah  

PR target/43550
* builtins.c (expand_builtin_bswap): Generate promoted subreg.


diff --git a/gcc/builtins.c b/gcc/builtins.c
index e5a9b4d..a2f2358 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -4517,6 +4517,12 @@ expand_builtin_bswap (enum machine_mode target_mode, 
tree exp, rtx target,
   if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
 return NULL_RTX;
 
+  if (!target
+  && (GET_MODE_PRECISION (word_mode)
+ > TYPE_PRECISION (TREE_TYPE (exp
+target = gen_lowpart_SUBREG (TYPE_MODE (TREE_TYPE (exp)),
+  gen_reg_rtx (word_mode));
+
   arg = CALL_EXPR_ARG (exp, 0);
   op0 = expand_expr (arg,
 subtarget && GET_MODE (subtarget) == target_mode
@@ -4528,8 +4534,13 @@ expand_builtin_bswap (enum machine_mode target_mode, 
tree exp, rtx target,
   target = expand_unop (target_mode, bswap_optab, op0, target, 1);
 
   gcc_assert (target);
-
-  return convert_to_mode (target_mode, target, 1);
+  target = convert_to_mode (target_mode, target, 1);
+  if (GET_CODE (target) == SUBREG)
+{
+  SUBREG_PROMOTED_VAR_P (target) = 1;
+  SUBREG_PROMOTED_SET (target, SRP_UNSIGNED);
+}
+  return target;
 }
 
 /* Expand a call to a unary builtin in EXP.
diff --git a/gcc/testsuite/gcc.target/arm/pr43550.c 
b/gcc/testsuite/gcc.target/arm/pr43550.c
index e69de29..7e4b2e0 100644
--- a/gcc/testsuite/gcc.target/arm/pr43550.c
+++ b/gcc/testsuite/gcc.target/arm/pr43550.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "uxt" 0 } } */
+
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+
+uint16_t s16 (uint16_t v)
+{
+  return v >> 8 | v << 8;
+}
+
+uint16_t _s16 (uint16_t v)
+{
+  return __builtin_bswap16 (v);
+}
+
+uint32_t s32 (uint32_t v)
+{
+  return __builtin_bswap32 (v);
+}

Re: [PATCH 2/2] Enable elimination of zext/sext

2014-09-07 Thread Kugan

On 05/09/14 19:50, Richard Biener wrote:

> Well - the best way would be to expose the target specifics to GIMPLE
> at some point in the optimization pipeline.  My guess would be that it's
> appropriate after loop optimizations (but maybe before induction variable
> optimization).
> 
> That is, have a pass that applies register promotion to all SSA names
> in the function, inserting appropriate truncations and extensions.  That
> way you'd never see (set (subreg...) on RTL.  The VRP and DOM
> passes running after that pass would then be able to aggressively
> optimize redundant truncations and extensions.
> 
> Effects on debug information are to be considered.  You can change
> the type of SSA names in-place but you don't want to do that for
> user DECLs (and we can't have the SSA name type and its DECL
> type differ - and not sure if we might want to lift that restriction).

Thanks. I will try to implement this.

I still would like to keep the VRP based approach as there are some
cases that I think can only be done with range info. For example:

short foo(unsigned char c)
{
  c = c & (unsigned char)0x0F;
  if( c > 7 )
return((short)(c - 5));
  else
return(( short )c);
}

So, how about adding and setting the overflow/wrap around flag to
range_info. We now set static_flag for VR_RANG/VR_ANTI_RANGE. If we go
back to the max + 1, min - 1 for VR_ANTI_RANGE, we can use this
static_flag to encode overflow/wrap around. Will that be something
acceptable?

Thanks again,
Kugan

[RFC][LIBGCC][0 of 2] 64 bit divide implementation for processor without hw divide instruction

2013-11-22 Thread Kugan

Hi All,

This RFC patch series implements a simple align divisor shift dividend
method for 64bit divide  and enables for ARMv7-a.

This algorithm runs (K+1) times where K is the number of bits divisor is
shifted to align. I have done repeated divides and found that this
implementation performs better for processor without hw divide instruction.

On a chromebook, when K is large (close to 64) this performs on an
average ~10% faster. When K is small (8 to 24), it performs about ~100%
faster on an average.

Regression tested on arm-none-linux-gnueabi with no issues.

OK?

Thanks,
Kugan

Re: [RFC][LIBGCC][1 of 2] 64 bit divide implementation for processor without hw divide instruction

2013-11-22 Thread Kugan

Hi All,

This RFC patch series implements a simple align divisor shift dividend
method.

Regression tested on arm-none-linux-gnueabi with no issues.

OK?

Thanks,
Kugan

+2013-11-22  Kugan Vivekanandarajah  
+
+   * libgcc/libgcc2.c (__udivmoddi4): Define new implementation when
+   HAVE_NO_HW_DIVIDE is defined, for processors without any divide
+ instructions.
+
diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
index bec411b..a1d3fbc 100644
--- a/libgcc/libgcc2.c
+++ b/libgcc/libgcc2.c
@@ -934,6 +934,74 @@ __parityDI2 (UDWtype x)
 #endif
 
 #ifdef L_udivmoddi4
+#ifdef HAVE_NO_HW_DIVIDE
+
+#if (defined (L_udivdi3) || defined (L_divdi3) || \
+ defined (L_umoddi3) || defined (L_moddi3))
+static inline __attribute__ ((__always_inline__))
+#endif
+UDWtype
+__udivmoddi4 (UDWtype n, UDWtype d, UDWtype *rp)
+{
+  UDWtype q = 0, r = n, y = d;
+  UWtype lz1, lz2, i, k;
+
+  /* Implements align divisor shift dividend method. This algorithm
+ aligns the divisor under the dividend and then perform number of
+ test-subtract iterations which shift the dividend left. Number of
+ iterations is k + 1 where k is the number of bit positions the
+ divisor must be shifted left  to align it under the dividend.
+ quotient bits can be saved in the rightmost positions of the dividend
+ as it shifts left on each test-subtract iteration. */
+
+  if (y <= r)
+{
+  lz1 = __builtin_clzll (d);
+  lz2 = __builtin_clzll (n);
+
+  k = lz1 - lz2;
+  y = (y << k);
+
+  /* Dividend can exceed 2 ^ (width − 1) − 1 but still be less than the
+aligned divisor. Normal iteration can drops the high order bit
+of the dividend. Therefore, first test-subtract iteration is a
+special case, saving its quotient bit in a separate location and
+not shifting the dividend. */
+  if (r >= y)
+   {
+ r = r - y;
+ q =  (1ULL << k);
+   }
+
+  if (k > 0)
+   {
+ y = y >> 1;
+
+ /* k additional iterations where k regular test subtract shift
+   dividend iterations are done.  */
+ i = k;
+ do
+   {
+ if (r >= y)
+   r = ((r - y) << 1) + 1;
+ else
+   r =  (r << 1);
+ i = i - 1;
+   } while (i != 0);
+
+ /* First quotient bit is combined with the quotient bits resulting
+from the k regular iterations.  */
+ q = q + r;
+ r = r >> k;
+ q = q - (r << k);
+   }
+}
+
+  if (rp)
+*rp = r;
+  return q;
+}
+#else
 
 #if (defined (L_udivdi3) || defined (L_divdi3) || \
  defined (L_umoddi3) || defined (L_moddi3))
@@ -1152,6 +1220,7 @@ __udivmoddi4 (UDWtype n, UDWtype d, UDWtype *rp)
   return ww.ll;
 }
 #endif
+#endif
 
 #ifdef L_divdi3
 DWtype

Re: [RFC][LIBGCC][2 of 2] 64 bit divide implementation for processor without hw divide instruction

2013-11-22 Thread Kugan

Hi All,

This RFC patch enables new divide algorithm for ARMV7-A

Regression tested on arm-none-linux-gnueabi with no issues.

OK?

Thanks,
Kugan

+2013-11-22  Kugan Vivekanandarajah  
+
+   * libgcc/config/arm/pbapi-lib.h (HAVE_NO_HW_DIVIDE): Define for
+   __ARM_ARCH_7_A__.
+
diff --git a/libgcc/config/arm/bpabi-lib.h b/libgcc/config/arm/bpabi-lib.h
index e0e46a6..85171c8 100644
--- a/libgcc/config/arm/bpabi-lib.h
+++ b/libgcc/config/arm/bpabi-lib.h
@@ -75,3 +75,7 @@
helper functions - not everything in libgcc - in the interests of
maintaining backward compatibility.  */
 #define LIBGCC2_FIXEDBIT_GNU_PREFIX
+
+#if defined(__ARM_ARCH_7A__)
+# define HAVE_NO_HW_DIVIDE
+#endif

Re: [RFC][LIBGCC][1 of 2] 64 bit divide implementation for processor without hw divide instruction

2013-11-25 Thread Kugan

On 24/11/13 02:14, Ian Lance Taylor wrote:
> Kugan  writes:
> 
>> This RFC patch series implements a simple align divisor shift dividend
>> method.
>>
>> Regression tested on arm-none-linux-gnueabi with no issues.
>>
>> OK?
>>
>> Thanks,
>> Kugan
>>
>> +2013-11-22  Kugan Vivekanandarajah  
>> +
>> +* libgcc/libgcc2.c (__udivmoddi4): Define new implementation when
>> +HAVE_NO_HW_DIVIDE is defined, for processors without any divide
>> + instructions.
> 
> 
> The code looks fine to me.
> 
> You should document HAVE_NO_HW_DIVIDE in gcc/doc/tm.texi in the Library
> Calls section.  The macro should probably be something like
> TARGET_HAS_NO_HW_DIVIDE.
> 
Thanks for the review. Is this OK for trunk now?


+2013-11-26  Kugan Vivekanandarajah  
+
+   * libgcc/libgcc2.c (__udivmoddi4): Define new implementation when
+   TARGET_HAS_NO_HW_DIVIDE is defined, for processors without any divide
+   instructions.
+


+2013-11-26  Kugan Vivekanandarajah  
+
+   * doc/tm.texi.in (TARGET_HAS_NO_HW_DIVIDE): Define.
+   * doc/tm.texi (TARGET_HAS_NO_HW_DIVIDE): Regenerate.
+

Thanks,
Kugan
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 925d93f..c9697f1 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5365,6 +5365,14 @@ If this macro evaluates to @code{false} the comparison 
functions return
 in @file{libgcc.a}, you do not need to define this macro.
 @end defmac
 
+@defmac TARGET_HAS_NO_HW_DIVIDE
+This macro should be defined if the target has no hardware divide
+instructions.  If this macro is defined, GCC will use an algorithm which
+make use of simple logical and arithmetic operations for 64-bit
+division.  If the macro is not defined, GCC will use an algorithm which
+make use of a 64-bit by 32-bit divide primitive.
+@end defmac
+
 @cindex @code{EDOM}, implicit usage
 @findex matherr
 @defmac TARGET_EDOM
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index edca600..03e6662 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4205,6 +4205,14 @@ If this macro evaluates to @code{false} the comparison 
functions return
 in @file{libgcc.a}, you do not need to define this macro.
 @end defmac
 
+@defmac TARGET_HAS_NO_HW_DIVIDE
+This macro should be defined if the target has no hardware divide
+instructions.  If this macro is defined, GCC will use an algorithm which
+make use of simple logical and arithmetic operations for 64-bit
+division.  If the macro is not defined, GCC will use an algorithm which
+make use of a 64-bit by 32-bit divide primitive.
+@end defmac
+
 @cindex @code{EDOM}, implicit usage
 @findex matherr
 @defmac TARGET_EDOM
diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
index bec411b..8c4cc6a 100644
--- a/libgcc/libgcc2.c
+++ b/libgcc/libgcc2.c
@@ -934,6 +934,74 @@ __parityDI2 (UDWtype x)
 #endif
 
 #ifdef L_udivmoddi4
+#ifdef TARGET_HAS_NO_HW_DIVIDE
+
+#if (defined (L_udivdi3) || defined (L_divdi3) || \
+ defined (L_umoddi3) || defined (L_moddi3))
+static inline __attribute__ ((__always_inline__))
+#endif
+UDWtype
+__udivmoddi4 (UDWtype n, UDWtype d, UDWtype *rp)
+{
+  UDWtype q = 0, r = n, y = d;
+  UWtype lz1, lz2, i, k;
+
+  /* Implements align divisor shift dividend method. This algorithm
+ aligns the divisor under the dividend and then perform number of
+ test-subtract iterations which shift the dividend left. Number of
+ iterations is k + 1 where k is the number of bit positions the
+ divisor must be shifted left  to align it under the dividend.
+ quotient bits can be saved in the rightmost positions of the dividend
+ as it shifts left on each test-subtract iteration. */
+
+  if (y <= r)
+{
+  lz1 = __builtin_clzll (d);
+  lz2 = __builtin_clzll (n);
+
+  k = lz1 - lz2;
+  y = (y << k);
+
+  /* Dividend can exceed 2 ^ (width − 1) − 1 but still be less than the
+aligned divisor. Normal iteration can drops the high order bit
+of the dividend. Therefore, first test-subtract iteration is a
+special case, saving its quotient bit in a separate location and
+not shifting the dividend. */
+  if (r >= y)
+   {
+ r = r - y;
+ q =  (1ULL << k);
+   }
+
+  if (k > 0)
+   {
+ y = y >> 1;
+
+ /* k additional iterations where k regular test subtract shift
+   dividend iterations are done.  */
+ i = k;
+ do
+   {
+ if (r >= y)
+   r = ((r - y) << 1) + 1;
+ else
+   r =  (r << 1);
+ i = i - 1;
+   } while (i != 0);
+
+ /* First quotient bit is combined with the quotient bits resulting
+from the k regular iterations.  */
+ q = q + r;
+ r = r >> k;
+ q = q - (r << k);
+   }
+}
+
+  if (rp)
+*rp = r;
+  return

Re: [RFC][ARM] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook

2014-06-09 Thread Kugan

On 30/05/14 18:35, Ramana Radhakrishnan wrote:
>> +  if (!TARGET_VFP)
>> +return;
>> +
>> +  /* Generate the equivalence of :
> 
> s/equivalence/equivalent.
> 
> Ok with that change and if no regressions.

Hi Ramana,

Sorry, I missed the thumb1 part. There are no mrc/mcr  versions of these
instructions in thumb1. So these should be conditional on not being
ARM_THUMB1.

Is this OK. Regression tested with no new refression on qemu for
arm-none-linux-gnueabi -march=armv7-a and on arm-none-linux-gnueabi
--with-mode=thumb and -march=armv5t.

Is this OK?

Thanks,
Kugan

gcc/

2014-06-10  Kugan Vivekanandarajah  

* config/arm/arm.c (arm_atomic_assign_expand_fenv): call
default_atomic_assign_expand_fenv for TARGET_THUMB1.
(arm_init_builtins) : Initialize builtins __builtins_arm_set_fpscr and
__builtins_arm_get_fpscr only when !TARGET_THUMB1.
* config/arm/vfp.md (set_fpscr): Make pattern conditional on
!TARGERT_THUMB1.
(get_fpscr) : Likewise.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f8575b9..c9f02df 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24760,7 +24760,7 @@ arm_init_builtins (void)
   if (TARGET_CRC32)
 arm_init_crc32_builtins ();
 
-  if (TARGET_VFP)
+  if (TARGET_VFP && !TARGET_THUMB1)
 {
   tree ftype_set_fpscr
= build_function_type_list (void_type_node, unsigned_type_node, NULL);
@@ -31452,8 +31452,8 @@ arm_atomic_assign_expand_fenv (tree *hold, tree *clear, 
tree *update)
   tree new_fenv_var, reload_fenv, restore_fnenv;
   tree update_call, atomic_feraiseexcept, hold_fnclex;
 
-  if (!TARGET_VFP)
-return;
+  if (!TARGET_VFP || TARGET_THUMB1)
+return default_atomic_assign_expand_fenv (hold, clear, update);
 
   /* Generate the equivalent of :
unsigned int fenv_var;
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index a8b27bc..44d2f38 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -1325,7 +1325,7 @@
 ;; Write Floating-point Status and Control Register.
 (define_insn "set_fpscr"
   [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] 
VUNSPEC_SET_FPSCR)]
-  "TARGET_VFP"
+  "TARGET_VFP && !TARGET_THUMB1"
   "mcr\\tp10, 7, %0, cr1, cr0, 0\\t @SET_FPSCR"
   [(set_attr "type" "mrs")])
 
@@ -1333,7 +1333,7 @@
 (define_insn "get_fpscr"
   [(set (match_operand:SI 0 "register_operand" "=r")
 (unspec_volatile:SI [(const_int 0)] VUNSPEC_GET_FPSCR))]
-  "TARGET_VFP"
+  "TARGET_VFP && !TARGET_THUMB1"
   "mrc\\tp10, 7, %0, cr1, cr0, 0\\t @GET_FPSCR"
   [(set_attr "type" "mrs")])

[PATCH 0/2] Zext/sext elimination using value range

2014-06-24 Thread Kugan

Hi,

This patch series (2) implements zext/sext extension elimination using
value ranges stored in SSA. Implementation is what was suggested in the
thread https://gcc.gnu.org/ml/gcc/2014-05/msg00213.html.

I have broken this into:

Patch 1 - Changes to store zero and sign extended promotions
(SPR_SIGNED_AND_UNSIGNED) in SUBREG with SUBREG_PROMOTED_VAR_P.
Patch 2 - Enables Zext/sext extensions by checking the value range.

test-cases that motivated this and the asm difference with the patch are:

1.
short foo(unsigned char c)
{
  c = c & (unsigned char)0x0F;
  if( c > 7 )
return((short)(c - 5));
  else
return(( short )c);
}

and r0, r0, #15
cmp r0, #7
subhi   r0, r0, #5
-   uxthr0, r0
-   sxthr0, r0
bx  lr

2.
unsigned short
crc2(unsigned short crc, unsigned char data)
{
   unsigned char i, x16, carry;
   for (i = 0; i < 8; i++)
 {
   x16 = (data ^ crc) & 1;
   data >>= 1;
   if (x16 == 1)
 {
   crc ^= 0x4002;
   carry = 1;
 }
   else
 carry = 0;
  crc >>= 1;
   if (carry)
 crc |= 0x8000;
   else
 crc &= 0x7fff;
 }
   return crc;
}

-   mov r3, #8
+   mov r2, #8
 .L3:
-   eor r2, r1, r0
-   sub r3, r3, #1
-   tst r2, #1
+   eor r3, r1, r0
mov r1, r1, lsr #1
+   tst r3, #1
eorne   r0, r0, #16384
moveq   r0, r0, lsr #1
eorne   r0, r0, #2
movne   r0, r0, lsr #1
orrne   r0, r0, #32768
-   andsr3, r3, #255
+   subsr2, r2, #1
bne .L3
bx  lr

Tested both patches on x86_64-unknown-linux-gnu and
arm-none-linux-gnueabi with no new regressions. Is this OK?

Thanks,
Kugan

[PATCH 1/2] Enable setting sign and unsigned promoted mode (SPR_SIGNED_AND_UNSIGNED)

2014-06-24 Thread Kugan

Changes the the SUBREG flags to be able to set promoted for sign
(SRP_SIGNED), unsigned (SRP_UNSIGNED),  sign and unsigned
(SPR_SIGNED_AND_UNSIGNED) in SUBREG_PROMOTED_VAR_P.

Thanks,
Kugan

gcc/

2014-06-24  Kugan Vivekanandarajah  

* gcc/calls.c (precompute_arguments): Use new SUBREG_PROMOTED_SET
instead of SUBREG_PROMOTED_UNSIGNED_SET
(expand_call) : Likewise.
* gcc/expr.c (convert_move) : Use new SUBREG_CHECK_PROMOTED_SIGN
instead of SUBREG_PROMOTED_UNSIGNED_P.
(convert_modes) : Likewise.
(store_expr) : Likewise.
(expand_expr_real_1) : Use new SUBREG_PROMOTED_SET
instead of SUBREG_PROMOTED_UNSIGNED_SET.
* gcc/function.c (assign_param_setup_reg) : Use new SUBREG_PROMOTED_SET
instead of SUBREG_PROMOTED_UNSIGNED_SET.
* gcc/ifcvt.c (noce_emit_cmove) : Updated to use
SUBREG_PROMOTED_UNSIGNED_P and SUBREG_PROMOTED_SIGNED_P.
* gcc/internal-fn.c (ubsan_expand_si_overflow_mul_check) : Use
SUBREG_PROMOTED_SET instead of SUBREG_PROMOTED_UNSIGNED_SET.
* gcc/optabs.c (widen_operand): Use new SUBREG_CHECK_PROMOTED_SIGN
instead of SUBREG_PROMOTED_UNSIGNED_P.
* gcc/rtl.h (SUBREG_PROMOTED_UNSIGNED_SET) : Remove.
(SUBREG_PROMOTED_SET) : New define.
(SUBREG_PROMOTED_GET) : Likewise.
(SUBREG_PROMOTED_SIGNED_P) : Likewise.
(SUBREG_CHECK_PROMOTED_SIGN) : Likewise.
(SUBREG_PROMOTED_UNSIGNED_P) : Updated.
* gcc/rtlanal.c (simplify_unary_operation_1) : Use new
SUBREG_PROMOTED_SET instead of SUBREG_PROMOTED_UNSIGNED_SET.
* gcc/simplify-rtx.c (simplify_unary_operation_1) : Use new
SUBREG_PROMOTED_SIGNED_P instead of
!SUBREG_PROMOTED_UNSIGNED_P.
(simplify_subreg) : Use new SUBREG_PROMOTED_SET instead of
 SUBREG_PROMOTED_UNSIGNED_SET.
diff --git a/gcc/calls.c b/gcc/calls.c
index 78fe7d8..c1fe3b8 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1484,8 +1484,7 @@ precompute_arguments (int num_actuals, struct arg_data 
*args)
  args[i].initial_value
= gen_lowpart_SUBREG (mode, args[i].value);
  SUBREG_PROMOTED_VAR_P (args[i].initial_value) = 1;
- SUBREG_PROMOTED_UNSIGNED_SET (args[i].initial_value,
-   args[i].unsignedp);
+ SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp);
}
}
 }
@@ -3365,7 +3364,8 @@ expand_call (tree exp, rtx target, int ignore)
 
  target = gen_rtx_SUBREG (TYPE_MODE (type), target, offset);
  SUBREG_PROMOTED_VAR_P (target) = 1;
- SUBREG_PROMOTED_UNSIGNED_SET (target, unsignedp);
+ SUBREG_PROMOTED_SET (target, unsignedp);
+
}
 
   /* If size of args is variable or this was a constructor call for a stack
diff --git a/gcc/expr.c b/gcc/expr.c
index 512c024..a8db9f5 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -329,7 +329,7 @@ convert_move (rtx to, rtx from, int unsignedp)
   if (GET_CODE (from) == SUBREG && SUBREG_PROMOTED_VAR_P (from)
   && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (from)))
  >= GET_MODE_PRECISION (to_mode))
-  && SUBREG_PROMOTED_UNSIGNED_P (from) == unsignedp)
+  && (SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp)))
 from = gen_lowpart (to_mode, from), from_mode = to_mode;
 
   gcc_assert (GET_CODE (to) != SUBREG || !SUBREG_PROMOTED_VAR_P (to));
@@ -703,7 +703,7 @@ convert_modes (enum machine_mode mode, enum machine_mode 
oldmode, rtx x, int uns
 
   if (GET_CODE (x) == SUBREG && SUBREG_PROMOTED_VAR_P (x)
   && GET_MODE_SIZE (GET_MODE (SUBREG_REG (x))) >= GET_MODE_SIZE (mode)
-  && SUBREG_PROMOTED_UNSIGNED_P (x) == unsignedp)
+  && (SUBREG_CHECK_PROMOTED_SIGN (x, unsignedp)))
 x = gen_lowpart (mode, SUBREG_REG (x));
 
   if (GET_MODE (x) != VOIDmode)
@@ -5202,8 +5202,7 @@ store_expr (tree exp, rtx target, int call_param_p, bool 
nontemporal)
  && GET_MODE_PRECISION (GET_MODE (target))
 == TYPE_PRECISION (TREE_TYPE (exp)))
{
- if (TYPE_UNSIGNED (TREE_TYPE (exp))
- != SUBREG_PROMOTED_UNSIGNED_P (target))
+ if (!(SUBREG_CHECK_PROMOTED_SIGN (target, TYPE_UNSIGNED (TREE_TYPE 
(exp)
{
  /* Some types, e.g. Fortran's logical*4, won't have a signed
 version, so use the mode instead.  */
@@ -9513,7 +9512,8 @@ expand_expr_real_1 (tree exp, rtx target, enum 
machine_mode tmode,
 
  temp = gen_lowpart_SUBREG (mode, decl_rtl);
  SUBREG_PROMOTED_VAR_P (temp) = 1;
- SUBREG_PROMOTED_UNSIGNED_SET (temp, unsignedp);
+ SUBREG_PROMOTED_SET (temp, unsignedp);
+
  return temp;
}
 
diff --git a/gcc/function.c b/gcc/function.c
index 441289e..9509622 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -3093,7 +3093,7 @@ assign_parm_setup

[PATCH 2/2] Enable elimination of zext/sext

2014-06-24 Thread Kugan

Sets proper flags on the SUBREG based on value
range info and enables elimination of zext/sext when possible.

Thanks,
Kugan


gcc/
2014-06-24  Kugan Vivekanandarajah  

* gcc/calls.c (precompute_arguments: Check is_promoted_for_type
and set the promoted mode.
(is_promoted_for_type) : New function.
(expand_expr_real_1) : Check is_promoted_for_type
and set the promoted mode.
* gcc/expr.h (is_promoted_for_type) : New function definition.
* gcc/cfgexpand.c (expand_gimple_stmt_1) : Call emit_move_insn if
SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.
diff --git a/gcc/calls.c b/gcc/calls.c
index c1fe3b8..4ef9df8 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1484,7 +1484,10 @@ precompute_arguments (int num_actuals, struct arg_data 
*args)
  args[i].initial_value
= gen_lowpart_SUBREG (mode, args[i].value);
  SUBREG_PROMOTED_VAR_P (args[i].initial_value) = 1;
- SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp);
+ if (is_promoted_for_type (args[i].tree_value, mode, 
!args[i].unsignedp))
+   SUBREG_PROMOTED_SET (args[i].initial_value, 
SRP_SIGNED_AND_UNSIGNED);
+ else
+   SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp);
}
}
 }
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index e8cd87f..0540b4d 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3309,7 +3309,13 @@ expand_gimple_stmt_1 (gimple stmt)
  GET_MODE (target), temp, unsignedp);
  }
 
-   convert_move (SUBREG_REG (target), temp, unsignedp);
+   if ((SUBREG_PROMOTED_GET (target) == SRP_SIGNED_AND_UNSIGNED)
+   && (GET_CODE (temp) == SUBREG)
+   && (GET_MODE (target) == GET_MODE (temp))
+   && (GET_MODE (SUBREG_REG (target)) == GET_MODE (SUBREG_REG 
(temp
+ emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp));
+   else
+ convert_move (SUBREG_REG (target), temp, unsignedp);
  }
else if (nontemporal && emit_storent_insn (target, temp))
  ;
diff --git a/gcc/expr.c b/gcc/expr.c
index a8db9f5..b2c8146 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9209,6 +9209,59 @@ expand_expr_real_2 (sepops ops, rtx target, enum 
machine_mode tmode,
 }
 #undef REDUCE_BIT_FIELD
 
+/* Return TRUE if value in SSA is already zero/sign extended for lhs type
+   (type here is the combination of LHS_MODE and LHS_UNS) using value range
+   information stored. Return FALSE otherwise. */
+bool
+is_promoted_for_type (tree ssa, enum machine_mode lhs_mode, bool lhs_uns)
+{
+  wide_int type_min, type_max;
+  wide_int min, max, limit;
+  unsigned int prec;
+  tree lhs_type;
+  bool rhs_uns;
+
+  if (flag_wrapv
+  || (flag_strict_overflow == false)
+  || (ssa == NULL_TREE)
+  || (TREE_CODE (ssa) != SSA_NAME)
+  || !INTEGRAL_TYPE_P (TREE_TYPE (ssa))
+  || POINTER_TYPE_P (TREE_TYPE (ssa)))
+return false;
+
+  /* Return FALSE if value_range is not recorded for SSA. */
+  if (get_range_info (ssa, &min, &max) != VR_RANGE)
+return false;
+
+  lhs_type = lang_hooks.types.type_for_mode (lhs_mode, lhs_uns);
+  rhs_uns = TYPE_UNSIGNED (TREE_TYPE (ssa));
+  prec = min.get_precision ();
+
+  /* Signed maximum value.  */
+  limit = wide_int::from (TYPE_MAX_VALUE (TREE_TYPE (ssa)), prec, SIGNED);
+
+  /* Signedness of LHS and RHS differs but values in range.  */
+  if ((rhs_uns != lhs_uns)
+  && ((!lhs_uns && !wi::neg_p (min, TYPE_SIGN (lhs_type)))
+ || (lhs_uns && (wi::cmp (max, limit, TYPE_SIGN (TREE_TYPE (ssa))) == 
-1
+lhs_uns = !lhs_uns;
+
+  /* Signedness of LHS and RHS should match.  */
+  if (rhs_uns != lhs_uns)
+return false;
+
+  type_min = wide_int::from (TYPE_MIN_VALUE (lhs_type), prec, TYPE_SIGN 
(TREE_TYPE (ssa)));
+  type_max = wide_int::from (TYPE_MAX_VALUE (lhs_type), prec, TYPE_SIGN 
(TREE_TYPE (ssa)));
+
+  /* Check if values lies in-between the type range.  */
+  if ((wi::neg_p (max, TYPE_SIGN (TREE_TYPE (ssa)))
+   || (wi::cmp (max, type_max, TYPE_SIGN (TREE_TYPE (ssa))) != 1))
+  && (!wi::neg_p (min, TYPE_SIGN (TREE_TYPE (ssa)))
+ || (wi::cmp (type_min, min, TYPE_SIGN (TREE_TYPE (ssa))) != 1)))
+return true;
+
+  return false;
+}
 
 /* Return TRUE if expression STMT is suitable for replacement.  
Never consider memory loads as replaceable, because those don't ever lead 
@@ -9512,7 +9565,10 @@ expand_expr_real_1 (tree exp, rtx target, enum 
machine_mode tmode,
 
  temp = gen_lowpart_SUBREG (mode, decl_rtl);
  SUBREG_PROMOTED_VAR_P (temp) = 1;
- SUBREG_PROMOTED_SET (temp, unsignedp);
+ if (is_promoted_for_type (ssa_name, mode, !unsignedp))
+   SUBREG_PROMOTED_SET (temp, S

Re: [PATCH 1/2] Enable setting sign and unsigned promoted mode (SPR_SIGNED_AND_UNSIGNED)

2014-06-25 Thread Kugan

>> +const unsigned int SRP_POINTER  = -1;
>> +const unsigned int SRP_SIGNED   = 0;
>> +const unsigned int SRP_UNSIGNED = 1;
>> +const unsigned int SRP_SIGNED_AND_UNSIGNED = 2;
> 
> But most importantly, I thought Richard Henderson suggested
> to use SRP_POINTER 0, SRP_SIGNED 1, SRP_UNSIGNED 2, SRP_SIGNED_AND_UNSIGNED 3,
> that way when checking e.g. SUBREG_PROMOTED_SIGNED_P or
> SUBREG_PROMOTED_UNSIGNED_P you can check just the single bit.
> Where something tested for SUBREG_PROMOTED_UNSIGNED_P () == -1 just
> use SUBREG_PROMOTED_GET.

The problem with SRP_POINTER 0, SRP_SIGNED 1, SRP_UNSIGNED 2,
SRP_SIGNED_AND_UNSIGNED 3 (as I understand) is that, it will be
incompatible with TYPE_UNSIGNED (tree) and defines of
POINTER_EXTEND_UNSIGNED values. We will have to then translate while
setting to SRP_* values . Also SUBREG_PROMOTED_SIGNED_P is now checked
in some cases for != 0 (meaning SRP_POINTER or SRP_UNSIGNED) and in some
cases > 0 (meaning SRP_UNSIGNED).

Since our aim is to perform single bit checks, why don’t we just use
this representation internally (i.e.  _rtx->unchanging = 1 if SRP_SIGNED
and _rtx->volatil = 1 if SRP_UNSIGNED). As for SUBREG_PROMOTED_SIGNED_P,
we still have to return -1 or 1 depending on SRP_POINTER or SRP_UNSIGNED.


const unsigned int SRP_POINTER  = -1;
const unsigned int SRP_SIGNED   = 0;
const unsigned int SRP_UNSIGNED = 1;
const unsigned int SRP_SIGNED_AND_UNSIGNED = 2;

/* Sets promoted mode for SUBREG_PROMOTED_VAR_P(), */
#define SUBREG_PROMOTED_SET(RTX, VAL)   \
do {\
  rtx const _rtx = RTL_FLAG_CHECK1 ("SUBREG_PROMOTED_SET",  \
(RTX), SUBREG); \
  switch ((VAL))\
  { \
case SRP_POINTER:   \
  _rtx->volatil = 0;\
  _rtx->unchanging = 0; \
  break;\
case SRP_SIGNED:\
  _rtx->volatil = 0;\
  _rtx->unchanging = 1; \
  break;\
case SRP_UNSIGNED:  \
  _rtx->volatil = 1;\
  _rtx->unchanging = 0; \
  break;\
case SRP_SIGNED_AND_UNSIGNED:   \
  _rtx->volatil = 1;\
  _rtx->unchanging = 1; \
  break;\
  } \
} while (0)

/* Gets promoted mode for SUBREG_PROMOTED_VAR_P(). */
#define SUBREG_PROMOTED_GET(RTX)\
  (2 * (RTL_FLAG_CHECK1 ("SUBREG_PROMOTED_GET", (RTX), SUBREG)->volatil)\
   + (RTX)->unchanging - 1)

/* Predicate to check if RTX of SUBREG_PROMOTED_VAR_P() is promoted
   for SIGNED type.  */
#define SUBREG_PROMOTED_SIGNED_P(RTX)   \
  (RTL_FLAG_CHECK1 ("SUBREG_PROMOTED_SIGNED_P", (RTX),
SUBREG)->unchanging == 1)

/* Predicate to check if RTX of SUBREG_PROMOTED_VAR_P() is promoted
   for UNSIGNED type.  In case of SRP_POINTER, SUBREG_PROMOTED_UNSIGNED_P
   returns -1 as this is in most cases handled like unsigned extension,
   except for generating instructions where special code is emitted for
   (ptr_extend insns) on some architectures.  */
   #define SUBREG_PROMOTED_UNSIGNED_P(RTX)  \
  RTL_FLAG_CHECK1 ("SUBREG_PROMOTED_UNSIGNED_P", (RTX),
SUBREG)->volatil)\
 + (RTX)->unchanging) == 0) ? -1 : ((RTX)->volatil == 1))

Am I missing anything here? Please let me know. I am attaching the patch
based on this with your other review comments addressed.

Thanks,
Kugan

gcc/
2014-06-25  Kugan Vivekanandarajah  

* calls.c (precompute_arguments): Use new SUBREG_PROMOTED_SET
instead of SUBREG_PROMOTED_UNSIGNED_SET
(expand_call): Likewise.
* expr.c (convert_move): Use new SUBREG_CHECK_PROMOTED_SIGN
instead of SUBREG_PROMOTED_UNSIGNED_P.
(convert_modes): Likewise.
(store_expr): Likewise.
(expand_expr_real_1): Use new SUBREG_PROMOTED_SET
instead of SUBREG_PROMOTED_UNSIGNED_SET.
* function.c (assign_param_setup_reg): Use new SUBREG_PROMOTED_SET
instead of SUBREG_PROMOTED_UNSIGNED_SET.
* ifcvt.c (noce_emit_cmove): Updated to use
SUBREG_PROMOTED_UNSIGNED_P and SUBREG_PROMOTED_SIGNED_P.
* internal-fn.c (ubsan_expand_si_o

Re: [PATCH 2/2] Enable elimination of zext/sext

2014-06-25 Thread Kugan

On 24/06/14 22:21, Jakub Jelinek wrote:
> On Tue, Jun 24, 2014 at 09:53:35PM +1000, Kugan wrote:
>> 2014-06-24  Kugan Vivekanandarajah  
>>
>>  * gcc/calls.c (precompute_arguments: Check is_promoted_for_type
>>  and set the promoted mode.
>>  (is_promoted_for_type) : New function.
>>  (expand_expr_real_1) : Check is_promoted_for_type
>>  and set the promoted mode.
>>  * gcc/expr.h (is_promoted_for_type) : New function definition.
>>  * gcc/cfgexpand.c (expand_gimple_stmt_1) : Call emit_move_insn if
>>  SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.
> 
> Similarly to the other patch, no gcc/ prefix in ChangeLog, no space before
> :, watch for too long lines, remove useless ()s around conditions.

Changed it.

>> +bool
>> +is_promoted_for_type (tree ssa, enum machine_mode lhs_mode, bool lhs_uns)
>> +{
>> +  wide_int type_min, type_max;
>> +  wide_int min, max, limit;
>> +  unsigned int prec;
>> +  tree lhs_type;
>> +  bool rhs_uns;
>> +
>> +  if (flag_wrapv
> 
> Why?
> 
>> +  || (flag_strict_overflow == false)
> 
> Why?  Also, that would be !flag_strict_overflow instead of
> (flag_strict_overflow == false)

For these flags, value ranges generated are not usable for extension
eliminations. Therefore, without this some of the test cases in
regression fails. For example:

short a;
void
foo (void)
{
  for (a = 0; a >= 0; a++)
;
}
-Os  -fno-strict-overflow produces the following range for the index
increment and hence goes into infinite loop.
_10: [1, 32768]
_10 = _4 + 1;

> 
>> +  || (ssa == NULL_TREE)
>> +  || (TREE_CODE (ssa) != SSA_NAME)
>> +  || !INTEGRAL_TYPE_P (TREE_TYPE (ssa))
>> +  || POINTER_TYPE_P (TREE_TYPE (ssa)))
> 
> All pointer types are !INTEGRAL_TYPE_P, so the last condition
> doesn't make any sense.

I have changed this. Please see the attached patch.


Thanks,
Kugan

gcc/
2014-06-25  Kugan Vivekanandarajah  

* calls.c (precompute_arguments): Check is_promoted_for_type
and set the promoted mode.
(is_promoted_for_type): New function.
(expand_expr_real_1): Check is_promoted_for_type
and set the promoted mode.
* expr.h (is_promoted_for_type): New function definition.
* cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if
SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.


diff --git a/gcc/calls.c b/gcc/calls.c
index a3e6faa..eac512f 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1484,7 +1484,10 @@ precompute_arguments (int num_actuals, struct arg_data 
*args)
  args[i].initial_value
= gen_lowpart_SUBREG (mode, args[i].value);
  SUBREG_PROMOTED_VAR_P (args[i].initial_value) = 1;
- SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp);
+ if (is_promoted_for_type (args[i].tree_value, mode, 
!args[i].unsignedp))
+   SUBREG_PROMOTED_SET (args[i].initial_value, 
SRP_SIGNED_AND_UNSIGNED);
+ else
+   SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp);
}
}
 }
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index e8cd87f..0540b4d 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3309,7 +3309,13 @@ expand_gimple_stmt_1 (gimple stmt)
  GET_MODE (target), temp, unsignedp);
  }
 
-   convert_move (SUBREG_REG (target), temp, unsignedp);
+   if ((SUBREG_PROMOTED_GET (target) == SRP_SIGNED_AND_UNSIGNED)
+   && (GET_CODE (temp) == SUBREG)
+   && (GET_MODE (target) == GET_MODE (temp))
+   && (GET_MODE (SUBREG_REG (target)) == GET_MODE (SUBREG_REG 
(temp
+ emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp));
+   else
+ convert_move (SUBREG_REG (target), temp, unsignedp);
  }
else if (nontemporal && emit_storent_insn (target, temp))
  ;
diff --git a/gcc/expr.c b/gcc/expr.c
index f9103a5..15da092 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9210,6 +9210,59 @@ expand_expr_real_2 (sepops ops, rtx target, enum 
machine_mode tmode,
 }
 #undef REDUCE_BIT_FIELD
 
+/* Return TRUE if value in SSA is already zero/sign extended for lhs type
+   (type here is the combination of LHS_MODE and LHS_UNS) using value range
+   information stored.  Return FALSE otherwise.  */
+bool
+is_promoted_for_type (tree ssa, enum machine_mode lhs_mode, bool lhs_uns)
+{
+  wide_int type_min, type_max;
+  wide_int min, max, limit;
+  unsigned int prec;
+  tree lhs_type;
+  bool rhs_uns;
+
+  if (flag_wrapv
+  || !flag_strict_overflow
+  || ssa == NULL_TREE
+  || TREE_CODE (ssa) != SSA_NAME
+  || !INTEGRAL_TYPE_P (TREE_TY

1 2 3 4 5 6 >

1 - 100 of 586 matches

Mail list logo