[PATCH, AArch64] support extension option 'crc' in -march and -mcpu

2013-09-04 Thread Yufeng Zhang

Hi,

This patch adds the support for the crc extension option to the aarch64 
gcc driver.


OK for the trunk?

Thanks,
Yufeng

gcc/

* config/aarch64/aarch64-option-extensions.def: Add
AARCH64_OPT_EXTENSION of 'crc'.
* config/aarch64/aarch64.h (AARCH64_FL_CRC): New define.
(AARCH64_ISA_CRC): Ditto.
* doc/invoke.texi (-march and -mcpu feature modifiers): Add
description of the CRC extension.diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 58e8154..371e74c 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -35,3 +35,4 @@
 AARCH64_OPT_EXTENSION("fp",AARCH64_FL_FP,  AARCH64_FL_FPSIMD | 
AARCH64_FL_CRYPTO)
 AARCH64_OPT_EXTENSION("simd",  AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | 
AARCH64_FL_CRYPTO)
 AARCH64_OPT_EXTENSION("crypto",AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  
AARCH64_FL_CRYPTO)
+AARCH64_OPT_EXTENSION("crc",   AARCH64_FL_CRC, AARCH64_FL_CRC)
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 0924269..d8012f8 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -158,6 +158,7 @@
 #define AARCH64_FL_FP (1 << 1) /* Has FP.  */
 #define AARCH64_FL_CRYPTO (1 << 2) /* Has crypto.  */
 #define AARCH64_FL_SLOWMUL(1 << 3) /* A slow multiply core.  */
+#define AARCH64_FL_CRC(1 << 4) /* Has CRC.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -170,6 +171,7 @@
 
 /* Macros to test ISA flags.  */
 extern unsigned long aarch64_isa_flags;
+#define AARCH64_ISA_CRC(aarch64_isa_flags & AARCH64_FL_CRC)
 #define AARCH64_ISA_CRYPTO (aarch64_isa_flags & AARCH64_FL_CRYPTO)
 #define AARCH64_ISA_FP (aarch64_isa_flags & AARCH64_FL_FP)
 #define AARCH64_ISA_SIMD   (aarch64_isa_flags & AARCH64_FL_SIMD)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 14955dd..0843178 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11138,6 +11138,8 @@ Feature modifiers used with @option{-march} and 
@option{-mcpu} can be one
 the following:
 
 @table @samp
+@item crc
+Enable CRC extension.
 @item crypto
 Enable Crypto extension.  This implies Advanced SIMD is enabled.
 @item fp

[PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64

2013-09-10 Thread Yufeng Zhang

Hi,

Following Bin's patch in 
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00695.html, this patch 
tweaks backtrace_base_for_ref () to strip of any widening conversion 
after the first TREE_CODE check fails.  Without this patch, the test 
(gcc.dg/tree-ssa/slsr-39.c) in Bin's patch will fail on AArch64, as 
backtrace_base_for_ref () will stop if not seeing an ssa_name since the 
tree code can be nop_expr instead.


Regtested on arm and aarch64; still bootstrapping x86_64.

OK for the trunk if the x86_64 bootstrap succeeds?

Thanks,
Yufeng

gcc/

* gimple-ssa-strength-reduction.c (backtrace_base_for_ref): Call
get_unwidened and check 'base_in' again.diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index fea5741..7585164 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -769,7 +769,14 @@ backtrace_base_for_ref (tree *pbase)
 
   STRIP_NOPS (base_in);
   if (TREE_CODE (base_in) != SSA_NAME)
-return tree_to_double_int (integer_zero_node);
+{
+  /* Strip of widening conversion(s) to handle cases where
+e.g. 'B' is widened from an 'int' in order to calculate
+a 64-bit address.  */
+  base_in = get_unwidened (base_in, NULL_TREE);
+  if (TREE_CODE (base_in) != SSA_NAME)
+   return tree_to_double_int (integer_zero_node);
+}
 
   base_cand = base_cand_from_table (base_in);
 

Re: [PATCH, AArch64] Fix the pointer-typed function argument expansion in aarch64_simd_expand_args

2013-09-10 Thread Yufeng Zhang

Oops, now attach the correct patch and change log.

Thanks,
Yufeng

gcc/

* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args):
Call aarch64_simd_expand_args to update op[argc].


On 09/10/13 18:08, Yufeng Zhang wrote:

This patch fixes a number of test failures in gcc.target/aarch64/v*.c in
ILP32.

The corresponding RTL patterns for some load/store builtins have Pmode
(i.e. DImode) specified for their address operands.  However, coming
from a pointer-typed function argument, op[argc] will have SImode in
ILP32.  Instead of duplicating these RTL patterns to cope with SImode
operand (which e.g. would complicate arm_neon.h), we explicitly convert
the operand to Pmode here; an address operand in a RTL shall have Pmode
anyway.  Note that if op[argc] already has DImode,
convert_memory_address will simply return itdiff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 6816b9c..0df5b3b 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -989,6 +989,8 @@ aarch64_simd_expand_args (rtx target, int icode, int 
have_retval,
  switch (thisarg)
{
case SIMD_ARG_COPY_TO_REG:
+ if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
+   op[argc] = convert_memory_address (Pmode, op[argc]);
  /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
  if (!(*insn_data[icode].operand[argc + have_retval].predicate)
  (op[argc], mode[argc]))

[PATCH, AArch64] Fix the pointer-typed function argument expansion in aarch64_simd_expand_args

2013-09-10 Thread Yufeng Zhang
This patch fixes a number of test failures in gcc.target/aarch64/v*.c in 
ILP32.


The corresponding RTL patterns for some load/store builtins have Pmode 
(i.e. DImode) specified for their address operands.  However, coming 
from a pointer-typed function argument, op[argc] will have SImode in 
ILP32.  Instead of duplicating these RTL patterns to cope with SImode 
operand (which e.g. would complicate arm_neon.h), we explicitly convert 
the operand to Pmode here; an address operand in a RTL shall have Pmode 
anyway.  Note that if op[argc] already has DImode, 
convert_memory_address will simply return it.


OK for the trunk?

Thanks,
Yufeng

gcc/

* gimple-ssa-strength-reduction.c (backtrace_base_for_ref): Call
get_unwidened and check 'base_in' again.diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index fea5741..7585164 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -769,7 +769,14 @@ backtrace_base_for_ref (tree *pbase)
 
   STRIP_NOPS (base_in);
   if (TREE_CODE (base_in) != SSA_NAME)
-return tree_to_double_int (integer_zero_node);
+{
+  /* Strip of widening conversion(s) to handle cases where
+e.g. 'B' is widened from an 'int' in order to calculate
+a 64-bit address.  */
+  base_in = get_unwidened (base_in, NULL_TREE);
+  if (TREE_CODE (base_in) != SSA_NAME)
+   return tree_to_double_int (integer_zero_node);
+}
 
   base_cand = base_cand_from_table (base_in);
 

Re: [Patch, AArch64, ILP32] 1/5 Initial support - configury changes

2013-09-18 Thread Yufeng Zhang

On 09/18/13 11:21, Andreas Schwab wrote:

Yufeng Zhang  writes:


 (ASM_SPEC): Update to also substitute -mabi.


You should check that the assembler actually understands that option.
Currently it is impossible to build an aarch64-linux compiler with
binutils from the binutils-2_23 branch.


Ah, I didn't think too much at then on the backward compatibility issue 
with binutils-2_23.  One potential solution can be to backport part of 
configury change from binutils trunk to 2_23.  Can I ask if there is any 
particular reason that you wouldn't like to build the trunk compiler 
with the trunk binutils?  Also since binutils 2.24 release shall not be 
far away, would you be happy to wait for that instead?


Thanks,
Yufeng



Re: [PATCH, AArch64] Fix the pointer-typed function argument expansion in aarch64_simd_expand_args

2013-09-19 Thread Yufeng Zhang

Ping~

Thanks,
Yufeng

http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00774.html


On 09/10/13 18:12, Yufeng Zhang wrote:

Oops, now attach the correct patch and change log.

Thanks,
Yufeng

gcc/

* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args):
Call aarch64_simd_expand_args to update op[argc].


On 09/10/13 18:08, Yufeng Zhang wrote:

This patch fixes a number of test failures in gcc.target/aarch64/v*.c in
ILP32.

The corresponding RTL patterns for some load/store builtins have Pmode
(i.e. DImode) specified for their address operands.  However, coming
from a pointer-typed function argument, op[argc] will have SImode in
ILP32.  Instead of duplicating these RTL patterns to cope with SImode
operand (which e.g. would complicate arm_neon.h), we explicitly convert
the operand to Pmode here; an address operand in a RTL shall have Pmode
anyway.  Note that if op[argc] already has DImode,
convert_memory_address will simply return it





Re: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction

2013-09-23 Thread Yufeng Zhang

On 09/18/13 02:26, bin.cheng wrote:




-Original Message-
From: Dominique Dhumieres [mailto:domi...@lps.ens.fr]
Sent: Wednesday, September 18, 2013 1:47 AM
To: gcc-patches@gcc.gnu.org
Cc: hjl.to...@gmail.com; Bin Cheng
Subject: Re: [PATCH GCC]Catch more MEM_REFs sharing common
addressing part in gimple strength reduction

The new test gcc.dg/tree-ssa/slsr-39.c fails in 64 bit mode (see
http://gcc.gnu.org/ml/gcc-regression/2013-09/msg00455.html ).
Looking for MEM in the dump returns

   _12 = MEM[(int[50] *)_17];
   MEM[(int[50] *)_20] = _13;



Thanks for reporting, I think this can be fixed by patch:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00761.html


Just a quick update on the patch.  The proposed patch didn't pass the 
x86_64 bootstrap, and I'm working on a better fix.


Thanks,
Yufeng



Re: [PATCH, ARM] Fix PR target/58423

2013-09-23 Thread Yufeng Zhang

On 09/23/13 07:58, Zhenqiang Chen wrote:

--- clean-trunk/gcc/config/arm/arm.c2013-09-17 14:29:45.632457018 +0800
+++ pr58423/gcc/config/arm/arm.c2013-09-18 14:34:24.708892318 +0800
@@ -17645,8 +17645,8 @@
mem = gen_frame_mem (DImode, stack_pointer_rtx);

  tmp = gen_rtx_SET (DImode, gen_rtx_REG (DImode, j), mem);
-RTX_FRAME_RELATED_P (tmp) = 1;
  tmp = emit_insn (tmp);
+RTX_FRAME_RELATED_P (tmp) = 1;


The indent doesn't seem right.

Yufeng



Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64

2013-09-25 Thread Yufeng Zhang

Hello,

Please find the updated version of the patch in the attachment.  It has 
addressed the previous comments and also included some changes in order 
to pass the bootstrapping on x86_64.


It's also passed the regtest on arm-none-eabi and aarch64-none-elf.

It will also fix the test failure as reported here:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01317.html

OK for the trunk?

Thanks,
Yufeng


gcc/

* gimple-ssa-strength-reduction.c (safe_to_multiply_p): New 
function.

(backtrace_base_for_ref): Call get_unwidened, check 'base_in'
again and set unwidend_p with true; call safe_to_multiply_p to 
avoid

unsafe unwidened cases.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-40.c: New test.



On 09/11/13 13:39, Bill Schmidt wrote:

On Wed, 2013-09-11 at 10:32 +0200, Richard Biener wrote:

On Tue, Sep 10, 2013 at 5:53 PM, Yufeng Zhang  wrote:

Hi,

Following Bin's patch in
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00695.html, this patch tweaks
backtrace_base_for_ref () to strip of any widening conversion after the
first TREE_CODE check fails.  Without this patch, the test
(gcc.dg/tree-ssa/slsr-39.c) in Bin's patch will fail on AArch64, as
backtrace_base_for_ref () will stop if not seeing an ssa_name since the tree
code can be nop_expr instead.

Regtested on arm and aarch64; still bootstrapping x86_64.

OK for the trunk if the x86_64 bootstrap succeeds?


Please add a testcase.


Also, the comment "Strip of" should read "Strip off".  Otherwise I have
no comments.

Thanks,
Bill



Richard.diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index 8d48add..1c04382 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -750,6 +750,40 @@ slsr_process_phi (gimple phi, bool speed)
   add_cand_for_stmt (phi, c);
 }
 
+/* Utility function for backtrace_base_for_ref.
+
+   Given
+
+ T2 = T2' + CST
+ RES = (wider_type) T2 * C3
+
+   where TYPE is TREE_TYPE (T2), this function returns false when it is
+   _not_ safe to carry out the following transformation.
+
+ RES = (wider_type) T2' * C3 + (wider_type) CST * C3
+
+   One example unsafe case is:
+
+ int array[40];
+ array[n - 1]
+
+   where n is a 32-bit unsigned int and pointer are 64-bit long.  In this
+   case, the gimple for (n - 1) is:
+
+ _2 = n_1(D) + 4294967295; // 0x
+
+   and it is wrong to multiply the large constant by 4 in the 64-bit space.  */
+
+static bool
+safe_to_multiply_p (tree type, double_int cst)
+{
+  if (TYPE_UNSIGNED (type)
+  && ! double_int_fits_to_tree_p (signed_type_for (type), cst))
+return false;
+
+  return true;
+}
+
 /* Given PBASE which is a pointer to tree, look up the defining
statement for it and check whether the candidate is in the
form of:
@@ -766,10 +800,19 @@ backtrace_base_for_ref (tree *pbase)
 {
   tree base_in = *pbase;
   slsr_cand_t base_cand;
+  bool unwidened_p = false;
 
   STRIP_NOPS (base_in);
   if (TREE_CODE (base_in) != SSA_NAME)
-return tree_to_double_int (integer_zero_node);
+{
+  /* Strip off widening conversion(s) to handle cases where
+e.g. 'B' is widened from an 'int' in order to calculate
+a 64-bit address.  */
+  base_in = get_unwidened (base_in, NULL_TREE);
+  if (TREE_CODE (base_in) != SSA_NAME)
+   return tree_to_double_int (integer_zero_node);
+  unwidened_p = true;
+}
 
   base_cand = base_cand_from_table (base_in);
 
@@ -777,7 +820,10 @@ backtrace_base_for_ref (tree *pbase)
 {
   if (base_cand->kind == CAND_ADD
  && base_cand->index.is_one ()
- && TREE_CODE (base_cand->stride) == INTEGER_CST)
+ && TREE_CODE (base_cand->stride) == INTEGER_CST
+ && (! unwidened_p
+ || safe_to_multiply_p (TREE_TYPE (base_cand->stride),
+tree_to_double_int (base_cand->stride
{
  /* X = B + (1 * S), S is integer constant.  */
  *pbase = base_cand->base_expr;
@@ -785,8 +831,11 @@ backtrace_base_for_ref (tree *pbase)
}
   else if (base_cand->kind == CAND_ADD
   && TREE_CODE (base_cand->stride) == INTEGER_CST
-  && integer_onep (base_cand->stride))
-{
+  && integer_onep (base_cand->stride)
+  && (! unwidened_p
+  || safe_to_multiply_p (TREE_TYPE (base_cand->base_expr),
+ base_cand->index)))
+   {
  /* X = B + (i * S), S is integer one.  */
  *pbase = base_cand->base_expr;
  return base_cand->index;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/slsr-40.c
new file mode 100644
index 000..72726a3
--- /dev/null
+++ b/gcc/

Re: [Ping] [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64

2013-10-01 Thread Yufeng Zhang

Ping~

Thanks,
Yufeng

On 09/25/13 12:37, Yufeng Zhang wrote:

Hello,

Please find the updated version of the patch in the attachment.  It has
addressed the previous comments and also included some changes in order
to pass the bootstrapping on x86_64.

It's also passed the regtest on arm-none-eabi and aarch64-none-elf.

It will also fix the test failure as reported here:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01317.html

OK for the trunk?

Thanks,
Yufeng


gcc/

  * gimple-ssa-strength-reduction.c (safe_to_multiply_p): New
function.
  (backtrace_base_for_ref): Call get_unwidened, check 'base_in'
  again and set unwidend_p with true; call safe_to_multiply_p to
avoid
  unsafe unwidened cases.

gcc/testsuite/

  * gcc.dg/tree-ssa/slsr-40.c: New test.



On 09/11/13 13:39, Bill Schmidt wrote:

On Wed, 2013-09-11 at 10:32 +0200, Richard Biener wrote:

On Tue, Sep 10, 2013 at 5:53 PM, Yufeng Zhang   wrote:

Hi,

Following Bin's patch in
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00695.html, this patch tweaks
backtrace_base_for_ref () to strip of any widening conversion after the
first TREE_CODE check fails.  Without this patch, the test
(gcc.dg/tree-ssa/slsr-39.c) in Bin's patch will fail on AArch64, as
backtrace_base_for_ref () will stop if not seeing an ssa_name since the tree
code can be nop_expr instead.

Regtested on arm and aarch64; still bootstrapping x86_64.

OK for the trunk if the x86_64 bootstrap succeeds?


Please add a testcase.


Also, the comment "Strip of" should read "Strip off".  Otherwise I have
no comments.

Thanks,
Bill



Richard.





Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64

2013-10-01 Thread Yufeng Zhang

Hi Bill,

Thank you for the review and the offer to help.

On 10/01/13 15:36, Bill Schmidt wrote:

On Tue, 2013-10-01 at 08:17 -0500, Bill Schmidt wrote:

On Tue, 2013-10-01 at 12:19 +0200, Richard Biener wrote:

On Wed, Sep 25, 2013 at 1:37 PM, Yufeng Zhang  wrote:

Hello,

Please find the updated version of the patch in the attachment.  It has
addressed the previous comments and also included some changes in order to
pass the bootstrapping on x86_64.

It's also passed the regtest on arm-none-eabi and aarch64-none-elf.

It will also fix the test failure as reported here:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01317.html

OK for the trunk?


+   where n is a 32-bit unsigned int and pointer are 64-bit long.  In this
+   case, the gimple for (n - 1) is:
+
+ _2 = n_1(D) + 4294967295; // 0x
+
+   and it is wrong to multiply the large constant by 4 in the 64-bit space.  */
+
+static bool
+safe_to_multiply_p (tree type, double_int cst)
+{
+  if (TYPE_UNSIGNED (type)
+&&  ! double_int_fits_to_tree_p (signed_type_for (type), cst))
+return false;
+
+  return true;
+}

This looks wrong.  The only relevant check is as whether the
multiplication overflows the original type as you miss the implicit
truncation that happens.  Which is something you don't know
unless you know the value.  It definitely isn't a property of a type
and a constant but the property of two constants and a type.
Or the predicate has a wrong name.

The use of get_unwidened in this core routine looks like this is
all happening in the wrong place and we should have picked up
another candidate for this instead?  I'm sure Bill will know more here.


I'm not happy with how this patch is progressing.  Without having looked
too deeply, this might be better handled earlier when determining which
casts are safe to use in building candidates.  What you have here seems
more like closing the barn door after the horse got out.  Maybe that's
the only solution, but it doesn't seem likely.

Another problem is that your test case isn't testing anything except
that the compiler doesn't crash.  That isn't sufficient as a regression
test.

I'll spend some time looking at this to see if I can find a better
approach.  It might be a day or two before I can get to it.  In addition
to the included test case, are there any other cases you've found that I
should be concerned with?


To help me investigate this without having to build a cross compiler,
could you please compile your test case (without the patch applied)
using -fdump-tree-reassoc2 -fdump-tree-slsr-details and send me the
generated dump files?


The issue is not specific to AArch64; please find the attached dumps 
generated from the x86-64 gcc by compiling gcc.dg/tree-ssa/slsr-39.c.


W.r.t your comment in the other email about adding a test to verify the 
expected gimple, I think the existing test gcc.dg/tree-ssa/slsr-39.c is 
sufficient.  The test currently fails on both AArch64 and x86-64, and 
presumably also fails on any other 64-bit target where pointer is 64-bit 
and int is 32-bit size.  The patch I proposed is to fix this issue and 
gcc.dg/tree-ssa/slsr-39.c itself shall be a good regression test (with 
specific verification on gimple ir).


The new test proposed in this patch is to regtest the issue my original 
patch has, which is a runtime failure due to incorrect optimization.


I'll address other comments in separate emails.

Thanks,
Yufeng
;; Function foo (foo, funcdef_no=0, decl_uid=1722, symbol_order=0)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }
foo (int[50] * a2, int v1)
{
  int j;
  long unsigned int _3;
  long unsigned int _4;
  int[50] * _6;
  int _11;
  int _12;
  int _13;

  :
  j_2 = v1_1(D) + 5;
  _3 = (long unsigned int) j_2;
  _4 = _3 * 200;
  _6 = a2_5(D) + _4;
  j_7 = v1_1(D) + 6;
  *_6[j_2] = j_2;
  *_6[j_7] = j_2;
  _11 = v1_1(D) + 4;
  _12 = *_6[_11];
  _13 = _12 + 1;
  *_6[_11] = _13;
  return;

}



;; Function foo (foo, funcdef_no=0, decl_uid=1722, symbol_order=0)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }

Strength reduction candidate vector:

  1  [2] j_2 = v1_1(D) + 5;
 ADD  : v1_1(D) + (5 * 1) : int
 basis: 0  dependent: 5  sibling: 0
 next-interp: 0  dead-savings: 0

  2  [2] _3 = (long unsigned int) j_2;
 ADD  : v1_1(D) + (5 * 1) : long unsigned int
 basis: 0  dependent: 0  sibling: 0
 next-interp: 0  dead-savings: 0

  3  [2] _4 = _3 * 200;
 MULT : (v1_1(D) + 5) * 200 : long unsigned int
 basis: 0  dependent: 0  sibling: 0
 next-interp: 0  dead-savings: 1

  4  [2] _6 = a2_5(D) + _4;
 ADD  : a2_5(D) + (1 * _4) : int[50] *
 basis: 0  dependent: 0  sibling: 0
 next-interp: 0  dead-savings: 0

  5  [2] j_7 = v1_1(D) + 6;
 ADD  : v1_1(D) + (6 * 1) : int
 basis: 1  dependent: 8  sibling: 0

Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64

2013-10-01 Thread Yufeng Zhang

On 10/01/13 20:55, Bill Schmidt wrote:



On Tue, 2013-10-01 at 11:56 -0500, Bill Schmidt wrote:

OK, thanks.  The problem that you've encountered is that you are
attempting to do something illegal. ;)  (Bin's original patch is
actually to blame for that, as well as me for not catching it then.)

As your new test shows, it is unsafe to do the transformation in
backtrace_base_for_ref when widening from an unsigned type, because the
unsigned type has wrap semantics by default.  (The actual test must be
done on TYPE_OVERFLOW_WRAPS since this wrap semantics can be added or
removed by compile option -- see the comments with legal_cast_p and
legal_cast_p_1 later in the module.)

You cannot in general prove that the transformation is allowable for a
specific constant, because you don't know that what you're adding it to
won't cause an overflow that's handled incorrectly.

I believe the correct fix for the unsigned-overflow case is to fail
backtrace_base_for_ref if legal_cast_p (in_type, out_type) returns
false, where in_type is the type of the new *PBASE, and out_type is the
widening type that you're looking through.  So you can't just
STRIP_NOPS, you have to check the cast for legitimacy for this
transformation.

This does not explain why backtrace_base_for_ref does not find all the
opportunities on slsr-39.c.  I don't immediately see what's preventing
that.  Note that the transformation is legal in that case because you
are widening from a signed int to an unsigned int, which won't cause
problems.  You guys need to dig deeper into why those opportunities are
missed when sizetype is larger than int.  Let me know if you need help
figuring it out.


Sorry, I had to leave before and wanted to get this response back to you
in case I didn't get back soon.  I've looked at this some more, and your
general approach should work ok once you get the legal_cast_p check in
place where you do the get_unwidened call now.  Once you know you have a
legal widening, you don't have to worry about the safe_to_multiply_p
stuff.  I.e., you don't need the last two chunks in the patch to
backtrace_base_for_ref, and you don't need the unwidened_p variable.  It
should all fall out properly by just restricting your unwidening to
legal casts.


Many thanks for looking into the issue so promptly.  I've updated the 
patch; I have to use legal_cast_p_1 instead as the gimple node is no 
longer available by then.


Does the new patch look sane?

The regtest on aarch64 and bootstrapping on x86-64 are still running.

Thanks,
Yufeng


gcc/

* gimple-ssa-strength-reduction.c (legal_cast_p_1): Forward
declaration.
(backtrace_base_for_ref): Call get_unwidened with 'base_in' if
'base_in' represent a conversion and legal_cast_p_1 holds; set
'base_in' with the returned value from get_unwidened.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-40.c: New test.diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index 139a4a1..a558f34 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -379,6 +379,7 @@ static bool address_arithmetic_p;
 /* Forward function declarations.  */
 static slsr_cand_t base_cand_from_table (tree);
 static tree introduce_cast_before_cand (slsr_cand_t, tree, tree);
+static bool legal_cast_p_1 (tree, tree);
 
 /* Produce a pointer to the IDX'th candidate in the candidate vector.  */
 
@@ -768,6 +769,14 @@ backtrace_base_for_ref (tree *pbase)
   slsr_cand_t base_cand;
 
   STRIP_NOPS (base_in);
+
+  /* Strip off widening conversion(s) to handle cases where
+ e.g. 'B' is widened from an 'int' in order to calculate
+ a 64-bit address.  */
+  if (CONVERT_EXPR_P (base_in)
+  && legal_cast_p_1 (base_in, TREE_OPERAND (base_in, 0)))
+base_in = get_unwidened (base_in, NULL_TREE);
+
   if (TREE_CODE (base_in) != SSA_NAME)
 return tree_to_double_int (integer_zero_node);
 
@@ -786,7 +795,7 @@ backtrace_base_for_ref (tree *pbase)
   else if (base_cand->kind == CAND_ADD
   && TREE_CODE (base_cand->stride) == INTEGER_CST
   && integer_onep (base_cand->stride))
-{
+   {
  /* X = B + (i * S), S is integer one.  */
  *pbase = base_cand->base_expr;
  return base_cand->index;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/slsr-40.c
new file mode 100644
index 000..72726a3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-40.c
@@ -0,0 +1,27 @@
+/* Verify straight-line strength reduction for array
+   subscripting.
+
+   elems[n-1] is reduced to elems + n * 4 + 0x * 4, only when
+   pointers are of the same size as that of int (assuming 4 bytes).  */
+
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct data
+{
+  unsigned long elms[1];
+} gData;
+
+void __attribute__((noinline))
+foo (struct data *dst, unsigned int n)
+{
+  dst->elms[n - 1] &= 1;
+}
+
+int
+main ()
+{
+  foo (&gData, 1);
+  return

Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64

2013-10-02 Thread Yufeng Zhang

On 10/02/13 02:21, Bill Schmidt wrote:

On Tue, 2013-10-01 at 23:57 +0100, Yufeng Zhang wrote:

On 10/01/13 20:55, Bill Schmidt wrote:



On Tue, 2013-10-01 at 11:56 -0500, Bill Schmidt wrote:

OK, thanks.  The problem that you've encountered is that you are
attempting to do something illegal. ;)  (Bin's original patch is
actually to blame for that, as well as me for not catching it then.)

As your new test shows, it is unsafe to do the transformation in
backtrace_base_for_ref when widening from an unsigned type, because the
unsigned type has wrap semantics by default.  (The actual test must be
done on TYPE_OVERFLOW_WRAPS since this wrap semantics can be added or
removed by compile option -- see the comments with legal_cast_p and
legal_cast_p_1 later in the module.)

You cannot in general prove that the transformation is allowable for a
specific constant, because you don't know that what you're adding it to
won't cause an overflow that's handled incorrectly.

I believe the correct fix for the unsigned-overflow case is to fail
backtrace_base_for_ref if legal_cast_p (in_type, out_type) returns
false, where in_type is the type of the new *PBASE, and out_type is the
widening type that you're looking through.  So you can't just
STRIP_NOPS, you have to check the cast for legitimacy for this
transformation.

This does not explain why backtrace_base_for_ref does not find all the
opportunities on slsr-39.c.  I don't immediately see what's preventing
that.  Note that the transformation is legal in that case because you
are widening from a signed int to an unsigned int, which won't cause
problems.  You guys need to dig deeper into why those opportunities are
missed when sizetype is larger than int.  Let me know if you need help
figuring it out.


Sorry, I had to leave before and wanted to get this response back to you
in case I didn't get back soon.  I've looked at this some more, and your
general approach should work ok once you get the legal_cast_p check in
place where you do the get_unwidened call now.  Once you know you have a
legal widening, you don't have to worry about the safe_to_multiply_p
stuff.  I.e., you don't need the last two chunks in the patch to
backtrace_base_for_ref, and you don't need the unwidened_p variable.  It
should all fall out properly by just restricting your unwidening to
legal casts.


Many thanks for looking into the issue so promptly.  I've updated the
patch; I have to use legal_cast_p_1 instead as the gimple node is no
longer available by then.

Does the new patch look sane?


Yes, much better.  I'm happy with this approach.


Great!  The regtest and bootstrap all passed so I've committed the patch.


However, please
restore the correct whitespace before the { at -786,7 +795,7.


This is actually a correction to the whitespace.  I've split the patch 
and committed it separately.


Thanks again for helping out!

Regards,
Yufeng



Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64

2013-10-02 Thread Yufeng Zhang

On 10/02/13 13:40, Bill Schmidt wrote:

On Tue, 2013-10-01 at 20:21 -0500, Bill Schmidt wrote:

On Tue, 2013-10-01 at 23:57 +0100, Yufeng Zhang wrote:

On 10/01/13 20:55, Bill Schmidt wrote:



On Tue, 2013-10-01 at 11:56 -0500, Bill Schmidt wrote:

OK, thanks.  The problem that you've encountered is that you are
attempting to do something illegal. ;)  (Bin's original patch is
actually to blame for that, as well as me for not catching it then.)

As your new test shows, it is unsafe to do the transformation in
backtrace_base_for_ref when widening from an unsigned type, because the
unsigned type has wrap semantics by default.  (The actual test must be
done on TYPE_OVERFLOW_WRAPS since this wrap semantics can be added or
removed by compile option -- see the comments with legal_cast_p and
legal_cast_p_1 later in the module.)

You cannot in general prove that the transformation is allowable for a
specific constant, because you don't know that what you're adding it to
won't cause an overflow that's handled incorrectly.

I believe the correct fix for the unsigned-overflow case is to fail
backtrace_base_for_ref if legal_cast_p (in_type, out_type) returns
false, where in_type is the type of the new *PBASE, and out_type is the
widening type that you're looking through.  So you can't just
STRIP_NOPS, you have to check the cast for legitimacy for this
transformation.

This does not explain why backtrace_base_for_ref does not find all the
opportunities on slsr-39.c.  I don't immediately see what's preventing
that.  Note that the transformation is legal in that case because you
are widening from a signed int to an unsigned int, which won't cause
problems.  You guys need to dig deeper into why those opportunities are
missed when sizetype is larger than int.  Let me know if you need help
figuring it out.


Sorry, I had to leave before and wanted to get this response back to you
in case I didn't get back soon.  I've looked at this some more, and your
general approach should work ok once you get the legal_cast_p check in
place where you do the get_unwidened call now.  Once you know you have a
legal widening, you don't have to worry about the safe_to_multiply_p
stuff.  I.e., you don't need the last two chunks in the patch to
backtrace_base_for_ref, and you don't need the unwidened_p variable.  It
should all fall out properly by just restricting your unwidening to
legal casts.


Many thanks for looking into the issue so promptly.  I've updated the
patch; I have to use legal_cast_p_1 instead as the gimple node is no
longer available by then.

Does the new patch look sane?


Yes, much better.  I'm happy with this approach.  However, please
restore the correct whitespace before the { at -786,7 +795,7.

Thanks for fixing this up!

Bill


(Just a reminder that I can't approve your patch; you need a maintainer
for that.  But it looks good to me.)


Oops.  I didn't realise that and I just saw your email. :( Sorry...

Can Richard please do a retro-approval?


Sometime when I get a moment I'm probably going to change this to handle
the casting when the candidates are added to the table.


Indeed, that will be a cleaner approach.

Thanks,
Yufeng



[PATCH] Generate fused widening multiply-and-accumulate operations only when the widening multiply has single use

2013-10-21 Thread Yufeng Zhang

Hi,

This patch changes the widening_mul pass to fuse the widening multiply 
with accumulate only when the multiply has single use.  The widening_mul 
pass currently does the conversion regardless of the number of the uses, 
which can cause poor code-gen in cases like the following:


typedef int ArrT [10][10];

void
foo (ArrT Arr, int Idx)
{
  Arr[Idx][Idx] = 1;
  Arr[Idx + 10][Idx] = 2;
}

On AArch64, after widening_mul, the IR is like

  _2 = (long unsigned int) Idx_1(D);
  _3 = Idx_1(D) w* 40;   <
  _5 = Arr_4(D) + _3;
  *_5[Idx_1(D)] = 1;
  _8 = WIDEN_MULT_PLUS_EXPR ; <
  _9 = Arr_4(D) + _8;
  *_9[Idx_1(D)] = 2;

Where the arrows point, there are redundant widening multiplies.

Bootstrap successfully on x86_64.

The patch passes the regtest on aarch64, arm and x86_64.

OK for the trunk?

Thanks,
Yufeng

p.s. Note that x86_64 doesn't suffer from this issue as the 
corresponding widening multiply accumulate op is not available on the 
target.


gcc/

* tree-ssa-math-opts.c (convert_plusminus_to_widen): Call
has_single_use () and not do the conversion if has_single_use ()
returns false for the multiplication result.diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index f7f8ec9..d316990 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2425,12 +2425,16 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, 
gimple stmt,
 
  It might also appear that it would be sufficient to use the existing
  operands of the widening multiply, but that would limit the choice of
- multiply-and-accumulate instructions.  */
+ multiply-and-accumulate instructions.
+
+ If the widened-multiplication result has more than one uses, it is
+ probably wiser not to do the conversion.  */
   if (code == PLUS_EXPR
   && (rhs1_code == MULT_EXPR || rhs1_code == WIDEN_MULT_EXPR))
 {
   if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
-  &type2, &mult_rhs2))
+  &type2, &mult_rhs2)
+ || !has_single_use (rhs1))
return false;
   add_rhs = rhs2;
   conv_stmt = conv1_stmt;
@@ -2438,7 +2442,8 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, 
gimple stmt,
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
 {
   if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
-  &type2, &mult_rhs2))
+  &type2, &mult_rhs2)
+ || !has_single_use (rhs2))
return false;
   add_rhs = rhs1;
   conv_stmt = conv2_stmt;

Re: [PATCH] Generate fused widening multiply-and-accumulate operations only when the widening multiply has single use

2013-10-23 Thread Yufeng Zhang

Hi,

Thank you both for the reviewing.  I've updated the patch and also added 
a test (to the gcc.dg to avoid duplication).  I'll commit the patch shortly.


Thanks,
Yufeng

gcc/

* tree-ssa-math-opts.c (convert_plusminus_to_widen): Call
has_single_use () and not do the conversion if has_single_use ()
returns false for the multiplication result.

gcc/testsuite/

* gcc.dg/wmul-1.c: New test.


On 10/23/13 10:42, Richard Biener wrote:

On Tue, Oct 22, 2013 at 12:01 AM, Yufeng Zhang  wrote:

Hi,

This patch changes the widening_mul pass to fuse the widening multiply with
accumulate only when the multiply has single use.  The widening_mul pass
currently does the conversion regardless of the number of the uses, which
can cause poor code-gen in cases like the following:

typedef int ArrT [10][10];

void
foo (ArrT Arr, int Idx)
{
   Arr[Idx][Idx] = 1;
   Arr[Idx + 10][Idx] = 2;
}

On AArch64, after widening_mul, the IR is like

   _2 = (long unsigned int) Idx_1(D);
   _3 = Idx_1(D) w* 40;<
   _5 = Arr_4(D) + _3;
   *_5[Idx_1(D)] = 1;
   _8 = WIDEN_MULT_PLUS_EXPR;<
   _9 = Arr_4(D) + _8;
   *_9[Idx_1(D)] = 2;

Where the arrows point, there are redundant widening multiplies.

Bootstrap successfully on x86_64.

The patch passes the regtest on aarch64, arm and x86_64.

OK for the trunk?


if (!is_widening_mult_p (rhs1_stmt,&type1,&mult_rhs1,
-&type2,&mult_rhs2))
+&type2,&mult_rhs2)
+  || !has_single_use (rhs1))

please check has_single_use first, it's the cheaper check.

Ok with that change (and possibly a testcase).

Thanks,
Richard.




Thanks,
Yufeng

p.s. Note that x86_64 doesn't suffer from this issue as the corresponding
widening multiply accumulate op is not available on the target.

gcc/

 * tree-ssa-math-opts.c (convert_plusminus_to_widen): Call
 has_single_use () and not do the conversion if has_single_use ()
 returns false for the multiplication result.


diff --git a/gcc/testsuite/gcc.dg/wmul-1.c b/gcc/testsuite/gcc.dg/wmul-1.c
new file mode 100644
index 000..3e762f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wmul-1.c
@@ -0,0 +1,19 @@
+/* Not to fuse widening multiply with accumulate if the multiply has more than
+   one uses.
+   Note that for targets where pointer and int are of the same size or
+   widening multiply-and-accumulate is not available, this test just passes.  
*/
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+typedef int ArrT [10][10];
+
+void
+foo (ArrT Arr, int Idx)
+{
+  Arr[Idx][Idx] = 1;
+  Arr[Idx + 10][Idx] = 2;
+}
+
+/* { dg-final { scan-tree-dump-not "WIDEN_MULT_PLUS_EXPR" "widening_mul" } } */
+/* { dg-final { cleanup-tree-dump "widening_mul" } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index f7f8ec9..77701ae 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2425,20 +2425,25 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, 
gimple stmt,
 
  It might also appear that it would be sufficient to use the existing
  operands of the widening multiply, but that would limit the choice of
- multiply-and-accumulate instructions.  */
+ multiply-and-accumulate instructions.
+
+ If the widened-multiplication result has more than one uses, it is
+ probably wiser not to do the conversion.  */
   if (code == PLUS_EXPR
   && (rhs1_code == MULT_EXPR || rhs1_code == WIDEN_MULT_EXPR))
 {
-  if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
-  &type2, &mult_rhs2))
+  if (!has_single_use (rhs1)
+ || !is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
+ &type2, &mult_rhs2))
return false;
   add_rhs = rhs2;
   conv_stmt = conv1_stmt;
 }
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
 {
-  if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
-  &type2, &mult_rhs2))
+  if (!has_single_use (rhs2)
+ || !is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
+ &type2, &mult_rhs2))
return false;
   add_rhs = rhs1;
   conv_stmt = conv2_stmt;

Re: [PATCH] Generate fused widening multiply-and-accumulate operations only when the widening multiply has single use

2013-10-24 Thread Yufeng Zhang

On 10/24/13 01:29, Richard Henderson wrote:

On 10/21/2013 03:01 PM, Yufeng Zhang wrote:


This patch changes the widening_mul pass to fuse the widening multiply with
accumulate only when the multiply has single use.  The widening_mul pass
currently does the conversion regardless of the number of the uses, which can
cause poor code-gen in cases like the following:

typedef int ArrT [10][10];

void
foo (ArrT Arr, int Idx)
{
   Arr[Idx][Idx] = 1;
   Arr[Idx + 10][Idx] = 2;
}

On AArch64, after widening_mul, the IR is like

   _2 = (long unsigned int) Idx_1(D);
   _3 = Idx_1(D) w* 40;<
   _5 = Arr_4(D) + _3;
   *_5[Idx_1(D)] = 1;
   _8 = WIDEN_MULT_PLUS_EXPR;<
   _9 = Arr_4(D) + _8;
   *_9[Idx_1(D)] = 2;

Where the arrows point, there are redundant widening multiplies.


So they're redundant.  Why does this imply poor code-gen?

If a target has more than one FMA unit, then the target might
be able to issue the computation for _3 and _8 in parallel.

Even if the target only has one FMA unit, but the unit is
pipelined, the computations could overlap.


Thanks for the review.

I think it is a fair point that redundancy doesn't always indicate poor 
code-gen, but there are a few reasons that I think this patch makes sense.


Firstly, the generated WIDEN_MULT_PLUS_EXPR can prevents other 
optimization passes from analyzing the IR sequence effectively.  Like in 
the above example, the widening multiply can be part of a larger common 
sub-expression (Arr_4(D) + Idx_1(D) w* 40 + Idx_1(D) * 4); by blindly 
merging the multiply with accumulate, it makes the recognition of the 
common sub-expression rather difficult.


Secondly, it is generally more expensive (in terms of both latency and 
energy) to multiply than accumulate.  Even though there are multiple MAC 
units* or well-working pipeline, it is not always the case that multiple 
widening multiply-and-accumulate instructions can be scheduled 
(statically or dynamically) together.  Merged multiply-and-accumulate 
can add to the register pressure as well.  So maybe it is better to let 
the backend do the conversion (when the multiply has more uses).


Also, isn't it in general that new common sub-expression (widening 
multiply in this case) shall not be created in the gimple IR when there 
is no obvious benefit?  I can sense that it may be a difference case for 
the floating-point multiply-and-accumulate, as on one hand the 
arithmetic is usually for pure data-processing instead of other purposes 
like address calculation (as what its integer peers may do), and on the 
other hand, on micro-architectures where there are more FMA units than 
FADD units, it probably makes more sense to generate more FMA 
instructions in order to take advantage of the throughput capacity.


The area where this patch tries to tackle is only about the integer 
widening multiply-and-accumulate, and it doesn't seem beneficial to me 
to merge the widening multiply with accumulate so aggressively; you 
could argue that other optimization passes shall be extended to be able 
to handle WIDEN_MULT_PLUS_EXPR and its friends; while it is an option 
I'm considering, it is more likely to be a longer-term solution.


Regards,
Yufeng

*) I think I had abused the word 'fused' in my previous emails.  It 
seems like 'fused' is more often used to refer to the floating-point 
multiply-and-accumulate with a single rounding.




Re: [PATCH GCC]Simplify address expression in IVOPT

2013-10-31 Thread Yufeng Zhang

On 10/30/13 14:46, Richard Biener wrote:

On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng  wrote:

Hi,
I noticed that IVOPT generates complex address expressions like below for iv
base.
 &arr_base[0].y
 &arr[0]
 &MEM[p+o]
It's even worse for targets support auto-increment addressing mode because
IVOPT adjusts such base expression with +/- step, then creates below:
 &arr_base[0].y +/- step
 &arr[0] +/- step
 &MEM[p+o] +/- step
It has two disadvantages:
1) Cost computation in IVOPT can't handle complex address expression and
general returns spill_cost for it, which is bad since address iv is
important to IVOPT.
2) IVOPT creates duplicate candidates for IVs which have same value in
different forms, for example, two candidates are generated with each for
"&a[0]" and "&a".  Again, it's even worse for auto-increment addressing
mode.

This patch fixes the issue by simplifying address expression at the entry of
allocating IV struct.  Maybe the simplification can be put in various fold*
functions but I think it might be better in this way, because:
1) fold* functions are used from front-end to various tree optimizations,
the simplified address expressions may not be what each optimizer wanted.
Think about parallelism related passes, they might want the array index
information kept for further analysis.
2) In some way, the simplification is conflict with current implementation
of fold* function.  Take fold_binary_loc as an example, it tries to simplify
"&a[i1] +p c* i2" into "&a[i1+i2]".  Of course we can simplify in this way
for IVOPT too, but that will cause new problems like: a) we have to add code
in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want
to avoid; b) the simplification can't always be done because of the
sign/unsigned offset problem (especially for auto-increment addressing
mode).
3) There are many entry point for fold* functions, the change will be
non-trivial.
4) The simplification is only done in alloc_iv for true (not duplicate ones)
iv struct, the number of such iv should be moderate.

With these points, I think it might be a win to do the simplification in
IVOPT and create a kind of sand box to let IVOPT play.  Any suggestions?

Bootstrap and tested on x86/x86_64/arm.
The patch causes three cases failed on some target, but all of them are
false alarm, which can be resolved by refining test cases to check more
accurate information.

Is it OK?


Hmm.  I think you want what get_inner_reference_aff does, using
the return value of get_inner_reference as starting point for
determine_base_object.  And you don't want to restrict yourselves
so much on what exprs to process, but only exclude DECL_Ps.
Just amend get_inner_reference_aff to return the tree base object.


Or, update determine_base_object to handle MEM_REF, ARRAY_REF, 
COMPONENT_REF, etc. by calling get_inner_reference to get the base and 
continuing the recursive determine_base_object on the return value 
(TREE_OPERAND (base, 0)).


Calling an amended get_inner_reference_aff can be expensive, as the 
function will also spend time in transforming the reference from tree to 
aff_tree.


Yufeng


[PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-04 Thread Yufeng Zhang

Hi,

This patch extends the slsr pass to optionally use an alternative base 
expression in finding basis for CAND_REFs.  Currently the pass uses 
hash-based algorithm to match the base_expr in a candidate.  Given a 
test case like the following, slsr will not be able to recognize the two 
CAND_REFs have the same basis, as their base_expr are of different 
SSA_NAMEs:


typedef int arr_2[20][20];

void foo (arr_2 a2, int i, int j)
{
  a2[i][j] = 1;
  a2[i + 10][j] = 2;
}

The gimple dump before slsr is like the following (using an 
arm-none-eabi gcc):


  i.0_2 = (unsigned int) i_1(D);
  _3 = i.0_2 * 80;
  _5 = a2_4(D) + _3;
  *_5[j_7(D)] = 1;  <
  _9 = _3 + 800;
  _10 = a2_4(D) + _9;
  *_10[j_7(D)] = 2; <

Here are the dumps for the two CAND_REFs generated for the two 
statements pointed by the arrows:



  4  [2] _5 = a2_4(D) + _3;
 ADD  : a2_4(D) + (80 * i_1(D)) : int[20] *
 basis: 0  dependent: 0  sibling: 0
 next-interp: 0  dead-savings: 0

  8  [2] *_10[j_7(D)] = 2;
 REF  : _10 + ((sizetype) j_7(D) * 4) + 0 : int[20] *
 basis: 5  dependent: 0  sibling: 0
 next-interp: 0  dead-savings: 0

As mentioned previously, slsr cannot establish that candidate 4 is the 
basis for the candidate 8, as they have different base_exprs: a2_4(D) 
and _10, respectively.  However, the two references actually only differ 
by an immediate offset (800).


This patch uses the tree affine combination facilities to create an 
optional alternative base expression to be used in finding (as well as 
recording) the basis.  It calls tree_to_aff_combination_expand on 
base_expr, reset the offset field of the generated aff_tree to 0 and 
generate a tree from it by calling aff_combination_to_tree.


The new tree is recorded as a potential basis, and when 
find_basis_for_candidate fails to find a basis for a CAND_REF in its 
normal approach, it searches again using a tree expanded in such way. 
Such an expanded tree usually discloses the expression behind an 
SSA_NAME.  In the example above, instead of seeing the strength 
reduction candidate chains like this:


  _5 -> 5
  _10 -> 8

we are now having:

  _5 -> 5
  _10 -> 8
  a2_4(D) + (sizetype) i_1(D) * 80 -> 5 -> 8

With the candidates 5 and 8 linked to the same tree expression (a2_4(D) 
+ (sizetype) i_1(D) * 80), slsr is now able to establish that 5 is the 
basis of 8.


The patch doesn't attempt to change the content of any CAND_REF though. 
 It only enables CAND_REFs which (1) have the same stride and (2) have 
the underlying expressions of their base_expr only differ in immediate 
offsets,  to be recognized to have the same basis.  The statements with 
such CAND_REFs will be lowered to MEM_REFs, and later on the RTL 
expander shall be able to fold and re-associate the immediate offsets to 
the rightmost side of the addressing expression, and therefore exposes 
the common sub-expression successfully.


The code-gen difference of the example code on arm with -O2 
-mcpu=cortex-15 is:


mov r3, r1, asl #6
-   add ip, r0, r2, asl #2
str lr, [sp, #-4]!
+   mov ip, #1
+   mov lr, #2
add r1, r3, r1, asl #4
-   mov lr, #1
-   mov r3, #2
add r0, r0, r1
-   add r0, r0, #800
-   str lr, [ip, r1]
-   str r3, [r0, r2, asl #2]
+   add r3, r0, r2, asl #2
+   str ip, [r0, r2, asl #2]
+   str lr, [r3, #800]
ldr pc, [sp], #4

One fewer instruction in this simple case.

The example used in illustration is too simple to show code-gen 
difference on x86_64, but the included test case will show the benefit 
of the patch quite obviously.


The patch has passed

* bootstrapping on arm and x86_64
* regtest on arm-none-eabi,  aarch64-none-elf and x86_64

There is no regression in SPEC2K on arm or x86_64.

OK to commit to the trunk?

Any comment is welcomed!

Thanks,
Yufeng


gcc/

* gimple-ssa-strength-reduction.c: Include tree-affine.h.
(find_basis_for_base_expr): Update comment.
(find_basis_for_candidate): Add new parameter 'alt_base_expr' of
type 'tree'.  Optionally call find_basis_for_base_expr with
'alt_base_expr'.
(record_potential_basis): Add new parameter 'alt_base_expr' of
type 'tree'; set node->base_expr with 'alt_base_expr' if it is
not NULL.
(name_expansions): New static variable.
(get_alternative_base): New function.
(alloc_cand_and_find_basis): Call get_alternative_base for 
CAND_REF.
Update calls to find_basis_for_candidate and 
record_potential_basis.

(execute_strength_reduction): Call free_affine_expand_cache with
&name_expansions.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-41.c: New test.diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index 9a5072c..3150046 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@

Re: [PATCH GCC]Simplify address expression in IVOPT

2013-11-05 Thread Yufeng Zhang

On 11/05/13 10:13, bin.cheng wrote:

Index: gcc/tree-affine.c
===
--- gcc/tree-affine.c   (revision 204117)
+++ gcc/tree-affine.c   (working copy)
@@ -874,10 +874,11 @@ debug_aff (aff_tree *val)
fprintf (stderr, "\n");
  }

-/* Returns address of the reference REF in ADDR.  The size of the accessed
-   location is stored to SIZE.  */
+/* Computes address of the reference REF in ADDR.  The size of the accessed
+   location is stored to SIZE.  Returns pointer to the ultimate containing
+   object to which REF refers.  */

-void
+tree
  get_inner_reference_aff (tree ref, aff_tree *addr, double_int *size)
  {
HOST_WIDE_INT bitsize, bitpos;
@@ -904,6 +905,8 @@ get_inner_reference_aff (tree ref, aff_tree *addr,
aff_combination_add (addr,&tmp);

*size = double_int::from_shwi ((bitsize + BITS_PER_UNIT - 1) / 
BITS_PER_UNIT);
+
+  return base_addr;
  }



I think what Richard suggests is to return the base object rather the 
address of the base object, i.e.


  return base;

This is good in keeping the consistency in the return values between 
get_inner_reference_aff and get_inner_reference.


Yufeng



Re: [PATCH GCC]Simplify address expression in IVOPT

2013-11-05 Thread Yufeng Zhang

On 11/05/13 11:45, Bin.Cheng wrote:

On Tue, Nov 5, 2013 at 7:19 PM, Yufeng Zhang  wrote:

>  On 11/05/13 10:13, bin.cheng wrote:

>>
>>  Index: gcc/tree-affine.c
>>  ===
>>  --- gcc/tree-affine.c   (revision 204117)
>>  +++ gcc/tree-affine.c   (working copy)
>>  @@ -874,10 +874,11 @@ debug_aff (aff_tree *val)
>>   fprintf (stderr, "\n");
>> }
>>
>>  -/* Returns address of the reference REF in ADDR.  The size of the
>>  accessed
>>  -   location is stored to SIZE.  */
>>  +/* Computes address of the reference REF in ADDR.  The size of the
>>  accessed
>>  +   location is stored to SIZE.  Returns pointer to the ultimate
>>  containing
>>  +   object to which REF refers.  */
>>
>>  -void
>>  +tree
>> get_inner_reference_aff (tree ref, aff_tree *addr, double_int *size)
>> {
>>   HOST_WIDE_INT bitsize, bitpos;
>>  @@ -904,6 +905,8 @@ get_inner_reference_aff (tree ref, aff_tree *addr,
>>   aff_combination_add (addr,&tmp);
>>
>>   *size = double_int::from_shwi ((bitsize + BITS_PER_UNIT - 1) /
>>  BITS_PER_UNIT);
>>  +
>>  +  return base_addr;
>> }
>>

>
>  I think what Richard suggests is to return the base object rather the
>  address of the base object, i.e.

I am not sure about that.  We have to pass pointer_type expression to
function determine_base_object for address expressions, because there
is no way to tell pointer from object once we are in
determine_base_object.


I'm just concerned with the consistency in what is returned between 
get_inner_reference and get_inner_reference_aff.  If 
determine_base_object expects reference only, you can probably work 
around it with something like:


  base_object = build_fold_addr_expr (base_object);

after the get_inner_reference_aff call.

Yufeng



Re: [PATCH ARM]Refine scaled address expression on ARM

2013-11-29 Thread Yufeng Zhang

On 11/29/13 07:52, Bin.Cheng wrote:

On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng  wrote:

On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw  wrote:

On 18/09/13 10:15, bin.cheng wrote:




-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of bin.cheng
Sent: Monday, September 02, 2013 3:09 PM
To: Richard Earnshaw
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PATCH ARM]Refine scaled address expression on ARM




-Original Message-
From: Richard Earnshaw
Sent: Thursday, August 29, 2013 9:06 PM
To: Bin Cheng
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH ARM]Refine scaled address expression on ARM

On 28/08/13 08:00, bin.cheng wrote:

Hi,

This patch refines scaled address expression on ARM.  It supports
"base+index*scale" in arm_legitimate_address_outer_p.  It also tries
to legitimize "base + index * scale + offset" with "reg<- base +
offset;  reg
+ index * scale" by introducing thumb2_legitimize_address.  For now
+ function
thumb2_legitimize_address is a kind of placeholder and just does the
mentioned transformation by calling to try_multiplier_address.
Hoping we can improve it in the future.

With this patch:
1) "base+index*scale" is recognized.


That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form.
  So this shouldn't be necessary.  Can you identify where this

non-canoncial form is being generated?




Oh, for now ivopt constructs "index*scale" to test whether backend
supports scaled addressing mode, which is not valid on ARM, so I was going
to construct "base + index*scale" instead.  Since "base + index * scale"

is not

canonical form, I will construct the canonical form and drop this part of

the

patch.

Is rest of this patch OK?


Hi Richard, I removed the part over which you concerned and created this
updated patch.

Is it OK?

Thanks.
bin

2013-09-18  Bin Cheng

   * config/arm/arm.c (try_multiplier_address): New function.
   (thumb2_legitimize_address): New function.
   (arm_legitimize_address): Call try_multiplier_address and
   thumb2_legitimize_address.


6-arm-scaled_address-20130918.txt


Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c  (revision 200774)
+++ gcc/config/arm/arm.c  (working copy)
@@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
  }
  }

+/* Try to find address expression like base + index * scale + offset
+   in X.  If we find one, force base + offset into register and
+   construct new expression reg + index * scale; return the new
+   address expression if it's valid.  Otherwise return X.  */
+static rtx
+try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+  rtx tmp, base_reg, new_rtx;
+  rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = NULL_RTX;
+
+  gcc_assert (GET_CODE (x) == PLUS);
+
+  /* Try to find and record base/index/scale/offset in X. */
+  if (GET_CODE (XEXP (x, 1)) == MULT)
+{
+  tmp = XEXP (x, 0);
+  index = XEXP (XEXP (x, 1), 0);
+  scale = XEXP (XEXP (x, 1), 1);
+  if (GET_CODE (tmp) != PLUS)
+ return x;
+
+  base = XEXP (tmp, 0);
+  offset = XEXP (tmp, 1);
+}
+  else
+{
+  tmp = XEXP (x, 0);
+  offset = XEXP (x, 1);
+  if (GET_CODE (tmp) != PLUS)
+ return x;
+
+  base = XEXP (tmp, 0);
+  scale = XEXP (tmp, 1);
+  if (GET_CODE (base) == MULT)
+ {
+   tmp = base;
+   base = scale;
+   scale = tmp;
+ }
+  if (GET_CODE (scale) != MULT)
+ return x;
+
+  index = XEXP (scale, 0);
+  scale = XEXP (scale, 1);
+}
+
+  if (CONST_INT_P (base))
+{
+  tmp = base;
+  base = offset;
+  offset = tmp;
+}
+
+  if (CONST_INT_P (index))
+{
+  tmp = index;
+  index = scale;
+  scale = tmp;
+}
+
+  /* ARM only supports constant scale in address.  */
+  if (!CONST_INT_P (scale))
+return x;
+
+  if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
+return x;
+
+  /* Only register/constant are allowed in each part.  */
+  if (!symbol_mentioned_p (base)
+&&  !symbol_mentioned_p (offset)
+&&  !symbol_mentioned_p (index)
+&&  !symbol_mentioned_p (scale))
+{


It would be easier to do this at the top of the function --
   if (symbol_mentioned_p (x))
 return x;



+  /* Force "base+offset" into register and construct
+  "register+index*scale".  Return the new expression
+  only if it's valid.  */
+  tmp = gen_rtx_PLUS (SImode, base, offset);
+  base_reg = force_reg (SImode, tmp);
+  tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale);
+  new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp);
+  return new_rtx;


I can't help thinking that this is backwards.  That is, you want to
split out the mult expression and use offset addressing in the addresses
itself.  That's likely to lead to either better CSE, or more induction

Thanks to your review.

Actually base+

Re: [PATCH ARM]Refine scaled address expression on ARM

2013-11-29 Thread Yufeng Zhang

On 11/29/13 12:02, Richard Biener wrote:

On Fri, Nov 29, 2013 at 12:46 PM, Yufeng Zhang  wrote:

On 11/29/13 07:52, Bin.Cheng wrote:


On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng   wrote:


On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw
wrote:


On 18/09/13 10:15, bin.cheng wrote:





-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of bin.cheng
Sent: Monday, September 02, 2013 3:09 PM
To: Richard Earnshaw
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PATCH ARM]Refine scaled address expression on ARM




-Original Message-
From: Richard Earnshaw
Sent: Thursday, August 29, 2013 9:06 PM
To: Bin Cheng
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH ARM]Refine scaled address expression on ARM

On 28/08/13 08:00, bin.cheng wrote:


Hi,

This patch refines scaled address expression on ARM.  It supports
"base+index*scale" in arm_legitimate_address_outer_p.  It also tries
to legitimize "base + index * scale + offset" with "reg<- base +
offset;  reg
+ index * scale" by introducing thumb2_legitimize_address.  For now
+ function
thumb2_legitimize_address is a kind of placeholder and just does the
mentioned transformation by calling to try_multiplier_address.
Hoping we can improve it in the future.

With this patch:
1) "base+index*scale" is recognized.



That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical
form.
   So this shouldn't be necessary.  Can you identify where this


non-canoncial form is being generated?





Oh, for now ivopt constructs "index*scale" to test whether backend
supports scaled addressing mode, which is not valid on ARM, so I was
going
to construct "base + index*scale" instead.  Since "base + index *
scale"


is not


canonical form, I will construct the canonical form and drop this part
of


the


patch.

Is rest of this patch OK?


Hi Richard, I removed the part over which you concerned and created
this
updated patch.

Is it OK?

Thanks.
bin

2013-09-18  Bin Cheng

* config/arm/arm.c (try_multiplier_address): New function.
(thumb2_legitimize_address): New function.
(arm_legitimize_address): Call try_multiplier_address and
thumb2_legitimize_address.


6-arm-scaled_address-20130918.txt


Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c  (revision 200774)
+++ gcc/config/arm/arm.c  (working copy)
@@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
   }
   }

+/* Try to find address expression like base + index * scale + offset
+   in X.  If we find one, force base + offset into register and
+   construct new expression reg + index * scale; return the new
+   address expression if it's valid.  Otherwise return X.  */
+static rtx
+try_multiplier_address (rtx x, enum machine_mode mode
ATTRIBUTE_UNUSED)
+{
+  rtx tmp, base_reg, new_rtx;
+  rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset =
NULL_RTX;
+
+  gcc_assert (GET_CODE (x) == PLUS);
+
+  /* Try to find and record base/index/scale/offset in X. */
+  if (GET_CODE (XEXP (x, 1)) == MULT)
+{
+  tmp = XEXP (x, 0);
+  index = XEXP (XEXP (x, 1), 0);
+  scale = XEXP (XEXP (x, 1), 1);
+  if (GET_CODE (tmp) != PLUS)
+ return x;
+
+  base = XEXP (tmp, 0);
+  offset = XEXP (tmp, 1);
+}
+  else
+{
+  tmp = XEXP (x, 0);
+  offset = XEXP (x, 1);
+  if (GET_CODE (tmp) != PLUS)
+ return x;
+
+  base = XEXP (tmp, 0);
+  scale = XEXP (tmp, 1);
+  if (GET_CODE (base) == MULT)
+ {
+   tmp = base;
+   base = scale;
+   scale = tmp;
+ }
+  if (GET_CODE (scale) != MULT)
+ return x;
+
+  index = XEXP (scale, 0);
+  scale = XEXP (scale, 1);
+}
+
+  if (CONST_INT_P (base))
+{
+  tmp = base;
+  base = offset;
+  offset = tmp;
+}
+
+  if (CONST_INT_P (index))
+{
+  tmp = index;
+  index = scale;
+  scale = tmp;
+}
+
+  /* ARM only supports constant scale in address.  */
+  if (!CONST_INT_P (scale))
+return x;
+
+  if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
+return x;
+
+  /* Only register/constant are allowed in each part.  */
+  if (!symbol_mentioned_p (base)
+&&   !symbol_mentioned_p (offset)
+&&   !symbol_mentioned_p (index)
+&&   !symbol_mentioned_p (scale))
+{



It would be easier to do this at the top of the function --
if (symbol_mentioned_p (x))
  return x;



+  /* Force "base+offset" into register and construct
+  "register+index*scale".  Return the new expression
+  only if it's valid.  */
+  tmp = gen_rtx_PLUS (SImode, base, offset);
+  base_reg = force_reg (SImode, tmp);
+  tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale);
+  new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp);
+  return n

Re: [PATCH ARM]Refine scaled address expression on ARM

2013-11-29 Thread Yufeng Zhang

On 11/29/13 10:44, Richard Biener wrote:

On Fri, Nov 29, 2013 at 8:52 AM, Bin.Cheng  wrote:

On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng  wrote:

On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw  wrote:

On 18/09/13 10:15, bin.cheng wrote:




-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of bin.cheng
Sent: Monday, September 02, 2013 3:09 PM
To: Richard Earnshaw
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PATCH ARM]Refine scaled address expression on ARM




-Original Message-
From: Richard Earnshaw
Sent: Thursday, August 29, 2013 9:06 PM
To: Bin Cheng
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH ARM]Refine scaled address expression on ARM

On 28/08/13 08:00, bin.cheng wrote:

Hi,

This patch refines scaled address expression on ARM.  It supports
"base+index*scale" in arm_legitimate_address_outer_p.  It also tries
to legitimize "base + index * scale + offset" with "reg<- base +
offset;  reg
+ index * scale" by introducing thumb2_legitimize_address.  For now
+ function
thumb2_legitimize_address is a kind of placeholder and just does the
mentioned transformation by calling to try_multiplier_address.
Hoping we can improve it in the future.

With this patch:
1) "base+index*scale" is recognized.


That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form.
  So this shouldn't be necessary.  Can you identify where this

non-canoncial form is being generated?




Oh, for now ivopt constructs "index*scale" to test whether backend
supports scaled addressing mode, which is not valid on ARM, so I was going
to construct "base + index*scale" instead.  Since "base + index * scale"

is not

canonical form, I will construct the canonical form and drop this part of

the

patch.

Is rest of this patch OK?


Hi Richard, I removed the part over which you concerned and created this
updated patch.

Is it OK?

Thanks.
bin

2013-09-18  Bin Cheng

   * config/arm/arm.c (try_multiplier_address): New function.
   (thumb2_legitimize_address): New function.
   (arm_legitimize_address): Call try_multiplier_address and
   thumb2_legitimize_address.


6-arm-scaled_address-20130918.txt


Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c  (revision 200774)
+++ gcc/config/arm/arm.c  (working copy)
@@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
  }
  }

+/* Try to find address expression like base + index * scale + offset
+   in X.  If we find one, force base + offset into register and
+   construct new expression reg + index * scale; return the new
+   address expression if it's valid.  Otherwise return X.  */
+static rtx
+try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+  rtx tmp, base_reg, new_rtx;
+  rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = NULL_RTX;
+
+  gcc_assert (GET_CODE (x) == PLUS);
+
+  /* Try to find and record base/index/scale/offset in X. */
+  if (GET_CODE (XEXP (x, 1)) == MULT)
+{
+  tmp = XEXP (x, 0);
+  index = XEXP (XEXP (x, 1), 0);
+  scale = XEXP (XEXP (x, 1), 1);
+  if (GET_CODE (tmp) != PLUS)
+ return x;
+
+  base = XEXP (tmp, 0);
+  offset = XEXP (tmp, 1);
+}
+  else
+{
+  tmp = XEXP (x, 0);
+  offset = XEXP (x, 1);
+  if (GET_CODE (tmp) != PLUS)
+ return x;
+
+  base = XEXP (tmp, 0);
+  scale = XEXP (tmp, 1);
+  if (GET_CODE (base) == MULT)
+ {
+   tmp = base;
+   base = scale;
+   scale = tmp;
+ }
+  if (GET_CODE (scale) != MULT)
+ return x;
+
+  index = XEXP (scale, 0);
+  scale = XEXP (scale, 1);
+}
+
+  if (CONST_INT_P (base))
+{
+  tmp = base;
+  base = offset;
+  offset = tmp;
+}
+
+  if (CONST_INT_P (index))
+{
+  tmp = index;
+  index = scale;
+  scale = tmp;
+}
+
+  /* ARM only supports constant scale in address.  */
+  if (!CONST_INT_P (scale))
+return x;
+
+  if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
+return x;
+
+  /* Only register/constant are allowed in each part.  */
+  if (!symbol_mentioned_p (base)
+&&  !symbol_mentioned_p (offset)
+&&  !symbol_mentioned_p (index)
+&&  !symbol_mentioned_p (scale))
+{


It would be easier to do this at the top of the function --
   if (symbol_mentioned_p (x))
 return x;



+  /* Force "base+offset" into register and construct
+  "register+index*scale".  Return the new expression
+  only if it's valid.  */
+  tmp = gen_rtx_PLUS (SImode, base, offset);
+  base_reg = force_reg (SImode, tmp);
+  tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale);
+  new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp);
+  return new_rtx;


I can't help thinking that this is backwards.  That is, you want to
split out the mult expression and use offset addressing in the addresses
itself.  That's likely to lead to either better CSE, 

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-02 Thread Yufeng Zhang

Ping~

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html

Thanks,
Yufeng

On 11/26/13 15:02, Yufeng Zhang wrote:

On 11/26/13 12:45, Richard Biener wrote:

On Thu, Nov 14, 2013 at 12:25 AM, Yufeng Zhang   wrote:

On 11/13/13 20:54, Bill Schmidt wrote:

The second version of your original patch is ok with me with the
following changes.  Sorry for the little side adventure into the
next-interp logic; in the end that's going to hurt more than it helps in
this case.  Thanks for having a look at it, anyway.  Thanks also for
cleaning up this version to be less intrusive to common interfaces; I
appreciate it.



Thanks a lot for the review.  I've attached an updated patch with the
suggested changes incorporated.

For the next-interp adventure, I was quite happy to do the experiment; it's
a good chance of gaining insight into the pass.  Many thanks for your prompt
replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final approval,
as I'm not a maintainer.



Hi Richard, would you be happy to OK the patch?


Hmm,

+static tree
+get_alternative_base (tree base)
+{
+  tree *result = (tree *) pointer_map_contains (alt_base_map, base);
+
+  if (result == NULL)
+{
+  tree expr;
+  aff_tree aff;
+
+  tree_to_aff_combination_expand (base, TREE_TYPE (base),
+&aff,&name_expansions);
+  aff.offset = tree_to_double_int (integer_zero_node);
+  expr = aff_combination_to_tree (&aff);
+
+  result = (tree *) pointer_map_insert (alt_base_map, base);
+  gcc_assert (!*result);

I believe this cache will never hit (unless you repeatedly ask for
the exact same statement?) - any non-trivial 'base' trees are
not shared and thus not pointer equivalent.


Yes, you are right about the non-trivial 'base' tree are rarely shared.
   The cached is introduced mainly because get_alternative_base () may be
called twice on the same 'base' tree, once in the
find_basis_for_candidate () for look-up and the other time in
alloc_cand_and_find_basis () for record_potential_basis ().  I'm happy
to leave out the cache if you think the benefit is trivial.


Also using tree_to_aff_combination_expand to get at - what
exactly? The address with any constant offset stripped?
Where do you re-construct that offset?  That is, aff.offset,
which you definitely need to get at a candidate?


As explained in the previous RFC emails, the expanded and
constant-offset-stripped base expr is only used for the purpose of basis
look-up.  The corresponding candidate still has the unexpanded base expr
as its 'base_expr', therefore the info on the constant offset is not
lost and doesn't need to be re-constructed.


+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slsr" } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */

scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.


As the slsr optimizes CAND_REF candidates by simply lowering them to
MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs
is an effective check.  Alternatively, I can add a follow-up patch to
add some dumping facility in replace_ref () to print out the replacing
actions when -fdump-tree-slsr-details is on.

I hope these can address your concerns.


Regards,
Yufeng





Richard.


Regards,

Yufeng

gcc/

  * gimple-ssa-strength-reduction.c: Include tree-affine.h.
  (name_expansions): New static variable.
  (alt_base_map): Ditto.
  (get_alternative_base): New function.
  (find_basis_for_candidate): For CAND_REF, optionally call
  find_basis_for_base_expr with the returned value from
  get_alternative_base.
  (record_potential_basis): Add new parameter 'base' of type 'tree';
  add an assertion of non-NULL base; use base to set node->base_expr.

  (alloc_cand_and_find_basis): Update; call record_potential_basis
  for CAND_REF with the returned value from get_alternative_base.
  (execute_strength_reduction): Call pointer_map_create for
  alt_base_map; call free_affine_expand_cache with&name_expansions.

gcc/testsuite/

  * gcc.dg/tree-ssa/slsr-41.c: New test.






diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index 88afc91..26502c3 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "hash-t

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-03 Thread Yufeng Zhang

On 12/03/13 06:48, Jeff Law wrote:

On 12/02/13 08:47, Yufeng Zhang wrote:

Ping~

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html




Thanks,
Yufeng

On 11/26/13 15:02, Yufeng Zhang wrote:

On 11/26/13 12:45, Richard Biener wrote:

On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
Zhangwrote:

On 11/13/13 20:54, Bill Schmidt wrote:

The second version of your original patch is ok with me with the
following changes.  Sorry for the little side adventure into the
next-interp logic; in the end that's going to hurt more than it
helps in
this case.  Thanks for having a look at it, anyway.  Thanks also for
cleaning up this version to be less intrusive to common interfaces; I
appreciate it.



Thanks a lot for the review.  I've attached an updated patch with the
suggested changes incorporated.

For the next-interp adventure, I was quite happy to do the
experiment; it's
a good chance of gaining insight into the pass.  Many thanks for
your prompt
replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final
approval,
as I'm not a maintainer.

First a note, I need to check on voting for Bill as the slsr maintainer
from the steering committee.   Voting was in progress just before the
close of stage1 development so I haven't tallied the results :-)


Looking forward to some good news! :)



Yes, you are right about the non-trivial 'base' tree are rarely shared.
The cached is introduced mainly because get_alternative_base () may be
called twice on the same 'base' tree, once in the
find_basis_for_candidate () for look-up and the other time in
alloc_cand_and_find_basis () for record_potential_basis ().  I'm happy
to leave out the cache if you think the benefit is trivial.

Without some sense of how expensive the lookups are vs how often the
cache hits it's awful hard to know if the cache is worth it.

I'd say take it out unless you have some sense it's really saving time.
   It's a pretty minor implementation detail either way.


I think the affine tree routines are generally expensive; it is worth 
having a cache to avoid calling them too many times.  I run the slsr-*.c 
tests under gcc.dg/tree-ssa/ and find out that the cache hit rates range 
from 55.6% to 90%, with 73.5% as the average.  The samples may not well 
represent the real world scenario, but they do show the fact that the 
'base' tree can be shared to some extent.  So I'd like to have the cache 
in the patch.







+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slsr" } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */

scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.


As the slsr optimizes CAND_REF candidates by simply lowering them to
MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs
is an effective check.  Alternatively, I can add a follow-up patch to
add some dumping facility in replace_ref () to print out the replacing
actions when -fdump-tree-slsr-details is on.

I think adding some details to the dump and scanning for them would be
better.  That's the only change that is required for this to move forward.


I've updated to patch to dump more details when -fdump-tree-slsr-details 
is on.  The tests have also been updated to scan for these new dumps 
instead of MEMs.




I suggest doing it quickly.  We're well past stage1 close at this point.


The bootstrapping on x86_64 is still running.  OK to commit if it succeeds?

Thanks,
Yufeng

gcc/

* gimple-ssa-strength-reduction.c: Include tree-affine.h.
(name_expansions): New static variable.
(alt_base_map): Ditto.
(get_alternative_base): New function.
(find_basis_for_candidate): For CAND_REF, optionally call
find_basis_for_base_expr with the returned value from
get_alternative_base.
(record_potential_basis): Add new parameter 'base' of type 'tree';
add an assertion of non-NULL base; use base to set node->base_expr.
(alloc_cand_and_find_basis): Update; call record_potential_basis
for CAND_REF with the returned value from get_alternative_base.
(replace_refs): Dump details on the replacing.
(execute_strength_reduction): Call pointer_map_create for
alt_base_map; call free_affine_expand_cache with &name_expansions.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-39.c: Update.
* gcc.dg/tree-ssa/slsr-41.c: New test.diff --git a/gcc/gimple-ssa-strength-reduction.c b/gcc/gimple-ssa-stren

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-03 Thread Yufeng Zhang

On 12/03/13 14:20, Richard Biener wrote:

On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhang  wrote:

On 12/03/13 06:48, Jeff Law wrote:


On 12/02/13 08:47, Yufeng Zhang wrote:


Ping~

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html





Thanks,
Yufeng

On 11/26/13 15:02, Yufeng Zhang wrote:


On 11/26/13 12:45, Richard Biener wrote:


On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
Zhang wrote:


On 11/13/13 20:54, Bill Schmidt wrote:


The second version of your original patch is ok with me with the
following changes.  Sorry for the little side adventure into the
next-interp logic; in the end that's going to hurt more than it
helps in
this case.  Thanks for having a look at it, anyway.  Thanks also for
cleaning up this version to be less intrusive to common interfaces; I
appreciate it.




Thanks a lot for the review.  I've attached an updated patch with the
suggested changes incorporated.

For the next-interp adventure, I was quite happy to do the
experiment; it's
a good chance of gaining insight into the pass.  Many thanks for
your prompt
replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final
approval,
as I'm not a maintainer.


First a note, I need to check on voting for Bill as the slsr maintainer
from the steering committee.   Voting was in progress just before the
close of stage1 development so I haven't tallied the results :-)



Looking forward to some good news! :)




Yes, you are right about the non-trivial 'base' tree are rarely shared.
 The cached is introduced mainly because get_alternative_base () may
be
called twice on the same 'base' tree, once in the
find_basis_for_candidate () for look-up and the other time in
alloc_cand_and_find_basis () for record_potential_basis ().  I'm happy
to leave out the cache if you think the benefit is trivial.


Without some sense of how expensive the lookups are vs how often the
cache hits it's awful hard to know if the cache is worth it.

I'd say take it out unless you have some sense it's really saving time.
It's a pretty minor implementation detail either way.



I think the affine tree routines are generally expensive; it is worth having
a cache to avoid calling them too many times.  I run the slsr-*.c tests
under gcc.dg/tree-ssa/ and find out that the cache hit rates range from
55.6% to 90%, with 73.5% as the average.  The samples may not well represent
the real world scenario, but they do show the fact that the 'base' tree can
be shared to some extent.  So I'd like to have the cache in the patch.







+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slsr" } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */

scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.



As the slsr optimizes CAND_REF candidates by simply lowering them to
MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs
is an effective check.  Alternatively, I can add a follow-up patch to
add some dumping facility in replace_ref () to print out the replacing
actions when -fdump-tree-slsr-details is on.


I think adding some details to the dump and scanning for them would be
better.  That's the only change that is required for this to move forward.



I've updated to patch to dump more details when -fdump-tree-slsr-details is
on.  The tests have also been updated to scan for these new dumps instead of
MEMs.




I suggest doing it quickly.  We're well past stage1 close at this point.



The bootstrapping on x86_64 is still running.  OK to commit if it succeeds?


I still don't like it.  It's using the wrong and too expensive tools to do
stuff.  What kind of bases are we ultimately interested in?  Browsing
the code it looks like we're having

   /* Base expression for the chain of candidates:  often, but not
  always, an SSA name.  */
   tree base_expr;

which isn't really too informative but I suppose they are all
kind-of-gimple_val()s?  That said, I wonder if you can simply
use get_addr_base_and_unit_offset in place of get_alternative_base (),
ignoring the returned offset.


'base_expr' is essentially the base address of a handled_component_p, 
e.g. ARRAY_REF, COMPONENT_REF, etc.  In most case, it is the address of 
the object returned by get_inner_reference ().


Given a test case like the following:

typedef int arr_2[20][20];

void foo (arr_2 a2, int i, int j)
{
  a2[i+10][j] = 1;
  a2[i+10][j+1] = 1;
  a2[i+20][j] = 1;
}

The IR before SLSR is (on x86_64):

  _2 = 

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-03 Thread Yufeng Zhang

On 12/03/13 20:35, Richard Biener wrote:

Yufeng Zhang  wrote:

On 12/03/13 14:20, Richard Biener wrote:

On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhang

wrote:

On 12/03/13 06:48, Jeff Law wrote:


On 12/02/13 08:47, Yufeng Zhang wrote:


Ping~

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html





Thanks,
Yufeng

On 11/26/13 15:02, Yufeng Zhang wrote:


On 11/26/13 12:45, Richard Biener wrote:


On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
Zhang  wrote:


On 11/13/13 20:54, Bill Schmidt wrote:


The second version of your original patch is ok with me with

the

following changes.  Sorry for the little side adventure into

the

next-interp logic; in the end that's going to hurt more than

it

helps in
this case.  Thanks for having a look at it, anyway.  Thanks

also for

cleaning up this version to be less intrusive to common

interfaces; I

appreciate it.




Thanks a lot for the review.  I've attached an updated patch

with the

suggested changes incorporated.

For the next-interp adventure, I was quite happy to do the
experiment; it's
a good chance of gaining insight into the pass.  Many thanks

for

your prompt
replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final
approval,
as I'm not a maintainer.


First a note, I need to check on voting for Bill as the slsr

maintainer

from the steering committee.   Voting was in progress just before

the

close of stage1 development so I haven't tallied the results :-)



Looking forward to some good news! :)




Yes, you are right about the non-trivial 'base' tree are rarely

shared.

  The cached is introduced mainly because get_alternative_base

() may

be
called twice on the same 'base' tree, once in the
find_basis_for_candidate () for look-up and the other time in
alloc_cand_and_find_basis () for record_potential_basis ().  I'm

happy

to leave out the cache if you think the benefit is trivial.


Without some sense of how expensive the lookups are vs how often

the

cache hits it's awful hard to know if the cache is worth it.

I'd say take it out unless you have some sense it's really saving

time.

 It's a pretty minor implementation detail either way.



I think the affine tree routines are generally expensive; it is

worth having

a cache to avoid calling them too many times.  I run the slsr-*.c

tests

under gcc.dg/tree-ssa/ and find out that the cache hit rates range

from

55.6% to 90%, with 73.5% as the average.  The samples may not well

represent

the real world scenario, but they do show the fact that the 'base'

tree can

be shared to some extent.  So I'd like to have the cache in the

patch.








+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slsr" } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */

scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.



As the slsr optimizes CAND_REF candidates by simply lowering them

to

MEM_REF from e.g. ARRAY_REF, I think scanning for the number of

MEM_REFs

is an effective check.  Alternatively, I can add a follow-up

patch to

add some dumping facility in replace_ref () to print out the

replacing

actions when -fdump-tree-slsr-details is on.


I think adding some details to the dump and scanning for them would

be

better.  That's the only change that is required for this to move

forward.



I've updated to patch to dump more details when

-fdump-tree-slsr-details is

on.  The tests have also been updated to scan for these new dumps

instead of

MEMs.




I suggest doing it quickly.  We're well past stage1 close at this

point.



The bootstrapping on x86_64 is still running.  OK to commit if it

succeeds?


I still don't like it.  It's using the wrong and too expensive tools

to do

stuff.  What kind of bases are we ultimately interested in?  Browsing
the code it looks like we're having

/* Base expression for the chain of candidates:  often, but not
   always, an SSA name.  */
tree base_expr;

which isn't really too informative but I suppose they are all
kind-of-gimple_val()s?  That said, I wonder if you can simply
use get_addr_base_and_unit_offset in place of get_alternative_base

(),

ignoring the returned offset.


'base_expr' is essentially the base address of a handled_component_p,
e.g. ARRAY_REF, COMPONENT_REF, etc.  In most case, it is the address of

the object returned by get_inner_reference ().

Given a test case like the following:

typedef int arr_2[20][20];

void foo (arr

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-04 Thread Yufeng Zhang

On 12/04/13 10:30, Richard Biener wrote:

On Wed, Dec 4, 2013 at 11:26 AM, Richard Biener
  wrote:

On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt
  wrote:

On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote:

Yufeng Zhang  wrote:

On 12/03/13 14:20, Richard Biener wrote:

On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhang

wrote:

On 12/03/13 06:48, Jeff Law wrote:


On 12/02/13 08:47, Yufeng Zhang wrote:


Ping~

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html





Thanks,
Yufeng

On 11/26/13 15:02, Yufeng Zhang wrote:


On 11/26/13 12:45, Richard Biener wrote:


On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
Zhang  wrote:


On 11/13/13 20:54, Bill Schmidt wrote:


The second version of your original patch is ok with me with

the

following changes.  Sorry for the little side adventure into

the

next-interp logic; in the end that's going to hurt more than

it

helps in
this case.  Thanks for having a look at it, anyway.  Thanks

also for

cleaning up this version to be less intrusive to common

interfaces; I

appreciate it.




Thanks a lot for the review.  I've attached an updated patch

with the

suggested changes incorporated.

For the next-interp adventure, I was quite happy to do the
experiment; it's
a good chance of gaining insight into the pass.  Many thanks

for

your prompt
replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final
approval,
as I'm not a maintainer.


First a note, I need to check on voting for Bill as the slsr

maintainer

from the steering committee.   Voting was in progress just before

the

close of stage1 development so I haven't tallied the results :-)



Looking forward to some good news! :)




Yes, you are right about the non-trivial 'base' tree are rarely

shared.

  The cached is introduced mainly because get_alternative_base

() may

be
called twice on the same 'base' tree, once in the
find_basis_for_candidate () for look-up and the other time in
alloc_cand_and_find_basis () for record_potential_basis ().  I'm

happy

to leave out the cache if you think the benefit is trivial.


Without some sense of how expensive the lookups are vs how often

the

cache hits it's awful hard to know if the cache is worth it.

I'd say take it out unless you have some sense it's really saving

time.

 It's a pretty minor implementation detail either way.



I think the affine tree routines are generally expensive; it is

worth having

a cache to avoid calling them too many times.  I run the slsr-*.c

tests

under gcc.dg/tree-ssa/ and find out that the cache hit rates range

from

55.6% to 90%, with 73.5% as the average.  The samples may not well

represent

the real world scenario, but they do show the fact that the 'base'

tree can

be shared to some extent.  So I'd like to have the cache in the

patch.








+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slsr" } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */

scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.



As the slsr optimizes CAND_REF candidates by simply lowering them

to

MEM_REF from e.g. ARRAY_REF, I think scanning for the number of

MEM_REFs

is an effective check.  Alternatively, I can add a follow-up

patch to

add some dumping facility in replace_ref () to print out the

replacing

actions when -fdump-tree-slsr-details is on.


I think adding some details to the dump and scanning for them would

be

better.  That's the only change that is required for this to move

forward.



I've updated to patch to dump more details when

-fdump-tree-slsr-details is

on.  The tests have also been updated to scan for these new dumps

instead of

MEMs.




I suggest doing it quickly.  We're well past stage1 close at this

point.



The bootstrapping on x86_64 is still running.  OK to commit if it

succeeds?


I still don't like it.  It's using the wrong and too expensive tools

to do

stuff.  What kind of bases are we ultimately interested in?  Browsing
the code it looks like we're having

/* Base expression for the chain of candidates:  often, but not
   always, an SSA name.  */
tree base_expr;

which isn't really too informative but I suppose they are all
kind-of-gimple_val()s?  That said, I wonder if you can simply
use get_addr_base_and_unit_offset in place of get_alternative_base

(),

ignoring the returned offset.


'base_expr' is essentially the base address of a handled_component_p,
e.g. ARRAY_REF, COMPONENT_

Re: [PATCH/AARCH64 3/6] Fix up multi-lib options

2013-12-04 Thread Yufeng Zhang

Looks good to me, but I cannot approve it.

Yufeng

On 12/03/13 21:24, Andrew Pinski wrote:


Hi,
   The arguments to --with-multilib-list for AARCH64 are exclusive but 
currently is being treated as ones which are not.  This causes problems in that 
we get four library sets with --with-multilib-list=lp64,ilp32: empty, lp64, 
ilp32, lp64/ilp32.  The first and last one does not make sense and should not 
be there.

This patch changes the definition of MULTILIB_OPTIONS so we have a / inbetween 
the options rather than a space.

OK?  Build and tested on aarch64-elf with both --with-multilib-list=lp64,ilp32 
and without it.

Thanks,
Andrew Pinski

* config/aarch64/t-aarch64 (MULTILIB_OPTIONS): Fix definition so
that options are conflicting ones.
---
  gcc/ChangeLog|2 +-
  gcc/config/aarch64/t-aarch64 |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

iff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index 9f8d8cd..98a30d8 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -41,5 +41,5 @@ aarch-common.o: $(srcdir)/config/arm/aarch-common.c 
$(CONFIG_H) $(SYSTEM_H) \
$(srcdir)/config/arm/aarch-common.c

  comma=,
-MULTILIB_OPTIONS= $(patsubst %, mabi=%, $(subst $(comma), 
,$(TM_MULTILIB_CONFIG)))
+MULTILIB_OPTIONS= $(subst $(comma),/, $(patsubst %, mabi=%, $(subst 
$(comma),$(comma)mabi=,$(TM_MULTILIB_CONFIG
  MULTILIB_DIRNAMES   = $(subst $(comma), ,$(TM_MULTILIB_CONFIG))





Re: [PATCH/AARCH64 6/6] Support ILP32 multi-lib

2013-12-04 Thread Yufeng Zhang
I think together with this patch, the default value for 
--with-multilib-list when it is absent can be updated to "lp64,ilp32" 
from "lp64" only.  This will make the multi-lib default setting on 
aarch64*-*-linux* consist that on aarch64*-*-elf.  See gcc/config.gcc.


Thanks,
Yufeng

P.S. Copy&paste related configury snippet.

aarch64*-*-linux*)
tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h linux.h 
glibc-stdint.h"

tm_file="${tm_file} aarch64/aarch64-elf.h aarch64/aarch64-linux.h"
tmake_file="${tmake_file} aarch64/t-aarch64 
aarch64/t-aarch64-linux"

case $target in
aarch64_be-*)
tm_defines="${tm_defines} TARGET_BIG_ENDIAN_DEFAULT=1"
;;
esac
aarch64_multilibs="${with_multilib_list}"
if test "$aarch64_multilibs" = "default"; then
# TODO: turn on ILP32 multilib build after its support 
is mature.

# aarch64_multilibs="lp64,ilp32"
aarch64_multilibs="lp64"
fi


On 12/03/13 21:24, Andrew Pinski wrote:

Hi,
   This is the final patch which adds support for the dynamic linker and
multi-lib directories for ILP32.  I did not change multi-arch support as
I did not know what it should be changed to and internally here at Cavium,
we don't use multi-arch.


OK?  Build and tested for aarch64-linux-gnu with and without 
--with-multilib-list=lp64,ilp32.

Thanks,
Andrew Pinski



* config/aarch64/aarch64-linux.h (GLIBC_DYNAMIC_LINKER): 
/lib/ld-linux32-aarch64.so.1
is used for ILP32.
(LINUX_TARGET_LINK_SPEC): Add linker script
 file whose name depends on -mabi= and -mbig-endian.
* config/aarch64/t-aarch64-linux (MULTILIB_OSDIRNAMES): Handle LP64 
better
and handle ilp32 too.
(MULTILIB_OPTIONS): Delete.
(MULTILIB_DIRNAMES): Delete.
---
  gcc/ChangeLog  |   11 +++
  gcc/config/aarch64/aarch64-linux.h |5 +++--
  gcc/config/aarch64/t-aarch64-linux |7 ++-
  3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-linux.h 
b/gcc/config/aarch64/aarch64-linux.h
index 83efad4..408297a 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -21,7 +21,7 @@
  #ifndef GCC_AARCH64_LINUX_H
  #define GCC_AARCH64_LINUX_H

-#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-aarch64.so.1"
+#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux%{mabi=ilp32:32}-aarch64.so.1"

  #define CPP_SPEC "%{pthread:-D_REENTRANT}"

@@ -32,7 +32,8 @@
 %{rdynamic:-export-dynamic}\
 -dynamic-linker " GNU_USER_DYNAMIC_LINKER "  \
 -X \
-   %{mbig-endian:-EB} %{mlittle-endian:-EL}"
+   %{mbig-endian:-EB} %{mlittle-endian:-EL}\
+   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}"

  #define LINK_SPEC LINUX_TARGET_LINK_SPEC

diff --git a/gcc/config/aarch64/t-aarch64-linux 
b/gcc/config/aarch64/t-aarch64-linux
index ca1525e..5032ea9 100644
--- a/gcc/config/aarch64/t-aarch64-linux
+++ b/gcc/config/aarch64/t-aarch64-linux
@@ -22,10 +22,7 @@ LIB1ASMSRC   = aarch64/lib1funcs.asm
  LIB1ASMFUNCS = _aarch64_sync_cache_range

  AARCH_BE = $(if $(findstring TARGET_BIG_ENDIAN_DEFAULT=1, $(tm_defines)),_be)
-MULTILIB_OSDIRNAMES = .=../lib64$(call 
if_multiarch,:aarch64$(AARCH_BE)-linux-gnu)
+MULTILIB_OSDIRNAMES = mabi.lp64=../lib64$(call 
if_multiarch,:aarch64$(AARCH_BE)-linux-gnu)
  MULTIARCH_DIRNAME = $(call if_multiarch,aarch64$(AARCH_BE)-linux-gnu)

-# Disable the multilib for linux-gnu targets for the time being; focus
-# on the baremetal targets.
-MULTILIB_OPTIONS=
-MULTILIB_DIRNAMES   =
+MULTILIB_OSDIRNAMES += mabi.ilp32=../lib32




Re: [PATCH/middle-end 2/6] __builtin_thread_pointer and AARCH64 ILP32

2013-12-04 Thread Yufeng Zhang

On 12/03/13 21:24, Andrew Pinski wrote:

Hi,
   With ILP32 AARCH64, Pmode (DImode) != ptrmode (SImode) so the variable decl
has a mode of SImode while the register is DImode.  So the target that gets
passed down to expand_builtin_thread_pointer is NULL as expand does not
know how to get a subreg for a pointer type.

This fixes the problem by handling a NULL target like we are able to handle
for a non register/correct mode target inside expand_builtin_thread_pointer.

OK?  Build and tested for aarch64-elf with no regressions.

Thanks,
Andrew Pinski

* builtins.c (expand_builtin_thread_pointer): Create a new target
when the target is NULL.
---
  gcc/ChangeLog  |5 +
  gcc/builtins.c |2 +-
  2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 4f1c818..66797fa 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5699,7 +5699,7 @@ expand_builtin_thread_pointer (tree exp, rtx target)
if (icode != CODE_FOR_nothing)
  {
struct expand_operand op;
-  if (!REG_P (target) || GET_MODE (target) != Pmode)
+  if (target == NULL_RTX || !REG_P (target) || GET_MODE (target) != Pmode)
target = gen_reg_rtx (Pmode);
create_output_operand (&op, target, Pmode);
expand_insn (icode, 1,&op);


Shouldn't thread pointer have ptr_mode instead?  I'm aware that on 
AArch64 the thread pointer system register tpidr_el0 is 64-bit wide 
regardless of ILP32 or not, but in the abstracted view of AArch64 ILP32 
world, the thread pointer shall be a 32-bit pointer; the OS should have 
taken care of the hardware register tpidr_el0 by having its higher 32 
bits cleared.  I think expand_builtin_thread_pointer and 
expand_builtin_set_thread_pointer should use ptr_mode instead.  Correct 
me if I missed anything.


Add Chung-Lin Tang to the CC list; Chung-Lin wrote these builtins in r192364

Yufeng



Re: [PATCH/AARCH64 5/6] Fix TLS for ILP32.

2013-12-04 Thread Yufeng Zhang

On 12/03/13 21:24, Andrew Pinski wrote:

Hi,
   With ILP32, some simple usage of TLS variables causes an unrecognizable
instruction due to needing to use SImode for loading pointers from memory.
This fixes the three (tlsie_small, tlsle_small, tlsdesc_small) patterns to
support SImode for pointers.

OK?  Build and tested on aarch64-elf with no regressions.

Thanks,
Andrew Pinski

* config/aarch64/aarch64.c (aarch64_load_symref_appropriately):
Handle TLS for ILP32.
* config/aarch64/aarch64.md (tlsie_small): Change to an expand to
handle ILP32.
(tlsie_small_): New pattern.
(tlsle_small): Change to an expand to handle ILP32.
(tlsle_small_): New pattern.
(tlsdesc_small): Change to an expand to handle ILP32.
(tlsdesc_small_): New pattern.
---
  gcc/ChangeLog |   12 ++
  gcc/config/aarch64/aarch64.c  |   23 ++--
  gcc/config/aarch64/aarch64.md |   76 ++---
  3 files changed, 94 insertions(+), 17 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b1b4eef..a3e4532 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -628,22 +628,37 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,

  case SYMBOL_SMALL_TLSDESC:
{
-   rtx x0 = gen_rtx_REG (Pmode, R0_REGNUM);
+   enum machine_mode mode = GET_MODE (dest);
+   rtx x0 = gen_rtx_REG (mode, R0_REGNUM);
rtx tp;

+   gcc_assert (mode == Pmode || mode == ptr_mode);
+
emit_insn (gen_tlsdesc_small (imm));
tp = aarch64_load_tp (NULL);
-   emit_insn (gen_rtx_SET (Pmode, dest, gen_rtx_PLUS (Pmode, tp, x0)));
+
+   if (mode != Pmode)
+ tp = gen_lowpart (mode, tp);
+
+   emit_insn (gen_rtx_SET (mode, dest, gen_rtx_PLUS (mode, tp, x0)));
set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
}

  case SYMBOL_SMALL_GOTTPREL:
{
-   rtx tmp_reg = gen_reg_rtx (Pmode);
+   enum machine_mode mode = GET_MODE (dest);
+   rtx tmp_reg = gen_reg_rtx (mode);
rtx tp = aarch64_load_tp (NULL);
+
+   gcc_assert (mode == Pmode || mode == ptr_mode);
+
emit_insn (gen_tlsie_small (tmp_reg, imm));
-   emit_insn (gen_rtx_SET (Pmode, dest, gen_rtx_PLUS (Pmode, tp, 
tmp_reg)));
+
+   if (mode != Pmode)
+ tp = gen_lowpart (mode, tp);
+
+   emit_insn (gen_rtx_SET (mode, dest, gen_rtx_PLUS (mode, tp, tmp_reg)));
set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
}
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 313517f..08fcc94 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3577,35 +3577,85 @@
[(set_attr "type" "call")
 (set_attr "length" "16")])

-(define_insn "tlsie_small"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-(unspec:DI [(match_operand:DI 1 "aarch64_tls_ie_symref" "S")]
+(define_expand "tlsie_small"
+  [(set (match_operand 0 "register_operand" "=r")
+(unspec [(match_operand 1 "aarch64_tls_ie_symref" "S")]
+  UNSPEC_GOTSMALLTLS))]
+  ""
+{
+  if (TARGET_ILP32)
+{
+  operands[0] = gen_lowpart (ptr_mode, operands[0]);
+  emit_insn (gen_tlsie_small_si (operands[0], operands[1]));
+}
+  else
+emit_insn (gen_tlsie_small_di (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "tlsie_small_"
+  [(set (match_operand:PTR 0 "register_operand" "=r")
+(unspec:PTR [(match_operand 1 "aarch64_tls_ie_symref" "S")]
   UNSPEC_GOTSMALLTLS))]
""
-  "adrp\\t%0, %A1\;ldr\\t%0, [%0, #%L1]"
+  "adrp\\t%0, %A1\;ldr\\t%0, [%0, #%L1]"
[(set_attr "type" "load1")
 (set_attr "length" "8")]
  )

-(define_insn "tlsle_small"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-(unspec:DI [(match_operand:DI 1 "register_operand" "r")
-   (match_operand:DI 2 "aarch64_tls_le_symref" "S")]
+
+(define_expand "tlsle_small"
+  [(set (match_operand 0 "register_operand" "=r")
+(unspec [(match_operand 1 "register_operand" "r")
+   (match_operand 2 "aarch64_tls_le_symref" "S")]
+   UNSPEC_GOTSMALLTLS))]
+  ""
+{
+  if (TARGET_ILP32)
+{
+  rtx temp = gen_reg_rtx (ptr_mode);
+  operands[1] = gen_lowpart (ptr_mode, operands[1]);
+  emit_insn (gen_tlsle_small_si (temp, operands[1], operands[2]));
+  emit_move_insn (operands[0], gen_lowpart (GET_MODE (operands[0]), temp));
+}


Looks like you hit the similar issue where the matched RTX can have 
either SImode or DImode in ILP32.  The mechanism looks OK but I think 
the approach that 'add_losym' adopts is neater, which checks on the mode 
instead of TARGET_ILP32 and calls gen_add_losym_di or gen_add_losym_si 
accordingly.  Note that the iterator used in add_losym_ is P 
instead of PTR.


Same for tlsie_small above.


+  else

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-05 Thread Yufeng Zhang

On 12/04/13 13:08, Bill Schmidt wrote:

On Wed, 2013-12-04 at 11:26 +0100, Richard Biener wrote:

On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt
  wrote:

On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote:

Yufeng Zhang  wrote:

On 12/03/13 14:20, Richard Biener wrote:

On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhang

wrote:

On 12/03/13 06:48, Jeff Law wrote:


On 12/02/13 08:47, Yufeng Zhang wrote:


Ping~

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html





Thanks,
Yufeng

On 11/26/13 15:02, Yufeng Zhang wrote:


On 11/26/13 12:45, Richard Biener wrote:


On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
Zhang  wrote:


On 11/13/13 20:54, Bill Schmidt wrote:


The second version of your original patch is ok with me with

the

following changes.  Sorry for the little side adventure into

the

next-interp logic; in the end that's going to hurt more than

it

helps in
this case.  Thanks for having a look at it, anyway.  Thanks

also for

cleaning up this version to be less intrusive to common

interfaces; I

appreciate it.




Thanks a lot for the review.  I've attached an updated patch

with the

suggested changes incorporated.

For the next-interp adventure, I was quite happy to do the
experiment; it's
a good chance of gaining insight into the pass.  Many thanks

for

your prompt
replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final
approval,
as I'm not a maintainer.


First a note, I need to check on voting for Bill as the slsr

maintainer

from the steering committee.   Voting was in progress just before

the

close of stage1 development so I haven't tallied the results :-)



Looking forward to some good news! :)




Yes, you are right about the non-trivial 'base' tree are rarely

shared.

  The cached is introduced mainly because get_alternative_base

() may

be
called twice on the same 'base' tree, once in the
find_basis_for_candidate () for look-up and the other time in
alloc_cand_and_find_basis () for record_potential_basis ().  I'm

happy

to leave out the cache if you think the benefit is trivial.


Without some sense of how expensive the lookups are vs how often

the

cache hits it's awful hard to know if the cache is worth it.

I'd say take it out unless you have some sense it's really saving

time.

 It's a pretty minor implementation detail either way.



I think the affine tree routines are generally expensive; it is

worth having

a cache to avoid calling them too many times.  I run the slsr-*.c

tests

under gcc.dg/tree-ssa/ and find out that the cache hit rates range

from

55.6% to 90%, with 73.5% as the average.  The samples may not well

represent

the real world scenario, but they do show the fact that the 'base'

tree can

be shared to some extent.  So I'd like to have the cache in the

patch.








+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slsr" } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */

scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.



As the slsr optimizes CAND_REF candidates by simply lowering them

to

MEM_REF from e.g. ARRAY_REF, I think scanning for the number of

MEM_REFs

is an effective check.  Alternatively, I can add a follow-up

patch to

add some dumping facility in replace_ref () to print out the

replacing

actions when -fdump-tree-slsr-details is on.


I think adding some details to the dump and scanning for them would

be

better.  That's the only change that is required for this to move

forward.



I've updated to patch to dump more details when

-fdump-tree-slsr-details is

on.  The tests have also been updated to scan for these new dumps

instead of

MEMs.




I suggest doing it quickly.  We're well past stage1 close at this

point.



The bootstrapping on x86_64 is still running.  OK to commit if it

succeeds?


I still don't like it.  It's using the wrong and too expensive tools

to do

stuff.  What kind of bases are we ultimately interested in?  Browsing
the code it looks like we're having

/* Base expression for the chain of candidates:  often, but not
   always, an SSA name.  */
tree base_expr;

which isn't really too informative but I suppose they are all
kind-of-gimple_val()s?  That said, I wonder if you can simply
use get_addr_base_and_unit_offset in place of get_alternative_base

(),

ignoring the returned offset.


'base_expr' is essentially the base address of a handled_component_p,
e.g. ARRAY_REF, COMPONENT_

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-05 Thread Yufeng Zhang

On 12/05/13 13:21, Bill Schmidt wrote:

On Thu, 2013-12-05 at 12:02 +, Yufeng Zhang wrote:

On 12/04/13 13:08, Bill Schmidt wrote:

On Wed, 2013-12-04 at 11:26 +0100, Richard Biener wrote:

[snip]


I'm not sure what you're suggesting that he use get_inner_reference on
at this point.  At the point where the affine machinery is invoked, the
memory reference was already expanded with get_inner_reference, and
there was no basis involving the SSA name produced as the base.  The
affine machinery is invoked on that SSA name to see if it is hiding
another base.  There's no additional memory reference to use
get_inner_reference on, just potentially some pointer arithmetic.

That said, if we have real compile-time issues, we should hold off on
this patch for this release.

Yufeng, please time some reasonably large benchmarks (some version of
SPECint or similar) and report back here before the patch goes in.


I've got some build time data for SPEC2Kint.

On x86_64 the -O3 builds take about 4m7.5s with or without the patch
(consistent over 3 samples).  The difference of the -O3 build time on
arm cortex-a15 is also within 2 seconds.

The bootstrapping time on x86_64 is 134m48.040s without the patch and
134m46.889s with the patch; this data is preliminary as I only sampled
once, but the difference of the bootstrapping time on arm cortex-a15 is
also within 5 seconds.

I can further time SPEC2006int if necessary.

I've also prepared a patch to further reduce the number of calls to
tree-affine expansion, by checking whether or not the passed-in BASE in
get_alternative_base () is simply an SSA_NAME of a declared variable.
Please see the inlined patch below.

Thanks,
Yufeng


diff --git a/gcc/gimple-ssa-strength-reduction.c
b/gcc/gimple-ssa-strength-reduction.c
index 26502c3..2984f06 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -437,13 +437,22 @@ get_alternative_base (tree base)

 if (result == NULL)
   {
-  tree expr;
-  aff_tree aff;
+  tree expr = NULL;
+  gimple def = NULL;

-  tree_to_aff_combination_expand (base, TREE_TYPE (base),
-&aff,&name_expansions);
-  aff.offset = tree_to_double_int (integer_zero_node);
-  expr = aff_combination_to_tree (&aff);
+  if (TREE_CODE (base) == SSA_NAME)
+ def = SSA_NAME_DEF_STMT (base);
+
+  /* Avoid calling expensive tree-affine expansion if BASE
+ is just an SSA_NAME of, e.g. a para_decl.  */
+  if (!def || (is_gimple_assign (def)&&  gimple_assign_lhs (def) ==
base))


Well, that just isn't right.  !def indicates you have a parameter, so
why call tree_to_aff_combination_expand in that case?  Just forget this
addition and check for flag_expensive_optimizations as Richard suggested
in another branch of this thread.


I thought every SSA_NAME has its DEF_STMT, at least in the cases which I 
checked they are GIMPLE_NOPs; that's why I used !def to check for cases 
where BASE is not an SSA_NAME (bad programming habit I guess).


Anyway, I'll leave out this addition.


Previous version of the patch is ok with this change, and with a comment
added that we should eliminate this backtracking with better forward
analysis in a future release.


Thanks.  The following inlined diff is the incremental change.

Thanks again for your review and help.

Regards,
Yufeng


diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c

index 26502c3..f406794 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -428,7 +428,10 @@ static struct pointer_map_t *alt_base_map;

 /* Given BASE, use the tree affine combiniation facilities to
find the underlying tree expression for BASE, with any
-   immediate offset excluded.  */
+   immediate offset excluded.
+
+   N.B. we should eliminate this backtracking with better forward
+   analysis in a future release.  */

 static tree
 get_alternative_base (tree base)
@@ -556,7 +559,7 @@ find_basis_for_candidate (slsr_cand_t c)
}
 }

-  if (!basis && c->kind == CAND_REF)
+  if (flag_expensive_optimizations && !basis && c->kind == CAND_REF)
 {
   tree alt_base_expr = get_alternative_base (c->base_expr);
   if (alt_base_expr)
@@ -641,7 +644,7 @@ alloc_cand_and_find_basis (enum cand_kind kind, 
gimple gs, tree base,

 c->basis = find_basis_for_candidate (c);

   record_potential_basis (c, base);
-  if (kind == CAND_REF)
+  if (flag_expensive_optimizations && kind == CAND_REF)
 {
   tree alt_base = get_alternative_base (base);
   if (alt_base)



Re: AARCH64 configure check for gas -mabi support

2013-12-06 Thread Yufeng Zhang

Hi Kugan,

Thanks for working on this issue.

On 12/04/13 21:03, Kugan wrote:

Hi,

gcc trunk aarch64 bootstrapping fails with gas version 2.23.2 (with
error message similar to cannot compute suffix of object files) as this
particular version does not support -mabi=lp64. It succeeds with later
versions of gas that supports -mabi.


The -mabi option was introduced to gas when the support for ILP32 was 
added.  Initially the options were named -milp32 and -mlp64:


  http://sourceware.org/ml/binutils/2013-06/msg00178.html

and later on they were change to -mabi=ilp32 and -mabi=lp64 for 
consistency with those in the aarch64 gcc:


  http://sourceware.org/ml/binutils/2013-07/msg00180.html

The following gcc patch made the driver use the explicit option to drive 
gas:


  http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00083.html

It is a neglect of the backward compatibility with binutils 2.23.



Attached patch add checking for -mabi=lp64 and prompts upgradation. Is
this Ok?


I think instead of mandating the support for the -mabi option, the 
compiler shall be changed able to work with binutils 2.23.  The 2.23 
binutils have a good support for aarch64 and the main difference from 
2.24 is the ILP32 support.  I think it is necessary to maintain the 
backward compatibility, and it should be achieved by suppressing the 
compiler's support for ILP32 when the -mabi option is not found 
available in gas during the configuration time.


I had a quick look at areas need to be updated:

* multilib support

In gcc/config.gcc, the default and the only accepted value for 
--with-multilib-list and --with-abi shall be lp64 when -mabi is not 
available.


* -mabi option

I suggest we keep the -mabi option, but reject -mabi=ilp32 in 
gcc/config/aarch64/aarch64.c:aarch64_override_options ()


* driver spec

In gcc/config/aarch64/aarch64-elf.h, the DRIVER_SELF_SPECS and ASM_SPEC 
shall be updated to not pass/specify -mabi for gas.


* documentation

I think it needs to be mentioned in gcc/doc/install.texi the constraint 
of using pre-2.24 binutils with aarch64 gcc that is 4.9 or later.


It is a quick scouting, but hopefully it has provided provide some 
guidance.  If you need more help, just let me know.



Yufeng

P.s. some minor comments on the attached patch.



diff --git a/gcc/configure b/gcc/configure
index fdf0cd0..17b6e85 100755
--- a/gcc/configure
+++ b/gcc/configure


Diff result of auto-generation is usually excluded from a patch.


diff --git a/gcc/configure.ac b/gcc/configure.ac
index 91a22d5..730ada0 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3532,6 +3532,15 @@ case "$target" in
[Define if your assembler supports the -no-mul-bug-abort 
option.])])
  ;;

+ aarch64-*-*)


aarch64*-*-*


+gcc_GAS_CHECK_FEATURE([-mabi option],
+  gcc_cv_as_aarch64_mabi,,
+  [-mabi=lp64], [.text],,,)
+if test x$gcc_cv_as_aarch64_mabi = xno; then
+   AC_MSG_ERROR([Assembler support for -mabi=lp64 is required. Upgrade the 
Assembler.])
+fi
+;;
+
sparc*-*-*)
  gcc_GAS_CHECK_FEATURE([.register], gcc_cv_as_sparc_register_op,,,
[.register %g2, #scratch],,






Re: [PATCH/AARCH64 6/6] Support ILP32 multi-lib

2013-12-09 Thread Yufeng Zhang

On 12/03/13 21:24, Andrew Pinski wrote:

Hi,
   This is the final patch which adds support for the dynamic linker and
multi-lib directories for ILP32.  I did not change multi-arch support as
I did not know what it should be changed to and internally here at Cavium,
we don't use multi-arch.


OK?  Build and tested for aarch64-linux-gnu with and without 
--with-multilib-list=lp64,ilp32.

Thanks,
Andrew Pinski



* config/aarch64/aarch64-linux.h (GLIBC_DYNAMIC_LINKER): 
/lib/ld-linux32-aarch64.so.1
is used for ILP32.
(LINUX_TARGET_LINK_SPEC): Add linker script
 file whose name depends on -mabi= and -mbig-endian.
* config/aarch64/t-aarch64-linux (MULTILIB_OSDIRNAMES): Handle LP64 
better
and handle ilp32 too.
(MULTILIB_OPTIONS): Delete.
(MULTILIB_DIRNAMES): Delete.
---
  gcc/ChangeLog  |   11 +++
  gcc/config/aarch64/aarch64-linux.h |5 +++--
  gcc/config/aarch64/t-aarch64-linux |7 ++-
  3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-linux.h 
b/gcc/config/aarch64/aarch64-linux.h
index 83efad4..408297a 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -21,7 +21,7 @@
  #ifndef GCC_AARCH64_LINUX_H
  #define GCC_AARCH64_LINUX_H

-#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-aarch64.so.1"
+#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux%{mabi=ilp32:32}-aarch64.so.1"


To be more explicit and consistent, the name of the ILP32 loader shall 
have 'ilp32' instead of '32'.  The extension field shall be appended to 
'aarch64', separated by '_', and we should probably add the big-endian 
name at the same time.  With the extension fields sorted alphabetically, 
GLIBC_DYNAMIC_LINKER can be defined as:


"/lib/ld-linux-aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1"

The multi-arch names shall follow the same naming convention (although 
we don't have to add the multi-arch support right now).




  #define CPP_SPEC "%{pthread:-D_REENTRANT}"

@@ -32,7 +32,8 @@
 %{rdynamic:-export-dynamic}\
 -dynamic-linker " GNU_USER_DYNAMIC_LINKER "  \
 -X \
-   %{mbig-endian:-EB} %{mlittle-endian:-EL}"
+   %{mbig-endian:-EB} %{mlittle-endian:-EL}\
+   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}"

  #define LINK_SPEC LINUX_TARGET_LINK_SPEC

diff --git a/gcc/config/aarch64/t-aarch64-linux 
b/gcc/config/aarch64/t-aarch64-linux
index ca1525e..5032ea9 100644
--- a/gcc/config/aarch64/t-aarch64-linux
+++ b/gcc/config/aarch64/t-aarch64-linux
@@ -22,10 +22,7 @@ LIB1ASMSRC   = aarch64/lib1funcs.asm
  LIB1ASMFUNCS = _aarch64_sync_cache_range

  AARCH_BE = $(if $(findstring TARGET_BIG_ENDIAN_DEFAULT=1, $(tm_defines)),_be)
-MULTILIB_OSDIRNAMES = .=../lib64$(call 
if_multiarch,:aarch64$(AARCH_BE)-linux-gnu)
+MULTILIB_OSDIRNAMES = mabi.lp64=../lib64$(call 
if_multiarch,:aarch64$(AARCH_BE)-linux-gnu)
  MULTIARCH_DIRNAME = $(call if_multiarch,aarch64$(AARCH_BE)-linux-gnu)

-# Disable the multilib for linux-gnu targets for the time being; focus
-# on the baremetal targets.
-MULTILIB_OPTIONS=
-MULTILIB_DIRNAMES   =
+MULTILIB_OSDIRNAMES += mabi.ilp32=../lib32


Similarly, we shall use libilp32 for the OSDIRNAME.  Although a bit 
ugly, libilp32 is much less ambiguous than lib32; the latter is easily 
confused with a directory for aarch32 libs.


Thanks,
Yufeng



Re: AARCH64 configure check for gas -mabi support

2013-12-09 Thread Yufeng Zhang

Hi Kugan,

Thanks for the quick action.

On 12/09/13 11:20, Kugan wrote:

Thanks Yufeng for the review.

On 07/12/13 03:18, Yufeng Zhang wrote:


>>  gcc trunk aarch64 bootstrapping fails with gas version 2.23.2 (with
>>  error message similar to cannot compute suffix of object files) as this
>>  particular version does not support -mabi=lp64. It succeeds with later
>>  versions of gas that supports -mabi.

>
>  The -mabi option was introduced to gas when the support for ILP32 was
>  added.  Initially the options were named -milp32 and -mlp64:
>
> http://sourceware.org/ml/binutils/2013-06/msg00178.html
>
>  and later on they were change to -mabi=ilp32 and -mabi=lp64 for
>  consistency with those in the aarch64 gcc:
>
> http://sourceware.org/ml/binutils/2013-07/msg00180.html
>
>  The following gcc patch made the driver use the explicit option to drive
>  gas:
>
> http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00083.html
>
>  It is a neglect of the backward compatibility with binutils 2.23.
>

>>
>>  Attached patch add checking for -mabi=lp64 and prompts upgradation. Is
>>  this Ok?

>
>  I think instead of mandating the support for the -mabi option, the
>  compiler shall be changed able to work with binutils 2.23.  The 2.23
>  binutils have a good support for aarch64 and the main difference from
>  2.24 is the ILP32 support.  I think it is necessary to maintain the
>  backward compatibility, and it should be achieved by suppressing the
>  compiler's support for ILP32 when the -mabi option is not found
>  available in gas during the configuration time.
>
>  I had a quick look at areas need to be updated:
>
>  * multilib support
>
>  In gcc/config.gcc, the default and the only accepted value for
>  --with-multilib-list and --with-abi shall be lp64 when -mabi is not
>  available.
>
>  * -mabi option
>
>  I suggest we keep the -mabi option, but reject -mabi=ilp32 in
>  gcc/config/aarch64/aarch64.c:aarch64_override_options ()
>
>  * driver spec
>
>  In gcc/config/aarch64/aarch64-elf.h, the DRIVER_SELF_SPECS and ASM_SPEC
>  shall be updated to not pass/specify -mabi for gas.
>
>  * documentation
>
>  I think it needs to be mentioned in gcc/doc/install.texi the constraint
>  of using pre-2.24 binutils with aarch64 gcc that is 4.9 or later.
>
>  It is a quick scouting, but hopefully it has provided provide some
>  guidance.  If you need more help, just let me know.
>
>
>  Yufeng
>
>  P.s. some minor comments on the attached patch.
>

>>
>>  diff --git a/gcc/configure b/gcc/configure
>>  index fdf0cd0..17b6e85 100755
>>  --- a/gcc/configure
>>  +++ b/gcc/configure

>
>  Diff result of auto-generation is usually excluded from a patch.
>

>>  diff --git a/gcc/configure.ac b/gcc/configure.ac
>>  index 91a22d5..730ada0 100644
>>  --- a/gcc/configure.ac
>>  +++ b/gcc/configure.ac
>>  @@ -3532,6 +3532,15 @@ case "$target" in
>> [Define if your assembler supports the -no-mul-bug-abort
>>  option.])])
>> ;;
>>
>>  + aarch64-*-*)

>
>  aarch64*-*-*
>

>>  +gcc_GAS_CHECK_FEATURE([-mabi option],
>>  +  gcc_cv_as_aarch64_mabi,,
>>  +  [-mabi=lp64], [.text],,,)
>>  +if test x$gcc_cv_as_aarch64_mabi = xno; then
>>  +AC_MSG_ERROR([Assembler support for -mabi=lp64 is required.
>>  Upgrade the Assembler.])
>>  +fi
>>  +;;
>>  +
>>   sparc*-*-*)
>> gcc_GAS_CHECK_FEATURE([.register], gcc_cv_as_sparc_register_op,,,
>>   [.register %g2, #scratch],,
>>

>
>

Here is an attempt to do it the way you have suggested.

Thanks,
Kugan

gcc/

+2013-12-09  Kugan Vivekanandarajah
+   * configure.ac: Add check for aarch64 assembler -mabi support.
+   * configure: Regenerate.
+   * config.in: Regenerate.
+   * config/aarch64/aarch64-elf.h (ASM_MABI_SPEC): New define.
+   (ASM_SPEC): Update to substitute -mabi with ASM_MABI_SPEC.
+   * config/aarch64/aarch64.h (aarch64_override_options):  Issue error if
+   Assembler does not support -mabi and option ilp32 is selected.


Assembler/assembler


+   * doc/install.texi: Added note that building gcc 4.9 and after with pre
+   2.24 binutils will not support -mabi=ilp32.
+



p.txt


diff --git a/gcc/config/aarch64/aarch64-elf.h b/gcc/config/aarch64/aarch64-elf.h
index 4757d22..b260b7c 100644
--- a/gcc/config/aarch64/aarch64-elf.h
+++ b/gcc/config/aarch64/aarch64-elf.h
@@ -134,13 +134,19 @@
" %{!mbig-endian:%{!mlittle-endian:" ENDIAN_SPEC "}}" \
" %{!mabi=*:" ABI_SPEC "}"

+#ifdef HAVE_AS_MABI_OPTION
+#define ASM_MABI_SPEC  &quo

Re: AARCH64 configure check for gas -mabi support

2013-12-10 Thread Yufeng Zhang

Hi Kugan,

The latest patch looks good to me; I only have a couple of minor 
comments inlined below.  Please ask Marcus to review and approve it. 
Thanks again for fixing this issue!


On 12/10/13 06:21, Kugan wrote:
[snip]


Updated it and tested with

1. binutils 2.23.2
   a. bootstrapped with defaults and tested gcc for -mabi=lp64
(compiles) and -mabi=ilp32 gives error
   b. Trying to boottsrap with --with-multilibs-list=lp64,ilp32 fails
with error msg
   c. Trying to bootstrap with --with-multilibs-list=ilp32 fails with
error msg
   d. Bootstrap with --with-multilibs-list=lp64 works.

2. binutils 2.24.51
a. bootstrapped with defaults and tested gcc for -mabi=lp64
(compiles) and -mabi=ilp32 (compiles)
   b. Bootstrap with --with-multilibs-list=lp64,ilp32 works and tested
gcc for -mabi=lp64
compiles and -mabi=ilp32  compiles(* gives linker error in my setup -
aarch64:ilp32 architecture of input file `/tmp/ccIFqSxU.o' is
incompatible with aarch64 output; I believe this is not related to what
I am testing)
   c. Bootstrap with default works


Thanks for the comprehensive testing.  The linker error you see is 
because that the ILP32 support for aarch64*-*-linux* has not been added 
(Andrew Pinski has sent the patch series to enable the support here 
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00282.html)


I also test the patch by building aarch64-none-elf cross compilers with 
binutils 2.23.2 and mainline, with default --with-multilibs-list.  It 
works well.


[snip]


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b1b4eef..a53febc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5187,6 +5187,13 @@ aarch64_override_options (void)
aarch64_parse_tune ();
  }

+/* Issue error if assembler does not support -mabi and option ilp32
+  is selected.  */


I'd prefer the comment to be "The compiler may have been configured with 
2.23.* binutils, which does not have support for ILP32."



+#ifndef HAVE_AS_MABI_OPTION
+  if (TARGET_ILP32)
+error ("Assembler does not supprt -mabi=ilp32");
+#endif


supprt/support


+
initialize_aarch64_code_model ();

aarch64_build_bitmask_table ();
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 91a22d5..a951b82 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3495,6 +3495,35 @@ AC_DEFINE_UNQUOTED(HAVE_LTO_PLUGIN, $gcc_cv_lto_plugin,
  AC_MSG_RESULT($gcc_cv_lto_plugin)

  case "$target" in
+
+  aarch64*-*-*)
+gcc_GAS_CHECK_FEATURE([-mabi option],
+  gcc_cv_as_aarch64_mabi,,
+  [-mabi=lp64], [.text],,,)
+if test x$gcc_cv_as_aarch64_mabi = xyes; then
+   AC_DEFINE(HAVE_AS_MABI_OPTION, 1,
+ [Define if your assembler supports the -mabi option.])
+else
+   if test x$with_abi = xilp32; then
+   AC_MSG_ERROR([Assembler does not support -mabi=ilp32.  Upgrade the 
Assembler.])
+   fi
+if test x"$with_multilib_list" = xdefault; then
+   TM_MULTILIB_CONFIG=lp64
+else
+   aarch64_multilibs=`echo $with_multilib_list | sed -e 's/,/ /g'`
+   for aarch64_multilib in ${aarch64_multilibs}; do
+   case ${aarch64_multilib} in
+   ilp32)
+   AC_MSG_ERROR([Assembler does not support -mabi=ilp32.  
Upgrade the Assembler.])
+   ;;
+   *)
+   ;;
+   esac
+   done
+   fi
+fi
+;;
+


I'm not very sure about the indent rules for configury files, but in 
other areas of configure.ac, it seems using a similar indent convention 
as in .c files.



Thanks,
Yufeng



# All TARGET_ABI_OSF targets.
alpha*-*-linux* | alpha*-*-*bsd*)
  gcc_GAS_CHECK_FEATURE([explicit relocation support],
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index a8f9f8a..00c4f0d 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3735,6 +3735,15 @@ removed and the system libunwind library will always be 
used.

  @html
  
+@end html
+@anchor{aarch64-x-x}
+@heading aarch64*-*-*
+Pre 2.24 binutils does not have support for selecting -mabi and does not
+support ILP32.  If GCC 4.9 or later is built with pre 2.24, GCC will not
+support option -mabi=ilp32.
+
+@html
+
  
  @end html
  @anchor{x-ibm-aix}






Re: RFA: revert libstdc++ r205810: simulator workload increase caused regression

2013-12-16 Thread Yufeng Zhang

On 12/16/13 04:22, Hans-Peter Nilsson wrote:

From: Hans-Peter Nilsson
Date: Sun, 15 Dec 2013 15:20:48 +0100



+// { dg-options "-std=gnu++0x -DSAMPLES=3" { target { { arm*-* }&&  
simulator } } }
+// { dg-options "-std=gnu++0x -DSAMPLES=1" { target simulator } }


JFTR, I managed to have two bugs here:
1 - the target tuple (unless being an "effective target") must match "*-*-*".
2 - the *last* matching line is used.

But as mentioned, I'd prefer to split chi2_quality.cc into
(five) separate tests, if a maintainer would be ok with that.


Sorry that the commit causes the timeout on cris-elf.  I agree that it 
would be nicer to split chi2_quality.cc.


Thanks,
Yufeng




Re: RFA: fix libstdc++ regression, simulator timeout from r205810

2013-12-27 Thread Yufeng Zhang
Many thanks for your effort in fixing the issue.  I can confirm that the 
new tests pass on arm-eabi using qemu as the simulator.


Thanks,
Yufeng

P.s. Wish you have nice holiday break and happy new year!

On 12/20/13 01:12, Hans-Peter Nilsson wrote:

Here's a patch that splits up 20_util/hash/chi2_quality.cc *and*
increases some of the iteration numbers for simulator targets to
something that passes for all working targets mentioned below.
I am a bit worried about the stability of these tests and the
implementation, seeing this amount of target-specific
differences in results.  Maybe a person interested in the
quality of the implementation and knowing statistics should have
a look.

I originally naively thought just splitting up the test would
help; allowing to remove the simulator target-test-constraints
completely, but that only produced timeouts for cris-elf.

The effect of this patch is therefore to split it up and
increase some of the SAMPLES values compared to the the current
commit, assuming there's a useful linear dependence.  Don't be
fooled by test_document_words now being excluded at the
test-top: it already was (for all SAMPLES<  10), except for
compiling and running an empty function; see the original.

I've tested this on two different x86_64-linux-gnu hosts, Fedora
17 and Debian 7 aka. "wheezy" (to eliminate my suspicion of
distro differences) and three simulator targets: cris-elf,
powerpc-eabi and mipsisa32r2el-elf.  I also tried running these
tests for arm-eabi / arm-sim.exp but they all failed apparently
because of memory resource constraints within the simulator
setup: in libstdc++.log I see "terminate called after throwing
an instance of 'std::bad_alloc'".

I felt I honestly had to increase some of the SAMPLES numbers
from the current number of 3; the final number I came up
with, passes for all the mentioned working targets.  I didn't
want to *decrease* any of the numbers (e.g. for simulator only)
to exploit some local minima of the "k" values (for example
there's one such for 1 for all targets above except
x86_64/32 and the ones the ARM people mentioned).  So, I
increased the respective number to be higher than where at least
one target showed a test-suite failure.  When checking the
SAMPLES numbers for the hosts, I of course just hacked the
default SAMPLES temporarily; they were not tested as simulator
targets. The SAMPLES number for all but test_bit_flip_set change
to 35000; a number somewhat arbitrarily chosen as higher than
3.

Curiously, I never saw test_bit_flip_set fail as mentioned in
.  Maybe
that was a mistaken observation and what was observed was
test_bit_string_set failing.  (That's a good reason to always
copy-paste instead of *typing* from memory or from reading.)
That test certainly failed for most targets, but also for
SAMPLES=3, not mentioned in the referenced message, which
made me suspect a distribution- or glibc-version-related
difference.

A higher SAMPLES number definitely has a noticeable cost in
terms of test-time, challenging the statement "it doesn't take
that much time either".  For both cris-elf-sim and
powerpc-eabi-sim running on a x86_64 host of yesteryear, the
time for the affected tests went from about 30 seconds to 4 min
28 seconds and 6 min 20 seconds respectively, going from 1
to these numbers.  For the original r205810 change compared to
r205803, test-time for cris-elf for a *complete "make
check"-run* (C, C++, ObjC, Fortran including libraries) went
from 3h56min to 4h5min (when the test timed out).

Ok to commit?

libstdc++-v3:
 * testsuite/20_util/hash/chi2_quality.h: Break out from
 chi2_quality.cc.
 * testsuite/20_util/hash/chi2_q_bit_flip_set.cc: Ditto.
 * testsuite/20_util/hash/chi2_q_document_words.cc: Ditto.
 * testsuite/20_util/hash/chi2_q_bit_string_set.cc: Ditto.  Increase
 SAMPLES to 35000 for simulator targets.
 * testsuite/20_util/hash/chi2_q_numeric_pattern_set.cc: Ditto.
 * testsuite/20_util/hash/chi2_q_uniform_random.cc: Ditto.
 * testsuite/20_util/hash/chi2_quality.cc: Remove.

--- libstdc++-v3/testsuite/20_util/hash/chi2_quality.cc Sun Dec 15 15:01:43 2013
+++ /dev/null   Thu Jan 01 00:00:00 1970 +
@@ -1,218 +0,0 @@
-// { dg-options "-std=gnu++0x" }
-
-// Use smaller statistics when running on simulators, so it takes less time.
-// { dg-options "-std=gnu++0x -DSAMPLES=3" { target simulator } }
-
-// Copyright (C) 2010-2013 Free Software Foundation, Inc.
-//
-// This file is part of the GNU ISO C++ Library.  This library is free
-// software; you can redistribute it and/or modify it under the
-// terms of the GNU General Public License as published by the
-// Free Software Foundation; either version 3, or (at your option)
-// any later version.
-//
-// This library is distributed in the hope that it will be useful,
-// but WITHOUT ANY WARRANTY; without even the implied warr

[PATCH, AArch64] Use llfloor and llceil for vcvtmd_s64_f64 and vcvtpd_s64_f64 in arm_neon.h

2014-01-06 Thread Yufeng Zhang
This patch fixes the implementation of vcvtmd_s64_f64 and vcvtpd_s64_f64 
in arm_neon.h to use llfloor and llceil instead, which are ILP32-friendly.


This patch will fix the following test failure in the ILP32 mode:

FAIL: gcc.target/aarch64/vect-vcvt.c scan-assembler fcvtms\\tx[0-9]+, 
d[0-9]+


OK for the trunk?

Thanks,
Yufeng

gcc/

* config/aarch64/aarch64-builtins.c
(aarch64_builtin_vectorized_function): Add BUILT_IN_LFLOORF,
BUILT_IN_LLFLOOR, BUILT_IN_LCEILF and BUILT_IN_LLCEIL.
* config/aarch64/arm_neon.h (vcvtaq_u64_f64): Call __builtin_llfloor
instead of __builtin_lfloor.
(vcvtnq_u64_f64): Call __builtin_llceil instead of __builtin_lceil.diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 439c3f4..27af30f 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1034,6 +1034,8 @@ aarch64_builtin_vectorized_function (tree fndecl, tree 
type_out, tree type_in)
   (out_mode == N##Imode && out_n == C \
&& in_mode == N##Fmode && in_n == C)
case BUILT_IN_LFLOOR:
+   case BUILT_IN_LFLOORF:
+   case BUILT_IN_LLFLOOR:
case BUILT_IN_IFLOORF:
  {
enum aarch64_builtins builtin;
@@ -1049,6 +1051,8 @@ aarch64_builtin_vectorized_function (tree fndecl, tree 
type_out, tree type_in)
return aarch64_builtin_decls[builtin];
  }
case BUILT_IN_LCEIL:
+   case BUILT_IN_LCEILF:
+   case BUILT_IN_LLCEIL:
case BUILT_IN_ICEILF:
  {
enum aarch64_builtins builtin;
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e33a684..c855b0f 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -17699,7 +17699,7 @@ vcvtaq_u64_f64 (float64x2_t __a)
 __extension__ static __inline int64_t __attribute__ ((__always_inline__))
 vcvtmd_s64_f64 (float64_t __a)
 {
-  return __builtin_lfloor (__a);
+  return __builtin_llfloor (__a);
 }
 
 __extension__ static __inline uint64_t __attribute__ ((__always_inline__))
@@ -17835,7 +17835,7 @@ vcvtnq_u64_f64 (float64x2_t __a)
 __extension__ static __inline int64_t __attribute__ ((__always_inline__))
 vcvtpd_s64_f64 (float64_t __a)
 {
-  return __builtin_lceil (__a);
+  return __builtin_llceil (__a);
 }
 
 __extension__ static __inline uint64_t __attribute__ ((__always_inline__))

[PATCH, ARM] Fix ICE in arm_expand_neon_args

2014-01-07 Thread Yufeng Zhang

Hi,

The patch fixes an ICE in gcc/config/arm/arm.c:arm_expand_neon_args (). 
 When the destination address for vst1q_lane_u64 is not aligned, 
calling expand_normal will get a REG, which is not expected by 
arm_expand_neon_args, resulting in an assertion failure.  Now, call 
expand_expr with EXPAND_MEMORY to tell the expand that we really want a 
MEM in the case of NEON_ARG_MEMORY.


OK for the trunk and 4.8 branch?

Thanks,
Yufeng

gcc/

* config/arm/arm.c (arm_expand_neon_args): Call expand_expr
with EXPAND_MEMORY for NEON_ARG_MEMORY; check if the returned
rtx is const0_rtx or not.

gcc/testsuite/

* gcc.target/arm/neon/vst1Q_laneu64-1.c: New test.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8fea2a6..a3b2796 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24526,7 +24526,11 @@ arm_expand_neon_args (rtx target, int icode, int 
have_retval,
type_mode);
 }
 
-  op[argc] = expand_normal (arg[argc]);
+ /* Use EXPAND_MEMORY for NEON_ARG_MEMORY to ensure a MEM_P
+be returned.  */
+ op[argc] = expand_expr (arg[argc], NULL_RTX, VOIDmode,
+ (thisarg == NEON_ARG_MEMORY
+  ? EXPAND_MEMORY : EXPAND_NORMAL));
 
   switch (thisarg)
 {
@@ -24545,6 +24549,9 @@ arm_expand_neon_args (rtx target, int icode, int 
have_retval,
   break;
 
 case NEON_ARG_MEMORY:
+ /* Check if expand failed.  */
+ if (op[argc] == const0_rtx)
+   return 0;
  gcc_assert (MEM_P (op[argc]));
  PUT_MODE (op[argc], mode[argc]);
  /* ??? arm_neon.h uses the same built-in functions for signed
diff --git a/gcc/testsuite/gcc.target/arm/neon/vst1Q_laneu64-1.c 
b/gcc/testsuite/gcc.target/arm/neon/vst1Q_laneu64-1.c
new file mode 100644
index 000..5f4c927
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vst1Q_laneu64-1.c
@@ -0,0 +1,25 @@
+/* Test the `vst1Q_laneu64' ARM Neon intrinsic.  */
+
+/* Detect ICE in the case of unaligned memory address.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+
+#include "arm_neon.h"
+
+unsigned char dummy_store[1000];
+
+void
+foo (char* addr)
+{
+  uint8x16_t vdata = vld1q_u8 (addr);
+  vst1q_lane_u64 ((uint64_t*) &dummy_store, vreinterpretq_u64_u8 (vdata), 0);
+}
+
+uint64_t
+bar (uint64x2_t vdata)
+{
+  vdata = vld1q_lane_u64 ((uint64_t*) &dummy_store, vdata, 0);
+  return vgetq_lane_u64 (vdata, 0);
+}

[PATCH, AArch64] Use GCC builtins to count leading/tailing zeros

2014-01-07 Thread Yufeng Zhang

Hi,

This patch is to sync up include/longlong.h with its glibc peer after 
the proposed change here:


http://sourceware.org/ml/libc-alpha/2014-01/msg00114.html

The patch defines a number of macros in stdlib/longlong.h to use GCC 
builtins __builtin_clz* to implement the __clz* and __ctz* functions on 
AArch64.


OK for the mainline?

Thanks,
Yufeng

include/

* longlong.h (count_leading_zeros, count_trailing_zeros)
(COUNT_LEADING_ZEROS_0): Define for aarch64.diff --git a/include/longlong.h b/include/longlong.h
index 5f00e54..b4c1f400 100644
--- a/include/longlong.h
+++ b/include/longlong.h
@@ -122,6 +122,22 @@ extern const UQItype __clz_tab[256] attribute_hidden;
 #define __AND_CLOBBER_CC , "cc"
 #endif /* __GNUC__ < 2 */
 
+#if defined (__aarch64__)
+
+#if W_TYPE_SIZE == 32
+#define count_leading_zeros(COUNT, X)  ((COUNT) = __builtin_clz (X))
+#define count_trailing_zeros(COUNT, X)   ((COUNT) = __builtin_ctz (X))
+#define COUNT_LEADING_ZEROS_0 32
+#endif /* W_TYPE_SIZE == 32 */
+
+#if W_TYPE_SIZE == 64
+#define count_leading_zeros(COUNT, X)  ((COUNT) = __builtin_clzll (X))
+#define count_trailing_zeros(COUNT, X)   ((COUNT) = __builtin_ctzll (X))
+#define COUNT_LEADING_ZEROS_0 64
+#endif /* W_TYPE_SIZE == 64 */
+
+#endif /* __aarch64__ */
+
 #if defined (__alpha) && W_TYPE_SIZE == 64
 #define umul_ppmm(ph, pl, m0, m1) \
   do { \

Re: [PATCH, AArch64 6/6] aarch64: Define add_ssaaaa, sub_ddmmss, umul_ppmm

2014-01-09 Thread Yufeng Zhang

Hi,

This patch and the preceding aarch64.md patches all look good to me, but 
I cannot approve it.


Thanks for adding the support for these missing patterns and defines!

Yufeng

On 01/08/14 18:13, Richard Henderson wrote:

We have good support for TImode arithmetic, so no need to do anything
with inline assembly.

include/
* longlong.h [__aarch64__] (add_ss, sub_ddmmss, umul_ppmm): New.
[__aarch64__] (COUNT_LEADING_ZEROS_0): Define in terms of W_TYPE_SIZE.
---
  include/longlong.h | 28 ++--
  1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/longlong.h b/include/longlong.h
index b4c1f400..1b11fc7 100644
--- a/include/longlong.h
+++ b/include/longlong.h
@@ -123,19 +123,35 @@ extern const UQItype __clz_tab[256] attribute_hidden;
  #endif /* __GNUC__<  2 */

  #if defined (__aarch64__)
+#define add_ss(sh, sl, ah, al, bh, bl) \
+  do { \
+UDWtype __x = (UDWtype)(UWtype)(ah)<<  64 | (UWtype)(al);\
+__x += (UDWtype)(UWtype)(bh)<<  64 | (UWtype)(bl);   \
+(sh) = __x>>  W_TYPE_SIZE;   \
+(sl) = __x;
\
+  } while (0)
+#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+  do { \
+UDWtype __x = (UDWtype)(UWtype)(ah)<<  64 | (UWtype)(al);\
+__x -= (UDWtype)(UWtype)(bh)<<  64 | (UWtype)(bl);   \
+(sh) = __x>>  W_TYPE_SIZE;   \
+(sl) = __x;
\
+  } while (0)
+#define umul_ppmm(ph, pl, m0, m1)  \
+  do { \
+UDWtype __x = (UDWtype)(UWtype)(m0) * (UWtype)(m1);
\
+(ph) = __x>>  W_TYPE_SIZE;   \
+(pl) = __x;
\
+  } while (0)

+#define COUNT_LEADING_ZEROS_0   W_TYPE_SIZE
  #if W_TYPE_SIZE == 32
  #define count_leading_zeros(COUNT, X) ((COUNT) = __builtin_clz (X))
  #define count_trailing_zeros(COUNT, X)   ((COUNT) = __builtin_ctz (X))
-#define COUNT_LEADING_ZEROS_0 32
-#endif /* W_TYPE_SIZE == 32 */
-
-#if W_TYPE_SIZE == 64
+#elif W_TYPE_SIZE == 64
  #define count_leading_zeros(COUNT, X) ((COUNT) = __builtin_clzll (X))
  #define count_trailing_zeros(COUNT, X)   ((COUNT) = __builtin_ctzll (X))
-#define COUNT_LEADING_ZEROS_0 64
  #endif /* W_TYPE_SIZE == 64 */
-
  #endif /* __aarch64__ */

  #if defined (__alpha)&&  W_TYPE_SIZE == 64





[PING] Re: [PATCH, AArch64] Use llfloor and llceil for vcvtmd_s64_f64 and vcvtpd_s64_f64 in arm_neon.h

2014-01-14 Thread Yufeng Zhang

Ping~

Originally posted here:
http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00185.html

Thanks,
Yufeng

On 01/06/14 12:30, Yufeng Zhang wrote:

This patch fixes the implementation of vcvtmd_s64_f64 and vcvtpd_s64_f64
in arm_neon.h to use llfloor and llceil instead, which are ILP32-friendly.

This patch will fix the following test failure in the ILP32 mode:

FAIL: gcc.target/aarch64/vect-vcvt.c scan-assembler fcvtms\\tx[0-9]+,
d[0-9]+

OK for the trunk?

Thanks,
Yufeng

gcc/

* config/aarch64/aarch64-builtins.c
(aarch64_builtin_vectorized_function): Add BUILT_IN_LFLOORF,
BUILT_IN_LLFLOOR, BUILT_IN_LCEILF and BUILT_IN_LLCEIL.
* config/aarch64/arm_neon.h (vcvtaq_u64_f64): Call __builtin_llfloor
instead of __builtin_lfloor.
(vcvtnq_u64_f64): Call __builtin_llceil instead of __builtin_lceil.





Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-12 Thread Yufeng Zhang
 patch improves the original one 
by reducing the number of changes to the existing framework, e.g. 
leaving find_basis_for_base_expr unchanged.  While it still slightly 
modifies the interfaces (find_basis_for_candidate and 
record_potential_basis), it has advantage over the 1st patch attached 
here: its impact on the code-gen is much smaller, as it enables more 
ARRAY_REFs to be lowered without handing over the underlying tree 
expression to replace_ref.  It creates the following dependency chains 
for the aforementioned example:


  i1 --> i2  (base expr is an SSA_NAME defined as (a2 + i * 200))
  i1 --> i2 --> i3  (base expr is a tree expression of (a2 + i * 200))

While they look the same as what the 1st patch does, only one candidate 
is generated for each memory accessing gimple statement; some candiates 
are chained twice, once to a cand_chain with a base_expr of an SSA_NAME 
and the other to a cand_chain with the underlying tree expression as its 
base_expr.  In other words, it produces two different dependency graphs 
without creating different interpretations, by utilizing the existing 
framework of cand_chain and find_basis_for_base_expr.


The patch passes the bootstrapping on arm and x86_64, as well as regtest 
on x86_64.  The following is the changelog entry:


gcc/

* gimple-ssa-strength-reduction.c: Include tree-affine.h.
(name_expansions): New static variable.
(alt_base_map): Ditto.
(get_alternative_base): New function.
(find_basis_for_candidate): For CAND_REF, optionally call
find_basis_for_base_expr with the returned value from
get_alternative_base.
(record_potential_basis): Add new parameter 'base' of type 'tree';
return if base == NULL; use base to set node->base_expr.
(alloc_cand_and_find_basis): Update; call 
record_potential_basis for

CAND_REF with the returned value from get_alternative_base.
(execute_strength_reduction): Call pointer_map_create for 
alt_base_map;

call free_affine_expand_cache with &name_expansions.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-41.c: New test.


Which patch do you like more?

If you have any question on either of the patch, please let me know.

Regards,
Yufeng


On 11/11/13 17:09, Bill Schmidt wrote:

Hi Yufeng,

The idea is a good one but I don't like your implementation of adding an
extra expression parameter to look at on the find_basis_for_candidate
lookup.  This goes against the design of the pass and may not be
sufficiently general (might there be situations where a third possible
basis could exist?).

The overall design is set up to have alternate interpretations of
candidates in the candidate table to handle this sort of ambiguity.  The
goal for your example is create a second candidate (chained to the first
one by way of the next_interp field) so that the candidate table looks
like this:

8  [2] *_10[j_7(D)] = 2;
   REF  : _10 + ((sizetype) j_7(D) * 4) + 0 : int[20] *
   basis: 0  dependent: 0  sibling: 0
   next-interp: 9  dead-savings: 0

9  [2] *_10[j_7(D)] = 2;
   REF  : _5 + ((sizetype) j_7(D) * 4) + 800 : int[20] *
   basis: 5  dependent: 0  sibling: 0
   next-interp: 0  dead-savings: 0

This will in turn allow subsequent candidates to be seen in terms of
either _5 or _10, which may be necessary to avoid missed opportunities.
There may be a subsequent REF _15 +... that can be an affine expression
of either of these, for example.

If you fail to find a basis for a candidate with its first
interpretation, you can then follow the next-interp chain to look for a
basis for the next one, without the messy passing of extra possibilities
to the find-basis routine.

I haven't read the patch in detail, but I think this should give you
enough to work with to re-design the idea to fit better with the
existing framework.  Please let me know if you need more information, or
if you feel I've misunderstood something.

Thanks,
Bill

On Mon, 2013-11-04 at 18:41 +, Yufeng Zhang wrote:

Hi,

This patch extends the slsr pass to optionally use an alternative base
expression in finding basis for CAND_REFs.  Currently the pass uses
hash-based algorithm to match the base_expr in a candidate.  Given a
test case like the following, slsr will not be able to recognize the two
CAND_REFs have the same basis, as their base_expr are of different
SSA_NAMEs:

typedef int arr_2[20][20];

void foo (arr_2 a2, int i, int j)
{
a2[i][j] = 1;
a2[i + 10][j] = 2;
}

The gimple dump before slsr is like the following (using an
arm-none-eabi gcc):

i.0_2 = (unsigned int) i_1(D);
_3 = i.0_2 * 80;
_5 = a2_4(D) + _3;
*_5[j_7(D)] = 1;<
_9 = _3 + 800;
_10 = a2_4(D) + _9;
*_10[j_7(D)] = 2;<

Here are the dumps for the two CAND_REFs generated for the two
statements pointed by the arrows:


4  [2] _5 = a2_4(D) + _3;
   ADD  : a2_4(D) + (80 * i

Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-13 Thread Yufeng Zhang

Hi Bill,

On 11/13/13 18:04, Bill Schmidt wrote:

Hi Yufeng,

On Tue, 2013-11-12 at 22:34 +, Yufeng Zhang wrote:

Hi Bill,

Many thanks for the review.

I find your suggestion on using the next_interp field quite
enlightening.  I prepared a patch which adds changes without modifying
the framework.  With the patch, the slsr pass now tries to create a
second candidate for each memory accessing gimple statement, and chain
it to the first one via the next_interp field.

There are two implications in this approach though:

1) For each memory accessing gimple statement, there can be two
candidates, and these two candidates can be part of different dependency
graphs respectively (based on different base expr).  Only one of the
dependency graph should be traversed to do replace_refs.  Most of the
changes in the patch is to handle this implication.

I am aware that you suggest to follow the next-interp chain only when
the searching fails for the first interpretation.  However, that doesn't
work very well, as it can result in worse code-gen.  Taking a varied
form of the added test slsr-41.c for example:

i1:  a2 [i] [j] = 1;
i2:  a2 [i] [j+1] = 2;
i3:  a2 [i+20] [j] = i;

With the 2nd interpretation created conditionally, the following two
dependency chains will be established:

i1 -->  i2  (base expr is an SSA_NAME defined as (a2 + i * 200))
i1 -->  i3  (base expr is a tree expression of (a2 + i * 200))


So it seems to me that really what needs to happen is to unify those two
base_exprs.  We don't currently have logic in this pass to look up an
SSA name based on {base, index, stride, cand_type}, but that could be
done with a hash table.  For now to save processing time it would make
sense to only do that for MEM candidates, though the cand_type should be
included in the hash to allow this to be used for other candidate types
if necessary.  Of course, the SSA name definition must dominate the
candidate to be eligible as a basis, and that should be checked, but
this should generally be the case.


I'm not quite sure if the SSA_NAME look-up works; maybe I haven't fully 
understood what you suggest.


For i1 --> i3, the base_expr is the tree expression (a2 + i * 200), 
which is the result of a sequence of operations (conversion to affine, 
immediate offset removal and conversion to tree), with another SSA_NAME 
as the input.  In other words, there are two SSA_NAMEs involved in the 
example:


  _s1: (a2 + i * 200).
  _s2: (a2 + (i * 200 + 4000))

their strides and indexes are different.

I guess what you suggest is that given the tree expression (a2 + i * 
200), look up an SSA_NAME and return _s1.  If that is the case, the 
challenge will be how to analyze the tree expression and get the 
information on its {base, index, stride, cand_type}.  While it would be 
too specific and narrative to check for a POINTER_PLUS_EXPR expression, 
the existing framework (e.g. create_add_ssa_cand) seems to assume that 
the analyzed tree represent a genuine gimple statement.


Moreover, there may not be an SSA_NAME exists, for example in the 
following case:


  i1:  a2 [i+1] [j] = 1;
  i2:  a2 [i+1] [j+1] = 2;
  i3:  a2 [i+20] [j] = i;

you wouldn't be able to find an SSA_NAME for (a2 + i * 200).

[snip]

A couple of quick comments on the next_interp patch:

  * You don't need num_of_dependents ().  You should be able to add a
forward declaration for count_candidates () and use it.


Missed count_candidates (); thanks!


  * Your new test case is missing a final newline, so your patch doesn't
apply cleanly.


I'll fix it.


Please look into unifying the base expressions, as I believe you should
not need the preferred_ref_cand logic if you do that.


I would also like to live without preferred_ref_cand if feasible . :)


I still prefer the approach of using next_interp for its generality and
expandibility.


Sure; this approach indeed fit the framework better.


Regards,
Yufeng



Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-13 Thread Yufeng Zhang

Hi Bill,

On 11/13/13 20:54, Bill Schmidt wrote:

Hi Yufeng,

The second version of your original patch is ok with me with the
following changes.  Sorry for the little side adventure into the
next-interp logic; in the end that's going to hurt more than it helps in
this case.  Thanks for having a look at it, anyway.  Thanks also for
cleaning up this version to be less intrusive to common interfaces; I
appreciate it.


Thanks a lot for the review.  I've attached an updated patch with the 
suggested changes incorporated.


For the next-interp adventure, I was quite happy to do the experiment; 
it's a good chance of gaining insight into the pass.  Many thanks for 
your prompt replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final approval,
as I'm not a maintainer.


Hi Richard, would you be happy to OK the patch?

Regards,
Yufeng

gcc/

* gimple-ssa-strength-reduction.c: Include tree-affine.h.
(name_expansions): New static variable.
(alt_base_map): Ditto.
(get_alternative_base): New function.
(find_basis_for_candidate): For CAND_REF, optionally call
find_basis_for_base_expr with the returned value from
get_alternative_base.
(record_potential_basis): Add new parameter 'base' of type 'tree';
add an assertion of non-NULL base; use base to set node->base_expr.
(alloc_cand_and_find_basis): Update; call record_potential_basis
for CAND_REF with the returned value from get_alternative_base.
(execute_strength_reduction): Call pointer_map_create for
alt_base_map; call free_affine_expand_cache with &name_expansions.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-41.c: New test.diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index 88afc91..26502c3 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "hash-table.h"
 #include "tree-ssa-address.h"
+#include "tree-affine.h"
 
 /* Information about a strength reduction candidate.  Each statement
in the candidate table represents an expression of one of the
@@ -420,6 +421,42 @@ cand_chain_hasher::equal (const value_type *chain1, const 
compare_type *chain2)
 /* Hash table embodying a mapping from base exprs to chains of candidates.  */
 static hash_table  base_cand_map;
 
+/* Pointer map used by tree_to_aff_combination_expand.  */
+static struct pointer_map_t *name_expansions;
+/* Pointer map embodying a mapping from bases to alternative bases.  */
+static struct pointer_map_t *alt_base_map;
+
+/* Given BASE, use the tree affine combiniation facilities to
+   find the underlying tree expression for BASE, with any
+   immediate offset excluded.  */
+
+static tree
+get_alternative_base (tree base)
+{
+  tree *result = (tree *) pointer_map_contains (alt_base_map, base);
+
+  if (result == NULL)
+{
+  tree expr;
+  aff_tree aff;
+
+  tree_to_aff_combination_expand (base, TREE_TYPE (base),
+ &aff, &name_expansions);
+  aff.offset = tree_to_double_int (integer_zero_node);
+  expr = aff_combination_to_tree (&aff);
+
+  result = (tree *) pointer_map_insert (alt_base_map, base);
+  gcc_assert (!*result);
+
+  if (expr == base)
+   *result = NULL;
+  else
+   *result = expr;
+}
+
+  return *result;
+}
+
 /* Look in the candidate table for a CAND_PHI that defines BASE and
return it if found; otherwise return NULL.  */
 
@@ -440,8 +477,9 @@ find_phi_def (tree base)
 }
 
 /* Helper routine for find_basis_for_candidate.  May be called twice:
-   once for the candidate's base expr, and optionally again for the
-   candidate's phi definition.  */
+   once for the candidate's base expr, and optionally again either for
+   the candidate's phi definition or for a CAND_REF's alternative base
+   expression.  */
 
 static slsr_cand_t
 find_basis_for_base_expr (slsr_cand_t c, tree base_expr)
@@ -518,6 +556,13 @@ find_basis_for_candidate (slsr_cand_t c)
}
 }
 
+  if (!basis && c->kind == CAND_REF)
+{
+  tree alt_base_expr = get_alternative_base (c->base_expr);
+  if (alt_base_expr)
+   basis = find_basis_for_base_expr (c, alt_base_expr);
+}
+
   if (basis)
 {
   c->sibling = basis->dependent;
@@ -528,17 +573,21 @@ find_basis_for_candidate (slsr_cand_t c)
   return 0;
 }
 
-/* Record a mapping from the base expression of C to C itself, indicating that
-   C may potentially serve as a basis using that base expression.  */
+/* Record a mapping from BASE to C, indicating that C may potentially serve
+   as a basis using that base expression.  BASE may be the same as
+   C->BASE_EXPR; alternatively BASE can be a different tree that share the
+   underlining expression of C->BASE_EXPR.  */
 
 static void
-record_potential_basis (slsr_cand_t c)
+recor

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-19 Thread Yufeng Zhang

Hi Richard,

Can I get an approval or some feedback from you about the patch?

Regards,
Yufeng

On 11/13/13 23:25, Yufeng Zhang wrote:

On 11/13/13 20:54, Bill Schmidt wrote:

Hi Yufeng,

The second version of your original patch is ok with me with the
following changes.


Thanks a lot for the review.  I've attached an updated patch with the
suggested changes incorporated.


Everything else looks OK to me.  Please ask Richard for final approval,
as I'm not a maintainer.


Hi Richard, would you be happy to OK the patch?

Regards,
Yufeng

gcc/

* gimple-ssa-strength-reduction.c: Include tree-affine.h.
(name_expansions): New static variable.
(alt_base_map): Ditto.
(get_alternative_base): New function.
(find_basis_for_candidate): For CAND_REF, optionally call
find_basis_for_base_expr with the returned value from
get_alternative_base.
(record_potential_basis): Add new parameter 'base' of type 'tree';
add an assertion of non-NULL base; use base to set node->base_expr.
(alloc_cand_and_find_basis): Update; call record_potential_basis
for CAND_REF with the returned value from get_alternative_base.
(execute_strength_reduction): Call pointer_map_create for
alt_base_map; call free_affine_expand_cache with&name_expansions.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-41.c: New test.





[PATCH] Defer address legitimization for expanded ARRAY_REF, COMPONENT_REF, etc. til the final address is computed

2013-11-22 Thread Yufeng Zhang

Hi,

Currently the address legitimization (by calling 
memory_address_addr_space) is carried out twice during the RTL expansion 
of ARRAY_REF, COMPONENT_REF, etc. when their OFFSET is not NULL.  It is 
done once for the BASE and once for the summed address in 
offset_address. This may cause part, if not all, of the generated BASE 
RTL to be forced into reg(s), preventing the RTL generator from carrying 
out effective re-association across BASE and OFFSET (via 
simplify_gen_binary).


For example, given the following test case:

typedef int arr_2[20][20];
void foo (arr_2 a2, int i, int j)
{
  a2[i+10][j] = 1;
}

the RTL code for the BASE (i.e. a2[i+10]) on arm (-mcpu=cortex-a15, 
-mthumb) is


* before the legitimization of BASE:

(plus:SI (plus:SI (mult:SI (reg/v:SI 115 [ i ])
(const_int 80 [0x50]))
(reg/v/f:SI 114 [ a2 ]))
(const_int 800 [0x320]))
(reg/f:SI 122)

* after the legitimization of BASE:

(reg/f:SI 122)

"Thanks to" the initial legitimization, the RTL for the final address is 
turned into:


(plus:SI (mult:SI (reg/v:SI 116 [ j ])
(const_int 4 [0x4]))
(reg/f:SI 122))

while with the legitimization deferred, the RTL for the final address 
could be:


(plus:SI (plus:SI (plus:SI (mult:SI (reg/v:SI 115 [ i ])
(const_int 80 [0x50]))
(mult:SI (reg/v:SI 116 [ j ])
(const_int 4 [0x4])))
(reg/v/f:SI 114 [ a2 ]))
(const_int 800 [0x320]))

which has more complete info in the RTL and is much more canonicalized; 
later on it could open up more opportunities for CSE.


The effect of this duplicated legitimization effort varies across 
different targets, as it is strongly related to the available addressing 
modes on a target.  On RISC machines where in general there are fewer 
addressing modes (which are in general less complicated as well), the 
RTL code quality can be affected more adversely.


The patch passes bootstrapping on arm and x86_64 and regtest on 
arm-none-eabi, aarch64-none-elf and x86_64.  There is no regression in 
spec2000 on arm or x86_64.


OK for the mainline?

Thanks,
Yufeng


gcc/

* cfgexpand.c (expand_call_stmt): Update the call to 
expand_expr_real_1.
* expr.c (expand_assignment): Add new local variable validate_p 
and set

it; call expand_expr or expand_expr_nv depending on validate_p for
to_rtx; call adjust_address_1 instead of adjust_address.
(store_expr): Update the call to expand_expr_real.
(expand_expr_real): Add new parameter 'validate_p'; update the 
call to

expand_expr_real_1.
(expand_expr_real_1): Add new parameter 'validate_p'; update 
the call

to expand_expr_real; depending on validate_p, call
memory_address_addr_space or convert_memory_address_addr_space;
likewise for expand_expr or expand_expr_nv; call adjust_address_1
instead of adjust_address.
* expr.h (expand_expr_real): Update.
(expand_expr_real_1): Update.
(expand_expr): Update.
(expand_expr_nv): New function.
(expand_normal): Update.diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index c312c37..e01c7ff 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2189,7 +2189,7 @@ expand_call_stmt (gimple stmt)
   if (lhs)
 expand_assignment (lhs, exp, false);
   else
-expand_expr_real_1 (exp, const0_rtx, VOIDmode, EXPAND_NORMAL, NULL);
+expand_expr_real_1 (exp, const0_rtx, VOIDmode, EXPAND_NORMAL, NULL, true);
 
   mark_transaction_restart_calls (stmt);
 }
diff --git a/gcc/expr.c b/gcc/expr.c
index 89e3979..90a2405 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4714,6 +4714,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
   int unsignedp;
   int volatilep = 0;
   tree tem;
+  bool validate_p;
 
   push_temp_slots ();
   tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
@@ -4723,7 +4724,12 @@ expand_assignment (tree to, tree from, bool nontemporal)
  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
get_bit_range (&bitregion_start, &bitregion_end, to, &bitpos, &offset);
 
-  to_rtx = expand_expr (tem, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  /* If OFFSET is not NULL, we defer the address legitimization til
+the moment that the base and offset have been both expanded and
+summed up.  */
+  validate_p = offset == NULL;
+  to_rtx = ((validate_p ? expand_expr : expand_expr_nv)
+   (tem, NULL_RTX, VOIDmode, EXPAND_WRITE));
 
   /* If the bitfield is volatile, we want to access it in the
 field's mode, not the computed mode.
@@ -4734,9 +4740,9 @@ expand_assignment (tree to, tree from, bool nontemporal)
  if (volatilep && flag_strict_volatile_bitfields > 0)
to_rtx = adjust_address (to_rtx, mode1, 0);
  else if (GET_MODE (to_rtx) == VOIDmode)
-   to_rtx = adjust_address (to_rtx, BLKmode, 0);
+   to_rtx = adjust_address_1 (to_rtx, BL

Re: [PATCH] Defer address legitimization for expanded ARRAY_REF, COMPONENT_REF, etc. til the final address is computed

2013-11-22 Thread Yufeng Zhang
Thanks for the feedback, Richard.  I'll do some experiment to see if I 
can get the post-expansion validation work.


Regards,
Yufeng

On 11/22/13 13:48, Richard Biener wrote:

On Fri, Nov 22, 2013 at 2:00 PM, Yufeng Zhang  wrote:

Hi,

Currently the address legitimization (by calling memory_address_addr_space)
is carried out twice during the RTL expansion of ARRAY_REF, COMPONENT_REF,
etc. when their OFFSET is not NULL.  It is done once for the BASE and once
for the summed address in offset_address. This may cause part, if not all,
of the generated BASE RTL to be forced into reg(s), preventing the RTL
generator from carrying out effective re-association across BASE and OFFSET
(via simplify_gen_binary).

For example, given the following test case:

typedef int arr_2[20][20];
void foo (arr_2 a2, int i, int j)
{
   a2[i+10][j] = 1;
}

the RTL code for the BASE (i.e. a2[i+10]) on arm (-mcpu=cortex-a15, -mthumb)
is


TER makes us see *(a2_5(D) + i * 80 + 800)[j_8] here


* before the legitimization of BASE:

(plus:SI (plus:SI (mult:SI (reg/v:SI 115 [ i ])
 (const_int 80 [0x50]))
 (reg/v/f:SI 114 [ a2 ]))
 (const_int 800 [0x320]))
(reg/f:SI 122)

* after the legitimization of BASE:

(reg/f:SI 122)

"Thanks to" the initial legitimization, the RTL for the final address is
turned into:

(plus:SI (mult:SI (reg/v:SI 116 [ j ])
 (const_int 4 [0x4]))
 (reg/f:SI 122))

while with the legitimization deferred, the RTL for the final address could
be:

(plus:SI (plus:SI (plus:SI (mult:SI (reg/v:SI 115 [ i ])
 (const_int 80 [0x50]))
 (mult:SI (reg/v:SI 116 [ j ])
 (const_int 4 [0x4])))
 (reg/v/f:SI 114 [ a2 ]))
 (const_int 800 [0x320]))

which has more complete info in the RTL and is much more canonicalized;
later on it could open up more opportunities for CSE.

The effect of this duplicated legitimization effort varies across different
targets, as it is strongly related to the available addressing modes on a
target.  On RISC machines where in general there are fewer addressing modes
(which are in general less complicated as well), the RTL code quality can be
affected more adversely.

The patch passes bootstrapping on arm and x86_64 and regtest on
arm-none-eabi, aarch64-none-elf and x86_64.  There is no regression in
spec2000 on arm or x86_64.

OK for the mainline?


But this patch makes the path through expansion even harder
to follow ... :/

I wonder if something along a "validate pass" after expansion
would be a better solution (and thus never validate during expansion
itself).

That is, expand is a beast - don't add to it, try to simplify it instead ;)

Thanks,
Richard.


Thanks,
Yufeng


gcc/

 * cfgexpand.c (expand_call_stmt): Update the call to
expand_expr_real_1.
 * expr.c (expand_assignment): Add new local variable validate_p and
set
 it; call expand_expr or expand_expr_nv depending on validate_p for
 to_rtx; call adjust_address_1 instead of adjust_address.
 (store_expr): Update the call to expand_expr_real.
 (expand_expr_real): Add new parameter 'validate_p'; update the call
to
 expand_expr_real_1.
 (expand_expr_real_1): Add new parameter 'validate_p'; update the
call
 to expand_expr_real; depending on validate_p, call
 memory_address_addr_space or convert_memory_address_addr_space;
 likewise for expand_expr or expand_expr_nv; call adjust_address_1
 instead of adjust_address.
 * expr.h (expand_expr_real): Update.
 (expand_expr_real_1): Update.
 (expand_expr): Update.
 (expand_expr_nv): New function.
 (expand_normal): Update.







Re: [PING^2] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-26 Thread Yufeng Zhang

Ping^2

The patch was posted here:

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01523.html

Thanks,
Yufeng

On 11/19/13 11:45, Yufeng Zhang wrote:

Hi Richard,

Can I get an approval or some feedback from you about the patch?

Regards,
Yufeng

On 11/13/13 23:25, Yufeng Zhang wrote:

On 11/13/13 20:54, Bill Schmidt wrote:

Hi Yufeng,

The second version of your original patch is ok with me with the
following changes.


Thanks a lot for the review.  I've attached an updated patch with the
suggested changes incorporated.


Everything else looks OK to me.  Please ask Richard for final approval,
as I'm not a maintainer.


Hi Richard, would you be happy to OK the patch?

Regards,
Yufeng

gcc/

* gimple-ssa-strength-reduction.c: Include tree-affine.h.
(name_expansions): New static variable.
(alt_base_map): Ditto.
(get_alternative_base): New function.
(find_basis_for_candidate): For CAND_REF, optionally call
find_basis_for_base_expr with the returned value from
get_alternative_base.
(record_potential_basis): Add new parameter 'base' of type 'tree';
add an assertion of non-NULL base; use base to set node->base_expr.
(alloc_cand_and_find_basis): Update; call record_potential_basis
for CAND_REF with the returned value from get_alternative_base.
(execute_strength_reduction): Call pointer_map_create for
alt_base_map; call free_affine_expand_cache with&name_expansions.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-41.c: New test.









[PATCH, ARM] Change arm_legitimize_address not to force an addend CONST_INT into REG

2013-11-26 Thread Yufeng Zhang

Hi,

arm_legitimize_address forces immediates in PLUS to be in REG for no 
good reason.  This patch changes it not to do this.


With the immediate constants directly available in the RTL, it helps the 
expand more effectively to fold and re-associate the immediates.


The change also helps the following if condition in 
arm_legitimize_address to have chance to be true:


  if (ARM_BASE_REGISTER_RTX_P (xop0)
  && CONST_INT_P (xop1))
{

The -fdump-rtl-expand diff for the added test case is:

 ;; Generating RTL for gimple basic block 2
@@ -32,53 +32,50 @@
  (nil))
 (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
 (insn 7 4 8 2 (set (reg:SI 116)
-(const_int 4000 [0xfa0]))
- (nil))
-(insn 8 7 9 2 (set (reg:SI 117)
 (reg/v:SI 115 [ i ]))
  (nil))
[snip]
-(insn 16 15 17 2 (set (reg/f:SI 124)
-(plus:SI (reg:SI 123)
-(reg:SI 116)))
+(insn 15 14 16 2 (set (reg/f:SI 123)
+(plus:SI (reg:SI 122)
+(const_int 4000 [0xfa0])))
  (nil))
-(insn 17 16 18 2 (set (reg:SI 125)
+(insn 16 15 17 2 (set (reg:SI 124)
 (const_int 1 [0x1]))
  (nil))
-(insn 18 17 0 2 (set (mem:SI (plus:SI (mult:SI (reg/v:SI 115 [ i ])
+(insn 17 16 0 2 (set (mem:SI (plus:SI (mult:SI (reg/v:SI 115 [ i ])
 (const_int 4 [0x4]))
-(reg/f:SI 124)) [2 *_6 S4 A32])
-(reg:SI 125))
+(reg/f:SI 123)) [2 *_6 S4 A32])
+(reg:SI 124))
  (nil))

Passed bootstrapping on cortex-a15 and regtest on arm-none-eabi.

OK for the trunk?

Thanks,
Yufeng


gcc/

* config/arm/arm.c (arm_legitimize_address): Check xop1 is not
a constant immediate before force_reg.

gcc/testsuite/

* gcc.target/arm/20131120.c: New test.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 212a4bc..d5d14e5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7120,7 +7120,8 @@ arm_legitimize_address (rtx x, rtx orig_x, enum 
machine_mode mode)
   if (CONSTANT_P (xop0) && !symbol_mentioned_p (xop0))
xop0 = force_reg (SImode, xop0);
 
-  if (CONSTANT_P (xop1) && !symbol_mentioned_p (xop1))
+  if (CONSTANT_P (xop1) && !CONST_INT_P (xop1)
+ && !symbol_mentioned_p (xop1))
xop1 = force_reg (SImode, xop1);
 
   if (ARM_BASE_REGISTER_RTX_P (xop0)
diff --git a/gcc/testsuite/gcc.target/arm/20131120.c 
b/gcc/testsuite/gcc.target/arm/20131120.c
new file mode 100644
index 000..c370ae6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/20131120.c
@@ -0,0 +1,14 @@
+/* Check that CONST_INT is not forced into REG before PLUS.  */
+/* { dg-do compile { target { arm_arm_ok || arm_thumb2_ok} } } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+
+typedef int Arr2[50][50];
+
+void
+foo (Arr2 a2, int i)
+{
+  a2[i+20][i] = 1;
+}
+
+/* { dg-final { scan-rtl-dump-not "\\\(set \\\(reg:SI \[0-9\]*\\\)\[\n\r\]+\[ 
\t]*\\\(const_int 4000" "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */

Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-26 Thread Yufeng Zhang

On 11/26/13 12:45, Richard Biener wrote:

On Thu, Nov 14, 2013 at 12:25 AM, Yufeng Zhang  wrote:

Hi Bill,


On 11/13/13 20:54, Bill Schmidt wrote:


Hi Yufeng,

The second version of your original patch is ok with me with the
following changes.  Sorry for the little side adventure into the
next-interp logic; in the end that's going to hurt more than it helps in
this case.  Thanks for having a look at it, anyway.  Thanks also for
cleaning up this version to be less intrusive to common interfaces; I
appreciate it.



Thanks a lot for the review.  I've attached an updated patch with the
suggested changes incorporated.

For the next-interp adventure, I was quite happy to do the experiment; it's
a good chance of gaining insight into the pass.  Many thanks for your prompt
replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final approval,
as I'm not a maintainer.



Hi Richard, would you be happy to OK the patch?


Hmm,

+static tree
+get_alternative_base (tree base)
+{
+  tree *result = (tree *) pointer_map_contains (alt_base_map, base);
+
+  if (result == NULL)
+{
+  tree expr;
+  aff_tree aff;
+
+  tree_to_aff_combination_expand (base, TREE_TYPE (base),
+&aff,&name_expansions);
+  aff.offset = tree_to_double_int (integer_zero_node);
+  expr = aff_combination_to_tree (&aff);
+
+  result = (tree *) pointer_map_insert (alt_base_map, base);
+  gcc_assert (!*result);

I believe this cache will never hit (unless you repeatedly ask for
the exact same statement?) - any non-trivial 'base' trees are
not shared and thus not pointer equivalent.


Yes, you are right about the non-trivial 'base' tree are rarely shared. 
 The cached is introduced mainly because get_alternative_base () may be 
called twice on the same 'base' tree, once in the 
find_basis_for_candidate () for look-up and the other time in 
alloc_cand_and_find_basis () for record_potential_basis ().  I'm happy 
to leave out the cache if you think the benefit is trivial.



Also using tree_to_aff_combination_expand to get at - what
exactly? The address with any constant offset stripped?
Where do you re-construct that offset?  That is, aff.offset,
which you definitely need to get at a candidate?


As explained in the previous RFC emails, the expanded and 
constant-offset-stripped base expr is only used for the purpose of basis 
look-up.  The corresponding candidate still has the unexpanded base expr 
as its 'base_expr', therefore the info on the constant offset is not 
lost and doesn't need to be re-constructed.



+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slsr" } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */

scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.


As the slsr optimizes CAND_REF candidates by simply lowering them to 
MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs 
is an effective check.  Alternatively, I can add a follow-up patch to 
add some dumping facility in replace_ref () to print out the replacing 
actions when -fdump-tree-slsr-details is on.


I hope these can address your concerns.


Regards,
Yufeng





Richard.


Regards,

Yufeng

gcc/

 * gimple-ssa-strength-reduction.c: Include tree-affine.h.
 (name_expansions): New static variable.
 (alt_base_map): Ditto.
 (get_alternative_base): New function.
 (find_basis_for_candidate): For CAND_REF, optionally call
 find_basis_for_base_expr with the returned value from
 get_alternative_base.
 (record_potential_basis): Add new parameter 'base' of type 'tree';
 add an assertion of non-NULL base; use base to set node->base_expr.

 (alloc_cand_and_find_basis): Update; call record_potential_basis
 for CAND_REF with the returned value from get_alternative_base.
 (execute_strength_reduction): Call pointer_map_create for
 alt_base_map; call free_affine_expand_cache with&name_expansions.

gcc/testsuite/

 * gcc.dg/tree-ssa/slsr-41.c: New test.







[PATCH, AArch64, PR 61483] builtin va_start incorrectly initializes the field of va_list for incoming unnamed arguments on the stack

2014-06-12 Thread Yufeng Zhang

Hi,

The patch fixes a bug in the AArch64 backend in calculating the 
beginning address of the unnamed incoming arguments on the stack, i.e. 
the initial value of __va_list->__stack.  aarch64_layout_arg incorrectly 
calculates the size of named arguments on stack using the number of 
registers needed as if there were enough registers available.  This is 
wrong, as for instance when passed in registers an HFA/HVA* argument 
takes as many SIMD registers as the number of its fields; when passed on 
the stack, however, it should be passed as what its storage layout is 
(rounded to the nearest multiple of 8 bytes).


The bug only affects builtin va_start, as it is other routines like 
aarch64_pad_arg_upward rather than aarch64_layout_arg which take care of 
the positioning of outgoing arguments on stack and the fetching of the 
incoming named arguments from stack.


The patch has passed bootstrapping.

OK for the trunk and 4.9.1 branch once the regtest passes as well?

Thanks,
Yufeng

* HFA: Homogeneous Floating-point Aggregate
  HVA: Homogeneous Short-Vector Aggregate


gcc/

PR target/61483
* config/aarch64/aarch64.c (aarch64_layout_arg): Add new local
variable 'size'; calculate 'size' right in the front; use
'size' to compute 'nregs' (when 'allocate_ncrn != 0') and
pcum->aapcs_stack_words.

gcc/testsuite/

PR target/61483
* gcc.target/aarch64/aapcs64/type-def.h (struct hfa_fx2_t): New type.
* gcc.target/aarch64/aapcs64/va_arg-13.c: New test.
* gcc.target/aarch64/aapcs64/va_arg-14.c: Ditto.
* gcc.target/aarch64/aapcs64/va_arg-15.c: Ditto.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fabd6a9..56a5a5d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1459,6 +1459,7 @@ aarch64_layout_arg (cumulative_args_t pcum_v, enum 
machine_mode mode,
   CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
   int ncrn, nvrn, nregs;
   bool allocate_ncrn, allocate_nvrn;
+  HOST_WIDE_INT size;
 
   /* We need to do this once per argument.  */
   if (pcum->aapcs_arg_processed)
@@ -1466,6 +1467,11 @@ aarch64_layout_arg (cumulative_args_t pcum_v, enum 
machine_mode mode,
 
   pcum->aapcs_arg_processed = true;
 
+  /* Size in bytes, rounded to the nearest multiple of 8 bytes.  */
+  size
+= AARCH64_ROUND_UP (type ? int_size_in_bytes (type) : GET_MODE_SIZE (mode),
+   UNITS_PER_WORD);
+
   allocate_ncrn = (type) ? !(FLOAT_TYPE_P (type)) : !FLOAT_MODE_P (mode);
   allocate_nvrn = aarch64_vfp_is_call_candidate (pcum_v,
 mode,
@@ -1516,9 +1522,7 @@ aarch64_layout_arg (cumulative_args_t pcum_v, enum 
machine_mode mode,
 }
 
   ncrn = pcum->aapcs_ncrn;
-  nregs = ((type ? int_size_in_bytes (type) : GET_MODE_SIZE (mode))
-  + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
-
+  nregs = size / UNITS_PER_WORD;
 
   /* C6 - C9.  though the sign and zero extension semantics are
  handled elsewhere.  This is the case where the argument fits
@@ -1567,13 +1571,12 @@ aarch64_layout_arg (cumulative_args_t pcum_v, enum 
machine_mode mode,
   pcum->aapcs_nextncrn = NUM_ARG_REGS;
 
   /* The argument is passed on stack; record the needed number of words for
- this argument (we can re-use NREGS) and align the total size if
- necessary.  */
+ this argument and align the total size if necessary.  */
 on_stack:
-  pcum->aapcs_stack_words = nregs;
+  pcum->aapcs_stack_words = size / UNITS_PER_WORD;
   if (aarch64_function_arg_alignment (mode, type) == 16 * BITS_PER_UNIT)
 pcum->aapcs_stack_size = AARCH64_ROUND_UP (pcum->aapcs_stack_size,
-  16 / UNITS_PER_WORD) + 1;
+  16 / UNITS_PER_WORD);
   return;
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/type-def.h 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/type-def.h
index a95d06a..07e56ff 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/type-def.h
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/type-def.h
@@ -34,6 +34,13 @@ struct hfa_fx2_t
   float b;
 };
 
+struct hfa_fx3_t
+{
+  float a;
+  float b;
+  float c;
+};
+
 struct hfa_dx2_t
 {
   double a;
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-13.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-13.c
new file mode 100644
index 000..27c4099
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-13.c
@@ -0,0 +1,53 @@
+/* Test AAPCS64 layout and __builtin_va_start.
+
+   Pass named HFA/HVA argument on stack.  */
+
+/* { dg-do run { target aarch64*-*-* } } */
+
+#ifndef IN_FRAMEWORK
+#define AAPCS64_TEST_STDARG
+#define TESTFILE "va_arg-13.c"
+
+struct float_float_t
+{
+  float a;
+  float b;
+} float_float;
+
+union float_int_t
+{
+  float b8;
+  int b5;
+} float_int;
+
+#define HAS_DATA_INIT_FUNC
+void
+init_data ()
+{
+  float_float.a = 1.2f;
+  float_float.b = 2.2f;
+
+  float_int.

[PATCH, Testsuite, AArch64] Make Function Return Value Test More Robust

2014-06-18 Thread Yufeng Zhang

Hi,

This improves the robustness of the aapcs64 test framework for testing 
function return ABI rules.  It ensures the test facility functions now 
able to see the exact content of return registers right at the moment 
when a function returns.


OK for trunk?

Thanks,
Yufeng

gcc/testsuite

Make the AAPCS64 function return tests more robust.

* gcc.target/aarch64/aapcs64/abitest-2.h (saved_return_address): New
global variable.
(FUNC_VAL_CHECK): Update to call myfunc via the 'ret' instruction,
instead of calling sequentially in the C code.
* gcc.target/aarch64/aapcs64/abitest.S (LABEL_TEST_FUNC_RETURN): Store
saved_return_address to the stack frame where LR register was stored.
(saved_return_address): Declare weak.diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h
index c56e7cc..c87fe9b 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h
@@ -5,6 +5,7 @@
 #include "validate_memory.h"
 
 void (*testfunc_ptr)(char* stack);
+unsigned long long saved_return_address;
 
 /* Helper macros to generate function name.  Example of the function name:
func_return_val_1.  */
@@ -71,6 +72,17 @@ __attribute__ ((noinline)) type FUNC_NAME (id) (int i, 
double d, type t)  \
  optimized away.  Using i and d prevents \
  warnings about unused parameters.   \
   */ \
+/* We save and set up the LR register in a way that essentially  \
+   inserts myfunc () between the returning of this function and the
  \
+   continueous execution of its caller.  By doing this, myfunc ()\
+   can save and check the exact content of the registers that are\
+   used forthe function return value.  
  \
+   The previous approach of sequentially calling myfunc right after
  \
+   this function does not guarantee myfunc see the exact register\
+   content, as compiler mayemit code in between the two calls, 
  \
+   especially during the -O0 codegen.  */\
+asm volatile ("mov %0, x30" : "=r" (saved_return_address));
  \
+asm volatile ("mov x30, %0" : : "r" ((unsigned long long) myfunc));   \
 return t;\
   }
 #include TESTFILE
@@ -84,7 +96,8 @@ __attribute__ ((noinline)) type FUNC_NAME (id) (int i, double 
d, type t)  \
   {\
 testfunc_ptr = TEST_FUNC_NAME(id); \
 FUNC_NAME(id) (0, 0.0, var);   \
-myfunc (); \
+/* The above function implicitly calls myfunc () on its return,\
+   and the execution resumes from here after myfunc () finishes.  */\
   }
 
 int main()
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S
index 86ce7be..68845fb 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S
@@ -50,6 +50,10 @@ LABEL_TEST_FUNC_RETURN:
   add  x9, x9, :lo12:testfunc_ptr
   ldr  x9, [x9, #0]
   blr  x9  // function return value test
+  adrp x9, saved_return_address
+  add  x9, x9, :lo12:saved_return_address
+  ldr  x9, [x9, #0]
+  str  x9, [sp, #8]// Update the copy of LR reg 
saved on stack
 LABEL_RET:
   ldp  x0, x30, [sp]
   mov  sp, x0
@@ -57,3 +61,4 @@ LABEL_RET:
 
 .weak  testfunc
 .weak  testfunc_ptr
+.weak  saved_return_address

[PATCH, ARM] Improve code-gen for multiple shifted accumulations in array indexing

2014-06-18 Thread Yufeng Zhang

Hi,

This patch improves the code-gen of -marm in the case of two-dimensional 
array access.


Given the following code:

typedef struct { int x,y,a,b; } X;

int
f7a(X p[][4], int x, int y)
{
  return p[x][y].a;
}

The code-gen on -O2 -marm -mcpu=cortex-a15 is currently

mov r2, r2, asl #4
add r1, r2, r1, asl #6
add r0, r0, r1
ldr r0, [r0, #8]
bx  lr

With the patch, we'll get:

add r1, r0, r1, lsl #6
add r2, r1, r2, lsl #4
ldr r0, [r2, #8]
bx  lr

The -mthumb code-gen had been OK.

The patch has passed the bootstrapping on cortex-a15 and the 
arm-none-eabi regtest, with no code-gen difference in spec2k 
(unfortunately).


OK for the trunk?

Thanks,
Yufeng

gcc/

* config/arm/arm.c (arm_reassoc_shifts_in_address): New declaration
and new function.
(arm_legitimize_address): Call the new functions.
(thumb_legitimize_address): Prefix the declaration with static.

gcc/testsuite/

* gcc.target/arm/shifted-add-1.c: New test.
* gcc.target/arm/shifted-add-2.c: Ditto.



Re: [PATCH, ARM] Improve code-gen for multiple shifted accumulations in array indexing

2014-06-18 Thread Yufeng Zhang

This time with patch... Apologize.

Yufeng

On 06/18/14 17:31, Yufeng Zhang wrote:

Hi,

This patch improves the code-gen of -marm in the case of two-dimensional
array access.

Given the following code:

typedef struct { int x,y,a,b; } X;

int
f7a(X p[][4], int x, int y)
{
return p[x][y].a;
}

The code-gen on -O2 -marm -mcpu=cortex-a15 is currently

  mov r2, r2, asl #4
  add r1, r2, r1, asl #6
  add r0, r0, r1
  ldr r0, [r0, #8]
  bx  lr

With the patch, we'll get:

  add r1, r0, r1, lsl #6
  add r2, r1, r2, lsl #4
  ldr r0, [r2, #8]
  bx  lr

The -mthumb code-gen had been OK.

The patch has passed the bootstrapping on cortex-a15 and the
arm-none-eabi regtest, with no code-gen difference in spec2k
(unfortunately).

OK for the trunk?

Thanks,
Yufeng

gcc/

* config/arm/arm.c (arm_reassoc_shifts_in_address): New declaration
and new function.
(arm_legitimize_address): Call the new functions.
(thumb_legitimize_address): Prefix the declaration with static.

gcc/testsuite/

* gcc.target/arm/shifted-add-1.c: New test.
* gcc.target/arm/shifted-add-2.c: Ditto.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 16fc7ed..281c96a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -88,6 +88,7 @@ static int thumb1_base_register_rtx_p (rtx, enum 
machine_mode, int);
 static rtx arm_legitimize_address (rtx, rtx, enum machine_mode);
 static reg_class_t arm_preferred_reload_class (rtx, reg_class_t);
 static rtx thumb_legitimize_address (rtx, rtx, enum machine_mode);
+static void arm_reassoc_shifts_in_address (rtx);
 inline static int thumb1_index_register_rtx_p (rtx, int);
 static bool arm_legitimate_address_p (enum machine_mode, rtx, bool);
 static int thumb_far_jump_used_p (void);
@@ -7501,7 +7502,8 @@ arm_legitimize_address (rtx x, rtx orig_x, enum 
machine_mode mode)
 {
   /* TODO: legitimize_address for Thumb2.  */
   if (TARGET_THUMB2)
-return x;
+   return x;
+
   return thumb_legitimize_address (x, orig_x, mode);
 }
 
@@ -7551,6 +7553,9 @@ arm_legitimize_address (rtx x, rtx orig_x, enum 
machine_mode mode)
}
   else if (xop0 != XEXP (x, 0) || xop1 != XEXP (x, 1))
x = gen_rtx_PLUS (SImode, xop0, xop1);
+
+  if (GET_CODE (xop0) == PLUS)
+   arm_reassoc_shifts_in_address (xop0);
 }
 
   /* XXX We don't allow MINUS any more -- see comment in
@@ -7614,7 +7619,8 @@ arm_legitimize_address (rtx x, rtx orig_x, enum 
machine_mode mode)
 
 /* Try machine-dependent ways of modifying an illegitimate Thumb address
to be legitimate.  If we find one, return the new, valid address.  */
-rtx
+
+static rtx
 thumb_legitimize_address (rtx x, rtx orig_x, enum machine_mode mode)
 {
   if (GET_CODE (x) == PLUS
@@ -7679,6 +7685,47 @@ thumb_legitimize_address (rtx x, rtx orig_x, enum 
machine_mode mode)
   return x;
 }
 
+/* Transform
+ PLUS (PLUS (MULT1, MULT2), REG)
+   to
+ PLUS (PLUS (MULT1, REG), MULT2)
+   so that we can use two add (shifted register) instructions
+   to compute the expression.  Note that SHIFTs has already
+   been replaced with MULTs as a result of canonicalization.
+
+   This routine is to help undo the undesired canonicalization
+   that is done by simplify_gen_binary on addresses with
+   multiple shifts.  For example, it will help transform
+  (x << 6) + (y << 4) + p + 8
+   back to:
+  (x << 6) + p + (y << 4) + 8
+   where p is the start address of a two-dimensional array and
+   x and y are the indexes.  */
+
+static void
+arm_reassoc_shifts_in_address (rtx x)
+{
+  if (GET_CODE (x) == PLUS)
+{
+  rtx op0 = XEXP (x, 0);
+  rtx op1 = XEXP (x, 1);
+
+  if (GET_CODE (op0) == PLUS && REG_P (op1))
+   {
+ rtx xop0 = XEXP (op0, 0);
+ rtx xop1 = XEXP (op0, 1);
+
+ if (GET_CODE (xop0) == MULT && GET_CODE (xop1) == MULT
+ && power_of_two_operand (XEXP (xop0, 1), GET_MODE (xop0))
+ && power_of_two_operand (XEXP (xop1, 1), GET_MODE (xop1)))
+   {
+ XEXP (op0, 1) = op1;
+ XEXP (x, 1) = xop1;
+   }
+   }
+}
+}
+
 bool
 arm_legitimize_reload_address (rtx *p,
   enum machine_mode mode,
diff --git a/gcc/testsuite/gcc.target/arm/shifted-add-1.c 
b/gcc/testsuite/gcc.target/arm/shifted-add-1.c
new file mode 100644
index 000..8777fe4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/shifted-add-1.c
@@ -0,0 +1,47 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2" } */
+
+typedef struct { int x,y,a,b; } x;
+
+int
+f7a(x p[][4], int x, int y)
+{
+  return p[x][y].a;
+}
+
+/* { dg-final { object-size text <= 16 { target { { ! arm_thumb1 } && { ! 
arm_thumb2 } } } } } */
+/* { dg-final { object-size text <= 12 { target arm_thumb2

[PATCH, Testsuite, AArch64] Make aapcs64.exp Tests Big-Endian Friendly

2014-06-19 Thread Yufeng Zhang

Hi,

This patch updates a number of aapcs64 tests to make them big-endian 
friendly.  Changes are mainly:


* checking the W regs instead of X regs for integral arguments less than 
8 bytes

* correcting the corresponding stack location checks in big-endian mode

With this patch, make check-gcc RUNTESTFLAGS="aapcs64.exp" gives a clean 
result on aarch64_be-none-elf.


OK for trunk?

Thanks,
Yufeng

gcc/testsuite/

Make the tests big-endian friendly.
* gcc.target/aarch64/aapcs64/test_25.c: Update.
* gcc.target/aarch64/aapcs64/va_arg-1.c: Ditto.
* gcc.target/aarch64/aapcs64/va_arg-12.c: Ditto.
* gcc.target/aarch64/aapcs64/va_arg-2.c: Ditto.
* gcc.target/aarch64/aapcs64/va_arg-3.c: Ditto.
* gcc.target/aarch64/aapcs64/va_arg-4.c: Ditto.
* gcc.target/aarch64/aapcs64/va_arg-5.c: Ditto.
* gcc.target/aarch64/aapcs64/va_arg-6.c: Ditto.
* gcc.target/aarch64/aapcs64/va_arg-7.c: Ditto.diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_25.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_25.c
index 2f942ff..2febb79 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_25.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_25.c
@@ -42,20 +42,20 @@ void init_data ()
   s2.df[0] = 123.456;
   s2.df[1] = 234.567;
   s2.df[2] = 345.678;
-  s3.v[0] = (vf2_t){ 19.f, 20.f, 21.f, 22.f };
-  s3.v[1] = (vf2_t){ 23.f, 24.f, 25.f, 26.f };
-  s4.v[0] = (vf2_t){ 27.f, 28.f, 29.f, 30.f };
-  s4.v[1] = (vf2_t){ 31.f, 32.f, 33.f, 34.f };
-  s4.v[2] = (vf2_t){ 35.f, 36.f, 37.f, 38.f };
+  s3.v[0] = (vf2_t){ 19.f, 20.f };
+  s3.v[1] = (vf2_t){ 23.f, 24.f };
+  s4.v[0] = (vf2_t){ 27.f, 28.f };
+  s4.v[1] = (vf2_t){ 31.f, 32.f };
+  s4.v[2] = (vf2_t){ 35.f, 36.f };
 }
 
 #include "abitest.h"
 #else
-ARG_NONFLAT (struct x0, s0, Q0, f32in64)
+ARG (struct x0, s0, D0)
 ARG (struct x2, s2, D1)
 ARG (struct x1, s1, Q4)
 ARG (struct x3, s3, D5)
 ARG (struct x4, s4, STACK)
-ARG_NONFLAT (int, 0xdeadbeef, X0, i32in64)
+ARG (int, 0xdeadbeef, W0)
 LAST_ARG (double, 456.789, STACK+24)
 #endif
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-1.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-1.c
index 4eb569e..4fb9a03 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-1.c
@@ -30,14 +30,14 @@ void init_data ()
 
 #include "abitest.h"
 #else
-  ARG  ( int  , 0xff  ,X0, 
LAST_NAMED_ARG_ID)
+  ARG  ( int  , 0xff  ,W0, 
LAST_NAMED_ARG_ID)
   DOTS
-  ANON_PROMOTED(unsigned char , 0xfe  , unsigned int, 0xfe   , X1, 
  1)
-  ANON_PROMOTED(  signed char , sc,   signed int, sc_promoted, X2, 
  2)
-  ANON_PROMOTED(unsigned short, 0xdcba, unsigned int, 0xdcba , X3, 
  3)
-  ANON_PROMOTED(  signed short, ss,   signed int, ss_promoted, X4, 
  4)
-  ANON (unsigned int  , 0xdeadbeef,X5, 
  5)
-  ANON (  signed int  , 0xcafebabe,X6, 
  6)
+  ANON_PROMOTED(unsigned char , 0xfe  , unsigned int, 0xfe   , W1, 
  1)
+  ANON_PROMOTED(  signed char , sc,   signed int, sc_promoted, W2, 
  2)
+  ANON_PROMOTED(unsigned short, 0xdcba, unsigned int, 0xdcba , W3, 
  3)
+  ANON_PROMOTED(  signed short, ss,   signed int, ss_promoted, W4, 
  4)
+  ANON (unsigned int  , 0xdeadbeef,W5, 
  5)
+  ANON (  signed int  , 0xcafebabe,W6, 
  6)
   ANON (unsigned long long, 0xba98765432101234ULL, X7, 
  7)
   ANON (  signed long long, 0xa987654321012345LL , STACK,  
  8)
   ANON (  __int128, qword.i  , 
STACK+16, 9)
@@ -46,5 +46,9 @@ void init_data ()
   ANON (long double   , 98765432123456789.987654321L,  Q2, 
 12)
   ANON ( vf2_t, vf2   ,D3, 
 13)
   ANON ( vi4_t, vi4   ,Q4, 
 14)
+#ifndef __AAPCS64_BIG_ENDIAN__
   LAST_ANON( int  , 0x,
STACK+32,15)
+#else
+  LAST_ANON( int  , 0x,
STACK+36,15)
+#endif
 #endif
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-12.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-12.c
index a12ccfd..3eddaa2 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-12.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-12.c
@@ -45,16 +45,20 @@ void init_data ()
 #include "abitest.h"
 #else
   PTR(struct z, a, X0, 0)
-  ARG(int, 0xdeadbeef, X1, 1)
-  ARG(int, 0xcafebabe, X2, 2)
-  ARG(int, 0xdeadbabe, X3, 3)
-  ARG(int, 0xcafebeef, X4, 4)
-  ARG(int, 0xbeefdead, X5, 5)
-  ARG(int, 0xbabecafe, X6, LAST_NAMED_ARG_ID)
+  ARG(int

[PING] [PATCH, ARM] Improve code-gen for multiple shifted accumulations in array indexing

2014-06-24 Thread Yufeng Zhang

Ping~

Original posted here:

https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01492.html

Thanks,
Yufeng

On 06/18/14 17:35, Yufeng Zhang wrote:

This time with patch... Apologize.

Yufeng

On 06/18/14 17:31, Yufeng Zhang wrote:

Hi,

This patch improves the code-gen of -marm in the case of two-dimensional
array access.

Given the following code:

typedef struct { int x,y,a,b; } X;

int
f7a(X p[][4], int x, int y)
{
 return p[x][y].a;
}

The code-gen on -O2 -marm -mcpu=cortex-a15 is currently

   mov r2, r2, asl #4
   add r1, r2, r1, asl #6
   add r0, r0, r1
   ldr r0, [r0, #8]
   bx  lr

With the patch, we'll get:

   add r1, r0, r1, lsl #6
   add r2, r1, r2, lsl #4
   ldr r0, [r2, #8]
   bx  lr

The -mthumb code-gen had been OK.

The patch has passed the bootstrapping on cortex-a15 and the
arm-none-eabi regtest, with no code-gen difference in spec2k
(unfortunately).

OK for the trunk?

Thanks,
Yufeng

gcc/

* config/arm/arm.c (arm_reassoc_shifts_in_address): New declaration
and new function.
(arm_legitimize_address): Call the new functions.
(thumb_legitimize_address): Prefix the declaration with static.

gcc/testsuite/

* gcc.target/arm/shifted-add-1.c: New test.
* gcc.target/arm/shifted-add-2.c: Ditto.






Re: [Patch, AArch64] Restructure arm_neon.h vector types' implementation.

2014-06-25 Thread Yufeng Zhang
On 23 June 2014 16:47, Tejas Belagod  wrote:
>
> Hi,
>
> Here is a patch that restructures neon builtins to use vector types based on
> standard base types. We previously defined arm_neon.h's neon vector
> types(int8x8_t) using gcc's front-end vector extensions. We now move away
> from that and use types built internally(e.g. __Int8x8_t). These internal
> types names are defined by the AAPCS64 and we build arm_neon.h's public
> vector types over these internal types. e.g.
>
>   typedef __Int8x8_t int8x8_t;
>
> as opposed to
>
>   typedef __builtin_aarch64_simd_qi int8x8_t
> __attribute__ ((__vector_size__ (8)));
>
> Impact on mangling:
>
> This patch does away with these builtin scalar types that the vector types
> were based on. These were previously used to look up mangling names. We now
> use the internal vector type names(e.g. __Int8x8_t) to lookup mangling for
> the arm_neon.h-exported vector types. There are a few internal scalar
> types(__builtin_aarch64_simd_oi etc.) that is needed to efficiently
> implement some NEON Intrinsics. These will be declared in the back-end and
> registered in the front-end and aarch64-specific builtin types, but are not
> user-visible. These, along with a few scalar __builtin types that aren't
> user-visible will have implementation-defined mangling. Because we don't
> have strong-typing across all builtins yet, we still have to maintain the
> old builtin scalar types - they will be removed once we move over to a
> strongly-typed builtin system implemented by the qualifier infrastructure.
>
> Marc Glisse's patch in this thread exposed this issue
> https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00618.html. I've tested my
> patch with the change that his patch introduced, and it seems to work fine -
> specifically these two lines:
>
> +  for (tree t = registered_builtin_types; t; t = TREE_CHAIN (t))
> +emit_support_tinfo_1 (TREE_VALUE (t));
>
> Regressed on aarch64-none-elf. OK for trunk?
>
> Thanks,
> Tejas.
>
> gcc/Changelog
>
> 2014-06-23  Tejas Belagod  
>
> * config/aarch64/aarch64-builtins.c (aarch64_build_scalar_type):
> Remove.
> (aarch64_scalar_builtin_types, aarch64_simd_type,
> aarch64_simd_types,
>  aarch64_mangle_builtin_scalar_type,
> aarch64_mangle_builtin_vector_type,
>  aarch64_mangle_builtin_type, aarch64_simd_builtin_std_type,
>  aarch64_lookup_simd_builtin_type, aarch64_simd_builtin_type,
>  aarch64_init_simd_builtin_types,
>  aarch64_init_simd_builtin_scalar_types): New.
> (aarch64_init_simd_builtins): Refactor.
> (aarch64_fold_builtin): Remove redundant defn.
> (aarch64_init_crc32_builtins): Use aarch64_simd_builtin_std_type.
> * config/aarch64/aarch64-simd-builtin-types.def: New.

Has the content of this new file been included in the patch?

Yufeng


Re: [Patch, AArch64] Restructure arm_neon.h vector types' implementation.

2014-06-27 Thread Yufeng Zhang
On 27 June 2014 16:32, Tejas Belagod  wrote:
>>>
>>> 2014-06-23  Tejas Belagod  
> diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def
> b/gcc/config/aarch64/aarch64-simd-builtin-types.def
> new file mode 100644
> index 000..aa6a84e
> --- /dev/null
> +++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def
> @@ -0,0 +1,50 @@
> +/* Builtin AdvSIMD types.
> +   Copyright (C) 2014 Free Software Foundation, Inc.
> +   Contributed by ARM Ltd.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +  ENTRY (Int8x8_t, V8QI, none, 10)
> +  ENTRY (Int8x16_t, V16QI, none, 11)
> +  ENTRY (Int16x4_t, V4HI, none, 11)
> +  ENTRY (Int16x8_t, V8HI, none, 11)
> +  ENTRY (Int32x2_t, V2SI, none, 11)
> +  ENTRY (Int32x4_t, V4SI, none, 11)
> +  ENTRY (Int64x1_t, DI, none, 11)
> +  ENTRY (Int64x2_t, V2DI, none, 11)
> +  ENTRY (Uint8x8_t, V8QI, unsigned, 11)
> +  ENTRY (Uint8x16_t, V16QI, unsigned, 12)
> +  ENTRY (Uint16x4_t, V4HI, unsigned, 12)
> +  ENTRY (Uint16x8_t, V8HI, unsigned, 12)
> +  ENTRY (Uint32x2_t, V2SI, unsigned, 12)
> +  ENTRY (Uint32x4_t, V4SI, unsigned, 12)
> +  ENTRY (Uint64x1_t, DI, unsigned, 12)
> +  ENTRY (Uint64x2_t, V2DI, unsigned, 12)
> +  ENTRY (Poly8_t, QI, poly, 9)
> +  ENTRY (Poly16_t, HI, poly, 10)
> +  ENTRY (Poly64_t, DI, poly, 10)
> +  ENTRY (Poly128_t, TI, poly, 11)
> +  ENTRY (Poly8x8_t, V8QI, poly, 11)
> +  ENTRY (Poly8x16_t, V16QI, poly, 12)
> +  ENTRY (Poly16x4_t, V4HI, poly, 12)
> +  ENTRY (Poly16x8_t, V8HI, poly, 12)
> +  ENTRY (Poly64x1_t, DI, poly, 12)
> +  ENTRY (Poly64x2_t, V2DI, poly, 12)
> +  ENTRY (Float32x2_t, V2SF, none, 13)
> +  ENTRY (Float32x4_t, V4SF, none, 13)
> +  ENTRY (Float64x1_t, DF, none, 13)

Will this revert Alan Lawrance's commit in 211892, which defines
Float64x1_t to have V1DF mode?

Thanks,
Yufeng

commit cffa849a621eb949bbdc4ce8468c932889450e6d
Author: alalaw01 
Date:   Mon Jun 23 12:46:52 2014 +

PR/60825 Make float64x1_t in arm_neon.h a proper vector type


Re: AARCH64 configure check for gas -mabi support

2014-06-30 Thread Yufeng Zhang

Looks good to me.  Thanks for the fix.

Yufeng

On 06/30/14 10:44, Gerald Pfeifer wrote:

I applied the small patch on top of this, mostly triggered by the
markup issue.

Let me know if there is anything you'd like to see differently; I
am thinking to push back to GCC 4.9 as well later.

Gerald


2014-06-30  Gerald Pfeifer

* doc/install.texi (Specific, aarch64*-*-*): Fix markup.  Reword a bit.

Index: doc/install.texi
===
--- doc/install.texi(revision 212139)
+++ doc/install.texi(working copy)
@@ -3760,9 +3760,9 @@
  @end html
  @anchor{aarch64-x-x}
  @heading aarch64*-*-*
-Pre 2.24 binutils does not have support for selecting -mabi and does not
-support ILP32.  If GCC 4.9 or later is built with pre 2.24, GCC will not
-support option -mabi=ilp32.
+Binutils pre 2.24 does not have support for selecting @option{-mabi} and
+does not support ILP32.  If it is used to build GCC 4.9 or later, GCC will
+not support option @option{-mabi=ilp32}.

  @html
  






[PATCH, AArch64, Testsuite] Specify -fno-use-caller-save for func-ret* tests

2014-07-01 Thread Yufeng Zhang

Hi,

This patch resolves a conflict between the aapcs64 test framework for 
func-ret tests and the optimization option -fuse-caller-save, which was 
enabled by default at -O1 or above recently.


Basically, the test framework has an inline-assembly based mechanism in 
place which invokes the test facility function right on the return of a 
tested function.  The compiler with -fuse-caller-save is unable to 
identify the unconventional call graph and carries out the optimization 
regardless.


Adding explicit LR clobbering field to the inline assembly doesn't solve 
the issue as the compiler would simply generate extra save/store of LR 
in the prologue/epilogue.


OK for the trunk?

Thanks,
Yufeng

gcc/testsuite/

* gcc.target/aarch64/aapcs64/aapcs64.exp:
(additional_flags_for_func_ret): New variable based on $additional_flags
plus -fno-use-caller-save.
(func-ret-*.c): Use the new variable.diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/aapcs64.exp 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/aapcs64.exp
index 195f977..fdfbff1 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/aapcs64.exp
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/aapcs64.exp
@@ -48,11 +48,15 @@ foreach src [lsort [glob -nocomplain 
$srcdir/$subdir/va_arg-*.c]] {
 }
 
 # Test function return value.
+#   Disable -fuse-caller-save to prevent the compiler from generating
+#   conflicting code.
+set additional_flags_for_func_ret $additional_flags
+append additional_flags_for_func_ret " -fno-use-caller-save"
 foreach src [lsort [glob -nocomplain $srcdir/$subdir/func-ret-*.c]] {
 if {[runtest_file_p $runtests $src]} {
c-torture-execute [list $src \
$srcdir/$subdir/abitest.S] \
-   $additional_flags
+   $additional_flags_for_func_ret
 }
 }
 

Re: [AArch64] fix missing Dwarf call frame information in the epilogue

2012-11-06 Thread Yufeng Zhang

Hi,

Many thanks for reviewing.  Please find the updated patch.  The explicit 
calls to gen_rtx_PLUS and GEN_INT have been replaced by plus_constant, 
and the call to aarch64_set_frame_expr has been replaced with 
add_reg_note (REG_CFA_ADJUST_CFA).


I'll clean up other cases in aarch64.c in a separate patch.

OK to commit?

Thanks,
Yufeng


gcc/ChangeLog

2012-11-06  Yufeng Zhang  

 * config/aarch64/aarch64.c (aarch64_expand_prologue): For the
 load-pair with writeback instruction, replace
 aarch64_set_frame_expr with add_reg_note (REG_CFA_ADJUST_CFA);
 add new local variable 'cfa_reg' and use it.

gcc/testsuite/ChangeLog

2012-11-06  Yufeng Zhang  

 * gcc.target/aarch64/dwarf-cfa-reg.c: New file.


On 09/12/12 19:37, Richard Henderson wrote:

On 09/12/2012 09:10 AM, Yufeng Zhang wrote:

aarch64_set_frame_expr (gen_rtx_SET
  (Pmode,
   stack_pointer_rtx,
-  gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+  gen_rtx_PLUS (Pmode, cfa_reg,
 GEN_INT (offset;


We'd prefer to use

   plus_constant (Pmode, cfa_reg, offset)

instead of the explicit call to gen_rtx_PLUS and GEN_INT.
It would appear that the entire aarch64.c file ought to
be audited for that.

Also, use of the REG_CFA_* notes is strongly encouraged over
use of REG_FRAME_RELATED_EXPR.

There's all sorts of work involved in turning R_F_R_E into
R_CFA_* notes, depending on a rather large state machine.
This state machine was developed when only prologues were
annotated for unwinding, and therefore one cannot expect it
to work reliably for epilogues.

A long-term goal is to convert all targets to use R_CFA_*
exclusively, as that preserves much more information present
in the structure of the code of the prologue generator.  It
means less work within the compiler, and eventually being able
to remove a rather large hunk of state-machine code.


r~

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b36be90..8a2d7ba 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1777,7 +1777,7 @@ aarch64_expand_prologue (void)
   - original_frame_size
   - cfun->machine->frame.saved_regs_size);
 
-  /* Store pairs and load pairs have a range only of +/- 512.  */
+  /* Store pairs and load pairs have a range only -512 to 504.  */
   if (offset >= 512)
 {
   /* When the frame has a large size, an initial decrease is done on
@@ -1923,6 +1923,7 @@ aarch64_expand_epilogue (bool for_sibcall)
   HOST_WIDE_INT original_frame_size, frame_size, offset;
   HOST_WIDE_INT fp_offset;
   rtx insn;
+  rtx cfa_reg;
 
   aarch64_layout_frame ();
   original_frame_size = get_frame_size () + cfun->machine->saved_varargs_size;
@@ -1935,7 +1936,9 @@ aarch64_expand_epilogue (bool for_sibcall)
   - original_frame_size
   - cfun->machine->frame.saved_regs_size);
 
-  /* Store pairs and load pairs have a range only of +/- 512.  */
+  cfa_reg = frame_pointer_needed ? hard_frame_pointer_rtx : stack_pointer_rtx;
+
+  /* Store pairs and load pairs have a range only -512 to 504.  */
   if (offset >= 512)
 {
   offset = original_frame_size + cfun->machine->frame.saved_regs_size;
@@ -1966,6 +1969,10 @@ aarch64_expand_epilogue (bool for_sibcall)
   hard_frame_pointer_rtx,
   GEN_INT (- fp_offset)));
   RTX_FRAME_RELATED_P (insn) = 1;
+  /* As SP is set to (FP - fp_offset), according to the rules in
+dwarf2cfi.c:dwarf2out_frame_debug_expr, CFA should be calculated
+from the value of SP from now on.  */
+  cfa_reg = stack_pointer_rtx;
 }
 
   aarch64_save_or_restore_callee_save_registers
@@ -2003,11 +2010,9 @@ aarch64_expand_epilogue (bool for_sibcall)
 GEN_INT (offset),
 GEN_INT (GET_MODE_SIZE (DImode) + offset)));
  RTX_FRAME_RELATED_P (XVECEXP (PATTERN (insn), 0, 2)) = 1;
- aarch64_set_frame_expr (gen_rtx_SET
- (Pmode,
-  stack_pointer_rtx,
-  gen_rtx_PLUS (Pmode, stack_pointer_rtx,
-GEN_INT (offset;
+ add_reg_note (insn, REG_CFA_ADJUST_CFA,
+   (gen_rtx_SET (Pmode, stack_pointer_rtx,
+ plus_constant (cfa_reg, offset;
}
 
  /* The first part of a frame-related parallel insn
@@ -2027,7 +2032,6 @@ aarch64_expand_epilogue (bool for_sibcall)
  RTX_FRAME_RELATED_P (insn) = 1;
}
}
-
   else
{
 

[PATCH][AARCH64] Fix the name mangling of va_list

2012-11-21 Thread Yufeng Zhang

Hi,

This patch updates the AArch64 port to mangle __va_list as it is in
namespace std in C++.  This is specified in the AArch64 AAPCS64 ABI doc.

OK for the trunk?

Thanks,
Yufeng

gcc/ChangeLog

2012-11-21  Yufeng Zhang  

* config/aarch64/aarch64.c (aarch64_mangle_type): New function.
(TARGET_MANGLE_TYPE): Define.

gcc/testsuite/ChangeLog

2012-11-21  Yufeng Zhang  

* g++.dg/abi/arm_va_list.C: Also test on aarch64*-*-*.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4437fef..792b086 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5859,6 +5859,20 @@ aarch64_preferred_simd_mode (enum machine_mode mode)
   return word_mode;
 }
 
+/* Implement TARGET_MANGLE_TYPE.  */
+
+const char *
+aarch64_mangle_type (const_tree type)
+{
+  /* The AArch64 ABI documents say that "__va_list" has to be
+ managled as if it is in the "std" namespace.  */
+  if (lang_hooks.types_compatible_p (CONST_CAST_TREE (type), va_list_type))
+return "St9__va_list";
+
+  /* Use the default mangling.  */
+  return NULL;
+}
+
 /* Legitimize a memory reference for sync primitive implemented using
LDXR/STXR instructions.  We currently force the form of the reference
to be indirect without offset.  */
@@ -6923,6 +6937,9 @@ aarch64_c_mode_for_suffix (char suffix)
 #undef TARGET_LIBGCC_CMP_RETURN_MODE
 #define TARGET_LIBGCC_CMP_RETURN_MODE aarch64_libgcc_cmp_return_mode
 
+#undef TARGET_MANGLE_TYPE
+#define TARGET_MANGLE_TYPE aarch64_mangle_type
+
 #undef TARGET_MEMORY_MOVE_COST
 #define TARGET_MEMORY_MOVE_COST aarch64_memory_move_cost
 
diff --git a/gcc/testsuite/g++.dg/abi/arm_va_list.C b/gcc/testsuite/g++.dg/abi/arm_va_list.C
index 45a426a..d983ee1 100644
--- a/gcc/testsuite/g++.dg/abi/arm_va_list.C
+++ b/gcc/testsuite/g++.dg/abi/arm_va_list.C
@@ -1,9 +1,10 @@
-// { dg-do compile }
+// { dg-do compile { target { aarch64*-*-* arm*-*-* } } }
 // { dg-options "-Wno-abi" }
-// { dg-require-effective-target arm_eabi }
+// { dg-require-effective-target arm_eabi { target arm*-*-* } }
 
 // AAPCS \S 7.1.4 requires that va_list be a typedef for "struct
 // __va_list".  The mangling is as if it were "std::__va_list".
+// So is required for the AArch64 target.
 // #include 
 typedef __builtin_va_list va_list;
 

[PATCH][AARCH64] Fix the name mangling of AdvSIMD vector types

2012-11-22 Thread Yufeng Zhang

Hi,

This patch implements the correct name mangling of AArch64 AdvSIMD 
vector types in conformance to the AAPCS64 doc (Procedure Call Standard 
for the ARM 64-bit Architecture, Appendix A).


OK for the trunk?

Thanks,
Yufeng

gcc/ChangeLog

2012-11-22  Yufeng Zhang  

* config/aarch64/aarch64.c (aarch64_simd_mangle_map_entry): New
typedef.
(aarch64_simd_mangle_map): New table.
(aarch64_mangle_type): Locate and return the mangled name for
a given AdvSIMD vector type.

gcc/testsuite/ChangeLog

2012-11-22  Yufeng Zhang  

* g++.dg/abi/mangle-neon-aarch64.C: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 42f3a40..ba84a39 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5853,6 +5853,50 @@ aarch64_preferred_simd_mode (enum machine_mode mode)
   return word_mode;
 }
 
+/* A table to help perform AArch64-specific name mangling for AdvSIMD
+   vector types in order to conform to the AAPCS64 (see "Procedure
+   Call Standard for the ARM 64-bit Architecture", Appendix A).  To
+   qualify for emission with the mangled names defined in that document,
+   a vector type must not only be of the correct mode but also be
+   composed of AdvSIMD vector element types (e.g.
+   _builtin_aarch64_simd_qi); these types are registered by
+   aarch64_init_simd_builtins ().  In other words, vector types defined
+   in other ways e.g. via vector_size attribute will get default
+   mangled names.  */
+typedef struct
+{
+  enum machine_mode mode;
+  const char *element_type_name;
+  const char *mangled_name;
+} aarch64_simd_mangle_map_entry;
+
+static aarch64_simd_mangle_map_entry aarch64_simd_mangle_map[] = {
+  /* 64-bit containerized types.  */
+  { V8QImode,  "__builtin_aarch64_simd_qi", "10__Int8x8_t" },
+  { V8QImode,  "__builtin_aarch64_simd_uqi","11__Uint8x8_t" },
+  { V4HImode,  "__builtin_aarch64_simd_hi", "11__Int16x4_t" },
+  { V4HImode,  "__builtin_aarch64_simd_uhi","12__Uint16x4_t" },
+  { V2SImode,  "__builtin_aarch64_simd_si", "11__Int32x2_t" },
+  { V2SImode,  "__builtin_aarch64_simd_usi","12__Uint32x2_t" },
+  { V2SFmode,  "__builtin_aarch64_simd_sf", "13__Float32x2_t" },
+  { V8QImode,  "__builtin_aarch64_simd_poly8",  "11__Poly8x8_t" },
+  { V4HImode,  "__builtin_aarch64_simd_poly16", "12__Poly16x4_t" },
+  /* 128-bit containerized types.  */
+  { V16QImode, "__builtin_aarch64_simd_qi", "11__Int8x16_t" },
+  { V16QImode, "__builtin_aarch64_simd_uqi","12__Uint8x16_t" },
+  { V8HImode,  "__builtin_aarch64_simd_hi", "11__Int16x8_t" },
+  { V8HImode,  "__builtin_aarch64_simd_uhi","12__Uint16x8_t" },
+  { V4SImode,  "__builtin_aarch64_simd_si", "11__Int32x4_t" },
+  { V4SImode,  "__builtin_aarch64_simd_usi","12__Uint32x4_t" },
+  { V2DImode,  "__builtin_aarch64_simd_di", "11__Int64x2_t" },
+  { V2DImode,  "__builtin_aarch64_simd_udi","12__Uint64x2_t" },
+  { V4SFmode,  "__builtin_aarch64_simd_sf", "13__Float32x4_t" },
+  { V2DFmode,  "__builtin_aarch64_simd_df", "13__Float64x2_t" },
+  { V16QImode, "__builtin_aarch64_simd_poly8",  "12__Poly8x16_t" },
+  { V8HImode,  "__builtin_aarch64_simd_poly16", "12__Poly16x8_t" },
+  { VOIDmode, NULL, NULL }
+};
+
 /* Implement TARGET_MANGLE_TYPE.  */
 
 const char *
@@ -5863,6 +5907,26 @@ aarch64_mangle_type (const_tree type)
   if (lang_hooks.types_compatible_p (CONST_CAST_TREE (type), va_list_type))
 return "St9__va_list";
 
+  /* Check the mode of the vector type, and the name of the vector
+ element type, against the table.  */
+  if (TREE_CODE (type) == VECTOR_TYPE)
+{
+  aarch64_simd_mangle_map_entry *pos = aarch64_simd_mangle_map;
+
+  while (pos->mode != VOIDmode)
+	{
+	  tree elt_type = TREE_TYPE (type);
+
+	  if (pos->mode == TYPE_MODE (type)
+	  && TREE_CODE (TYPE_NAME (elt_type)) == TYPE_DECL
+	  && !strcmp (IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (elt_type))),
+			  pos->element_type_name))
+	return pos->mangled_name;
+
+	  pos++;
+	}
+}
+
   /* Use the default mangling.  */
   return NULL;
 }
diff --git a/gcc/testsuite/g++.dg/abi/mangle-neon-aarch64.C b/gcc/testsuite/g++.dg/abi/mangle-neon-aarch64.C
new file mode 100644
index 000..09540e8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/mangle-neon-aarch64.C
@@ -0,0 +1,55 @@
+// Test that AArch64 AdvSIMD (NEON) vector types have their names mangled
+// correctly.
+
+// { dg-do compile { target { aarch64*-*-* } } }
+
+#include 
+

Re: [PATCH][AARCH64] Fix the name mangling of va_list

2012-11-29 Thread Yufeng Zhang
Please find the updated patch that improves the comment added to the 
test in the generic part of the testsuite.


Thanks,
Yufeng

On 11/26/12 09:55, Marcus Shawcroft wrote:

On 21/11/12 14:31, Yufeng Zhang wrote:

Hi,

This patch updates the AArch64 port to mangle __va_list as it is in
namespace std in C++.  This is specified in the AArch64 AAPCS64 ABI doc.

OK for the trunk?

Thanks,
Yufeng

gcc/ChangeLog

2012-11-21  Yufeng Zhang

   * config/aarch64/aarch64.c (aarch64_mangle_type): New function.
   (TARGET_MANGLE_TYPE): Define.



The change to the AArch64 port itself is OK.


gcc/testsuite/ChangeLog

2012-11-21  Yufeng Zhang

   * g++.dg/abi/arm_va_list.C: Also test on aarch64*-*-*.



   // AAPCS \S 7.1.4 requires that va_list be a typedef for "struct
   // __va_list".  The mangling is as if it were "std::__va_list".
+// So is required for the AArch64 target.


The functional change in this test makes sense however the comment
change is slightly confusing.  The original comment refers to the
procedure call standard for AArch32: IHI0042D_aapcs.pdf.

The procedure call standard for AArch64 is defined in
IHI0055A_aapcs.pdf.  This document also discusses va_list in chapter
7.1.4.  Perhaps a different form of words distinguishes between the two
different PCS documents would be better? Perhaps something along these
lines:

// AAPCS \S 7.1.4 requires that va_list be a typedef for "struct
// __va_list".  The mangling is as if it were "std::__va_list".
// AArch64 PCS IHI0055A_aapcs64.pdf \S 7.1.4 requires that va_list
// be a typedef for "struct __va_list".  The mangling is as if it
// were "std::__va_list".

In any case I don;t believe I can OK this change to the generic part of
the test suite. Suggest you CC Mike Stump or Janis Johnson.


Cheers

/Marcus

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d4708bf..42f3a40 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5853,6 +5853,20 @@ aarch64_preferred_simd_mode (enum machine_mode mode)
   return word_mode;
 }
 
+/* Implement TARGET_MANGLE_TYPE.  */
+
+const char *
+aarch64_mangle_type (const_tree type)
+{
+  /* The AArch64 ABI documents say that "__va_list" has to be
+ managled as if it is in the "std" namespace.  */
+  if (lang_hooks.types_compatible_p (CONST_CAST_TREE (type), va_list_type))
+return "St9__va_list";
+
+  /* Use the default mangling.  */
+  return NULL;
+}
+
 /* Return the equivalent letter for size.  */
 static unsigned char
 sizetochar (int size)
@@ -6778,6 +6792,9 @@ aarch64_c_mode_for_suffix (char suffix)
 #undef TARGET_LIBGCC_CMP_RETURN_MODE
 #define TARGET_LIBGCC_CMP_RETURN_MODE aarch64_libgcc_cmp_return_mode
 
+#undef TARGET_MANGLE_TYPE
+#define TARGET_MANGLE_TYPE aarch64_mangle_type
+
 #undef TARGET_MEMORY_MOVE_COST
 #define TARGET_MEMORY_MOVE_COST aarch64_memory_move_cost
 
diff --git a/gcc/testsuite/g++.dg/abi/arm_va_list.C b/gcc/testsuite/g++.dg/abi/arm_va_list.C
index 45a426a..4f6f3a4 100644
--- a/gcc/testsuite/g++.dg/abi/arm_va_list.C
+++ b/gcc/testsuite/g++.dg/abi/arm_va_list.C
@@ -1,9 +1,10 @@
-// { dg-do compile }
+// { dg-do compile { target { aarch64*-*-* arm*-*-* } } }
 // { dg-options "-Wno-abi" }
-// { dg-require-effective-target arm_eabi }
+// { dg-require-effective-target arm_eabi { target arm*-*-* } }
 
 // AAPCS \S 7.1.4 requires that va_list be a typedef for "struct
 // __va_list".  The mangling is as if it were "std::__va_list".
+// AAPCS64 \S 7.1.4 has the same requirement for AArch64 targets.
 // #include 
 typedef __builtin_va_list va_list;
 

Re: [PATCH][AARCH64][PING] Fix the name mangling of AdvSIMD vector types

2012-12-05 Thread Yufeng Zhang

Ping~

On 22/11/12 16:49, Yufeng Zhang wrote:

Hi,

This patch implements the correct name mangling of AArch64 AdvSIMD
vector types in conformance to the AAPCS64 doc (Procedure Call Standard
for the ARM 64-bit Architecture, Appendix A).

OK for the trunk?

Thanks,
Yufeng

gcc/ChangeLog

2012-11-22  Yufeng Zhang

  * config/aarch64/aarch64.c (aarch64_simd_mangle_map_entry): New
  typedef.
  (aarch64_simd_mangle_map): New table.
  (aarch64_mangle_type): Locate and return the mangled name for
  a given AdvSIMD vector type.

gcc/testsuite/ChangeLog

2012-11-22  Yufeng Zhang

  * g++.dg/abi/mangle-neon-aarch64.C: New test.





[PATCH, AArch64] Skip aarch64*-*-* for g++.dg/cpp0x/alias-decl-debug-0.C

2013-05-28 Thread Yufeng Zhang

Hi,

The attached patch updates the g++ test 
g++.dg/cpp0x/alias-decl-debug-0.C to skip aarch64*-*-*, on which there 
is no support for STABS.


OK for the trunk?

Thanks,
Yufeng

gcc/testsuite/

* g++.dg/cpp0x/alias-decl-debug-0.C: Add aarch64*-*-* to the
dg-skip-if "No stabs".diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-debug-0.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-debug-0.C
index 6365528..a9aae37 100644
--- a/gcc/testsuite/g++.dg/cpp0x/alias-decl-debug-0.C
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-debug-0.C
@@ -1,5 +1,5 @@
 // Origin: PR c++/51032
-// { dg-skip-if "No stabs" { mmix-*-* *-*-aix* alpha*-*-* hppa*64*-*-* 
ia64-*-* *-*-vxworks* } { "*" } { "" } }
+// { dg-skip-if "No stabs" { aarch64*-*-* mmix-*-* *-*-aix* alpha*-*-* 
hppa*64*-*-* ia64-*-* *-*-vxworks* } { "*" } { "" } }
 // { dg-options "-std=c++0x -gstabs+" }
 
 template 

[Patch, AArch64] Adjust gcc.dg/torture/stackalign/builtin-apply-2.c

2013-06-17 Thread Yufeng Zhang

Hi,

This patch sets STACK_ARGUMENTS_SIZE with 0 for AArch64 as variadic 
arguments to 'bar' are passed in registers on this target.


OK for the trunk?

Thanks,
Yufeng

gcc/testsuite/

* gcc.dg/torture/stackalign/builtin-apply-2.c: set
STACK_ARGUMENTS_SIZE with 0 if __aarch64__ is defined.diff --git a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c 
b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
index cbb38ef..7982210 100644
--- a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
+++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
@@ -16,7 +16,7 @@
E, F and G are passed on stack.  So the size of the stack argument
data is 20.  */
 #define STACK_ARGUMENTS_SIZE  20
-#elif defined __MMIX__
+#elif defined __aarch64__ || defined __MMIX__
 /* No parameters on stack for bar.  */
 #define STACK_ARGUMENTS_SIZE 0
 #else

[PATCH, AArch64] Minor refactoring of aarch64_add_offset

2013-06-25 Thread Yufeng Zhang
This patch carries out minor refactoring on aarch64_add_offset; it 
replaces 'DImode' and 'Pmode' with 'mode'.


OK for the trunk?

Thanks,
Yufeng


gcc/

* config/aarch64/aarch64.c (aarch64_add_offset): Change to pass
'mode' to aarch64_plus_immediate and gen_rtx_PLUS.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 527b00d..81c6fd9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -804,7 +804,7 @@ aarch64_force_temporary (rtx x, rtx value)
 static rtx
 aarch64_add_offset (enum machine_mode mode, rtx temp, rtx reg, HOST_WIDE_INT offset)
 {
-  if (!aarch64_plus_immediate (GEN_INT (offset), DImode))
+  if (!aarch64_plus_immediate (GEN_INT (offset), mode))
 {
   rtx high;
   /* Load the full offset into a register.  This
@@ -812,7 +812,7 @@ aarch64_add_offset (enum machine_mode mode, rtx temp, rtx reg, HOST_WIDE_INT off
   high = GEN_INT (offset);
   offset = 0;
   high = aarch64_force_temporary (temp, high);
-  reg = aarch64_force_temporary (temp, gen_rtx_PLUS (Pmode, high, reg));
+  reg = aarch64_force_temporary (temp, gen_rtx_PLUS (mode, high, reg));
 }
   return plus_constant (mode, reg, offset);
 }

[PATCH, AArch64] Minor refactoring of aarch64_force_temporary

2013-06-25 Thread Yufeng Zhang
This patch adds a new parameter 'mode' of type 'enum machine_mode mode' 
to aarch64_force_temporary, and updates the calls to it.


OK for the trunk?

Thanks,
Yufeng

gcc/

* config/aarch64/aarch64.c (aarch64_force_temporary): Add an extra
parameter 'mode' of type 'enum machine_mode mode'; change to pass
'mode' to force_reg.
(aarch64_add_offset): Update calls to aarch64_force_temporary.
(aarch64_expand_mov_immediate): Likewise.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 81c6fd9..77591c1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -789,10 +789,10 @@ aarch64_split_simd_move (rtx dst, rtx src)
 }
 
 static rtx
-aarch64_force_temporary (rtx x, rtx value)
+aarch64_force_temporary (enum machine_mode mode, rtx x, rtx value)
 {
   if (can_create_pseudo_p ())
-return force_reg (Pmode, value);
+return force_reg (mode, value);
   else
 {
   x = aarch64_emit_move (x, value);
@@ -811,8 +811,9 @@ aarch64_add_offset (enum machine_mode mode, rtx temp, rtx reg, HOST_WIDE_INT off
  might be improvable in the future.  */
   high = GEN_INT (offset);
   offset = 0;
-  high = aarch64_force_temporary (temp, high);
-  reg = aarch64_force_temporary (temp, gen_rtx_PLUS (mode, high, reg));
+  high = aarch64_force_temporary (mode, temp, high);
+  reg = aarch64_force_temporary (mode, temp,
+ gen_rtx_PLUS (mode, high, reg));
 }
   return plus_constant (mode, reg, offset);
 }
@@ -851,7 +852,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	  && targetm.cannot_force_const_mem (mode, imm))
 	{
 	  gcc_assert(can_create_pseudo_p ());
-	  base = aarch64_force_temporary (dest, base);
+	  base = aarch64_force_temporary (mode, dest, base);
 	  base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
 	  aarch64_emit_move (dest, base);
 	  return;
@@ -868,7 +869,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	  if (offset != const0_rtx)
 	{
 	  gcc_assert(can_create_pseudo_p ());
-	  base = aarch64_force_temporary (dest, base);
+	  base = aarch64_force_temporary (mode, dest, base);
 	  base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
 	  aarch64_emit_move (dest, base);
 	  return;

[Patch, AArch64, ILP32] 0/5 Add support for ILP32

2013-06-26 Thread Yufeng Zhang

Hi,

A set of five patches will be sent shortly as the gcc part of changes 
that add support for ILP32 in the AArch64 baremetal toolchain.


The five patches will be organized as the following:

1. Configury changes;

2. AArch64 backend changes that add necessary instruction patterns and 
update the backend macros and hooks to support ILP32;


3. Minor change to the generic part of the compiler to enable correct 
pass-by-reference parameter passing;


4. Changes to a number of tests for them to be ILP32-friendly;

5. Define _ILP32 and __ILP32__.

The patch set will enable the basic ILP32 support in the baremetal 
environment, with small absolute and small PIC as the supported 
addressing models.


Patches for binutils changes have been committed to the binutils trunk; 
they were previously posted here: 
http://sourceware.org/ml/binutils/2013-06/msg00176.html



Thanks,
Yufeng




[Patch, AArch64, ILP32] 1/5 Initial support - configury changes

2013-06-26 Thread Yufeng Zhang

This patch adds the configuration changes to the AArch64 GCC to support:

* -milp32 and -mlp64 options in the compiler and the driver
* multilib of ilp32 and/or lp64 libraries
* differentiation of basic types in the compiler backend

The patch enables --with-multilib-list configuration option for 
specifying the list of library flavors to enable; the default value is 
"mlp64" and can be overridden by --with-abi to "milp32".


It also enables --with-abi for setting the default model in the 
compiler.  Its default value is "mlp64" unless --with-multilib-list is 
explicitly specified with "milp32", in which case it defaults to "milp32".


In the backend, two target flags are introduced: TARGET_ILP32 and 
TARGET_LP64.  They are set by -milp32 and -mlp64 respectively, exclusive 
to each other.  The default setting is via the option variable 
aarch64_pmodel_flags, which defaults to TARGET_DEFAULT_PMODEL, which is 
further defined in biarchlp64.h or biarchilp32.h depending which header 
file is included.


  biarchlp64.h biarchilp32.h
TARGET_DEFAULT_PMODEL OPTION_MASK_LP64 OPTION_MASK_ILP32
TARGET_PMODEL 12

TARGET_ILP32 and TARGET_LP64 are implicitly defined as:

#define TARGET_ILP32 ((aarch64_pmodel_flags & OPTION_MASK_ILP32) != 0)
#define TARGET_LP64 ((aarch64_pmodel_flags & OPTION_MASK_LP64) != 0)

Note that the multilib support in the Linux toolchain is suppressed 
deliberately.


OK for the trunk?

Thanks,
Yufeng


gcc/
* config.gcc (aarch64*-*-*): Support --with-abi.
(aarch64*-*-elf): Support --with-multilib-list.
(aarch64*-*-linux*): Likewise.
(supported_defaults): Add abi to aarch64*-*-*.
* configure.ac: Mention AArch64 for --with-multilib-list.
* configure: Re-generated.
* config/aarch64/biarchilp32.h: New file.
* config/aarch64/biarchlp64.h: New file.
* config/aarch64/aarch64-elf.h (SPEC_LP64): New define.
(SPEC_ILP32): Ditto.
(ASM_SPEC): Update to SPEC_LP64 and SPEC_ILP32.
(MULTILIB_DEFAULTS): New define.
* config/aarch64/aarch64-elf-raw.h (EMUL_SUFFIX): New define.
(LINK_SPEC): Change to depend on SPEC_LP64 and SPEC_ILP32 and also
to use EMUL_SUFFIX.
* config/aarch64/aarch64.h (LONG_TYPE_SIZE): Change to depend on
TARGET_ILP32.
(POINTER_SIZE): New define.
(POINTERS_EXTEND_UNSIGNED): Ditto.
* config/aarch64/aarch64.c (initialize_aarch64_programming_model):
New declaration and definition.
(aarch64_override_options): Call the new function.
* config/aarch64/aarch64.opt (aarch64_pmodel_flags): New.
(milp32, mlp64): New.
* config/aarch64/t-aarch64 (comma): New define.
(MULTILIB_OPTIONS): Ditto.
(MULTILIB_DIRNAMES): Ditto.
* config/aarch64/t-aarch64-linux (MULTIARCH_DIRNAME): New define.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 0ad7217..c8af44e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -497,6 +497,26 @@ then
 fi
 
 case ${target} in
+aarch64*-*-*)
+	case ${with_abi} in
+	"")
+		if test "x$with_multilib_list" = xmilp32; then
+			tm_file="aarch64/biarchilp32.h ${tm_file}"
+		else
+			tm_file="aarch64/biarchlp64.h ${tm_file}"
+		fi
+		;;
+	lp64 | mlp64)
+		tm_file="aarch64/biarchlp64.h ${tm_file}"
+		;;
+	ilp32 | milp32)
+		tm_file="aarch64/biarchilp32.h ${tm_file}"
+		;;
+	*)
+		echo "Unknown ABI used in --with-abi=$with_abi"
+		exit 1
+	esac
+	;;
 i[34567]86-*-*)
 	if test "x$with_abi" != x; then
 		echo "This target does not support --with-abi."
@@ -827,6 +847,32 @@ aarch64*-*-elf)
 		tm_defines="${tm_defines} TARGET_BIG_ENDIAN_DEFAULT=1"
 		;;
 	esac
+	aarch64_multilibs="${with_multilib_list}"
+	if test "$aarch64_multilibs" = "default"; then
+		case ${with_abi} in
+		ilp32 | milp32)
+			aarch64_multilibs="milp32"
+			;;
+		*)
+			# TODO: Change to build both flavours by default when
+			# the ILP32 support is mature enough.
+			# aarch64_multilibs="mlp64,milp32"
+			aarch64_multilibs="mlp64"
+			;;
+		esac
+	fi
+	aarch64_multilibs=`echo $aarch64_multilibs | sed -e 's/,/ /g'`
+	for aarch64_multilib in ${aarch64_multilibs}; do
+		case ${aarch64_multilib} in
+		milp32 | mlp64 )
+			TM_MULTILIB_CONFIG="${TM_MULTILIB_CONFIG},${aarch64_multilib}"
+			;;
+		*)
+			echo "--with-multilib-list=${aarch64_multilib} not supported."
+			exit 1
+		esac
+	done
+	TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^,//'`
 	;;
 aarch64*-*-linux*)
 	tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h"
@@ -837,6 +883,32 @@ aarch64*-*-linux*)
 		tm_defines="${tm_defines} TARGET_BIG_ENDIAN_DEFAULT=1"
 		;;
 	esac
+	aarch64_multilibs="${with_multilib_list}"
+	if test "$aarch64_multilibs" = "default"; then
+		case ${with_abi} in
+		ilp32 | milp32)
+			aarch64_multilibs="milp32"
+			;;
+		*)
+			# TODO: Change to build both flavours by default when
+			# the ILP32 support is mature enough.
+			# aa

[Patch, AArch64, ILP32] 2/5 More backend changes and support for small absolute and small PIC addressing models

2013-06-26 Thread Yufeng Zhang
This patch updates the AArch64 backend to support the small absolute and 
small PIC addressing models for ILP32; it also updates a number of other 
backend macros and hooks in order to support ILP32.


OK for the trunk?

Thanks,
Yufeng


gcc/

* config/aarch64/aarch64.c (POINTER_BYTES): New define.
(aarch64_load_symref_appropriately): In the case of
SYMBOL_SMALL_ABSOLUTE, use the mode of 'dest' instead of Pmode
to generate new rtx; likewise to the case of SYMBOL_SMALL_GOT.
(aarch64_expand_mov_immediate): In the case of SYMBOL_FORCE_TO_MEM,
change to pass 'ptr_mode' to force_const_mem and zero-extend 'mem'
if 'mode' doesn't equal to 'ptr_mode'.
(aarch64_output_mi_thunk): Add an assertion on the alignment of
'vcall_offset'; change to call aarch64_emit_move differently 
depending

on whether 'Pmode' equals to 'ptr_mode' or not; use 'POINTER_BYTES'
to calculate the upper bound of 'vcall_offset'.
(aarch64_cannot_force_const_mem): Change to also return true if
mode != ptr_mode.
(aarch64_legitimize_reload_address): In the case of large
displacements, add new local variable 'xmode' and an assertion
based on it; change to use 'xmode' to generate the new rtx and
reload.
(aarch64_asm_trampoline_template): Change to generate the template
differently depending on TARGET_ILP32 or not; change to use
'POINTER_BYTES' in the argument passed to assemble_aligned_integer.
(aarch64_trampoline_size): Removed.
(aarch64_trampoline_init): Add new local constant 'tramp_code_sz'
and replace immediate literals with it.  Change to use 'ptr_mode'
instead of 'DImode' and call convert_memory_address if the mode
of 'fnaddr' doesn't equal to 'ptr_mode'.
(aarch64_elf_asm_constructor): Change to use 
assemble_aligned_integer

to output symbol.
(aarch64_elf_asm_destructor): Likewise.
* config/aarch64/aarch64.h (TRAMPOLINE_SIZE): Change to be 
dependent

on TARGET_ILP32 instead of aarch64_trampoline_size.
* config/aarch64/aarch64.md (movsi_aarch64): Add new alternatives
of 'mov' between WSP and W registers as well as 'adr' and 'adrp'.
(loadwb_pair_): Rename to ...
(loadwb_pair_): ... this.  Replace PTR with P.
(storewb_pair_): Likewise; rename to ...
(storewb_pair_): ... this.
(add_losym): Change to 'define_expand' and call 
gen_add_losym_

depending on the value of 'mode'.
(add_losym_): New.
(ldr_got_small_): New, based on ldr_got_small.
(ldr_got_small): Remove.
(ldr_got_small_sidi): New.
* config/aarch64/iterators.md (P): New.
(PTR): Change to 'ptr_mode' in the condition.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c16d55f..1117515 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -46,6 +46,9 @@
 #include "optabs.h"
 #include "dwarf2.h"
 
+/* Defined for convenience.  */
+#define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)
+
 /* Classifies an address.
 
ADDRESS_REG_IMM
@@ -519,13 +522,16 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 {
 case SYMBOL_SMALL_ABSOLUTE:
   {
+	/* In ILP32, the mode of dest can be either SImode or DImode.  */
 	rtx tmp_reg = dest;
+	enum machine_mode mode = GET_MODE (dest);
+
+	gcc_assert (mode == Pmode || mode == ptr_mode);
+
 	if (can_create_pseudo_p ())
-	  {
-	tmp_reg =  gen_reg_rtx (Pmode);
-	  }
+	  tmp_reg = gen_reg_rtx (mode);
 
-	emit_move_insn (tmp_reg, gen_rtx_HIGH (Pmode, imm));
+	emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm));
 	emit_insn (gen_add_losym (dest, tmp_reg, imm));
 	return;
   }
@@ -536,11 +542,33 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 
 case SYMBOL_SMALL_GOT:
   {
+	/* In ILP32, the mode of dest can be either SImode or DImode,
+	   while the got entry is always of SImode size.  The mode of
+	   dest depends on how dest is used: if dest is assigned to a
+	   pointer (e.g. in the memory), it has SImode; it may have
+	   DImode if dest is dereferenced to access the memeory.
+	   This is why we have to handle three different ldr_got_small
+	   patterns here (two patterns for ILP32).  */
 	rtx tmp_reg = dest;
+	enum machine_mode mode = GET_MODE (dest);
+
 	if (can_create_pseudo_p ())
-	  tmp_reg =  gen_reg_rtx (Pmode);
-	emit_move_insn (tmp_reg, gen_rtx_HIGH (Pmode, imm));
-	emit_insn (gen_ldr_got_small (dest, tmp_reg, imm));
+	  tmp_reg = gen_reg_rtx (mode);
+
+	emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm));
+	if (mode == ptr_mode)
+	  {
+	if (mode == DImode)
+	  emit_insn (gen_ldr_got_small_di (dest, tmp_reg, imm));
+	else
+	  emit_insn (gen_ldr_got_small_si (dest, tmp_reg, imm));
+	  }
+	else
+	  {
+	gcc_assert (mode == Pmode);
+	emit_insn (gen_ldr_got_small_sidi (dest, tmp_reg, imm));
+	

[Patch, AArch64, ILP32] 3/5 Minor change in function.c:assign_parm_find_data_types()

2013-06-26 Thread Yufeng Zhang
This patch updates assign_parm_find_data_types to assign passed_mode and 
nominal_mode with the mode of the built pointer type instead of the 
hard-coded Pmode in the case of pass-by-reference.  This is in line with 
the assignment to passed_mode and nominal_mode in other cases inside the 
function.


assign_parm_find_data_types generally uses TYPE_MODE to calculate 
passed_mode and nominal_mode:


  /* Find mode of arg as it is passed, and mode of arg as it should be
 during execution of this function.  */
  passed_mode = TYPE_MODE (passed_type);
  nominal_mode = TYPE_MODE (nominal_type);

this includes the case when the passed argument is a pointer by itself.

However there is a discrepancy when it deals with argument passed by 
invisible reference; it builds the argument's corresponding pointer 
type, but sets passed_mode and nominal_mode with Pmode directly.


This is OK for targets where Pmode == ptr_mode, but on AArch64 with 
ILP32 they are different with Pmode as DImode and ptr_mode as SImode. 
When such a reference is passed on stack, the reference is prepared by 
the caller in the lower 4 bytes of an 8-byte slot but is fetched by the 
callee as an 8-byte datum, of which the higher 4 bytes may contain junk. 
 It is probably the combination of Pmode != ptr_mode and the particular 
ABI specification that make the AArch64 ILP32 the first target on which 
the issue manifests itself.


Bootstrapped on x86_64-none-linux-gnu.

OK for the trunk?

Thanks,
Yufeng


gcc/
* function.c (assign_parm_find_data_types): Set passed_mode and
nominal_mode to the TYPE_MODE of nominal_type for the built
pointer type in case of the struct-pass-by-reference.diff --git a/gcc/function.c b/gcc/function.c
index 3e33fc7..6a0aaaf 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -2369,7 +2369,7 @@ assign_parm_find_data_types (struct assign_parm_data_all *all, tree parm,
 {
   passed_type = nominal_type = build_pointer_type (passed_type);
   data->passed_pointer = true;
-  passed_mode = nominal_mode = Pmode;
+  passed_mode = nominal_mode = TYPE_MODE (nominal_type);
 }
 
   /* Find mode as it is passed by the ABI.  */

[Patch, AArch64, ILP32] 4/5 Change tests to be ILP32-friendly

2013-06-26 Thread Yufeng Zhang

The attached patch fixes a few gcc test cases.


Thanks,
Yufeng


gcc/testsuite/

* gcc.dg/20020219-1.c: Skip the test on aarch64*-*-* in ilp32.
* gcc.target/aarch64/aapcs64/test_18.c (struct y): Change the field
type from long to long long.
* gcc.target/aarch64/atomic-op-long.c: Update dg-final directives
to have effective-target keywords of lp64 and ilp32.
* gcc.target/aarch64/fcvt_double_int.c: Likewise.
* gcc.target/aarch64/fcvt_double_long.c: Likewise.
* gcc.target/aarch64/fcvt_double_uint.c: Likewise.
* gcc.target/aarch64/fcvt_double_ulong.c: Likewise.
* gcc.target/aarch64/fcvt_float_int.c: Likewise.
* gcc.target/aarch64/fcvt_float_long.c: Likewise.
* gcc.target/aarch64/fcvt_float_uint.c: Likewise.
* gcc.target/aarch64/fcvt_float_ulong.c: Likewise.
* gcc.target/aarch64/vect_smlal_1.c: Replace 'long' with 'long long'.

diff --git a/gcc/testsuite/gcc.dg/20020219-1.c b/gcc/testsuite/gcc.dg/20020219-1.c
index ffdf19a..d2ba755 100644
--- a/gcc/testsuite/gcc.dg/20020219-1.c
+++ b/gcc/testsuite/gcc.dg/20020219-1.c
@@ -13,6 +13,7 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
 /* { dg-options "-O2 -mdisable-indexing" { target hppa*-*-hpux* } } */
+/* { dg-skip-if "" { aarch64*-*-* && ilp32 } { "*" } { "" } } */
 /* { dg-skip-if "" { "ia64-*-hpux*" } "*" "-mlp64" } */
 /* { dg-skip-if "" { { i?86-*-* x86_64-*-* } && x32 } { "*" } { "" } } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_18.c b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_18.c
index b611e9b..2ebecee 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_18.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_18.c
@@ -9,10 +9,10 @@
 
 struct y
 {
-  long p;
-  long q;
-  long r;
-  long s;
+  long long p;
+  long long q;
+  long long r;
+  long long s;
 } v = { 1, 2, 3, 4 };
 
 struct z
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
index 9468ef4..0672d48 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
@@ -39,5 +39,7 @@ atomic_fetch_or_RELAXED (long a)
   return __atomic_fetch_or (&v, a, __ATOMIC_RELAXED);
 }
 
-/* { dg-final { scan-assembler-times "ldxr\tx\[0-9\]+, \\\[x\[0-9\]+\\\]" 6 } } */
-/* { dg-final { scan-assembler-times "stxr\tw\[0-9\]+, x\[0-9\]+, \\\[x\[0-9\]+\\\]" 6 } } */
+/* { dg-final { scan-assembler-times "ldxr\tx\[0-9\]+, \\\[x\[0-9\]+\\\]" 6 {target lp64} } } */
+/* { dg-final { scan-assembler-times "ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\]" 6 {target ilp32} } } */
+/* { dg-final { scan-assembler-times "stxr\tw\[0-9\]+, x\[0-9\]+, \\\[x\[0-9\]+\\\]" 6 {target lp64} } } */
+/* { dg-final { scan-assembler-times "stxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\]" 6 {target ilp32} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/fcvt_double_int.c b/gcc/testsuite/gcc.target/aarch64/fcvt_double_int.c
index 697aab1..e539909 100644
--- a/gcc/testsuite/gcc.target/aarch64/fcvt_double_int.c
+++ b/gcc/testsuite/gcc.target/aarch64/fcvt_double_int.c
@@ -8,8 +8,10 @@
 #include "fcvt.x"
 
 /* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\]+, *d\[0-9\]" 2 } } */
-/* { dg-final { scan-assembler-times "fcvtps\tx\[0-9\]+, *d\[0-9\]" 1 } } */
-/* { dg-final { scan-assembler-times "fcvtps\tw\[0-9\]+, *d\[0-9\]" 2 } } */
-/* { dg-final { scan-assembler-times "fcvtms\tx\[0-9\]+, *d\[0-9\]" 1 } } */
-/* { dg-final { scan-assembler-times "fcvtms\tw\[0-9\]+, *d\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "fcvtps\tx\[0-9\]+, *d\[0-9\]" 1 {target lp64} } } */
+/* { dg-final { scan-assembler-times "fcvtps\tw\[0-9\]+, *d\[0-9\]" 2 {target lp64} } } */
+/* { dg-final { scan-assembler-times "fcvtps\tw\[0-9\]+, *d\[0-9\]" 3 {target ilp32} } } */
+/* { dg-final { scan-assembler-times "fcvtms\tx\[0-9\]+, *d\[0-9\]" 1 {target lp64} } } */
+/* { dg-final { scan-assembler-times "fcvtms\tw\[0-9\]+, *d\[0-9\]" 2 {target lp64} } } */
+/* { dg-final { scan-assembler-times "fcvtms\tw\[0-9\]+, *d\[0-9\]" 3 {target ilp32} } } */
 /* { dg-final { scan-assembler-times "fcvtas\tw\[0-9\]+, *d\[0-9\]" 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/fcvt_double_long.c b/gcc/testsuite/gcc.target/aarch64/fcvt_double_long.c
index edf640b..5eb36ff 100644
--- a/gcc/testsuite/gcc.target/aarch64/fcvt_double_long.c
+++ b/gcc/testsuite/gcc.target/aarch64/fcvt_double_long.c
@@ -7,7 +7,11 @@
 
 #include "fcvt.x"
 
-/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\]+, *d\[0-9\]" 2 } } */
-/* { dg-final { scan-assembler-times "fcvtps\tx\[0-9\]+, *d\[0-9\]" 3 } } */
-/* { dg-final { scan-assembler-times "fcvtms\tx\[0-9\]+, *d\[0-9\]" 3 } } */
-/* { dg-final { scan-assembler-times "fcvtas\tx\[0-9\]+, *d\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\]+, *d\[0-9\]" 2 {target lp64} } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\]+, *d\[0-9\]" 2 {target ilp32} } 

[Patch, AArch64, ILP32] 5/5 Define _ILP32 and __ILP32__

2013-06-26 Thread Yufeng Zhang
This patch defines _ILP32 and __ILP32__ for the AArch64 port when the 
ILP32 ABI is in use.


This helps libraries, e.g. libgloss and glibc, recognize which model is 
being compiled.


OK for the trunk?

Thanks,
Yufeng


gcc/
* config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define _ILP32
and __ILP32__ when the ILP32 model is in use.

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index d468dd8..e5dadb3 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -49,6 +49,11 @@
 	break;	\
 	}		\
 			\
+  if (TARGET_ILP32)	\
+	{		\
+	  cpp_define (parse_in, "_ILP32");		\
+	  cpp_define (parse_in, "__ILP32__");		\
+	}		\
 } while (0)
 
 

Re: [Patch, AArch64, ILP32] 3/5 Minor change in function.c:assign_parm_find_data_types()

2013-06-26 Thread Yufeng Zhang

On 06/27/13 00:04, Andrew Pinski wrote:

On Wed, Jun 26, 2013 at 3:39 PM, Yufeng Zhang  wrote:

This patch updates assign_parm_find_data_types to assign passed_mode and
nominal_mode with the mode of the built pointer type instead of the
hard-coded Pmode in the case of pass-by-reference.  This is in line with the
assignment to passed_mode and nominal_mode in other cases inside the
function.

assign_parm_find_data_types generally uses TYPE_MODE to calculate
passed_mode and nominal_mode:

   /* Find mode of arg as it is passed, and mode of arg as it should be
  during execution of this function.  */
   passed_mode = TYPE_MODE (passed_type);
   nominal_mode = TYPE_MODE (nominal_type);

this includes the case when the passed argument is a pointer by itself.

However there is a discrepancy when it deals with argument passed by
invisible reference; it builds the argument's corresponding pointer type,
but sets passed_mode and nominal_mode with Pmode directly.

This is OK for targets where Pmode == ptr_mode, but on AArch64 with ILP32
they are different with Pmode as DImode and ptr_mode as SImode. When such a
reference is passed on stack, the reference is prepared by the caller in the
lower 4 bytes of an 8-byte slot but is fetched by the callee as an 8-byte
datum, of which the higher 4 bytes may contain junk.  It is probably the
combination of Pmode != ptr_mode and the particular ABI specification that
make the AArch64 ILP32 the first target on which the issue manifests itself.

Bootstrapped on x86_64-none-linux-gnu.

OK for the trunk?



IA64-hpux also uses Pmode != ptr_mode, can you provide the testcase
which fails without this change?
I used a powerpc64 target where Pmode != ptr_mode which did not hit
this bug either.


The issue was firstly observed in one of the compat tests which passes a 
large number of non-small structures.  The following is a trimmed-down 
reproducible code snippet (although not runnable but shall be easy to be 
make runnable):


struct s5
{
  double a;
  double b;
  double c;
  double d;
  double e;
} gS;

double foo (struct s5 p1, struct s5 p2,struct s5 p3,struct s5 p4,struct 
s5 p5,struct s5 p6,struct s5 p7,struct s5 p8, struct s5 p9)

{
  return p9.c;
}
--- CUT ---

The code-gen (-O2) without the patch is:

.text
.align  2
.global foo
.type   foo, %function
foo:
ldr x0, [sp]<<=== here!
ldr d0, [x0,16]
ret
.size   foo, .-foo

Where the arrow points is the load of the pointer to 'p9' that is passed 
on stack.  The instruction really should be ldr w0, [sp], i.e. the 
pointer mode is SImode rather than DImode.


It needs a number of conditions for the issue to manifest:

1. pass-by-reference; on aarch64 one example is a struct that is larger 
than 16 bytes.
2. the reference is passed on stack; on aarch64, this usually only 
happens after registers x0 - x7 are used.
3. the size of stack slot for passing pointer is larger than the pointer 
size; on aarch64, it is 8-byte vs. 4-byte
4. the unused part of the stack slot is not zeroed out, i.e. undefined 
by the ABI

5. in the runtime, the unused part of such a stack slot contains junk.

The runtime segmentation fault may only be generated when all the above 
conditions are met.  I'm not familiar with IA64-hpux or powerpc64 
procedure call ABIs, but I guess those targets are just being lucky?


Thanks,
Yufeng



Re: [Patch, AArch64, ILP32] 3/5 Minor change in function.c:assign_parm_find_data_types()

2013-06-26 Thread Yufeng Zhang

On 06/27/13 00:51, Andrew Pinski wrote:

On Wed, Jun 26, 2013 at 4:41 PM, Yufeng Zhang  wrote:

On 06/27/13 00:04, Andrew Pinski wrote:


On Wed, Jun 26, 2013 at 3:39 PM, Yufeng Zhang
wrote:


This patch updates assign_parm_find_data_types to assign passed_mode and
nominal_mode with the mode of the built pointer type instead of the
hard-coded Pmode in the case of pass-by-reference.  This is in line with
the
assignment to passed_mode and nominal_mode in other cases inside the
function.

assign_parm_find_data_types generally uses TYPE_MODE to calculate
passed_mode and nominal_mode:

/* Find mode of arg as it is passed, and mode of arg as it should be
   during execution of this function.  */
passed_mode = TYPE_MODE (passed_type);
nominal_mode = TYPE_MODE (nominal_type);

this includes the case when the passed argument is a pointer by itself.

However there is a discrepancy when it deals with argument passed by
invisible reference; it builds the argument's corresponding pointer type,
but sets passed_mode and nominal_mode with Pmode directly.

This is OK for targets where Pmode == ptr_mode, but on AArch64 with ILP32
they are different with Pmode as DImode and ptr_mode as SImode. When such
a
reference is passed on stack, the reference is prepared by the caller in
the
lower 4 bytes of an 8-byte slot but is fetched by the callee as an 8-byte
datum, of which the higher 4 bytes may contain junk.  It is probably the
combination of Pmode != ptr_mode and the particular ABI specification
that
make the AArch64 ILP32 the first target on which the issue manifests
itself.

Bootstrapped on x86_64-none-linux-gnu.

OK for the trunk?




IA64-hpux also uses Pmode != ptr_mode, can you provide the testcase
which fails without this change?
I used a powerpc64 target where Pmode != ptr_mode which did not hit
this bug either.



The issue was firstly observed in one of the compat tests which passes a
large number of non-small structures.  The following is a trimmed-down
reproducible code snippet (although not runnable but shall be easy to be
make runnable):

struct s5
{
   double a;
   double b;
   double c;
   double d;
   double e;
} gS;

double foo (struct s5 p1, struct s5 p2,struct s5 p3,struct s5 p4,struct s5
p5,struct s5 p6,struct s5 p7,struct s5 p8, struct s5 p9)
{
   return p9.c;
}
--- CUT ---

The code-gen (-O2) without the patch is:

 .text
 .align  2
 .global foo
 .type   foo, %function
foo:
 ldr x0, [sp]<<=== here!
 ldr d0, [x0,16]
 ret
 .size   foo, .-foo

Where the arrow points is the load of the pointer to 'p9' that is passed on
stack.  The instruction really should be ldr w0, [sp], i.e. the pointer mode
is SImode rather than DImode.

It needs a number of conditions for the issue to manifest:

1. pass-by-reference; on aarch64 one example is a struct that is larger than
16 bytes.
2. the reference is passed on stack; on aarch64, this usually only happens
after registers x0 - x7 are used.
3. the size of stack slot for passing pointer is larger than the pointer
size; on aarch64, it is 8-byte vs. 4-byte
4. the unused part of the stack slot is not zeroed out, i.e. undefined by
the ABI


This is the real issue.  I think it is better if we change the ABI to
say they are zero'd.  It really makes things like this a mess.


I don't agree on this.  There is nothing wrong with the unused bits 
filled with unspecified values; there are sufficient number of 
load/store instruction variants on AArch64 that are able to load/store 
smaller-sized datum from/to memory; zeroing-unused bits on the stack may 
require extra instructions which add cost.


Nevertheless, assign_parm_find_data_types() shall not generate different 
modes for pass-by-reference argument and straight-forward pointer one. 
For instance in the following code snippet, the passed_mode and 
nominal_mode for '&p9' in foo and 'p9' in bar shall be the same; but 
'&p9' in foo gets DImode and 'p3' in bar get SImode, which is really 
wrong (by typing '&p9', I mean the pass-by-reference).


struct s5
{
  double a;
  double b;
  double c;
  double d;
  double e;
} gS;

double foo (struct s5 p1, struct s5 p2,struct s5 p3,struct s5 p4,struct 
s5 p5,struct s5 p6,struct s5 p7,struct s5 p8, struct s5 p9)

{
  return p9.c;
}

double bar (struct s5 *p1, struct s5 *p2,struct s5 *p3,struct s5 
*p4,struct s5 *p5,struct s5 *p6,struct s5 *p7,struct s5 *p8, struct s5 *p9)

{
  return p9->c;
}

Hope I have demonstrated the issue clearly.

Thanks,
Yufeng



Re: [Patch, AArch64, ILP32] 3/5 Minor change in function.c:assign_parm_find_data_types()

2013-06-26 Thread Yufeng Zhang

On 06/27/13 00:57, Andrew Pinski wrote:

On Wed, Jun 26, 2013 at 4:51 PM, Andrew Pinski  wrote:

On Wed, Jun 26, 2013 at 4:41 PM, Yufeng Zhang  wrote:

On 06/27/13 00:04, Andrew Pinski wrote:


On Wed, Jun 26, 2013 at 3:39 PM, Yufeng Zhang
wrote:


This patch updates assign_parm_find_data_types to assign passed_mode and
nominal_mode with the mode of the built pointer type instead of the
hard-coded Pmode in the case of pass-by-reference.  This is in line with
the
assignment to passed_mode and nominal_mode in other cases inside the
function.

assign_parm_find_data_types generally uses TYPE_MODE to calculate
passed_mode and nominal_mode:

/* Find mode of arg as it is passed, and mode of arg as it should be
   during execution of this function.  */
passed_mode = TYPE_MODE (passed_type);
nominal_mode = TYPE_MODE (nominal_type);

this includes the case when the passed argument is a pointer by itself.

However there is a discrepancy when it deals with argument passed by
invisible reference; it builds the argument's corresponding pointer type,
but sets passed_mode and nominal_mode with Pmode directly.

This is OK for targets where Pmode == ptr_mode, but on AArch64 with ILP32
they are different with Pmode as DImode and ptr_mode as SImode. When such
a
reference is passed on stack, the reference is prepared by the caller in
the
lower 4 bytes of an 8-byte slot but is fetched by the callee as an 8-byte
datum, of which the higher 4 bytes may contain junk.  It is probably the
combination of Pmode != ptr_mode and the particular ABI specification
that
make the AArch64 ILP32 the first target on which the issue manifests
itself.

Bootstrapped on x86_64-none-linux-gnu.

OK for the trunk?




IA64-hpux also uses Pmode != ptr_mode, can you provide the testcase
which fails without this change?
I used a powerpc64 target where Pmode != ptr_mode which did not hit
this bug either.



The issue was firstly observed in one of the compat tests which passes a
large number of non-small structures.  The following is a trimmed-down
reproducible code snippet (although not runnable but shall be easy to be
make runnable):

struct s5
{
   double a;
   double b;
   double c;
   double d;
   double e;
} gS;

double foo (struct s5 p1, struct s5 p2,struct s5 p3,struct s5 p4,struct s5
p5,struct s5 p6,struct s5 p7,struct s5 p8, struct s5 p9)
{
   return p9.c;
}
--- CUT ---

The code-gen (-O2) without the patch is:

 .text
 .align  2
 .global foo
 .type   foo, %function
foo:
 ldr x0, [sp]<<=== here!
 ldr d0, [x0,16]
 ret
 .size   foo, .-foo

Where the arrow points is the load of the pointer to 'p9' that is passed on
stack.  The instruction really should be ldr w0, [sp], i.e. the pointer mode
is SImode rather than DImode.

It needs a number of conditions for the issue to manifest:

1. pass-by-reference; on aarch64 one example is a struct that is larger than
16 bytes.
2. the reference is passed on stack; on aarch64, this usually only happens
after registers x0 - x7 are used.
3. the size of stack slot for passing pointer is larger than the pointer
size; on aarch64, it is 8-byte vs. 4-byte
4. the unused part of the stack slot is not zeroed out, i.e. undefined by
the ABI


This is the real issue.  I think it is better if we change the ABI to
say they are zero'd.  It really makes things like this a mess.


5. in the runtime, the unused part of such a stack slot contains junk.

The runtime segmentation fault may only be generated when all the above
conditions are met.  I'm not familiar with IA64-hpux or powerpc64 procedure
call ABIs, but I guess those targets are just being lucky?


Or rather their ABIs all say are zero or sign extended for values less
than 8 byte wide.


One more thing, it looks like your change will not work correctly for
big-endian ILP32 AARCH64 either as the least significant word is
offsetted by 4.


Ah, this is definitely a bug and I can confirm that it only happens on 
the 32-bit pointer-type parameter passed on stack.  I'll bring up a 
patch today to fix it.



Did you test big-endian ILP32 AARCH64?


I started the big-endian testing fairly recently, so there is limited 
testing done so far but I am still working on it and will prepare 
patches if more issues are found.


Thanks,
Yufeng



[Patch, AArch64, ILP32] Pad pointer-typed stack argument downward in ILP32

2013-06-27 Thread Yufeng Zhang
This patch fixes the bug that pointer-typed argument passed on stack is 
not padded properly in ILP32.


OK for the trunk?

Thanks,
Yufeng



gcc/

* config/aarch64/aarch64.c (aarch64_pad_arg_upward): In big-endian,
pad pointer-typed argument downward.

gcc/testsuite/

* gcc.target/aarch64/test-ptr-arg-on-stack-1.c: New test.


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f78e0d6..79f8761 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1585,11 +1585,12 @@ aarch64_pad_arg_upward (enum machine_mode mode, 
const_tree type)
   if (!BYTES_BIG_ENDIAN)
 return true;
 
-  /* Otherwise, integral types and floating point types are padded downward:
+  /* Otherwise, integral, floating-point and pointer types are padded downward:
  the least significant byte of a stack argument is passed at the highest
  byte address of the stack slot.  */
   if (type
-  ? (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
+  ? (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type)
+|| POINTER_TYPE_P (type))
   : (SCALAR_INT_MODE_P (mode) || SCALAR_FLOAT_MODE_P (mode)))
 return false;
 
diff --git a/gcc/testsuite/gcc.target/aarch64/test-ptr-arg-on-stack-1.c 
b/gcc/testsuite/gcc.target/aarch64/test-ptr-arg-on-stack-1.c
new file mode 100644
index 000..bb68e0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/test-ptr-arg-on-stack-1.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-inline" } */
+
+/* Test pass-by-reference and pointer-typed argument passing on stack.
+   This test shall pass on any of the following four combinitions:
+{big-endian, little-endian} {LP64, ILP32}.  */
+
+struct s5
+{
+  double a;
+  double b;
+  double c;
+  double d;
+  double e;
+} gS = {1.0, 2.0, 3.0, 4.0, 5.0};
+
+double  __attribute__ ((noinline))
+foo (struct s5 p1, struct s5 p2, struct s5 p3, struct s5 p4,
+ struct s5 p5, struct s5 p6, struct s5 p7, struct s5 p8,
+ struct s5 p9)
+{
+  asm ("");
+  return p9.c;
+}
+
+void abort (void);
+int printf (const char *, ...);
+
+int main (void)
+{
+  printf ("Here we print out some values and more importantly hope that"
+ " the stack is getting a bit dirty for the bug to manifest itself"
+ "\n\t%f, %f, %f, %f, %f\n", gS.a, gS.b, gS.c, gS.d, gS.e);
+
+  if (foo (gS, gS, gS, gS, gS, gS, gS, gS, gS) != 3.0)
+abort ();
+
+  return 0;
+}

Re: [Patch, AArch64, ILP32] 5/5 Define _ILP32 and __ILP32__

2013-06-27 Thread Yufeng Zhang

On 06/27/13 01:56, Joseph S. Myers wrote:

On Wed, 26 Jun 2013, Yufeng Zhang wrote:


This patch defines _ILP32 and __ILP32__ for the AArch64 port when the ILP32
ABI is in use.

This helps libraries, e.g. libgloss and glibc, recognize which model is being
compiled.


GCC already defines _LP64 and __LP64__ in architecture-independent code
for LP64 systems.  Libraries can use those to distinguish the two models
for AArch64, so I don't see any need to add architecture-specific macros
with the opposite sense.


We need a reliable way to tell we are compiling for ILP32.  On one hand 
LLP64 support may be added in the future; on the other hand, not all 
AArch64 compilers may define _LP64 and __LP64__.


Other ports like x86_64, ia64-hpux and pa-hpux also define one or both.

Thanks,
Yufeng




Re: [Patch, AArch64, ILP32] 1/5 Initial support - configury changes

2013-06-28 Thread Yufeng Zhang

Hi Andrew,

Thank you for your review.  I'm currently testing an updated patch and 
will send it for further review early next week.


Regards,
Yufeng


On 06/26/13 23:59, Andrew Pinski wrote:

On Wed, Jun 26, 2013 at 3:33 PM, Yufeng Zhang  wrote:

This patch adds the configuration changes to the AArch64 GCC to support:

* -milp32 and -mlp64 options in the compiler and the driver
* multilib of ilp32 and/or lp64 libraries
* differentiation of basic types in the compiler backend

The patch enables --with-multilib-list configuration option for specifying
the list of library flavors to enable; the default value is "mlp64" and can
be overridden by --with-abi to "milp32".

It also enables --with-abi for setting the default model in the compiler.
Its default value is "mlp64" unless --with-multilib-list is explicitly
specified with "milp32", in which case it defaults to "milp32".

In the backend, two target flags are introduced: TARGET_ILP32 and
TARGET_LP64.  They are set by -milp32 and -mlp64 respectively, exclusive to
each other.  The default setting is via the option variable
aarch64_pmodel_flags, which defaults to TARGET_DEFAULT_PMODEL, which is
further defined in biarchlp64.h or biarchilp32.h depending which header file
is included.

   biarchlp64.h biarchilp32.h
TARGET_DEFAULT_PMODEL OPTION_MASK_LP64 OPTION_MASK_ILP32
TARGET_PMODEL 12

TARGET_ILP32 and TARGET_LP64 are implicitly defined as:

#define TARGET_ILP32 ((aarch64_pmodel_flags&  OPTION_MASK_ILP32) != 0)
#define TARGET_LP64 ((aarch64_pmodel_flags&  OPTION_MASK_LP64) != 0)

Note that the multilib support in the Linux toolchain is suppressed
deliberately.

OK for the trunk?



I think you should not support --with-multilib-list at all.  It should
just include ilp32 multilib no matter what.  Note the linux multilib
has to wait until the glibc/kernel side is done.

Also:
+#if TARGET_BIG_ENDIAN_DEFAULT == 1
+#define EMUL_SUFFIX "b"
+#else
+#define EMUL_SUFFIX ""
+#endif

is broken when you supply the opposite endian option.

Also you really should just use -mabi=ilp32 and -mabi=lp64 which
reduces the number of changes needed to be done to config.gcc.

You should use DRIVER_SELF_SPECS to simplify your LINKS_SPECS.
Something like:
#ifdef TARGET_BIG_ENDIAN_DEFAULT
#define ENDIAN_SPEC "-mbig-endian"
#else
#define ENDIAN_SPEC "-mlittle-endian"
#endif
/* Force the default endianness and ABI flags onto the command line
in order to make the other specs easier to write.  */
#undef DRIVER_SELF_SPECS
#define DRIVER_SELF_SPECS \
   " %{!mbig-endian:%{!mlittle-endian:" ENDIAN_SPEC "}}" \
   " %{!milp32:%{!mlp64:-mlp64}}"

or rather:
" %{!mabi=*: -mabi=lp64}"



And then in aarch64-elf-raw.h:
#ifndef LINK_SPEC
#define LINK_SPEC "%{mbig-endian:-EB} %{mlittle-endian:-EL} -X \
-maarch64elf%{milp32:32}%{mbig-endian:b}"
#endif

Or using the -mabi=* way:
#ifndef LINK_SPEC
#define LINK_SPEC "%{mbig-endian:-EB} %{mlittle-endian:-EL} -X \
-maarch64elf%{mabi=ilp32:32}%{mbig-endian:b}"
#endif



Thanks,
Andrew Pinski




Thanks,
Yufeng


gcc/
 * config.gcc (aarch64*-*-*): Support --with-abi.
 (aarch64*-*-elf): Support --with-multilib-list.
 (aarch64*-*-linux*): Likewise.
 (supported_defaults): Add abi to aarch64*-*-*.
 * configure.ac: Mention AArch64 for --with-multilib-list.
 * configure: Re-generated.
 * config/aarch64/biarchilp32.h: New file.
 * config/aarch64/biarchlp64.h: New file.
 * config/aarch64/aarch64-elf.h (SPEC_LP64): New define.
 (SPEC_ILP32): Ditto.
 (ASM_SPEC): Update to SPEC_LP64 and SPEC_ILP32.
 (MULTILIB_DEFAULTS): New define.
 * config/aarch64/aarch64-elf-raw.h (EMUL_SUFFIX): New define.
 (LINK_SPEC): Change to depend on SPEC_LP64 and SPEC_ILP32 and also
 to use EMUL_SUFFIX.
 * config/aarch64/aarch64.h (LONG_TYPE_SIZE): Change to depend on
 TARGET_ILP32.
 (POINTER_SIZE): New define.
 (POINTERS_EXTEND_UNSIGNED): Ditto.
 * config/aarch64/aarch64.c (initialize_aarch64_programming_model):
 New declaration and definition.
 (aarch64_override_options): Call the new function.
 * config/aarch64/aarch64.opt (aarch64_pmodel_flags): New.
 (milp32, mlp64): New.
 * config/aarch64/t-aarch64 (comma): New define.
 (MULTILIB_OPTIONS): Ditto.
 (MULTILIB_DIRNAMES): Ditto.
 * config/aarch64/t-aarch64-linux (MULTIARCH_DIRNAME): New define.








Re: [Patch, AArch64, ILP32] 5/5 Define _ILP32 and __ILP32__

2013-06-28 Thread Yufeng Zhang

On 06/27/13 20:28, Joseph S. Myers wrote:

On Thu, 27 Jun 2013, Yufeng Zhang wrote:


We need a reliable way to tell we are compiling for ILP32.  On one hand LLP64
support may be added in the future; on the other hand, not all AArch64


If thinking of adding a third ABI, that suggests you should be using
something along the lines of _MIPS_SIM - a macro that's always defined,
with an integer value depending on the ABI in used.


compilers may define _LP64 and __LP64__.


Why should all such compilers define the ILP32 macros, but not all define
the LP64 macros?  Do you have an AArch64 equivalent of the ACLE that
specifies such things?


There will be ACLE for AArch64 and it is in the view that both macros 
will be specified.


Since a few other ports that support ILP32 have already defined these 
macros, it shall help the code porting with them defined for AArch64 as 
well.



Other ports like x86_64, ia64-hpux and pa-hpux also define one or both.


If multiple ports define something, that might be an indication for
defining it in target-independent code (like _LP64) rather than repeating
it for more targets.


I can propose a patch later to have _ILP32 and __ILP32__ defined in the 
target-independent code instead and see if the community like it?


Thanks,
Yufeng



[PATCH, AArch64] Remove unused types and variables for abi types

2013-07-02 Thread Yufeng Zhang
This patch removes unused types and variables claimed to handle abi 
types in aarch64.


OK for the trunk?

Thanks,
Yufeng

gcc/

* config/aarch64/aarch64.h (enum arm_abi_type): Remove.
(ARM_ABI_AAPCS64): Ditto.
(arm_abi): Ditto.
(ARM_DEFAULT_ABI): Ditto.diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index a08797b..7bdb1e2 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -521,12 +521,6 @@ typedef struct GTY (()) machine_function
 #endif
 
 
-/* Which ABI to use.  */
-enum arm_abi_type
-{
-  ARM_ABI_AAPCS64
-};
-
 enum arm_pcs
 {
   ARM_PCS_AAPCS64, /* Base standard AAPCS for 64 bit.  */
@@ -534,11 +528,7 @@ enum arm_pcs
 };
 
 
-extern enum arm_abi_type arm_abi;
 extern enum arm_pcs arm_pcs_variant;
-#ifndef ARM_DEFAULT_ABI
-#define ARM_DEFAULT_ABI ARM_ABI_AAPCS64
-#endif
 
 #ifndef ARM_DEFAULT_PCS
 #define ARM_DEFAULT_PCS ARM_PCS_AAPCS64

Re: [Patch, AArch64, ILP32] 1/5 Initial support - configury changes

2013-07-02 Thread Yufeng Zhang

Hi Andrew,

Please find the updated patch in the attachment that addresses your 
comments.


It now builds both ilp32 and lp64 multilibs by default, with the 
--with-multilib-list support remaining to provide options to turn off 
one of them.


-mabi=ilp32 and -mabi=lp64 are now the command line options to use.  The 
SPECs have been updated as well.


Thanks,
Yufeng


gcc/
* config.gcc (aarch64*-*-*): Support --with-abi.
(aarch64*-*-elf): Support --with-multilib-list.
(aarch64*-*-linux*): Likewise.
(supported_defaults): Add abi to aarch64*-*-*.
* configure.ac: Mention AArch64 for --with-multilib-list.
* configure: Re-generated.
* config/aarch64/biarchilp32.h: New file.
* config/aarch64/biarchlp64.h: New file.
* config/aarch64/aarch64-elf.h (ENDIAN_SPEC): New define.
(ABI_SPEC): Ditto.
(MULTILIB_DEFAULTS): Ditto.
(DRIVER_SELF_SPECS): Ditto.
(ASM_SPEC): Update to also substitute -mabi.
* config/aarch64/aarch64-elf-raw.h (LINK_SPEC): Add linker script
file whose name depends on -mabi= and -mbig-endian.
* config/aarch64/aarch64.h (LONG_TYPE_SIZE): Change to depend on
TARGET_ILP32.
(POINTER_SIZE): New define.
(POINTERS_EXTEND_UNSIGNED): Ditto.
(enum aarch64_abi_type): New enumeration tag.
(AARCH64_ABI_LP64, AARCH64_ABI_ILP32): New enumerators.
(AARCH64_ABI_DEFAULT): Define to AARCH64_ABI_LP64 if undefined.
(TARGET_ILP32): New define.
* config/aarch64/aarch64.opt (mabi): New.
(aarch64_abi): New.
(ilp32, lp64): New values for -mabi.
* config/aarch64/t-aarch64 (comma): New define.
(MULTILIB_OPTIONS): Ditto.
(MULTILIB_DIRNAMES): Ditto.
* config/aarch64/t-aarch64-linux (MULTIARCH_DIRNAME): New define.
* doc/invoke.texi: Document -mabi for AArch64.



On 06/26/13 23:59, Andrew Pinski wrote:

On Wed, Jun 26, 2013 at 3:33 PM, Yufeng Zhang  wrote:

This patch adds the configuration changes to the AArch64 GCC to support:

* -milp32 and -mlp64 options in the compiler and the driver
* multilib of ilp32 and/or lp64 libraries
* differentiation of basic types in the compiler backend

The patch enables --with-multilib-list configuration option for specifying
the list of library flavors to enable; the default value is "mlp64" and can
be overridden by --with-abi to "milp32".

It also enables --with-abi for setting the default model in the compiler.
Its default value is "mlp64" unless --with-multilib-list is explicitly
specified with "milp32", in which case it defaults to "milp32".

In the backend, two target flags are introduced: TARGET_ILP32 and
TARGET_LP64.  They are set by -milp32 and -mlp64 respectively, exclusive to
each other.  The default setting is via the option variable
aarch64_pmodel_flags, which defaults to TARGET_DEFAULT_PMODEL, which is
further defined in biarchlp64.h or biarchilp32.h depending which header file
is included.

   biarchlp64.h biarchilp32.h
TARGET_DEFAULT_PMODEL OPTION_MASK_LP64 OPTION_MASK_ILP32
TARGET_PMODEL 12

TARGET_ILP32 and TARGET_LP64 are implicitly defined as:

#define TARGET_ILP32 ((aarch64_pmodel_flags&  OPTION_MASK_ILP32) != 0)
#define TARGET_LP64 ((aarch64_pmodel_flags&  OPTION_MASK_LP64) != 0)

Note that the multilib support in the Linux toolchain is suppressed
deliberately.

OK for the trunk?



I think you should not support --with-multilib-list at all.  It should
just include ilp32 multilib no matter what.  Note the linux multilib
has to wait until the glibc/kernel side is done.

Also:
+#if TARGET_BIG_ENDIAN_DEFAULT == 1
+#define EMUL_SUFFIX "b"
+#else
+#define EMUL_SUFFIX ""
+#endif

is broken when you supply the opposite endian option.

Also you really should just use -mabi=ilp32 and -mabi=lp64 which
reduces the number of changes needed to be done to config.gcc.

You should use DRIVER_SELF_SPECS to simplify your LINKS_SPECS.
Something like:
#ifdef TARGET_BIG_ENDIAN_DEFAULT
#define ENDIAN_SPEC "-mbig-endian"
#else
#define ENDIAN_SPEC "-mlittle-endian"
#endif
/* Force the default endianness and ABI flags onto the command line
in order to make the other specs easier to write.  */
#undef DRIVER_SELF_SPECS
#define DRIVER_SELF_SPECS \
   " %{!mbig-endian:%{!mlittle-endian:" ENDIAN_SPEC "}}" \
   " %{!milp32:%{!mlp64:-mlp64}}"

or rather:
" %{!mabi=*: -mabi=lp64}"



And then in aarch64-elf-raw.h:
#ifndef LINK_SPEC
#define LINK_SPEC "%{mbig-endian:-EB} %{mlittle-endian:-EL} -X \
-maarch64elf%{milp32:32}%{mbig-endian:b}"
#endif

Or using the -mabi=* way:
#ifndef LINK_SPEC
#define LINK_SPEC "%{mbig-endian:-EB} %{mlittle-endian:-EL} -X \
-maarch64elf%{mabi=ilp32:32}%{mbig-endian:b}"
#endif



Thanks,
Andrew Pin

Re: [Ping] [Patch, AArch64, ILP32] 3/5 Minor change in function.c:assign_parm_find_data_types()

2013-07-02 Thread Yufeng Zhang

Ping~

Can I get an OK please if there is no objection?

Regards,
Yufeng

On 06/26/13 23:39, Yufeng Zhang wrote:

This patch updates assign_parm_find_data_types to assign passed_mode and
nominal_mode with the mode of the built pointer type instead of the
hard-coded Pmode in the case of pass-by-reference.  This is in line with
the assignment to passed_mode and nominal_mode in other cases inside the
function.

assign_parm_find_data_types generally uses TYPE_MODE to calculate
passed_mode and nominal_mode:

/* Find mode of arg as it is passed, and mode of arg as it should be
   during execution of this function.  */
passed_mode = TYPE_MODE (passed_type);
nominal_mode = TYPE_MODE (nominal_type);

this includes the case when the passed argument is a pointer by itself.

However there is a discrepancy when it deals with argument passed by
invisible reference; it builds the argument's corresponding pointer
type, but sets passed_mode and nominal_mode with Pmode directly.

This is OK for targets where Pmode == ptr_mode, but on AArch64 with
ILP32 they are different with Pmode as DImode and ptr_mode as SImode.
When such a reference is passed on stack, the reference is prepared by
the caller in the lower 4 bytes of an 8-byte slot but is fetched by the
callee as an 8-byte datum, of which the higher 4 bytes may contain junk.
   It is probably the combination of Pmode != ptr_mode and the particular
ABI specification that make the AArch64 ILP32 the first target on which
the issue manifests itself.

Bootstrapped on x86_64-none-linux-gnu.

OK for the trunk?

Thanks,
Yufeng


gcc/
* function.c (assign_parm_find_data_types): Set passed_mode and
nominal_mode to the TYPE_MODE of nominal_type for the built
pointer type in case of the struct-pass-by-reference.





[PATCH, AArch64] Add support for "wsp" register

2013-07-04 Thread Yufeng Zhang

Hi,

This patch adds support for the register "wsp"; in ILP32, this is 
necessary in order to support the global register variable associated 
the stack pointer with the syntax asm ("wsp"); it is used in libgloss to 
get the stack pointer.


OK for the trunk?

Thanks,
Yufeng


gcc/

* config/aarch64/aarch64.c (aarch64_hard_regno_mode_ok): Also 
return

true for SP_REGNUM if mode == ptr_mode.
* config/aarch64/aarch64.h (ADDITIONAL_REGISTER_NAMES): Add "wsp"
with value R0_REGNUM + 31.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7049651..46c11bc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -358,8 +358,13 @@ aarch64_hard_regno_mode_ok (unsigned regno, enum machine_mode mode)
   if (GET_MODE_CLASS (mode) == MODE_CC)
 return regno == CC_REGNUM;
 
-  if (regno == SP_REGNUM || regno == FRAME_POINTER_REGNUM
-  || regno == ARG_POINTER_REGNUM)
+  if (regno == SP_REGNUM)
+/* The purpose of comparing with ptr_mode is to support the
+   global register variable associated with the stack pointer
+   register via the syntax of asm ("wsp") in ILP32.  */
+return mode == Pmode || mode == ptr_mode;
+
+  if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM)
 return mode == Pmode;
 
   if (GP_REGNUM_P (regno) && ! aarch64_vect_struct_mode_p (mode))
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index e1fa413..0924269 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -278,7 +278,7 @@ extern unsigned long aarch64_tune_flags;
 R_ALIASES(16), R_ALIASES(17), R_ALIASES(18), R_ALIASES(19), \
 R_ALIASES(20), R_ALIASES(21), R_ALIASES(22), R_ALIASES(23), \
 R_ALIASES(24), R_ALIASES(25), R_ALIASES(26), R_ALIASES(27), \
-R_ALIASES(28), R_ALIASES(29), R_ALIASES(30), /* 31 omitted  */ \
+R_ALIASES(28), R_ALIASES(29), R_ALIASES(30), {"wsp", R0_REGNUM + 31}, \
 V_ALIASES(0),  V_ALIASES(1),  V_ALIASES(2),  V_ALIASES(3),  \
 V_ALIASES(4),  V_ALIASES(5),  V_ALIASES(6),  V_ALIASES(7),  \
 V_ALIASES(8),  V_ALIASES(9),  V_ALIASES(10), V_ALIASES(11), \

Re: [Ping^2] [Patch, AArch64, ILP32] 3/5 Minor change in function.c:assign_parm_find_data_types()

2013-07-08 Thread Yufeng Zhang

Ping^2~

Thanks,
Yufeng


On 07/02/13 23:44, Yufeng Zhang wrote:

Ping~

Can I get an OK please if there is no objection?

Regards,
Yufeng

On 06/26/13 23:39, Yufeng Zhang wrote:

This patch updates assign_parm_find_data_types to assign passed_mode and
nominal_mode with the mode of the built pointer type instead of the
hard-coded Pmode in the case of pass-by-reference.  This is in line with
the assignment to passed_mode and nominal_mode in other cases inside the
function.

assign_parm_find_data_types generally uses TYPE_MODE to calculate
passed_mode and nominal_mode:

 /* Find mode of arg as it is passed, and mode of arg as it should be
during execution of this function.  */
 passed_mode = TYPE_MODE (passed_type);
 nominal_mode = TYPE_MODE (nominal_type);

this includes the case when the passed argument is a pointer by itself.

However there is a discrepancy when it deals with argument passed by
invisible reference; it builds the argument's corresponding pointer
type, but sets passed_mode and nominal_mode with Pmode directly.

This is OK for targets where Pmode == ptr_mode, but on AArch64 with
ILP32 they are different with Pmode as DImode and ptr_mode as SImode.
When such a reference is passed on stack, the reference is prepared by
the caller in the lower 4 bytes of an 8-byte slot but is fetched by the
callee as an 8-byte datum, of which the higher 4 bytes may contain junk.
It is probably the combination of Pmode != ptr_mode and the particular
ABI specification that make the AArch64 ILP32 the first target on which
the issue manifests itself.

Bootstrapped on x86_64-none-linux-gnu.

OK for the trunk?

Thanks,
Yufeng


gcc/
* function.c (assign_parm_find_data_types): Set passed_mode and
nominal_mode to the TYPE_MODE of nominal_type for the built
pointer type in case of the struct-pass-by-reference.









Re: [Ping] [Patch, AArch64, ILP32] 1/5 Initial support - configury changes

2013-07-18 Thread Yufeng Zhang

Ping~

Thanks,
Yufeng

On 07/02/13 19:53, Yufeng Zhang wrote:

Hi Andrew,

Please find the updated patch in the attachment that addresses your
comments.

It now builds both ilp32 and lp64 multilibs by default, with the
--with-multilib-list support remaining to provide options to turn off
one of them.

-mabi=ilp32 and -mabi=lp64 are now the command line options to use.  The
SPECs have been updated as well.

Thanks,
Yufeng


gcc/
  * config.gcc (aarch64*-*-*): Support --with-abi.
  (aarch64*-*-elf): Support --with-multilib-list.
  (aarch64*-*-linux*): Likewise.
  (supported_defaults): Add abi to aarch64*-*-*.
  * configure.ac: Mention AArch64 for --with-multilib-list.
  * configure: Re-generated.
  * config/aarch64/biarchilp32.h: New file.
  * config/aarch64/biarchlp64.h: New file.
  * config/aarch64/aarch64-elf.h (ENDIAN_SPEC): New define.
  (ABI_SPEC): Ditto.
  (MULTILIB_DEFAULTS): Ditto.
  (DRIVER_SELF_SPECS): Ditto.
  (ASM_SPEC): Update to also substitute -mabi.
  * config/aarch64/aarch64-elf-raw.h (LINK_SPEC): Add linker script
  file whose name depends on -mabi= and -mbig-endian.
  * config/aarch64/aarch64.h (LONG_TYPE_SIZE): Change to depend on
  TARGET_ILP32.
  (POINTER_SIZE): New define.
  (POINTERS_EXTEND_UNSIGNED): Ditto.
  (enum aarch64_abi_type): New enumeration tag.
  (AARCH64_ABI_LP64, AARCH64_ABI_ILP32): New enumerators.
  (AARCH64_ABI_DEFAULT): Define to AARCH64_ABI_LP64 if undefined.
  (TARGET_ILP32): New define.
  * config/aarch64/aarch64.opt (mabi): New.
  (aarch64_abi): New.
  (ilp32, lp64): New values for -mabi.
  * config/aarch64/t-aarch64 (comma): New define.
  (MULTILIB_OPTIONS): Ditto.
  (MULTILIB_DIRNAMES): Ditto.
  * config/aarch64/t-aarch64-linux (MULTIARCH_DIRNAME): New define.
  * doc/invoke.texi: Document -mabi for AArch64.



On 06/26/13 23:59, Andrew Pinski wrote:

On Wed, Jun 26, 2013 at 3:33 PM, Yufeng Zhang   wrote:

This patch adds the configuration changes to the AArch64 GCC to support:

* -milp32 and -mlp64 options in the compiler and the driver
* multilib of ilp32 and/or lp64 libraries
* differentiation of basic types in the compiler backend

The patch enables --with-multilib-list configuration option for specifying
the list of library flavors to enable; the default value is "mlp64" and can
be overridden by --with-abi to "milp32".

It also enables --with-abi for setting the default model in the compiler.
Its default value is "mlp64" unless --with-multilib-list is explicitly
specified with "milp32", in which case it defaults to "milp32".

In the backend, two target flags are introduced: TARGET_ILP32 and
TARGET_LP64.  They are set by -milp32 and -mlp64 respectively, exclusive to
each other.  The default setting is via the option variable
aarch64_pmodel_flags, which defaults to TARGET_DEFAULT_PMODEL, which is
further defined in biarchlp64.h or biarchilp32.h depending which header file
is included.

biarchlp64.h biarchilp32.h
TARGET_DEFAULT_PMODEL OPTION_MASK_LP64 OPTION_MASK_ILP32
TARGET_PMODEL 12

TARGET_ILP32 and TARGET_LP64 are implicitly defined as:

#define TARGET_ILP32 ((aarch64_pmodel_flags&   OPTION_MASK_ILP32) != 0)
#define TARGET_LP64 ((aarch64_pmodel_flags&   OPTION_MASK_LP64) != 0)

Note that the multilib support in the Linux toolchain is suppressed
deliberately.

OK for the trunk?



I think you should not support --with-multilib-list at all.  It should
just include ilp32 multilib no matter what.  Note the linux multilib
has to wait until the glibc/kernel side is done.

Also:
+#if TARGET_BIG_ENDIAN_DEFAULT == 1
+#define EMUL_SUFFIX "b"
+#else
+#define EMUL_SUFFIX ""
+#endif

is broken when you supply the opposite endian option.

Also you really should just use -mabi=ilp32 and -mabi=lp64 which
reduces the number of changes needed to be done to config.gcc.

You should use DRIVER_SELF_SPECS to simplify your LINKS_SPECS.
Something like:
#ifdef TARGET_BIG_ENDIAN_DEFAULT
#define ENDIAN_SPEC "-mbig-endian"
#else
#define ENDIAN_SPEC "-mlittle-endian"
#endif
/* Force the default endianness and ABI flags onto the command line
 in order to make the other specs easier to write.  */
#undef DRIVER_SELF_SPECS
#define DRIVER_SELF_SPECS \
" %{!mbig-endian:%{!mlittle-endian:" ENDIAN_SPEC "}}" \
" %{!milp32:%{!mlp64:-mlp64}}"

or rather:
" %{!mabi=*: -mabi=lp64}"



And then in aarch64-elf-raw.h:
#ifndef LINK_SPEC
#define LINK_SPEC "%{mbig-endian:-EB} %{mlittle-endian:-EL} -X \
-maarch64elf%{milp32:32}%{mbig-endian:b}"
#endif

Or using the -mabi=* way:
#ifndef LINK_SPEC
#define LINK_SPEC "%{mbig-endian:-EB} %{mlittle-endian:-EL} -X \
-maarch64elf%{mabi=ilp32:32}%{mbig-endian:b}"
#endif



Thanks,
Andrew Pinski





Re: [Ping] [Patch, AArch64, ILP32] 2/5 More backend changes and support for small absolute and small PIC addressing models

2013-07-18 Thread Yufeng Zhang

Ping~

Thanks,
Yufeng

On 06/26/13 23:35, Yufeng Zhang wrote:

This patch updates the AArch64 backend to support the small absolute and
small PIC addressing models for ILP32; it also updates a number of other
backend macros and hooks in order to support ILP32.

OK for the trunk?

Thanks,
Yufeng


gcc/

  * config/aarch64/aarch64.c (POINTER_BYTES): New define.
  (aarch64_load_symref_appropriately): In the case of
  SYMBOL_SMALL_ABSOLUTE, use the mode of 'dest' instead of Pmode
  to generate new rtx; likewise to the case of SYMBOL_SMALL_GOT.
  (aarch64_expand_mov_immediate): In the case of SYMBOL_FORCE_TO_MEM,
  change to pass 'ptr_mode' to force_const_mem and zero-extend 'mem'
  if 'mode' doesn't equal to 'ptr_mode'.
  (aarch64_output_mi_thunk): Add an assertion on the alignment of
  'vcall_offset'; change to call aarch64_emit_move differently
depending
  on whether 'Pmode' equals to 'ptr_mode' or not; use 'POINTER_BYTES'
  to calculate the upper bound of 'vcall_offset'.
  (aarch64_cannot_force_const_mem): Change to also return true if
  mode != ptr_mode.
  (aarch64_legitimize_reload_address): In the case of large
  displacements, add new local variable 'xmode' and an assertion
  based on it; change to use 'xmode' to generate the new rtx and
  reload.
  (aarch64_asm_trampoline_template): Change to generate the template
  differently depending on TARGET_ILP32 or not; change to use
  'POINTER_BYTES' in the argument passed to assemble_aligned_integer.
  (aarch64_trampoline_size): Removed.
  (aarch64_trampoline_init): Add new local constant 'tramp_code_sz'
  and replace immediate literals with it.  Change to use 'ptr_mode'
  instead of 'DImode' and call convert_memory_address if the mode
  of 'fnaddr' doesn't equal to 'ptr_mode'.
  (aarch64_elf_asm_constructor): Change to use
assemble_aligned_integer
  to output symbol.
  (aarch64_elf_asm_destructor): Likewise.
  * config/aarch64/aarch64.h (TRAMPOLINE_SIZE): Change to be
dependent
  on TARGET_ILP32 instead of aarch64_trampoline_size.
  * config/aarch64/aarch64.md (movsi_aarch64): Add new alternatives
  of 'mov' between WSP and W registers as well as 'adr' and 'adrp'.
  (loadwb_pair_): Rename to ...
  (loadwb_pair_): ... this.  Replace PTR with P.
  (storewb_pair_): Likewise; rename to ...
  (storewb_pair_): ... this.
  (add_losym): Change to 'define_expand' and call
gen_add_losym_
  depending on the value of 'mode'.
  (add_losym_): New.
  (ldr_got_small_): New, based on ldr_got_small.
  (ldr_got_small): Remove.
  (ldr_got_small_sidi): New.
  * config/aarch64/iterators.md (P): New.
  (PTR): Change to 'ptr_mode' in the condition.





Re: [Ping^3] [Patch, AArch64, ILP32] 3/5 Minor change in function.c:assign_parm_find_data_types()

2013-07-18 Thread Yufeng Zhang

Ping^3~

Thanks,
Yufeng

On 07/08/13 11:11, Yufeng Zhang wrote:

Ping^2~

Thanks,
Yufeng


On 07/02/13 23:44, Yufeng Zhang wrote:

Ping~

Can I get an OK please if there is no objection?

Regards,
Yufeng

On 06/26/13 23:39, Yufeng Zhang wrote:

This patch updates assign_parm_find_data_types to assign passed_mode and
nominal_mode with the mode of the built pointer type instead of the
hard-coded Pmode in the case of pass-by-reference.  This is in line with
the assignment to passed_mode and nominal_mode in other cases inside the
function.

assign_parm_find_data_types generally uses TYPE_MODE to calculate
passed_mode and nominal_mode:

  /* Find mode of arg as it is passed, and mode of arg as it should be
 during execution of this function.  */
  passed_mode = TYPE_MODE (passed_type);
  nominal_mode = TYPE_MODE (nominal_type);

this includes the case when the passed argument is a pointer by itself.

However there is a discrepancy when it deals with argument passed by
invisible reference; it builds the argument's corresponding pointer
type, but sets passed_mode and nominal_mode with Pmode directly.

This is OK for targets where Pmode == ptr_mode, but on AArch64 with
ILP32 they are different with Pmode as DImode and ptr_mode as SImode.
When such a reference is passed on stack, the reference is prepared by
the caller in the lower 4 bytes of an 8-byte slot but is fetched by the
callee as an 8-byte datum, of which the higher 4 bytes may contain junk.
 It is probably the combination of Pmode != ptr_mode and the particular
ABI specification that make the AArch64 ILP32 the first target on which
the issue manifests itself.

Bootstrapped on x86_64-none-linux-gnu.

OK for the trunk?

Thanks,
Yufeng


gcc/
* function.c (assign_parm_find_data_types): Set passed_mode and
nominal_mode to the TYPE_MODE of nominal_type for the built
pointer type in case of the struct-pass-by-reference.













Re: [Ping] [Patch, AArch64, ILP32] 4/5 Change tests to be ILP32-friendly

2013-07-18 Thread Yufeng Zhang

Ping~

Thanks,
Yufeng

On 06/26/13 23:41, Yufeng Zhang wrote:

The attached patch fixes a few gcc test cases.


Thanks,
Yufeng


gcc/testsuite/

* gcc.dg/20020219-1.c: Skip the test on aarch64*-*-* in ilp32.
* gcc.target/aarch64/aapcs64/test_18.c (struct y): Change the field
type from long to long long.
* gcc.target/aarch64/atomic-op-long.c: Update dg-final directives
to have effective-target keywords of lp64 and ilp32.
* gcc.target/aarch64/fcvt_double_int.c: Likewise.
* gcc.target/aarch64/fcvt_double_long.c: Likewise.
* gcc.target/aarch64/fcvt_double_uint.c: Likewise.
* gcc.target/aarch64/fcvt_double_ulong.c: Likewise.
* gcc.target/aarch64/fcvt_float_int.c: Likewise.
* gcc.target/aarch64/fcvt_float_long.c: Likewise.
* gcc.target/aarch64/fcvt_float_uint.c: Likewise.
* gcc.target/aarch64/fcvt_float_ulong.c: Likewise.
* gcc.target/aarch64/vect_smlal_1.c: Replace 'long' with 'long long'.





Re: [Ping] [Patch, AArch64, ILP32] 5/5 Define _ILP32 and __ILP32__

2013-07-18 Thread Yufeng Zhang

Ping~

Thanks,
Yufeng

On 06/26/13 23:42, Yufeng Zhang wrote:

This patch defines _ILP32 and __ILP32__ for the AArch64 port when the
ILP32 ABI is in use.

This helps libraries, e.g. libgloss and glibc, recognize which model is
being compiled.

OK for the trunk?

Thanks,
Yufeng


gcc/
* config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define _ILP32
and __ILP32__ when the ILP32 model is in use.





Re: [Ping] [Patch, AArch64, ILP32] Pad pointer-typed stack argument downward in ILP32

2013-07-18 Thread Yufeng Zhang

Ping~

Thanks,
Yufeng

On 06/27/13 17:00, Yufeng Zhang wrote:

This patch fixes the bug that pointer-typed argument passed on stack is
not padded properly in ILP32.

OK for the trunk?

Thanks,
Yufeng



gcc/

  * config/aarch64/aarch64.c (aarch64_pad_arg_upward): In big-endian,
  pad pointer-typed argument downward.

gcc/testsuite/

  * gcc.target/aarch64/test-ptr-arg-on-stack-1.c: New test.






[PATCH, AArch64] Change to pass -mabi=* directly to the assembler

2013-07-19 Thread Yufeng Zhang

Hi,

Following the work in AArch64 GAS to unify the ABI command line 
interface, this patch updates the compiler driver to pass -mabi=* 
directly to the assembler.


The related GAS patch is here:
http://www.sourceware.org/ml/binutils/2013-07/msg00180.html

OK for the trunk (after the initial ILP32 patch set are committed)?

Thanks,
Yufeng


gcc/

* config/aarch64/aarch64-elf.h (ASM_SPEC): Pass on -mabi=*.

diff --git a/gcc/config/aarch64/aarch64-elf.h b/gcc/config/aarch64/aarch64-elf.h
index 315a510..4757d22 100644
--- a/gcc/config/aarch64/aarch64-elf.h
+++ b/gcc/config/aarch64/aarch64-elf.h
@@ -140,8 +140,7 @@
 %{mlittle-endian:-EL} \
 %{mcpu=*:-mcpu=%*} \
 %{march=*:-march=%*} \
-%{mabi=ilp32*:-milp32} \
-%{mabi=lp64*:-mlp64}"
+%{mabi=*:-mabi=%*}"
 #endif
 
 #undef TYPE_OPERAND_FMT

[PATCH, AArch64] Skip gcc.dg/lower-subreg-1.c

2013-07-26 Thread Yufeng Zhang

Hi,

This patch changes to skip gcc.dg/lower-subreg-1.c for aarch64*-*-*. 
The word mode in aarch64 is 64-bit so the lower-subreg pass won't happen 
in this test case.  The test is currently skipped on aarch64 with lp64 
due to the directive of "dg-require-effective-target ilp32", but fails 
when -mabi=ilp32 is in use.


OK to commit?

Thanks,
Yufeng

gcc/testsuite/

* gcc.dg/lower-subreg-1.c: Skip aarch64*-*-*.diff --git a/gcc/testsuite/gcc.dg/lower-subreg-1.c b/gcc/testsuite/gcc.dg/lower-subreg-1.c
index f5827e1..102ba22 100644
--- a/gcc/testsuite/gcc.dg/lower-subreg-1.c
+++ b/gcc/testsuite/gcc.dg/lower-subreg-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! { mips64 || { arm*-*-* ia64-*-* sparc*-*-* spu-*-* tilegx-*-* } } } } } */
+/* { dg-do compile { target { ! { mips64 || { aarch64*-*-* arm*-*-* ia64-*-* sparc*-*-* spu-*-* tilegx-*-* } } } } } */
 /* { dg-options "-O -fdump-rtl-subreg1" } */
 /* { dg-skip-if "" { { i?86-*-* x86_64-*-* } && x32 } { "*" } { "" } } */
 /* { dg-require-effective-target ilp32 } */

  1   2   >