date:20110420

[0/5] Improvements to vldN and vstN intrinsics

2011-04-20 Thread Richard Sandiford

I've just submitted a merge request for the vldN and vstN intrinsic
improvements.  There are five related patches, so I thought it might
be easier to review the merge if I posted the individual changes here.

See:

http://www.mail-archive.com/linaro-toolchain@lists.linaro.org/msg00969.html

for an example of how this helps.

Richard

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[1/5] Improve output of vld3q and vld4q

2011-04-20 Thread Richard Sandiford

This first patch optimises the output for vld3q and vld4q functions.
These functions expand into two individual vld3 and vld4 instructions,
with each instruction setting one (interleaved) half of the output
register.  The problem was that both instructions treated the
output register as an input, whereas only the second one needs to.
We therefore treated the output register as being live before the
vldNq and generated unnecessary spill code.

E.g.:

#include 

void
foo (uint32_t *a, uint32_t *b, uint32_t *c)
{
  uint32x4x3_t x, y;

  x = vld3q_u32 (a);
  y = vld3q_u32 (b);
  x.val[0] = vaddq_u32 (x.val[0], y.val[0]);
  x.val[1] = vaddq_u32 (x.val[1], y.val[1]);
  x.val[2] = vaddq_u32 (x.val[2], y.val[2]);
  vst3q_u32 (a, x);
}

gave:

stmfd   sp!, {r3, fp}
ldr r2, .L2
add fp, sp, #4
vldmia  r2, {d16-d21}
sub sp, sp, #112
vmovq11, q8  @ ti
vmovq12, q9  @ ti
vmovq13, q10  @ ti
...

where the vldmia is loading the x and y "inputs" to the two vld3q_u32s
from the corresponding stack slots.

The patch is a backport of:

http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01634.html

which has been applied to 4.7.  No changes were needed for 4.5.

Richard


gcc/
Backport from mainline:

2011-03-30  Richard Sandiford  
Ramana Radhakrishnan  

PR target/43590
* config/arm/neon.md (neon_vld3qa, neon_vld4qa): Remove
operand 1 and reshuffle the operands to match.
(neon_vld3, neon_vld4): Update accordingly.

Index: gcc/config/arm/neon.md
===
--- gcc/config/arm/neon.md  2011-04-19 13:55:04.0 +
+++ gcc/config/arm/neon.md  2011-04-19 13:55:04.0 +
@@ -4925,8 +4925,7 @@ (define_expand "neon_vld3"
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_NEON"
 {
-  emit_insn (gen_neon_vld3qa (operands[0], operands[0],
-operands[1], operands[1]));
+  emit_insn (gen_neon_vld3qa (operands[0], operands[1], operands[1]));
   emit_insn (gen_neon_vld3qb (operands[0], operands[0],
 operands[1], operands[1]));
   DONE;
@@ -4934,12 +4933,11 @@ (define_expand "neon_vld3"
 
 (define_insn "neon_vld3qa"
   [(set (match_operand:CI 0 "s_register_operand" "=w")
-(unspec:CI [(mem:CI (match_operand:SI 3 "s_register_operand" "2"))
-(match_operand:CI 1 "s_register_operand" "0")
+(unspec:CI [(mem:CI (match_operand:SI 2 "s_register_operand" "1"))
 (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
UNSPEC_VLD3A))
-   (set (match_operand:SI 2 "s_register_operand" "=r")
-(plus:SI (match_dup 3)
+   (set (match_operand:SI 1 "s_register_operand" "=r")
+(plus:SI (match_dup 2)
 (const_int 24)))]
   "TARGET_NEON"
 {
@@ -4948,7 +4946,7 @@ (define_insn "neon_vld3qa"
   ops[0] = gen_rtx_REG (DImode, regno);
   ops[1] = gen_rtx_REG (DImode, regno + 4);
   ops[2] = gen_rtx_REG (DImode, regno + 8);
-  ops[3] = operands[2];
+  ops[3] = operands[1];
   output_asm_insn ("vld3.\t{%P0, %P1, %P2}, [%3]!", ops);
   return "";
 }
@@ -5217,8 +5215,7 @@ (define_expand "neon_vld4"
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_NEON"
 {
-  emit_insn (gen_neon_vld4qa (operands[0], operands[0],
-operands[1], operands[1]));
+  emit_insn (gen_neon_vld4qa (operands[0], operands[1], operands[1]));
   emit_insn (gen_neon_vld4qb (operands[0], operands[0],
 operands[1], operands[1]));
   DONE;
@@ -5226,12 +5223,11 @@ (define_expand "neon_vld4"
 
 (define_insn "neon_vld4qa"
   [(set (match_operand:XI 0 "s_register_operand" "=w")
-(unspec:XI [(mem:XI (match_operand:SI 3 "s_register_operand" "2"))
-(match_operand:XI 1 "s_register_operand" "0")
+(unspec:XI [(mem:XI (match_operand:SI 2 "s_register_operand" "1"))
 (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
UNSPEC_VLD4A))
-   (set (match_operand:SI 2 "s_register_operand" "=r")
-(plus:SI (match_dup 3)
+   (set (match_operand:SI 1 "s_register_operand" "=r")
+(plus:SI (match_dup 2)
 (const_int 32)))]
   "TARGET_NEON"
 {
@@ -5241,7 +5237,7 @@ (define_insn "neon_vld4qa"
   ops[1] = gen_rtx_REG (DImode, regno + 4);
   ops[2] = gen_rtx_REG (DImode, regno + 8);
   ops[3] = gen_rtx_REG (DImode, regno + 12);
-  ops[4] = operands[2];
+  ops[4] = operands[1];
   output_asm_insn ("vld4.\t{%P0, %P1, %P2, %P3}, [%4]!", ops);
   return "";
 }

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[2/5] Remodel the vldN and vstN patterns

2011-04-20 Thread Richard Sandiford

The patterns for the Neon vld and vst intrinsics used the following sort
of construct to refer to memory:

(mem:FOO (match_operand:SI X "register_operand" "r"))

This patch changes them to use:

(match_operand:FOO' X "neon_struct_operand" "(=)Um")

instead.  This allows the loads to use post-increment addresses as well
as bare registers, and also matches the form that the vec_load_lanes
and vec_store_lanes optabs need.  (Those optabs will be in a later
autovectorisation merge.)

The patch is a backport of:

http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html

which has been applied to 4.7.  There are three differences in the
4.5 version:

* Our 4.5 code prints alignments as "[rN, :ALIGN]" rather than
  "[rN:ALIGN]".  I've fixed that here.  The initial commit to FSF trunk
  used the correct form, so there isn't a separate fix that could be
  backported.

* 4.5 doesn't have MEM_REF, so neon_dereference_pointer uses an
  INDIRECT_REF instead.

* 4.5 defines the mode attributes in neon.md rather than in a
  separate iterators.md.

Richard


gcc/
Backport from mainline:

2011-04-12  Richard Sandiford  

* config/arm/arm.c (arm_print_operand): Use MEM_SIZE to get the
size of a '%A' memory reference.
(T_DREG, T_QREG): New neon_builtin_type_bits.
(arm_init_neon_builtins): Assert that the load and store operands
are neon_struct_operands.
(locate_neon_builtin_icode): Provide the neon_builtin_type_bits.
(NEON_ARG_MEMORY): New builtin_arg.
(neon_dereference_pointer): New function.
(arm_expand_neon_args): Add a neon_builtin_type_bits argument.
Handle NEON_ARG_MEMORY.
(arm_expand_neon_builtin): Update after above interface changes.
Use NEON_ARG_MEMORY for loads and stores.
* config/arm/predicates.md (neon_struct_operand): New predicate.
* config/arm/neon.md (V_two_elem): Tweak formatting.
(V_three_elem): Use BLKmode for accesses that have no associated mode.
(neon_vld1, neon_vld1_dup)
(neon_vst1_lane, neon_vst1, neon_vld2)
(neon_vld2_lane, neon_vld2_dup, neon_vst2)
(neon_vst2_lane, neon_vld3, neon_vld3_lane)
(neon_vld3_dup, neon_vst3, neon_vst3_lane)
(neon_vld4, neon_vld4_lane, neon_vld4_dup)
(neon_vst4): Replace pointer operand with a memory operand.
Use %A in the output template.
(neon_vld3qa, neon_vld3qb, neon_vst3qa)
(neon_vst3qb, neon_vld4qa, neon_vld4qb)
(neon_vst4qa, neon_vst4qb): Likewise, but halve
the width of the memory access.  Remove post-increment.
* config/arm/neon-testgen.ml: Allow addresses to have an alignment.

gcc/testsuite/
Backport from mainline:

2011-04-12  Richard Sandiford  

* gcc.target/arm/neon-vld3-1.c: New test.
* gcc.target/arm/neon-vst3-1.c: New test.
* gcc.target/arm/neon/v*.c: Regenerate.

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c2011-04-20 08:29:44.0 +
+++ gcc/config/arm/arm.c2011-04-20 09:32:44.0 +
@@ -16847,7 +16847,7 @@ arm_print_operand (FILE *stream, rtx x,
   {
rtx addr;
bool postinc = FALSE;
-   unsigned align, modesize, align_bits;
+   unsigned align, memsize, align_bits;
 
gcc_assert (GET_CODE (x) == MEM);
addr = XEXP (x, 0);
@@ -16862,12 +16862,12 @@ arm_print_operand (FILE *stream, rtx x,
   instruction (for some alignments) as an aid to the memory subsystem
   of the target.  */
align = MEM_ALIGN (x) >> 3;
-   modesize = GET_MODE_SIZE (GET_MODE (x));
+   memsize = INTVAL (MEM_SIZE (x));

/* Only certain alignment specifiers are supported by the hardware.  */
-   if (modesize == 16 && (align % 32) == 0)
+   if (memsize == 16 && (align % 32) == 0)
  align_bits = 256;
-   else if ((modesize == 8 || modesize == 16) && (align % 16) == 0)
+   else if ((memsize == 8 || memsize == 16) && (align % 16) == 0)
  align_bits = 128;
else if ((align % 8) == 0)
  align_bits = 64;
@@ -16875,7 +16875,7 @@ arm_print_operand (FILE *stream, rtx x,
  align_bits = 0;

if (align_bits != 0)
- asm_fprintf (stream, ", :%d", align_bits);
+ asm_fprintf (stream, ":%d", align_bits);
 
asm_fprintf (stream, "]");
 
@@ -18398,12 +18398,14 @@ enum neon_builtin_type_bits {
   T_V2SI  = 0x0004,
   T_V2SF  = 0x0008,
   T_DI= 0x0010,
+  T_DREG  = 0x001F,
   T_V16QI = 0x0020,
   T_V8HI  = 0x0040,
   T_V4SI  = 0x0080,
   T_V4SF  = 0x0100,
   T_V2DI  = 0x0200,
   T_TI   = 0x0400,
+  T_QREG  = 0x07E0,
   T_EI   = 0x0800,
   T_OI   = 0x1000
 };
@@ -19049,10 +19051,9 @@ arm_init_neon_builtins (void)
if (is_load && k == 1)
  {
/* Neon load patterns always have

[3/5] Allow arrays of vectors to be stored in registers

2011-04-20 Thread Richard Sandiford

This patch allows the target to override MAX_FIXED_MODE_SIZE for
specific kinds of array.  We can then give a non-BLK mode to things
like uint32x2x4_t, which in turn allows them to be stored in registers.

The patch is a backport of:

http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02192.html

which Richard Guenther approved in principle, but which can't be
applied yet because of 4/5.  The only difference in the 4.5 version
is that 4.5 still uses the old target hook definition scheme,
rather than 4.7's target.def.

Richard


gcc/
* hooks.h (hook_bool_mode_uhwi_false): Declare.
* hooks.c (hook_bool_mode_uhwi_false): New function.
* doc/tm.texi (TARGET_ARRAY_MODE_SUPPORTED_P): Document.
* target.h (array_mode_supported_p): New hook.
* target-def.h (TARGET_ARRAY_MODE_SUPPORTED_P): Define if undefined.
(TARGET_INITIALIZER): Include it.
* stor-layout.c (mode_for_array): New function.
(layout_type): Use it.
* config/arm/arm.c (arm_array_mode_supported_p): New function.
(TARGET_ARRAY_MODE_SUPPORTED_P): Define.

Index: gcc/hooks.h
===
--- gcc/hooks.h 2011-04-19 14:14:01.0 +
+++ gcc/hooks.h 2011-04-19 16:19:06.0 +
@@ -32,6 +32,8 @@ extern bool hook_bool_const_int_const_in
 extern bool hook_bool_mode_false (enum machine_mode);
 extern bool hook_bool_mode_const_rtx_false (enum machine_mode, const_rtx);
 extern bool hook_bool_mode_const_rtx_true (enum machine_mode, const_rtx);
+extern bool hook_bool_mode_uhwi_false (enum machine_mode,
+  unsigned HOST_WIDE_INT);
 extern bool hook_bool_tree_false (tree);
 extern bool hook_bool_const_tree_false (const_tree);
 extern bool hook_bool_tree_true (tree);
Index: gcc/hooks.c
===
--- gcc/hooks.c 2011-04-19 14:14:01.0 +
+++ gcc/hooks.c 2011-04-19 16:19:06.0 +
@@ -86,6 +86,15 @@ hook_bool_mode_const_rtx_true (enum mach
   return true;
 }
 
+/* Generic hook that takes (enum machine_mode, unsigned HOST_WIDE_INT)
+   and returns false.  */
+bool
+hook_bool_mode_uhwi_false (enum machine_mode mode ATTRIBUTE_UNUSED,
+  unsigned HOST_WIDE_INT value ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
 /* Generic hook that takes (FILE *, const char *) and does nothing.  */
 void
 hook_void_FILEptr_constcharptr (FILE *a ATTRIBUTE_UNUSED, const char *b 
ATTRIBUTE_UNUSED)
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi 2011-04-19 14:14:01.0 +
+++ gcc/doc/tm.texi 2011-04-19 16:38:08.0 +
@@ -4367,6 +4367,34 @@ insns involving vector mode @var{mode}.
 must have move patterns for this mode.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_ARRAY_MODE_SUPPORTED_P (enum machine_mode 
@var{mode}, unsigned HOST_WIDE_INT @var{nelems})
+Return true if GCC should try to use a scalar mode to store an array
+of @var{nelems} elements, given that each element has mode @var{mode}.
+Returning true here overrides the usual @code{MAX_FIXED_MODE} limit
+and allows GCC to use any defined integer mode.
+
+One use of this hook is to support vector load and store operations
+that operate on several homogeneous vectors.  For example, ARM Neon
+has operations like:
+
+@smallexample
+int8x8x3_t vld3_s8 (const int8_t *)
+@end smallexample
+
+where the return type is defined as:
+
+@smallexample
+typedef struct int8x8x3_t
+@{
+  int8x8_t val[3];
+@} int8x8x3_t;
+@end smallexample
+
+If this hook allows @code{val} to have a scalar mode, then
+@code{int8x8x3_t} can have the same mode.  GCC can then store
+@code{int8x8x3_t}s in registers rather than forcing them onto the stack.
+@end deftypefn
+
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
Index: gcc/target.h
===
--- gcc/target.h2011-04-19 14:14:01.0 +
+++ gcc/target.h2011-04-19 16:38:08.0 +
@@ -764,6 +764,9 @@ struct gcc_target
  for further details.  */
   bool (* vector_mode_supported_p) (enum machine_mode mode);
 
+  /* See tm.texi.  */
+  bool (* array_mode_supported_p) (enum machine_mode, unsigned HOST_WIDE_INT);
+
   /* Compute a (partial) cost for rtx X.  Return true if the complete
  cost has been computed, and false if subexpressions should be
  scanned.  In either case, *TOTAL contains the cost result.  */
Index: gcc/target-def.h
===
--- gcc/target-def.h2011-04-19 14:14:01.0 +
+++ gcc/target-def.h2011-04-19 16:38:08.0 +
@@ -553,6 +553,10 @@ #define TARGET_FIXED_POINT_SUPPORTED_P d
 #define TARGET_VECTOR_MODE_SUPPORTED_P hook_bool_mode_false
 #endif
 
+#ifndef TARGET_ARRAY_MODE_SUPPORTED_P
+

[4/5] Convert LEGITIMATE_CONSTANT_P into a hook and add a more argument

2011-04-20 Thread Richard Sandiford

This patch converts LEGITIMATE_CONSTANT_P into a target hook and
passes along the mode of the constant.  This can then be used by 5/5.

The patch is a version of:

http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00195.html

which is still pending review after two pings.  It seems pretty simple
though, so I think it's worth backporting now rather than waiting for
upstream approval.

The backport is very much a cut-down version.  Rather than convert all
targets to the new hook, I've kept LEGITIMATE_CONSTANT_P around and
made it the default implementation of the new hook.  Only ARM defines
the hook directly.

Note that the ARM definition is supposed to be identical to the old
LEGITIMATE_CONSTANT_P version.  Only 5/5 is meant to change it.

Richard


gcc/
* doc/tm.texi (LEGITIMATE_CONSTANT_P): Replace with...
(TARGET_LEGITIMATE_CONSTANT_P): ...this.
* target.h (gcc_target): Add legitimate_constant_p.
* target-def.h (TARGET_LEGITIMATE_CONSTANT_P): Define.
(TARGET_INITIALIZER): Include it.
* calls.c (precompute_register_parameters): Replace uses of
LEGITIMATE_CONSTANT_P with targetm.legitimate_constant_p.
(emit_library_call_value_1): Likewise.
* expr.c (move_block_to_reg, can_store_by_pieces, emit_move_insn)
(compress_float_constant, emit_push_insn, expand_expr_real_1): Likewise.
* recog.c (general_operand, immediate_operand): Likewise.
* reload.c (find_reloads_toplev, find_reloads_address_part): Likewise.
* reload1.c (init_eliminable_invariants): Likewise.
* targhooks.h (default_legitimate_constant_p); Declare.
* targhooks.c (default_legitimate_constant_p): New function.

* config/arm/arm-protos.h (arm_cannot_force_const_mem): Delete.
* config/arm/arm.h (ARM_LEGITIMATE_CONSTANT_P): Likewise.
(THUMB_LEGITIMATE_CONSTANT_P, LEGITIMATE_CONSTANT_P): Likewise.
* config/arm/arm.c (TARGET_LEGITIMATE_CONSTANT_P): Define.
(arm_legitimate_constant_p_1, thumb_legitimate_constant_p)
(arm_legitimate_constant_p): New functions.
(arm_cannot_force_const_mem): Make static.

Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi 2011-04-19 16:38:08.0 +
+++ gcc/doc/tm.texi 2011-04-19 16:38:15.0 +
@@ -2642,8 +2642,8 @@ instruction for loading an immediate val
 register, so @code{PREFERRED_RELOAD_CLASS} returns @code{NO_REGS} when
 @var{x} is a floating-point constant.  If the constant can't be loaded
 into any kind of register, code generation will be better if
-@code{LEGITIMATE_CONSTANT_P} makes the constant illegitimate instead
-of using @code{PREFERRED_RELOAD_CLASS}.
+@code{TARGET_LEGITIMATE_CONSTANT_P} makes the constant illegitimate instead
+of using @code{TARGET_PREFERRED_RELOAD_CLASS}.
 
 If an insn has pseudos in it after register allocation, reload will go
 through the alternatives and call repeatedly @code{PREFERRED_RELOAD_CLASS}
@@ -5628,13 +5628,13 @@ addresses.  Many RISC machines have no m
 You may assume that @var{addr} is a valid address for the machine.
 @end defmac
 
-@defmac LEGITIMATE_CONSTANT_P (@var{x})
-A C expression that is nonzero if @var{x} is a legitimate constant for
-an immediate operand on the target machine.  You can assume that
-@var{x} satisfies @code{CONSTANT_P}, so you need not check this.  In fact,
-@samp{1} is a suitable definition for this macro on machines where
-anything @code{CONSTANT_P} is valid.
-@end defmac
+@deftypefn {Target Hook} bool TARGET_LEGITIMATE_CONSTANT_P (enum machine_mode 
@var{mode}, rtx @var{x})
+This hook returns true if @var{x} is a legitimate constant for a
+@var{mode}-mode immediate operand on the target machine.  You can assume that
+@var{x} satisfies @code{CONSTANT_P}, so you need not check this.
+
+The default definition returns true.
+@end deftypefn
 
 @deftypefn {Target Hook} rtx TARGET_DELEGITIMIZE_ADDRESS (rtx @var{x})
 This hook is used to undo the possibly obfuscating effects of the
Index: gcc/target.h
===
--- gcc/target.h2011-04-19 16:38:08.0 +
+++ gcc/target.h2011-04-19 16:38:16.0 +
@@ -645,7 +645,10 @@ struct gcc_target
   /* Return true if the target supports conditional execution.  */
   bool (* have_conditional_execution) (void);
 
-  /* True if the constant X cannot be placed in the constant pool.  */
+  /* See tm.texi.  */
+  bool (* legitimate_constant_p) (enum machine_mode, rtx);
+
+/* True if the constant X cannot be placed in the constant pool.  */
   bool (* cannot_force_const_mem) (rtx);
 
   /* True if the insn X cannot be duplicated.  */
Index: gcc/target-def.h
===
--- gcc/target-def.h2011-04-19 16:38:08.0 +
+++ gcc/target-def.h2011-04-19 16:38:16.0 +
@@ -563,6 +563,7 @@ #define TARGE

[5/5] Fix PR target/46329

2011-04-20 Thread Richard Sandiford

This patch handles moves involving structure constants.  It's a backport of:

http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00200.html

which Richard Earnshaw has approved, but which cannot be applied yet
because it depends on 4/5.  The patch is needed because 3/5 would
otherwise expose new instances of the PR.

Richard


gcc/
PR target/46329
* config/arm/arm.c (arm_legitimate_constant_p_1): Return false
for all Neon struct constants.

gcc/testsuite/
From  Richard Earnshaw  

PR target/46329
* gcc.target/arm/pr46329.c: New test.

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c2011-04-19 16:38:16.0 +
+++ gcc/config/arm/arm.c2011-04-20 07:54:11.0 +
@@ -140,7 +140,7 @@ static void arm_internal_label (FILE *,
 static void arm_output_mi_thunk (FILE *, tree, HOST_WIDE_INT, HOST_WIDE_INT,
 tree);
 static bool arm_have_conditional_execution (void);
-static bool arm_cannot_force_const_mem (enum machine_mode, rtx);
+static bool arm_cannot_force_const_mem (rtx);
 static bool arm_legitimate_constant_p (enum machine_mode, rtx);
 static bool arm_rtx_costs_1 (rtx, enum rtx_code, int*, bool);
 static bool arm_size_rtx_costs (rtx, enum rtx_code, enum rtx_code, int *);
@@ -6465,8 +6465,14 @@ arm_tls_referenced_p (rtx x)
When generating pic allow anything.  */
 
 static bool
-arm_legitimate_constant_p_1 (enum machine_mode mode ATTRIBUTE_UNUSED, rtx x)
+arm_legitimate_constant_p_1 (enum machine_mode mode, rtx x)
 {
+  /* At present, we have no support for Neon structure constants, so forbid
+ them here.  It might be possible to handle simple cases like 0 and -1
+ in future.  */
+  if (TARGET_NEON && VALID_NEON_STRUCT_MODE (mode))
+return false;
+
   return flag_pic || !label_mentioned_p (x);
 }
 
Index: gcc/testsuite/gcc.target/arm/pr46329.c
===
--- /dev/null   2010-10-05 15:55:33.0 +
+++ gcc/testsuite/gcc.target/arm/pr46329.c  2011-04-19 16:38:16.0 
+
@@ -0,0 +1,9 @@
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+
+int __attribute__ ((vector_size (32))) x;
+void
+foo (void)
+{
+  x <<= x;
+}

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Question on compressed vmlinux .got and .bss sections

2011-04-20 Thread Dave Martin

On Tue, Apr 19, 2011 at 01:33:12PM -0400, Nicolas Pitre wrote:
> On Wed, 20 Apr 2011, Shawn Guo wrote:
> 
> > On Tue, Apr 19, 2011 at 04:23:09PM +0100, Dave Martin wrote:
> > > Hopefully this explains what's going on, but what are you trying
> > > to achieve exactly?
> > > 
> > Thanks a ton, Dave.  It does explain what I'm seeing, and your
> > explanation looks like a very good learning material. 
> > 
> > I'm running into a problem with John Bonies' append-dtb-to-zImage
> > patch.  That is the header of dtb was overwritten by uart_base
> > value.  John's patch did fix up .bss entries in .got to move them
> > behind dtb image.  But as you explained, when uart_base is defined
> > as static one, its address is fixed up in pc-relative way at link
> > time, and John's patch does not help it, hence the write to
> > uart_base at runtime overwrites dtb image.
> > 
> > What do you think is the right fix to this problem?  Forbid the use
> > of static uninitialized variable?  I'm afraid not.  Is it possible
> > to fix up the cases like uart_base here at runtime?
> 
> You must not use static variable in the decompressor.  For one thing, 
> that breaks the ability to XIP the decompressor code and move writable 
> data elsewhere.
> 
> So the fix is indeed to _not_ declare any global variable as static in 
> this case.

After some thinking about this, I think I agree.

Having to relocate a GOT-full of addresses many of which are actually at
fixed PC-relative offsets just for this capability is a bit annoying,
but the GNU tools don't support other models very well.

We might be able to reduce the size of the GOT by building with
-fvisibility=hidden, and making judicious use of "extern" on all
data declarations/definitions:

[gcc-4.4.info]
 `extern' declarations are not affected by `-fvisibility', so a lot
 of code can be recompiled with `-fvisibility=hidden' with no
 modifications.  However, this means that calls to `extern'
 functions with no explicit visibility will use the PLT, so it is
 more effective to use `__attribute ((visibility))' and/or `#pragma
 GCC visibility' to tell the compiler which `extern' declarations
 should be treated as hidden.

This only seems to work reliably for data definitions; plus the
toolchain behaviour may "evolve" with respect to obscure features
like this.  So if we wanted to achieve such a thing reliably, we'd
probably need explicit visibility attributes on the affected
declarations.

The advantage is unlikely to be huge though since the GOT is small anyway;
and we wouldn't be able to throw away the GOT relocation code completely,
beacuse of the need to relocate bss references...

Cheers
---Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Question on compressed vmlinux .got and .bss sections

2011-04-20 Thread Nicolas Pitre

On Wed, 20 Apr 2011, Dave Martin wrote:

> On Tue, Apr 19, 2011 at 01:33:12PM -0400, Nicolas Pitre wrote:
> > You must not use static variable in the decompressor.  For one thing, 
> > that breaks the ability to XIP the decompressor code and move writable 
> > data elsewhere.
> > 
> > So the fix is indeed to _not_ declare any global variable as static in 
> > this case.
> 
> After some thinking about this, I think I agree.
> 
> Having to relocate a GOT-full of addresses many of which are actually at
> fixed PC-relative offsets just for this capability is a bit annoying,
> but the GNU tools don't support other models very well.

You cannot relocate PC-relative offsets at run time.  Those references 
are spread throughout the code into literal pools.  Forcing all 
references to go through the GOT makes it possible for the code to 
relocate selected parts of itself at run time.

> We might be able to reduce the size of the GOT by building with
> -fvisibility=hidden, and making judicious use of "extern" on all
> data declarations/definitions:
> 
> [gcc-4.4.info]
>  `extern' declarations are not affected by `-fvisibility', so a lot
>  of code can be recompiled with `-fvisibility=hidden' with no
>  modifications.  However, this means that calls to `extern'
>  functions with no explicit visibility will use the PLT, so it is
>  more effective to use `__attribute ((visibility))' and/or `#pragma
>  GCC visibility' to tell the compiler which `extern' declarations
>  should be treated as hidden.
> 
> This only seems to work reliably for data definitions; plus the
> toolchain behaviour may "evolve" with respect to obscure features
> like this.

That doesn't solve the problem at all.  In this case, we really want 
_all_ data references to go through the GOT, meaning that everything 
would have to be marked extern.  The only references which are OK to be 
PC relative are read-only references, and therefore they can just be 
marked as static const.

> So if we wanted to achieve such a thing reliably, we'd
> probably need explicit visibility attributes on the affected
> declarations.

Like I said, it's about all of them.

> The advantage is unlikely to be huge though since the GOT is small anyway;
> and we wouldn't be able to throw away the GOT relocation code completely,
> beacuse of the need to relocate bss references...

In fact, all that remains in the GOT, assuming that const data is marked 
static, are .bss references.  Again, for simplicity's sake, we don't 
support initialized and writable global variables as in the XIP case 
those would have to be copied into RAM and the GOT patched accordingly.  
In practice this is not hard to achieve. To ensure that, we simply 
discard the .data early in the linker script.

Nicolas

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Question on compressed vmlinux .got and .bss sections

2011-04-20 Thread Dave Martin

Hi,

On Wed, Apr 20, 2011 at 1:42 PM, Nicolas Pitre  wrote:
> On Wed, 20 Apr 2011, Dave Martin wrote:
>
>> On Tue, Apr 19, 2011 at 01:33:12PM -0400, Nicolas Pitre wrote:
>> > You must not use static variable in the decompressor.  For one thing,
>> > that breaks the ability to XIP the decompressor code and move writable
>> > data elsewhere.
>> >
>> > So the fix is indeed to _not_ declare any global variable as static in
>> > this case.
>>
>> After some thinking about this, I think I agree.
>>
>> Having to relocate a GOT-full of addresses many of which are actually at
>> fixed PC-relative offsets just for this capability is a bit annoying,
>> but the GNU tools don't support other models very well.
>
> You cannot relocate PC-relative offsets at run time.  Those references
> are spread throughout the code into literal pools.  Forcing all
> references to go through the GOT makes it possible for the code to
> relocate selected parts of itself at run time.

My point was that relocatability implies overhead, and the GOT
potentially contains a load of relocations for code and read-only data
which will never get moved in practice.

For writable/uninitialised data, it's different of course -- we often
will need to relocate that in real situations (as observed here).  I'd
guessed that only part of the GOT in the compressed loader was
addressing such data, but actually, it seems to be pretty much all of
it, as you suggest.

So the number of useless relocations, and any associated overhead,
looks low (if any).

>
>> We might be able to reduce the size of the GOT by building with
>> -fvisibility=hidden, and making judicious use of "extern" on all
>> data declarations/definitions:
>>
>> [gcc-4.4.info]
>>      `extern' declarations are not affected by `-fvisibility', so a lot
>>      of code can be recompiled with `-fvisibility=hidden' with no
>>      modifications.  However, this means that calls to `extern'
>>      functions with no explicit visibility will use the PLT, so it is
>>      more effective to use `__attribute ((visibility))' and/or `#pragma
>>      GCC visibility' to tell the compiler which `extern' declarations
>>      should be treated as hidden.
>>
>> This only seems to work reliably for data definitions; plus the
>> toolchain behaviour may "evolve" with respect to obscure features
>> like this.
>
> That doesn't solve the problem at all.  In this case, we really want
> _all_ data references to go through the GOT, meaning that everything
> would have to be marked extern.  The only references which are OK to be
> PC relative are read-only references, and therefore they can just be
> marked as static const.
>
>> So if we wanted to achieve such a thing reliably, we'd
>> probably need explicit visibility attributes on the affected
>> declarations.
>
> Like I said, it's about all of them.
>
>> The advantage is unlikely to be huge though since the GOT is small anyway;
>> and we wouldn't be able to throw away the GOT relocation code completely,
>> beacuse of the need to relocate bss references...
>
> In fact, all that remains in the GOT, assuming that const data is marked
> static, are .bss references.  Again, for simplicity's sake, we don't
> support initialized and writable global variables as in the XIP case
> those would have to be copied into RAM and the GOT patched accordingly.
> In practice this is not hard to achieve. To ensure that, we simply
> discard the .data early in the linker script.

Sure -- my observations were simply based around the fact that we're
using the tools to do something they don't feel well adapted to,
compared with other tools with a more embedded/bare-metal focus.  So
if there were a better or more correct way to use the tools to get the
results we need, it would be worth considering.  But from the
discussion it sounds like the code already does pretty much the best
thing possible anyway.

Cheers
---Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Question on compressed vmlinux .got and .bss sections

2011-04-20 Thread Nicolas Pitre

On Wed, 20 Apr 2011, Dave Martin wrote:

> Hi,
> 
> On Wed, Apr 20, 2011 at 1:42 PM, Nicolas Pitre  
> wrote:
> > On Wed, 20 Apr 2011, Dave Martin wrote:
> >
> >> On Tue, Apr 19, 2011 at 01:33:12PM -0400, Nicolas Pitre wrote:
> >> > You must not use static variable in the decompressor.  For one thing,
> >> > that breaks the ability to XIP the decompressor code and move writable
> >> > data elsewhere.
> >> >
> >> > So the fix is indeed to _not_ declare any global variable as static in
> >> > this case.
> >>
> >> After some thinking about this, I think I agree.
> >>
> >> Having to relocate a GOT-full of addresses many of which are actually at
> >> fixed PC-relative offsets just for this capability is a bit annoying,
> >> but the GNU tools don't support other models very well.
> >
> > You cannot relocate PC-relative offsets at run time.  Those references
> > are spread throughout the code into literal pools.  Forcing all
> > references to go through the GOT makes it possible for the code to
> > relocate selected parts of itself at run time.
> 
> My point was that relocatability implies overhead, and the GOT
> potentially contains a load of relocations for code and read-only data
> which will never get moved in practice.

Sure, for code (already implicit) or ro data, using GOTOFF relocs is 
perfectly fine.  As long as the relevant data is marked const then there 
is no issue also marking it static, at which point the same effect as 
-fvisibility=hidden is achieved i.e. no GOT entries are allocated.

> For writable/uninitialised data, it's different of course -- we often
> will need to relocate that in real situations (as observed here).  I'd
> guessed that only part of the GOT in the compressed loader was
> addressing such data, but actually, it seems to be pretty much all of
> it, as you suggest.

Yes, and in practice it contains only between 6 and 8 entries depending 
on the config used.  And all of them are references to .bss variables.  
So the overhead is pretty small.


Nicolas___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[OT] ti-omap-tools - script for setting up DVSDK 3 with various toolchains

2011-04-20 Thread AJ ONeal

Here's a script for installing TI's DVSDK 3:

https://bitbucket.org/thayne/ti-omap-tools/src

Works with

   - CodeSourcery
   - OpenEmbedded
   - Linaro

It will download the bazillion dependencies scattered across TI's site and
makes it easier to gut the DVSDK's hard-coded paths to work for your setup.

The DVSDK 4 isn't used because it is completely different from the DVSDK 3
and is much more difficult to root the hard paths and checks out of.

AJ ONeal
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Linaro GCC 4.5 and 4.6 2011-04 released

2011-04-20 Thread Michael Hope

The Linaro Toolchain Working Group is pleased to announce the release
of both Linaro GCC 4.5 and Linaro GCC 4.6.

Linaro GCC 4.5 2011.04 is the ninth release in the 4.5 series.  Based
off the latest
GCC 4.5.2+svn171921, it adds new optimisations, support for Android,
and fixes for many of the issues found in the last month.

Linaro GCC 4.6 2011.04 is the second release in the 4.6 series. Based off the
latest GCC 4.6.0+svn171921, it is the first supported release of the
new series and includes a significant number of mainstreamed patches
from 4.5.

Interesting changes in 4.6 include:
 * Updates to 4.6.0+r171921
 * Adds conditional store sinking to the vectoriser
 * Brings in a significant number of the Linaro GCC 4.5 patches that
   are in mainline

Interesting changes in 4.5 include:
 * Updates to 4.5.2+r172013
 * Disables the shrink wrap optimisation by default
 * Adds support for swing-modulo scheduling (SMS) on ARM
 * Adds support for Android and the Bionic C library
 * Optimises -fvar-tracking, greatly reducing memory used when
   compiling large files (seen in QEMU)

Fixes:
 * 'volatile' being ignored on volatile struct members
 * A potential register clobber in arm_negdi2
 * An error in libgcc that prevented it being built with -Os
 * Multiple shrink wrap bugs (LP: #731665, 721023, 736081, 758082,
   730860, 736439, 721023)
 * LP: #730440 incorrect immediate for movt (seen in Firebird)
 * LP: #728315 extension elimination pass mishandles subregs of
   promoted variables (seen on MIPS)
 * LP: #675347 volatile int causes inline assembly build failure (seen
   in Qt)

SMS is an optimisation that works on innermost loops and reorders the
instructions by overlapping different locations.  An example is that
the values for the next loop may be loaded during the current loop,
making the values already ready when the next loop starts.

SMS is disabled by default.  To try it, add the options '-fmodulo-sched
-fmodulo-sched-allow-regmoves'.

The source tarball is available from:
 https://launchpad.net/gcc-linaro/+milestone/4.5-2011.04-0
 https://launchpad.net/gcc-linaro/+milestone/4.6-2011.04-0

Downloads are available from the Linaro GCC page on Launchpad:
 https://launchpad.net/gcc-linaro

-- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Linaro GDB 7.2 2011-04 released

2011-04-20 Thread Michael Hope

The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.2.

Linaro GDB 7.2 2011.04 is the fifth release in the 7.2 series. Based
off the latest GDB 7.2, it includes a number of ARM-focused bug fixes.

This release fixes:
 * LP: #684218 Failure to backtrace out of glibc system call stubs
 * LP: #667309 failed to single step over bad thumb->arm boundary
 * Fix accessing "fpscr" register

The source tarball is available at:
 https://launchpad.net/gdb-linaro/+milestone/7.2-2011.04-0

More information on Linaro GDB is available at:
 https://launchpad.net/gdb-linaro

-- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[0/5] Improvements to vldN and vstN intrinsics

[1/5] Improve output of vld3q and vld4q

[2/5] Remodel the vldN and vstN patterns

[3/5] Allow arrays of vectors to be stored in registers

[4/5] Convert LEGITIMATE_CONSTANT_P into a hook and add a more argument

[5/5] Fix PR target/46329

Re: Question on compressed vmlinux .got and .bss sections

Re: Question on compressed vmlinux .got and .bss sections

Re: Question on compressed vmlinux .got and .bss sections

Re: Question on compressed vmlinux .got and .bss sections

[OT] ti-omap-tools - script for setting up DVSDK 3 with various toolchains

Linaro GCC 4.5 and 4.6 2011-04 released

Linaro GDB 7.2 2011-04 released

13 matches

Site Navigation

Mail list logo

Footer information