Hi,
PR48250 happens under TARGET_NEON, where DImode is included within the
valid NEON modes. This turns the range of legitimate constant indexes to
step-4 (coproc load/store), thus arm_legitimize_reload_address() when
trying to decompose the [reg+index] reload address into
[(reg+index_high)+index_low], can cause an ICE later when 'index_low'
part is not aligned to 4.

I'm not sure why the current DImode index is computed as:
low = ((val & 0xf) ^ 0x8) - 0x8;  the sign-extending into negative
values, then subtracting back, actually creates further off indexes.
e.g. in the supplied testcase, [sp+13] was turned into [(sp+16)-3].

My patch changes the index decomposing to a more straightforward way; it
also sort of outlines the way the other reload address indexes are
broken by using and-masks, is not the most effective.  The address is
computed by addition, subtracting away the parts to obtain low+high
should be the optimal way of giving the largest computable index range.

I have included a few Thumb-2 bits in the patch; I know currently
arm_legitimize_reload_address() is only used under TARGET_ARM, but I
guess it might eventually be turned into TARGET_32BIT.

Cross-tested on QEMU without regressions, is this okay?

Thanks,
Chung-Lin

2011-03-24  Chung-Lin Tang  <clt...@codesourcery.com>

        PR target/48250
        * config/arm/arm.c (arm_legitimize_reload_address): Adjust
        DImode constant index decomposing. Mask out lower 2-bits for
        NEON and Thumb-2.

        testsuite/
        * gcc.target/arm/pr48250.c: New.
Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c    (revision 171379)
+++ config/arm/arm.c    (working copy)
@@ -6416,7 +6416,22 @@
       HOST_WIDE_INT low, high;
 
       if (mode == DImode || (mode == DFmode && TARGET_SOFT_FLOAT))
-       low = ((val & 0xf) ^ 0x8) - 0x8;
+       {
+         /* ??? There may be more adjustments later for Thumb-2,
+            which has a ldrd insn with +-1020 index range.  */
+         int max_idx = 255;
+
+         /* low == val, if val is within range [-max_idx, +max_idx].
+            If not, val is set to the boundary +-max_idx.  */
+         low = (-max_idx <= val && val <= max_idx
+                ? val : (val > 0 ? max_idx : -max_idx));
+
+         /* Thumb-2 ldrd, and NEON coprocessor load/store indexes
+            are in steps of 4, so the least two bits need to be
+            cleared to zero.  */
+         if (TARGET_NEON || TARGET_THUMB2)
+           low &= ~0x3;
+       }
       else if (TARGET_MAVERICK && TARGET_HARD_FLOAT)
        /* Need to be careful, -256 is not a valid offset.  */
        low = val >= 0 ? (val & 0xff) : -((-val) & 0xff);
Index: testsuite/gcc.target/arm/pr48250.c
===================================================================
--- testsuite/gcc.target/arm/pr48250.c  (revision 0)
+++ testsuite/gcc.target/arm/pr48250.c  (revision 0)
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+
+void bar();
+
+struct S
+{
+  unsigned int u32;
+  unsigned long long int u64;
+};
+
+void
+foo ()
+{
+  char a[100];
+  unsigned int ptr = (unsigned int) &a;
+  struct S *unaligned_S;
+  int index;
+
+  ptr = ptr + (ptr & 1 ? 0 : 1);
+  unaligned_S = (struct S *) ptr;
+
+  for (index = 0; index < 3; index++)
+    {
+      switch (index)
+       {
+       case 1:
+         unaligned_S->u64 = 0;
+         bar ();
+       }
+    }
+}

Reply via email to