Re: [ACTIVITY] Feb 14 - Feb 17

2011-02-21 Thread Ulrich Weigand
Michael Hope  wrote on 02/20/2011 08:18:22 PM:
> On Sat, Feb 19, 2011 at 7:55 AM, Ulrich Weigand
> >  * Finished testing GDB HW watchpoints patch on vexpress,
> >   submitted complete patch set for mainline inclusion
>
> Let me know when and how and I'll update the build machines.

I've now committed HW watchpoint support to mainline GDB, and
backported to Linaro GDB 7.2 (rev. 32992).

Before you update the build machines, please make sure that
the hardware/kernel on those machines actually support HW
watchpoints.  The simplest way to find out is to verify that
the kernel boot messages include something like:

  hw-breakpoint: found 6 breakpoint and 1 watchpoint registers.


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
  IBM Deutschland Research & Development GmbH
  Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Improving the code generated for vld and vst intrinsics

2011-02-21 Thread Richard Sandiford
One of the vectorisation discussions from last year was about the poor
code GCC generates for vld{2,3,4}_*() and vst{2,3,4}_*().  It forces the
result of the loads onto the stack, then loads the individual pieces from
there.  It does the same thing in reverse for stores.

I think there are two major problems here:

1. The result of the vld*() is a record type such as:

typedef struct int16x4x3_t
{
  int16x4_t val[3];
} int16x4x3_t;

   Ideally, we'd like one of these structures to be stored in a pseudo
   register.  However, the ARM port currently limits in-register
   record types to 64 bits, so something this big is always given
   BLKmode and stored on the stack.

   A simple "fix" for this is to increase MAX_FIXED_MODE_SIZE.
   That would do the right thing for the structures in arm_neon.h,
   but wouldn't be safe in general.

2. The vld*() returns values as a single integer (such as EI mode),
   while uses of the value will typically be in a vector mode such
   as V4SI.  CANNOT_CHANGE_MODE_CLASS doesn't allow direct
   "mode-punning" between the two in VFP_REGS, so this again
   forces the punning to be done on the stack.

   The code in question is:

/* FPA registers can't do subreg as all values are reformatted to internal
   precision.  VFP registers may only be accessed in the mode they
   were set.  */
#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)   \
  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)   \
   ? reg_classes_intersect_p (FPA_REGS, (CLASS))\
 || reg_classes_intersect_p (VFP_REGS, (CLASS)) \

   However, the VFP restriction appears to be specific to VFPv1 --
   thanks to Peter for the archaeology -- and isn't a problem for v6+.
   In that case, removing this restriction is an important optimisation.

I tried the patch below on the following simple testcase:

#include "arm_neon.h"

void
foo (uint16_t *a)
{
  uint16x4x3_t x, y;

  x = vld3_u16 (a);
  y = vld3_u16 (a + 12);
  x.val[0] = vadd_u16 (x.val[0], y.val[0]);
  x.val[1] = vadd_u16 (x.val[1], y.val[1]);
  x.val[2] = vadd_u16 (x.val[2], y.val[2]);
  vst3_u16 (a, x);
}

(not necessarily sensible!).  Before the patch, -O2 produced:

sub sp, sp, #48
add r3, r0, #24
vld3.16 {d16-d18}, [r3]
vld3.16 {d20-d22}, [r0]
add r3, sp, #24
vstmia  sp, {d20-d22}
vstmia  r3, {d16-d18}
flddd19, [sp, #8]
flddd16, [sp, #0]
flddd17, [sp, #24]
flddd20, [sp, #32]
vadd.i16d18, d16, d17
vadd.i16d17, d19, d20
flddd19, [sp, #16]
flddd20, [sp, #40]
vadd.i16d16, d19, d20
fstdd18, [sp, #0]
fstdd17, [sp, #8]
fstdd16, [sp, #16]
vldmia  sp, {d16-d18}
vst3.16 {d16-d18}, [r0]
add sp, sp, #48
bx  lr

After the patch we get:

vld3.16 {d24-d26}, [r0]
add r3, r0, #24
vld3.16 {d20-d22}, [r3]
vmovq8, q12  @ ti
vadd.i16d17, d17, d21
vadd.i16d16, d24, d20
vadd.i16d18, d26, d22
vst3.16 {d16-d18}, [r0]
bx  lr

The VMOV is a bit disappointing, and needs further investigation.

The first hunk fixes (2), and I think is correct.  The second hunk
hacks (1), and isn't suitable in itself.  I'll next try to make
arm_neon.h use built-in record types that are explicitly EImode,
which should remove the need to change MAX_FIXED_MODE_SIZE.

Richard


Index: gcc/gcc/config/arm/arm.h
===
--- gcc.orig/gcc/config/arm/arm.h
+++ gcc/gcc/config/arm/arm.h
@@ -1171,10 +1171,12 @@ enum reg_class
 /* FPA registers can't do subreg as all values are reformatted to internal
precision.  VFP registers may only be accessed in the mode they
were set.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)  \
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)  \
-   ? reg_classes_intersect_p (FPA_REGS, (CLASS))   \
- || reg_classes_intersect_p (VFP_REGS, (CLASS))\
2+#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)  \
+   ? (reg_classes_intersect_p (FPA_REGS, (CLASS))  \
+  || (TARGET_VFP   \
+ && reg_classes_intersect_p (VFP_REGS, (CLASS))\
+ && arm_fpu_desc->rev == 1))   \
: 0)
 
 /* The class value for index registers, and the one for base regs.  */
@@ -2458,4 +2460,6 @@ enum arm_builtins
instruction.  */
 #define MAX_LDM_STM_OPS 4
 
+#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (XImode)
+
 #endif /* ! GCC_ARM_H */

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailma

Re: [ACTIVITY] Feb 14 - Feb 17

2011-02-21 Thread Michael Hope
On Tue, Feb 22, 2011 at 5:55 AM, Ulrich Weigand
 wrote:
> Michael Hope  wrote on 02/20/2011 08:18:22 PM:
>> On Sat, Feb 19, 2011 at 7:55 AM, Ulrich Weigand
>> >  * Finished testing GDB HW watchpoints patch on vexpress,
>> >   submitted complete patch set for mainline inclusion
>>
>> Let me know when and how and I'll update the build machines.
>
> I've now committed HW watchpoint support to mainline GDB, and
> backported to Linaro GDB 7.2 (rev. 32992).
>
> Before you update the build machines, please make sure that
> the hardware/kernel on those machines actually support HW
> watchpoints.  The simplest way to find out is to verify that
> the kernel boot messages include something like:
>
>  hw-breakpoint: found 6 breakpoint and 1 watchpoint registers.

Ah.  My main build machines are OMAP3s.  I'll see about updating the
Pandas and using them for GDB builds.

-- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain