Re: DCC device driver and gdb server

2010-10-11 Thread Øyvind Harboe
On Mon, Oct 11, 2010 at 12:36 AM, Michael Hope  wrote:
> Hi Øyvind.  You might want to ask the OpenOCD list about this.  I've
> used the DCC as an end user before for sending printf() style messages
> and RPC over and it's quite good for that.

I did, no response. (I'm also one of the main OpenOCD maintainers :-)

> I'm not sure how your proposal would work though.  My understanding is
> that the DCC is only accessible over JTAG and, if you already have a
> JTAG connection, then you might as well use that to debug.

OpenOCD / JTAG is not great for application debugging.

What I envisage is to use gdbserver for application debugging and
JTAG for kernel debugging. gdbserver would talk over JTAG DCC
serial port and kernel debugging would happen via normal JTAG
debugging.

In "normal" JTAG debugging mode the CPU is halted.

Here are some commercial implementations of kernel + application
debugging:

www.arium.com/pdf/CompleteLinuxDebugging.pdf

(This one is in German, but with a bit of googling, you'll find an English
one of the same PDF).

http://www.lauterbach.com/publications/integriertes_run_und_stop_mode_debugging_fuer_embedded_linux.pdf

-- 
Øyvind Harboe
US toll free 1-866-980-3434 / International +47 51 63 25 00
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


NEON vectorization: use of specialized load/store instructions

2010-10-11 Thread Julian Brown
I meant to send this to the "external" Linaro toolchain mailing list,
not the internal CS one. Apologies to those who receive it twice!

In a follow-up message, Joseph Myers pointed out a post he'd written
previously on the same subject:

  http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html

In further followups (at the risk of misrepresenting Joseph & Paul
Brook's opinions!), there seemed to be general agreement that a scheme
something like that outlined below, with "permuting" loads/stores and
some way of handling multiple in-register layouts for vectors seems
like it will be a necessary addition to the vectorizer, going forward.

Julian

Begin forwarded message:

Date: Thu, 7 Oct 2010 16:45:17 +0100
From: Julian Brown 
To: Ira Rosen 
Cc: Tejas Belagod , Linaro List
 Subject: [gnu-linaro-tools] NEON
vectorization: use of specialized load/store instructions


Hi,

We're having some system issues, so I thought I'd take the chance to
write down some things I've been thinking about re: utilising the NEON
load/store instructions more effectively. I've also attempted to
summarize the problems with big-endian mode. All unverified as of yet,
so please take with a pinch of salt :-). Comments appreciated. It's
been a while since I last thought about some of this stuff...

Cheers,

Julian

Use of specialized load instructions


To provide good support for NEON's element and structure load/store
instructions, GCC lacks support for a couple of key features:

1. A good way of representing a set of two, three or four vector
registers (either D- or Q-sized), possibly with non-unit stride.

2. A generalised mapping between memory locations and lane numbers.

To start with point 1: currently the element and structure load/store
instructions are only supported via intrinsics. These are specified to
load and store as if going via an array embedded in a union, i.e.:

typedef struct int8x8x2_t
{
  int8x8_t val[2];
} int8x8x2_t;

__extension__ static __inline int8x8x2_t __attribute__
((__always_inline__)) vld2_s8 (const int8_t * __a)
{
  union { int8x8x2_t __i; __builtin_neon_ti __o; } __rv;
  __rv.__o = __builtin_neon_vld2v8qi ((const __builtin_neon_qi *) __a);
  return __rv.__i;
}

Even for a trivial test program, e.g.:

#include 

int foo (int8_t *x)
{
  int8x8x2_t result = vld2_s8 (x);
  return vget_lane_s8 (result.val[0], 1);
}

We will generate code like so:

sub sp, sp, #32
vld2.8  {d16-d17}, [r0]
mov r3, sp
vstmia  sp, {d16-d17}
add ip, sp, #16
ldmia   r3, {r0, r1, r2, r3}
stmia   ip, {r0, r1, r2, r3}
flddd16, [sp, #16]
vmov.s8 r0, d16[1]
add sp, sp, #32
bx  lr

I.e., rather than being used directly, the registers loaded by vld2
will always be spilled to the stack then reloaded. This obviously
reduces the usefulness of these intrinsics by a large factor. With some
planning, it'd be good to find a powerful enough solution to this
problem so that the same representation for multiple registers can be
used by the autovectorizer as well as the intrinsic-handling code.

(One difficulty is that the "foo.val[X]" interface should still be
available to user code. There's probably no need for "val" to literally
be an array, though other representations would require front-end
changes).

Assuming it's hard for the register allocator to deal with
highly-constrained situations like requiring four consecutive
registers, one (ugly) possibility might be to run a pass before
register allocation, looking for "big" multi-register vectors and
pre-allocating them to hard registers. Even using a fixed allocation of
a single set of registers (e.g. make it so that all multi-reg
loads/stores larger than a Q register must use d0-d7, or whatever)
would probably give better code than what we produce at present, in
most cases.

Now, point 2. To start with, an aside: AIUI, there is currently an
assumption in the vectoriser code that increasing element numbers in
vector registers correspond to increasing addresses when those
registers are loaded from and stored to memory (as if the vector was a
short array, or alternatively as if a union of the vector register and
an array of element-types had the same numberings for lanes and array
indices corresponding to the same elements). Unfortunately that is only
true for NEON in little-endian mode: in big-endian mode, the story is
more complicated, for reasons I will try to explain.

To remain compliant with the soft-float variant of the ARM EABI, we
must pass vector register arguments in ARM registers (or the stack),
not vector registers. This means that we must be very careful with the
ordering of elements for values passed to functions. Consider the
trivial function:

int __attribute__((noinline)) qux (int16x8_t x)
{
  x = vaddq_s16 (x, x);
  return vgetq_lane_s16 (x, 1);
}

This is compiled by GCC to the following (slightly unimpressively):

vmov 

Re: DCC device driver and gdb server

2010-10-11 Thread Michael Hope
On Mon, Oct 11, 2010 at 8:06 PM, Øyvind Harboe  wrote:
> On Mon, Oct 11, 2010 at 12:36 AM, Michael Hope  
> wrote:
>> Hi Øyvind.  You might want to ask the OpenOCD list about this.  I've
>> used the DCC as an end user before for sending printf() style messages
>> and RPC over and it's quite good for that.
>
> I did, no response. (I'm also one of the main OpenOCD maintainers :-)

Heh.  I spotted the @zylin address but didn't twig past that.

>> I'm not sure how your proposal would work though.  My understanding is
>> that the DCC is only accessible over JTAG and, if you already have a
>> JTAG connection, then you might as well use that to debug.
>
> OpenOCD / JTAG is not great for application debugging.
>
> What I envisage is to use gdbserver for application debugging and
> JTAG for kernel debugging. gdbserver would talk over JTAG DCC
> serial port and kernel debugging would happen via normal JTAG
> debugging.
>
> In "normal" JTAG debugging mode the CPU is halted.
>
> Here are some commercial implementations of kernel + application
> debugging:
>
> www.arium.com/pdf/CompleteLinuxDebugging.pdf

Ah, I see.  You use the JTAG unit and on-chip debug hardware for the
kernel and use the JTAG-supplied DCC connection for applications.  So
you can use the DCC instead of Ethernet or serial, which is nice as it
frees up a port.

You might want to ask linaro-...@lists about this and see if any of
the kernel guys know about kernel support.  A quick poke about shows
that you can send the kernel's printf() style debug messages through
it but little else.

-- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: DCC device driver and gdb server

2010-10-11 Thread Nicolas Pitre
On Tue, 12 Oct 2010, Michael Hope wrote:

> Ah, I see.  You use the JTAG unit and on-chip debug hardware for the
> kernel and use the JTAG-supplied DCC connection for applications.  So
> you can use the DCC instead of Ethernet or serial, which is nice as it
> frees up a port.
> 
> You might want to ask linaro-...@lists about this and see if any of
> the kernel guys know about kernel support.  A quick poke about shows
> that you can send the kernel's printf() style debug messages through
> it but little else.

Some patches were posted by Daniel Walker recently to expose the DCC as 
a true Linux serial port, and not only a debug channel for early 
printk's as it is right now.


Nicolas

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain