Re: Effect of alignment and peeling on vectorised loops

2011-11-30 Thread Ira Rosen
On 30 November 2011 02:33, Michael Hope  wrote:

> I then converted the vld1 and vst1 to specifiy an alignment of 64
> bits. See:
>  http://people.linaro.org/~michaelh/incoming/set-alignment.png
>
> This improved the throughput in all cases and in cases for more than 50
> words by 14 %.  This graph also shows the overhead of the runtime
> peeling check.  The blue line is the vectoriser version which is
> slower to pick up due the greater per call overhead.

So, the auto-vectorized code doesn't have the alignment hints (peeling
or not peeling), right? Is this how a hint is supposed to look like:
vld1.i64 {d16-d17}, [r1 :"#_128"] , or am I looking for a wrong thing?

I thought that peeling should be useful at least for the hints.

>
> I then went back to the vectoriser and changed the alignment of the
> struct to cause peeling to turn on and off.  See:
>  http://people.linaro.org/~michaelh/incoming/unroll.png
>
> At 200 words, the version without peeling is 2.9 % faster.  This is
> partly due to a fixed count loop turning into a runtime count due to
> unknown alignment.
>
> This run also showed the affect of loop unrolling.  The loop seems to
> be unrolled for loops of <= 64 words and drops off in performance past
> around 8 words.  When the unrolling finally drops out, performance
> increases by 101 %.

I see register spills starting from COUNT=36.

Ira

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: gcc4.6,how to remove werror

2011-11-30 Thread Dave Martin
On Tue, Nov 29, 2011 at 09:43:01PM +0700, tknv wrote:
> This issue happend,when I was compiling ARM kernel.
> I could not get it where -Werror gets overriden in tools/perf.
> Could you tell me where is it ?

In tools/perf/Makefile:

34:# Define WERROR=0 to disable treating any warnings as errors.
69:ifneq ($(WERROR),0)
70: CFLAGS_WERROR := -Werror
105:CFLAGS = -fno-omit-frame-pointer -ggdb3 -Wall -Wextra -std=gnu99 
$(CFLAGS_WERROR) $(CFLAGS_OPTIMIZE) -D_FORTIFY_SOURCE=2 $(EXTRA_WARNINGS) 
$(EXTRA_CFLAGS)

This Makefile _only_ applies to the perf tools.  The rest of the kernel
uses other Makefiles.

The perf tools are not built automatically, so if you are only trying to
build a kernel image then you must have some other problem.


As I suggested previously, can you please run

make V=1

and send the last few lines of output, including the error message *and*
the command which caused it.

Without a precise indication of where the error occurs, I can't offer much
help.

Cheers
---Dave



___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: Effect of alignment and peeling on vectorised loops

2011-11-30 Thread Michael Hope
On Thu, Dec 1, 2011 at 12:20 AM, Ira Rosen  wrote:
> On 30 November 2011 02:33, Michael Hope  wrote:
>
>> I then converted the vld1 and vst1 to specifiy an alignment of 64
>> bits. See:
>>  http://people.linaro.org/~michaelh/incoming/set-alignment.png
>>
>> This improved the throughput in all cases and in cases for more than 50
>> words by 14 %.  This graph also shows the overhead of the runtime
>> peeling check.  The blue line is the vectoriser version which is
>> slower to pick up due the greater per call overhead.
>
> So, the auto-vectorized code doesn't have the alignment hints (peeling
> or not peeling), right? Is this how a hint is supposed to look like:
> vld1.i64 {d16-d17}, [r1 :"#_128"] , or am I looking for a wrong thing?

Yip.  We currently use a vldmia r1!, {d16-d17} which (on the A9 at
least) only works for aligned values and takes the same time as the
unaligned-friendly vld1.i64 {d16-d17}, [r1]!

> I thought that peeling should be useful at least for the hints.

Peeling and using the vld1.i64 {d16-d17}, [r1:64]! form should be
faster for larger loops.  For some reason vld1.i64 ..., [r1:128] gives
an illegal instruction trap on my board.  Note that the :128 is in
bits.

>> I then went back to the vectoriser and changed the alignment of the
>> struct to cause peeling to turn on and off.  See:
>>  http://people.linaro.org/~michaelh/incoming/unroll.png
>>
>> At 200 words, the version without peeling is 2.9 % faster.  This is
>> partly due to a fixed count loop turning into a runtime count due to
>> unknown alignment.
>>
>> This run also showed the affect of loop unrolling.  The loop seems to
>> be unrolled for loops of <= 64 words and drops off in performance past
>> around 8 words.  When the unrolling finally drops out, performance
>> increases by 101 %.
>
> I see register spills starting from COUNT=36.

Ah.  Does the vectoriser cost model take register pressure into
account?  How can I turn this on?

-- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: Effect of alignment and peeling on vectorised loops

2011-11-30 Thread Michael Hope
On Thu, Dec 1, 2011 at 12:20 AM, Ira Rosen  wrote:
> On 30 November 2011 02:33, Michael Hope  wrote:
>
>> I then converted the vld1 and vst1 to specifiy an alignment of 64
>> bits. See:
>>  http://people.linaro.org/~michaelh/incoming/set-alignment.png
>>
>> This improved the throughput in all cases and in cases for more than 50
>> words by 14 %.  This graph also shows the overhead of the runtime
>> peeling check.  The blue line is the vectoriser version which is
>> slower to pick up due the greater per call overhead.
>
> So, the auto-vectorized code doesn't have the alignment hints (peeling
> or not peeling), right? Is this how a hint is supposed to look like:
> vld1.i64 {d16-d17}, [r1 :"#_128"] , or am I looking for a wrong thing?

I had a look in the backend and the vld1/vst1 %A operand adds the
alignment if known.  It correctly adds [r1:64] if I feed in an array
of int64s.  The code checks based on MEM_ALIGN and MEM_SIZE of the
operand:
align = MEM_ALIGN (x) >> 3;
memsize = INTVAL (MEM_SIZE (x));

Not sure why the backend generates a vldmia instead of a vld1 though.

-- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain