Re: Effect of alignment and peeling on vectorised loops

2011-12-07 Thread Ramana Radhakrishnan
On 30 November 2011 20:28, Michael Hope wrote: > On Thu, Dec 1, 2011 at 12:20 AM, Ira Rosen wrote: >> On 30 November 2011 02:33, Michael Hope wrote: >> > > Peeling and using the vld1.i64 {d16-d17}, [r1:64]! form should be > faster for larger loops.  For some reason vld1.i64 ..., [r1:128] gives >

Re: Effect of alignment and peeling on vectorised loops

2011-12-01 Thread Ira Rosen
On 30 November 2011 22:28, Michael Hope wrote: >>> This run also showed the affect of loop unrolling. The loop seems to >>> be unrolled for loops of <= 64 words and drops off in performance past >>> around 8 words. When the unrolling finally drops out, performance >>> increases by 101 %. >> >> I

Re: Effect of alignment and peeling on vectorised loops

2011-11-30 Thread Michael Hope
On Thu, Dec 1, 2011 at 12:20 AM, Ira Rosen wrote: > On 30 November 2011 02:33, Michael Hope wrote: > >> I then converted the vld1 and vst1 to specifiy an alignment of 64 >> bits. See: >>  http://people.linaro.org/~michaelh/incoming/set-alignment.png >> >> This improved the throughput in all cases

Re: Effect of alignment and peeling on vectorised loops

2011-11-30 Thread Michael Hope
On Thu, Dec 1, 2011 at 12:20 AM, Ira Rosen wrote: > On 30 November 2011 02:33, Michael Hope wrote: > >> I then converted the vld1 and vst1 to specifiy an alignment of 64 >> bits. See: >>  http://people.linaro.org/~michaelh/incoming/set-alignment.png >> >> This improved the throughput in all cases

Re: Effect of alignment and peeling on vectorised loops

2011-11-30 Thread Ira Rosen
On 30 November 2011 02:33, Michael Hope wrote: > I then converted the vld1 and vst1 to specifiy an alignment of 64 > bits. See: >  http://people.linaro.org/~michaelh/incoming/set-alignment.png > > This improved the throughput in all cases and in cases for more than 50 > words by 14 %.  This graph