On 30 November 2011 20:28, Michael Hope wrote:
> On Thu, Dec 1, 2011 at 12:20 AM, Ira Rosen wrote:
>> On 30 November 2011 02:33, Michael Hope wrote:
>>
>
> Peeling and using the vld1.i64 {d16-d17}, [r1:64]! form should be
> faster for larger loops. For some reason vld1.i64 ..., [r1:128] gives
>
On 30 November 2011 22:28, Michael Hope wrote:
>>> This run also showed the affect of loop unrolling. The loop seems to
>>> be unrolled for loops of <= 64 words and drops off in performance past
>>> around 8 words. When the unrolling finally drops out, performance
>>> increases by 101 %.
>>
>> I
On Thu, Dec 1, 2011 at 12:20 AM, Ira Rosen wrote:
> On 30 November 2011 02:33, Michael Hope wrote:
>
>> I then converted the vld1 and vst1 to specifiy an alignment of 64
>> bits. See:
>> http://people.linaro.org/~michaelh/incoming/set-alignment.png
>>
>> This improved the throughput in all cases
On Thu, Dec 1, 2011 at 12:20 AM, Ira Rosen wrote:
> On 30 November 2011 02:33, Michael Hope wrote:
>
>> I then converted the vld1 and vst1 to specifiy an alignment of 64
>> bits. See:
>> http://people.linaro.org/~michaelh/incoming/set-alignment.png
>>
>> This improved the throughput in all cases
On 30 November 2011 02:33, Michael Hope wrote:
> I then converted the vld1 and vst1 to specifiy an alignment of 64
> bits. See:
> http://people.linaro.org/~michaelh/incoming/set-alignment.png
>
> This improved the throughput in all cases and in cases for more than 50
> words by 14 %. This graph