On 21 November 2012 09:20, Zhenqiang Chen wrote:
> On 21 November 2012 03:26, Michael Hope wrote:
>> On 20 November 2012 22:10, Zhenqiang Chen wrote:
>>> Hi,
>>>
>>> I try ARM, MIPS, PowerPC and X86 on povray benchmark. No one can
>>> shrink-wrap function Ray_In_Bound.
>>>
>>> Here is:
>>> bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object)
>>> {
>>> ...
>>> for (Bound = Bounding_Object; Bound != NULL; Bound = Bound->Sibling)
>>> {...}
>>> return (true);
>>> }
>>> For ARM O2/O3, "Bound" is allocated to "r6" during ira. So there is copy
>>>
>>> r6 = r1 before
>>> testing Bound != NULL
>>
>> Could you hack the benchmark to make the early exit explicit and see
>> if that changes the result? That lets us know if improving shrink
>> wrap is worthwhile.
>>
>> Something like:
>>
>> bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object)
>> {
>> if (Bounding_Object == NULL) return true;
>
> I had tried it. The result is the same with the original one. (The
> hack code is optimized)
After hacking the assemble code, I got 2-3% performance improvement
for -O2. Here is the assemble change
Original code:
push{r4, r5, r6, r7, r8, r9, lr}
.save {r4, r5, r6, r7, r8, r9, lr}
mov r6, r1
.pad #196
sub sp, sp, #196
cbz r1, .L113
ldr r8, .L117
...
.L113:
movsr0, #1
add sp, sp, #196
@ sp needed
pop {r4, r5, r6, r7, r8, r9, pc}
After shrink-wrap:
cbz r1, .L1131
push{r4, r5, r6, r7, r8, r9, lr}
.save {r4, r5, r6, r7, r8, r9, lr}
mov r6, r1
.pad #196
sub sp, sp, #196
ldr r8, .L117
...
.L113:
movsr0, #1
add sp, sp, #196
@ sp needed
pop {r4, r5, r6, r7, r8, r9, pc}
.L1131:
movsr0, #1
bx lr
But simple hack for -O3 has ~1% regression. "code alignment" change
should be the root cause. To verify it, I add 6 NOPs after "bx lr".
With it, the size of block .L1131 is 16 Bytes. After this change, O3
will have 2-3% performance improvement.
-Zhenqiang
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain