On 25/10/13 19:04, Kyrill Tkachov wrote:
> On 24/10/13 20:03, Kugan wrote:
>>
>> Hi Kyrill,
>>
>> It happens for armv5te arm-none-linux-gnueabi. --with-mode=arm
>> --with-arch=armv5te --with-float=soft
>
> Ah ok, I can reproduce it now. So, while I agree that we add a scan for
> vbit and vbif to these testcases, there seems to be something dodgy
> going on with the register allocation.
>
> With -march=armv5te I'm getting the following snippet of code in the
> ltgt case:
>
> .L12:
> ldr r4, [ip]
> ldr r5, [ip, #4]
> ldr r6, [ip, #8]
> ldr r7, [ip, #12]
> vmov d20, r4, r5 @ v4sf
> vmov d21, r6, r7
> vcgt.f32 q8, q10, q9
> vcgt.f32 q10, q9, q10
> vorr q8, q8, q10
> vmov d22, r4, r5 @ v4sf
> vmov d23, r6, r7
> vbit q11, q9, q8
> vmov r4, r5, d22 @ v4sf
> vmov r6, r7, d23
>
> The second vcgt.f32 trashes q10, then recreates it in q11 with:
> vmov d22, r4, r5 @ v4sf
> vmov d23, r6, r7
>
> so it can do the vbit. Surely there's something better that can be done?
>
> In contrast, with -march=armv7-a we get:
>
> .L12:
> vld1.32 {q9}, [r4]!
> vcgt.f32 q8, q9, q10
> vcgt.f32 q11, q10, q9
> vorr q8, q8, q11
> vbsl q8, q10, q9
> vst1.32 {q8}, [lr]!
>
This is because of the unaligned access done for armv7-a. arm.c has the
following comment:
/* Enable -munaligned-access by default for
- all ARMv6 architecture-based processors
- ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
- ARMv8 architecture-base processors.
Disable -munaligned-access by default for
- all pre-ARMv6 architecture-based processors
- ARMv6-M architecture-based processors. */
Please look at the rtl difference.
- is armv7-a
+ is armv5te
;; vect_var_.18_61 = MEM[(float *)vect_pw2.14_59];
-(insn 71 70 72 (set (reg:V4SF 192)
- (unspec:V4SF [
- (mem:V4SF (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float
*)vect_pw2.14_59]+0 S16 A32])
- ] UNSPEC_MISALIGNED_ACCESS)) neon-vcond-ltgt.c:12 -1
+(insn 71 70 72 (clobber (reg:V4SF 168 [ vect_var_.18 ]))
neon-vcond-ltgt.c:12 -1
+ (nil))
+
+(insn 72 71 73 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 0)
+ (mem:SI (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float
*)vect_pw2.14_59]+0 S4 A32])) neon-vcond-ltgt.c:12 -1
+ (nil))
+
+(insn 73 72 74 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 4)
+ (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+ (const_int 4 [0x4])) [0 MEM[(float *)vect_pw2.14_59]+4
S4 A32])) neon-vcond-ltgt.c:12 -1
+ (nil))
+
+(insn 74 73 75 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 8)
+ (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+ (const_int 8 [0x8])) [0 MEM[(float *)vect_pw2.14_59]+8
S4 A32])) neon-vcond-ltgt.c:12 -1
(nil))
-(insn 72 71 0 (set (reg:V4SF 168 [ vect_var_.18 ])
- (reg:V4SF 192)) neon-vcond-ltgt.c:12 -1
+(insn 75 74 0 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 12)
+ (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+ (const_int 12 [0xc])) [0 MEM[(float
*)vect_pw2.14_59]+12 S4 A32])) neon-vcond-ltgt.c:12 -1
(nil))