Re: vec_ld versus vec_vsx_ld on power8

2015-03-13 Thread Bill Schmidt
Hi Tim, Sorry to have confused you. This stuff is a bit boggling the first 200 times you look at it... For both 32-bit and 64-bit floating-point, you should use ld_vsx_vec on both BE and LE machines, and the compiler will take care of doing the right thing for you in both cases. You do not have

Re: vec_ld versus vec_vsx_ld on power8

2015-03-13 Thread Ewart Timothée
Hello, I am super confuse now scenario 1, what I have in m code: machine boots in LE. 1) memory: LE 2) I load (ld_vec) 3) register : LE 4) VSU compute in LE 5) I store (st_vec) 6) memory: LE scenario 2: ( I did not test but it is what I get if I order gcc to compiler in BE) machine boot in BE

Re: vec_ld versus vec_vsx_ld on power8

2015-03-13 Thread Bill Schmidt
Hi Tim, Actually, I left out another very good reason why you may want to use vec_vsx_ld/st. Sorry for forgetting this. As you saw, vec_ld translates into the lvx instruction. This instruction loads a sequence of 16 bytes into a vector register. For big endian, the first byte in memory is load

Re: vec_ld versus vec_vsx_ld on power8

2015-03-13 Thread Ewart Timothée
thank you very much for this answer. I know my memory is aligned so I will use vec_ld/st only. best Tim

vec_ld versus vec_vsx_ld on power8

2015-03-13 Thread Bill Schmidt
Hi Tim, I'll discuss the loads here for simplicity; the situation for stores is analogous. There are a couple of differences between lvx and lxvd2x. The most important one is that lxvd2x supports unaligned loads, while lvx does not. You'll note that lvx will zero out the lower 4 bits of the eff

vec_ld versus vec_vsx_ld on power8

2015-03-13 Thread Ewart Timothée
Hello all, I have a issue/question using VMX/VSX on Power8 processor on a little endian system. Using intrinsics function, if I perform an operation with vec_vsx_ld(…) - vet_vsx_st(), the compiler will add a permutation, and then perform an operations (memory correctly aligned) lxvd2x … xxpermd