Hi Tim,
Sorry to have confused you. This stuff is a bit boggling the first 200
times you look at it...
For both 32-bit and 64-bit floating-point, you should use ld_vsx_vec on
both BE and LE machines, and the compiler will take care of doing the
right thing for you in both cases. You do not have
Hello,
I am super confuse now
scenario 1, what I have in m code:
machine boots in LE.
1) memory: LE
2) I load (ld_vec)
3) register : LE
4) VSU compute in LE
5) I store (st_vec)
6) memory: LE
scenario 2: ( I did not test but it is what I get if I order gcc to compiler in
BE)
machine boot in BE
Hi Tim,
Actually, I left out another very good reason why you may want to use
vec_vsx_ld/st. Sorry for forgetting this.
As you saw, vec_ld translates into the lvx instruction. This
instruction loads a sequence of 16 bytes into a vector register. For
big endian, the first byte in memory is load
thank you very much for this answer.
I know my memory is aligned so I will use vec_ld/st only.
best
Tim
Hi Tim,
I'll discuss the loads here for simplicity; the situation for stores is
analogous.
There are a couple of differences between lvx and lxvd2x. The most
important one is that lxvd2x supports unaligned loads, while lvx does
not. You'll note that lvx will zero out the lower 4 bits of the
eff
Hello all,
I have a issue/question using VMX/VSX on Power8 processor on a little endian
system.
Using intrinsics function, if I perform an operation with vec_vsx_ld(…) -
vet_vsx_st(), the compiler will add
a permutation, and then perform an operations (memory correctly aligned)
lxvd2x …
xxpermd