On Mon, Jun 8, 2015 at 12:57 PM, Oded Gabbay <[email protected]> wrote: > On Wed, Jun 3, 2015 at 6:42 AM, Siarhei Siamashka > <[email protected]> wrote: >>> + AVV (endian_xor.c[1]),0); >>> + perm = vec_xor (perm,(vector unsigned char) AVV ( >>> + 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x04, >>> + 0x08, 0x08, 0x08, 0x08, 0x0C, 0x0C, 0x0C, 0x0C)); >>> + return vec_perm (pix, pix, perm); >>> } >> >> For this part, both the original and the patched code resulted in >> identical instruction sequences: >> >> 0000000000000000 <.vmx_splat_alpha>: >> 0: 3d 22 00 00 addis r9,r2,0 >> 4: 39 29 00 00 addi r9,r9,0 >> 8: 7c 00 48 ce lvx v0,0,r9 >> c: 10 42 10 2b vperm v2,v2,v2,v0 >> 10: 4e 80 00 20 blr >> >> This is actually good. I was afraid that the compiler might screw up >> it a bit and do something stupid like adding an extra VXOR instruction >> here (for the 'vec_xor' intrinsic). >> > > Actually, I get a different disassembly: > > 0000000000007b10 <vmx_splat_alpha>: > 7b10: 00 00 4c 3c addis r2,r12,0 > 7b14: 00 00 42 38 addi r2,r2,0 > 7b18: 00 00 22 3d addis r9,r2,0 > 7b1c: 0c 03 23 10 vspltisb v1,3 > 7b20: 00 00 29 39 addi r9,r9,0 > 7b24: 99 4e 00 7c lxvd2x vs32,0,r9 > 7b28: 57 02 00 f0 xxswapd vs32,vs32 > 7b2c: d7 04 01 f0 xxlxor vs32,vs33,vs32 > 7b30: 17 05 00 f0 xxlnor vs32,vs32,vs32 > 7b34: 2b 10 42 10 vperm v2,v2,v2,v0 > 7b38: 20 00 80 4e blr > > And without the patch, I get this: > > 0000000000007930 <vmx_splat_alpha>: > 7930: 00 00 4c 3c addis r2,r12,0 > 7934: 00 00 42 38 addi r2,r2,0 > 7938: 00 00 22 3d addis r9,r2,0 > 793c: 00 00 29 39 addi r9,r9,0 > 7940: 98 4e 00 7c lxvd2x vs0,0,r9 > 7944: 50 02 00 f0 xxswapd vs0,vs0 > 7948: 11 05 00 f0 xxlnor vs32,vs0,vs0 > 794c: 2b 10 42 10 vperm v2,v2,v2,v0 > 7950: 20 00 80 4e blr > > So there is an added vspltisb + xxlxor command. > I used the default configure+make. > Maybe I need to define some special flag to the compiler ? > > This is my gcc version: > gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9) > I'm running RHEL 7.1 ppc64le on POWER8 machine. > > Oded
So I understood where my confusion came from. The disassembly you showed is for ppc64/be , while i work on ppc64/le So on ppc64/le, the added commands are: xxswapd vs0,vs0 <-- swap perm after load from memory xxlnor vs32,vs0,vs0 <-- NOR perm before the vperm command From what I understand, there is no way to eliminate these commands (unless writing inline assembly). The patch added commands, as I said, are vspltisb + xxlxor, so it is definitely better to remove these to make the overhead in ppc64/le to be just 2 commands instead of 4 commands. Using the #ifdef BIG will eliminate it. Oded _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
