On 22/10/2021 11:47, Lin Liu wrote: > The swab() is massively over complicated > Simplify it with compiler builtins and fallback to plain C function > if undefined. > Update components to switch to this new swap bytes. > > <snip> > 34 files changed, 150 insertions(+), 646 deletions(-)
It is worth saying a couple of things. x86's ___arch__swab64 is wrong. Well - it was mostly ok for 32bit builds of Xen, and is not ok for 64bit builds. As a consequence, this series nets an improvement of: $ ../scripts/bloat-o-meter xen-syms-before xen-syms-after add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-54 (-54) Function old new delta elf_access_unsigned 173 151 -22 unlzo 1128 1096 -32 Total: Before=3803328, After=3803274, chg -0.00% because the code generation for bswap64 goes from: ffff82d0402059b0 <_bswap64>: ffff82d0402059b0: 48 89 f8 mov %rdi,%rax ffff82d0402059b3: 48 c1 e8 20 shr $0x20,%rax ffff82d0402059b7: 0f cf bswap %edi ffff82d0402059b9: 0f c8 bswap %eax ffff82d0402059bb: 97 xchg %eax,%edi ffff82d0402059bc: 48 89 c2 mov %rax,%rdx ffff82d0402059bf: 89 f8 mov %edi,%eax ffff82d0402059c1: 48 c1 e2 20 shl $0x20,%rdx ffff82d0402059c5: 48 09 d0 or %rdx,%rax ffff82d0402059c8: c3 retq to ffff82d0402059b0 <_bswap64>: ffff82d0402059b0: 48 89 f8 mov %rdi,%rax ffff82d0402059b3: 48 0f c8 bswap %rax ffff82d0402059b6: c3 retq Almost all byteswapping is done on 32bit quantities, not 64, which is why the delta is so small. However, it also drops 500 lines of code, which is a demonstration of how silly the swab() infrastructure was. It also removes the need for per-arch code to do any of this. I'd say its safe to go into 4.16, but I'll understand if others want to push back on that at this point in the release. ~Andrew
