Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-21 Thread Ed Maste via lldb-dev
On 13 April 2017 at 07:28, Pavel Labath via lldb-dev wrote: > Improving the checksumming speed is definitely a worthwhile contribution, > but be aware that there is a pretty simple way to avoid computing the crc > altogether, and that is to make sure your binaries have a build ID. This is > genera

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-18 Thread Scott Smith via lldb-dev
Thank you for that clarification. Sounds like we can't change the crc code then. I realized I had been using GNU's gold linker. I switched to linking with lld(-4.0) and now linking uses less than 1/3rd the cpu. It seems that the default hashing (fast == xxHash) is faster than whatever gold was

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-18 Thread Pavel Labath via lldb-dev
What we need is the ability to connect a stripped version of an SO to one with debug symbols present. Currently there are (at least) two ways to achieve that: - build-id: both SOs have a build-id section with the same value. Normally, that's added by a linker in the final link, and subsequent stri

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-13 Thread Scott Smith via lldb-dev
Interesting. That saves lldb startup time (after crc improvements/parallelization) by about 1.25 seconds wall clock / 10 seconds cpu time, but increases linking by about 2 seconds of cpu time (and an inconsistent amount of wall clock time). That's only a good tradeoff if you run the debugger a lo

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-13 Thread Kamil Rytarowski via lldb-dev
There is a good crc32c (assuming we want crc32c) code in DPDK (BSD-licensed). http://dpdk.org/browse/dpdk/tree/lib/librte_hash It has hardware assisted algorithm for x86 and arm64 (if hardware supports it). There is a fallback to lookup table implementation. CRC32 is definitely worth merging wit

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-13 Thread Pavel Labath via lldb-dev
Improving the checksumming speed is definitely a worthwhile contribution, but be aware that there is a pretty simple way to avoid computing the crc altogether, and that is to make sure your binaries have a build ID. This is generally as simple as adding -Wl,--build-id to your compiler flags. +1 to

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Zachary Turner via lldb-dev
I know this is outside of your initial goal, but it would be really great if JamCRC be updated in llvm to be parallel. I see that you're making use of TaskRunner for the parallelism, but that looks pretty generic, so perhaps that could be raised into llvm as well if it helps. Not trying to throw e

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
Ok I stripped out the zlib crc algorithm and just left the parallelism + calls to zlib's crc32_combine, but only if we are actually linking with zlib. I left those calls here (rather than folding them info JamCRC) because I'm taking advantage of TaskRunner to parallelize the work. I moved the sys

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Zachary Turner via lldb-dev
It seems like the the crc32_combine is not too hard to implement, but we do already have code in LLVM that is predicated on the existence of zlib, so it seems reasonable to leave it that way. And yes, you would submit those changes to llvm-dev. Perhaps the JamCRC implementation could be updated t

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
What about the crc combining? I don't feel comfortable reimplementing that on my own. Can I leave that as a feature predicated on zlib? For the JamCRC improvements, I assume I submit that to llvm-dev@ instead? On Wed, Apr 12, 2017 at 12:45 PM, Zachary Turner wrote: > BTW, the JamCRC is used

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Zachary Turner via lldb-dev
BTW, the JamCRC is used in writing Windows COFF object files, PGO instrumentation, and PDB Debug Info reading / writing, so any work we do to make it faster will benefit many parts of the toolchain. On Wed, Apr 12, 2017 at 12:42 PM Zachary Turner wrote: > It would be nice if we could simply upda

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Zachary Turner via lldb-dev
It would be nice if we could simply update LLVM's implementation to be faster. Having multiple implementations of the same thing seems undesirable, especially if one (fast) implementation is always superior to some other reason. i.e. there's no reason anyone would ever want to use a slow implemen

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
I didn't realize that existed; I just checked and it looks like there's JamCRC which uses the same polynomial. I don't know what "Jam" means in this context, unless it identifies the polynomial some how? The code is also byte-at-a-time. Would you prefer I use JamCRC support code instead, and the

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Zachary Turner via lldb-dev
Zlib is definitely optional and we cannot make it required. Did you check to see if llvm has a crc32 function somewhere in Support? On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev < lldb-dev@lists.llvm.org> wrote: > The algorithm included in ObjectFileELF.cpp performs a byte at a time >

[lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
The algorithm included in ObjectFileELF.cpp performs a byte at a time computation, which causes long pipeline stalls in modern processors. Unfortunately, the polynomial used is not the same one used by the SSE 4.2 instruction set, but there are two ways to make it faster: 1. Work on multiple bytes