Improving the checksumming speed is definitely a worthwhile contribution, but be aware that there is a pretty simple way to avoid computing the crc altogether, and that is to make sure your binaries have a build ID. This is generally as simple as adding -Wl,--build-id to your compiler flags.
+1 to moving the checksumming code to llvm pl On 13 April 2017 at 07:20, Zachary Turner via lldb-dev < lldb-dev@lists.llvm.org> wrote: > I know this is outside of your initial goal, but it would be really great > if JamCRC be updated in llvm to be parallel. I see that you're making use > of TaskRunner for the parallelism, but that looks pretty generic, so > perhaps that could be raised into llvm as well if it helps. > > Not trying to throw extra work on you, but it seems like a really good > general purpose improvement and it would be a shame if only lldb can > benefit from it. > On Wed, Apr 12, 2017 at 8:35 PM Scott Smith via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > >> Ok I stripped out the zlib crc algorithm and just left the parallelism + >> calls to zlib's crc32_combine, but only if we are actually linking with >> zlib. I left those calls here (rather than folding them info JamCRC) >> because I'm taking advantage of TaskRunner to parallelize the work. >> >> I moved the system include block after the llvm includes, both because I >> had to (to use the config #defines), and because it fit the published >> coding convention. >> >> By itself, it reduces my test time from 55 to 47 seconds. (The original >> time is slower than before because I pulled the latest code, guess there's >> another slowdown to fix). >> >> On Wed, Apr 12, 2017 at 12:15 PM, Scott Smith < >> scott.sm...@purestorage.com> wrote: >> >>> The algorithm included in ObjectFileELF.cpp performs a byte at a time >>> computation, which causes long pipeline stalls in modern processors. >>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2 >>> instruction set, but there are two ways to make it faster: >>> >>> 1. Work on multiple bytes at a time, using multiple lookup tables. (see >>> http://create.stephan-brumme.com/crc32/#slicing-by-8-overview) >>> 2. Compute crcs over separate regions in parallel, then combine the >>> results. (see http://stackoverflow.com/questions/23122312/crc- >>> calculation-of-a-mostly-static-data-stream) >>> >>> As it happens, zlib provides functions for both: >>> 1. The zlib crc32 function uses the same polynomial as >>> ObjectFileELF.cpp, and uses slicing-by-4 along with loop unrolling. >>> 2. The zlib library provides crc32_combine. >>> >>> I decided to just call out to the zlib library, since I see my version >>> of lldb already links with zlib; however, the llvm CMakeLists.txt declares >>> it optional. >>> >>> I'm including my patch that assumes zlib is always linked in. Let me >>> know if you prefer: >>> 1. I make the change conditional on having zlib (i.e. fall back to the >>> old code if zlib is not present) >>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp. >>> However, I'm going to guess that requires updating some documentation to >>> include zlib's copyright notice. >>> >>> This brings startup time on my machine / my binary from 50 seconds down >>> to 32. >>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM) >>> >>> >> _______________________________________________ >> lldb-dev mailing list >> lldb-dev@lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >> > > _______________________________________________ > lldb-dev mailing list > lldb-dev@lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev > >
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev