BTW, the JamCRC is used in writing Windows COFF object files, PGO instrumentation, and PDB Debug Info reading / writing, so any work we do to make it faster will benefit many parts of the toolchain.
On Wed, Apr 12, 2017 at 12:42 PM Zachary Turner <ztur...@google.com> wrote: > It would be nice if we could simply update LLVM's implementation to be > faster. Having multiple implementations of the same thing seems > undesirable, especially if one (fast) implementation is always superior to > some other reason. i.e. there's no reason anyone would ever want to use a > slow implementation if a fast one is available. > > Can we change the JamCRC implementation in LLVM to use 4-byte slicing and > parallelize it ourselves? This way there's no dependency on zlib, so even > people who have non-zlib enabled builds of LLDB get the benefits of the > fast algorithm. > > On Wed, Apr 12, 2017 at 12:36 PM Scott Smith <scott.sm...@purestorage.com> > wrote: > >> I didn't realize that existed; I just checked and it looks like there's >> JamCRC which uses the same polynomial. I don't know what "Jam" means in >> this context, unless it identifies the polynomial some how? The code is >> also byte-at-a-time. >> >> Would you prefer I use JamCRC support code instead, and then change >> JamCRC to optionally use zlib if it's available? >> >> On Wed, Apr 12, 2017 at 12:23 PM, Zachary Turner <ztur...@google.com> >> wrote: >> >>> Zlib is definitely optional and we cannot make it required. >>> >>> Did you check to see if llvm has a crc32 function somewhere in Support? >>> On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev < >>> lldb-dev@lists.llvm.org> wrote: >>> >>>> The algorithm included in ObjectFileELF.cpp performs a byte at a time >>>> computation, which causes long pipeline stalls in modern processors. >>>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2 >>>> instruction set, but there are two ways to make it faster: >>>> >>>> 1. Work on multiple bytes at a time, using multiple lookup tables. (see >>>> http://create.stephan-brumme.com/crc32/#slicing-by-8-overview) >>>> 2. Compute crcs over separate regions in parallel, then combine the >>>> results. (see >>>> http://stackoverflow.com/questions/23122312/crc-calculation-of-a-mostly-static-data-stream >>>> ) >>>> >>>> As it happens, zlib provides functions for both: >>>> 1. The zlib crc32 function uses the same polynomial as >>>> ObjectFileELF.cpp, and uses slicing-by-4 along with loop unrolling. >>>> 2. The zlib library provides crc32_combine. >>>> >>>> I decided to just call out to the zlib library, since I see my version >>>> of lldb already links with zlib; however, the llvm CMakeLists.txt declares >>>> it optional. >>>> >>>> I'm including my patch that assumes zlib is always linked in. Let me >>>> know if you prefer: >>>> 1. I make the change conditional on having zlib (i.e. fall back to the >>>> old code if zlib is not present) >>>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp. >>>> However, I'm going to guess that requires updating some documentation to >>>> include zlib's copyright notice. >>>> >>>> This brings startup time on my machine / my binary from 50 seconds down >>>> to 32. >>>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM) >>>> >>>> _______________________________________________ >>>> lldb-dev mailing list >>>> lldb-dev@lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >>>> >>> >>
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev