What about the crc combining? I don't feel comfortable reimplementing that on my own. Can I leave that as a feature predicated on zlib?
For the JamCRC improvements, I assume I submit that to llvm-dev@ instead? On Wed, Apr 12, 2017 at 12:45 PM, Zachary Turner <ztur...@google.com> wrote: > BTW, the JamCRC is used in writing Windows COFF object files, PGO > instrumentation, and PDB Debug Info reading / writing, so any work we do to > make it faster will benefit many parts of the toolchain. > > On Wed, Apr 12, 2017 at 12:42 PM Zachary Turner <ztur...@google.com> > wrote: > >> It would be nice if we could simply update LLVM's implementation to be >> faster. Having multiple implementations of the same thing seems >> undesirable, especially if one (fast) implementation is always superior to >> some other reason. i.e. there's no reason anyone would ever want to use a >> slow implementation if a fast one is available. >> >> Can we change the JamCRC implementation in LLVM to use 4-byte slicing and >> parallelize it ourselves? This way there's no dependency on zlib, so even >> people who have non-zlib enabled builds of LLDB get the benefits of the >> fast algorithm. >> >> On Wed, Apr 12, 2017 at 12:36 PM Scott Smith <scott.sm...@purestorage.com> >> wrote: >> >>> I didn't realize that existed; I just checked and it looks like there's >>> JamCRC which uses the same polynomial. I don't know what "Jam" means in >>> this context, unless it identifies the polynomial some how? The code is >>> also byte-at-a-time. >>> >>> Would you prefer I use JamCRC support code instead, and then change >>> JamCRC to optionally use zlib if it's available? >>> >>> On Wed, Apr 12, 2017 at 12:23 PM, Zachary Turner <ztur...@google.com> >>> wrote: >>> >>>> Zlib is definitely optional and we cannot make it required. >>>> >>>> Did you check to see if llvm has a crc32 function somewhere in Support? >>>> On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev < >>>> lldb-dev@lists.llvm.org> wrote: >>>> >>>>> The algorithm included in ObjectFileELF.cpp performs a byte at a time >>>>> computation, which causes long pipeline stalls in modern processors. >>>>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2 >>>>> instruction set, but there are two ways to make it faster: >>>>> >>>>> 1. Work on multiple bytes at a time, using multiple lookup tables. >>>>> (see http://create.stephan-brumme.com/crc32/#slicing-by-8-overview) >>>>> 2. Compute crcs over separate regions in parallel, then combine the >>>>> results. (see http://stackoverflow.com/questions/23122312/crc- >>>>> calculation-of-a-mostly-static-data-stream) >>>>> >>>>> As it happens, zlib provides functions for both: >>>>> 1. The zlib crc32 function uses the same polynomial as >>>>> ObjectFileELF.cpp, and uses slicing-by-4 along with loop unrolling. >>>>> 2. The zlib library provides crc32_combine. >>>>> >>>>> I decided to just call out to the zlib library, since I see my version >>>>> of lldb already links with zlib; however, the llvm CMakeLists.txt declares >>>>> it optional. >>>>> >>>>> I'm including my patch that assumes zlib is always linked in. Let me >>>>> know if you prefer: >>>>> 1. I make the change conditional on having zlib (i.e. fall back to the >>>>> old code if zlib is not present) >>>>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp. >>>>> However, I'm going to guess that requires updating some documentation to >>>>> include zlib's copyright notice. >>>>> >>>>> This brings startup time on my machine / my binary from 50 seconds >>>>> down to 32. >>>>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM) >>>>> >>>>> _______________________________________________ >>>>> lldb-dev mailing list >>>>> lldb-dev@lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >>>>> >>>> >>>
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev