Re: [lldb-dev] Improve performance of crc32 calculation

Scott Smith via lldb-dev Wed, 12 Apr 2017 12:52:24 -0700

What about the crc combining?  I don't feel comfortable reimplementing that
on my own.  Can I leave that as a feature predicated on zlib?


For the JamCRC improvements, I assume I submit that to llvm-dev@ instead?


On Wed, Apr 12, 2017 at 12:45 PM, Zachary Turner <[email protected]> wrote:

> BTW, the JamCRC is used in writing Windows COFF object files, PGO
> instrumentation, and PDB Debug Info reading / writing, so any work we do to
> make it faster will benefit many parts of the toolchain.
>
> On Wed, Apr 12, 2017 at 12:42 PM Zachary Turner <[email protected]>
> wrote:
>
>> It would be nice if we could simply update LLVM's implementation to be
>> faster.  Having multiple implementations of the same thing seems
>> undesirable, especially if one (fast) implementation is always superior to
>> some other reason.  i.e. there's no reason anyone would ever want to use a
>> slow implementation if a fast one is available.
>>
>> Can we change the JamCRC implementation in LLVM to use 4-byte slicing and
>> parallelize it ourselves?  This way there's no dependency on zlib, so even
>> people who have non-zlib enabled builds of LLDB get the benefits of the
>> fast algorithm.
>>
>> On Wed, Apr 12, 2017 at 12:36 PM Scott Smith <[email protected]>
>> wrote:
>>
>>> I didn't realize that existed; I just checked and it looks like there's
>>> JamCRC which uses the same polynomial.  I don't know what "Jam" means in
>>> this context, unless it identifies the polynomial some how?  The code is
>>> also byte-at-a-time.
>>>
>>> Would you prefer I use JamCRC support code instead, and then change
>>> JamCRC to optionally use zlib if it's available?
>>>
>>> On Wed, Apr 12, 2017 at 12:23 PM, Zachary Turner <[email protected]>
>>> wrote:
>>>
>>>> Zlib is definitely optional and we cannot make it required.
>>>>
>>>> Did you check to see if llvm has a crc32 function somewhere in Support?
>>>> On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev <
>>>> [email protected]> wrote:
>>>>
>>>>> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>>>>> computation, which causes long pipeline stalls in modern processors.
>>>>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2
>>>>> instruction set, but there are two ways to make it faster:
>>>>>
>>>>> 1. Work on multiple bytes at a time, using multiple lookup tables.
>>>>> (see http://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
>>>>> 2. Compute crcs over separate regions in parallel, then combine the
>>>>> results.  (see http://stackoverflow.com/questions/23122312/crc-
>>>>> calculation-of-a-mostly-static-data-stream)
>>>>>
>>>>> As it happens, zlib provides functions for both:
>>>>> 1. The zlib crc32 function uses the same polynomial as
>>>>> ObjectFileELF.cpp, and uses slicing-by-4 along with loop unrolling.
>>>>> 2. The zlib library provides crc32_combine.
>>>>>
>>>>> I decided to just call out to the zlib library, since I see my version
>>>>> of lldb already links with zlib; however, the llvm CMakeLists.txt declares
>>>>> it optional.
>>>>>
>>>>> I'm including my patch that assumes zlib is always linked in.  Let me
>>>>> know if you prefer:
>>>>> 1. I make the change conditional on having zlib (i.e. fall back to the
>>>>> old code if zlib is not present)
>>>>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp.
>>>>> However, I'm going to guess that requires updating some documentation to
>>>>> include zlib's copyright notice.
>>>>>
>>>>> This brings startup time on my machine / my binary from 50 seconds
>>>>> down to 32.
>>>>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
>>>>>
>>>>> _______________________________________________
>>>>> lldb-dev mailing list
>>>>> [email protected]
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>
>>>>
>>>

_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] Improve performance of crc32 calculation

Reply via email to