On 13 April 2017 at 07:28, Pavel Labath via lldb-dev
wrote:
> Improving the checksumming speed is definitely a worthwhile contribution,
> but be aware that there is a pretty simple way to avoid computing the crc
> altogether, and that is to make sure your binaries have a build ID. This is
> genera
Thank you for that clarification. Sounds like we can't change the crc code
then.
I realized I had been using GNU's gold linker. I switched to linking with
lld(-4.0) and now linking uses less than 1/3rd the cpu. It seems that the
default hashing (fast == xxHash) is faster than whatever gold was
What we need is the ability to connect a stripped version of an SO to one
with debug symbols present. Currently there are (at least) two ways to
achieve that:
- build-id: both SOs have a build-id section with the same value. Normally,
that's added by a linker in the final link, and subsequent stri
Interesting. That saves lldb startup time (after crc
improvements/parallelization) by about 1.25 seconds wall clock / 10 seconds
cpu time, but increases linking by about 2 seconds of cpu time (and an
inconsistent amount of wall clock time). That's only a good tradeoff if
you run the debugger a lo
There is a good crc32c (assuming we want crc32c) code in DPDK
(BSD-licensed).
http://dpdk.org/browse/dpdk/tree/lib/librte_hash
It has hardware assisted algorithm for x86 and arm64 (if hardware
supports it). There is a fallback to lookup table implementation.
CRC32 is definitely worth merging wit
Improving the checksumming speed is definitely a worthwhile contribution,
but be aware that there is a pretty simple way to avoid computing the crc
altogether, and that is to make sure your binaries have a build ID. This is
generally as simple as adding -Wl,--build-id to your compiler flags.
+1 to
I know this is outside of your initial goal, but it would be really great
if JamCRC be updated in llvm to be parallel. I see that you're making use
of TaskRunner for the parallelism, but that looks pretty generic, so
perhaps that could be raised into llvm as well if it helps.
Not trying to throw e
Ok I stripped out the zlib crc algorithm and just left the parallelism +
calls to zlib's crc32_combine, but only if we are actually linking with
zlib. I left those calls here (rather than folding them info JamCRC)
because I'm taking advantage of TaskRunner to parallelize the work.
I moved the sys
It seems like the the crc32_combine is not too hard to implement, but we do
already have code in LLVM that is predicated on the existence of zlib, so
it seems reasonable to leave it that way. And yes, you would submit those
changes to llvm-dev. Perhaps the JamCRC implementation could be updated t
What about the crc combining? I don't feel comfortable reimplementing that
on my own. Can I leave that as a feature predicated on zlib?
For the JamCRC improvements, I assume I submit that to llvm-dev@ instead?
On Wed, Apr 12, 2017 at 12:45 PM, Zachary Turner wrote:
> BTW, the JamCRC is used
BTW, the JamCRC is used in writing Windows COFF object files, PGO
instrumentation, and PDB Debug Info reading / writing, so any work we do to
make it faster will benefit many parts of the toolchain.
On Wed, Apr 12, 2017 at 12:42 PM Zachary Turner wrote:
> It would be nice if we could simply upda
It would be nice if we could simply update LLVM's implementation to be
faster. Having multiple implementations of the same thing seems
undesirable, especially if one (fast) implementation is always superior to
some other reason. i.e. there's no reason anyone would ever want to use a
slow implemen
I didn't realize that existed; I just checked and it looks like there's
JamCRC which uses the same polynomial. I don't know what "Jam" means in
this context, unless it identifies the polynomial some how? The code is
also byte-at-a-time.
Would you prefer I use JamCRC support code instead, and the
Zlib is definitely optional and we cannot make it required.
Did you check to see if llvm has a crc32 function somewhere in Support?
On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev <
lldb-dev@lists.llvm.org> wrote:
> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>
The algorithm included in ObjectFileELF.cpp performs a byte at a time
computation, which causes long pipeline stalls in modern processors.
Unfortunately, the polynomial used is not the same one used by the SSE 4.2
instruction set, but there are two ways to make it faster:
1. Work on multiple bytes
15 matches
Mail list logo