[lldb-dev] LLDB performance drop from 3.9 to 4.0

2017-04-12 Thread Scott Smith via lldb-dev
I worked on some performance improvements for lldb 3.9, and was about to
forward port them so I can submit them for inclusion, but I realized there
has been a major performance drop from 3.9 to 4.0.  I am using the official
builds on an Ubuntu 16.04 machine with 16 cores / 32 hyperthreads.

Running: time lldb-4.0 -b -o 'b main' -o 'run' MY_PROGRAM > /dev/null

With 3.9, I get:
real0m31.782s
user0m50.024s
sys0m4.348s

With 4.0, I get:
real0m51.652s
user1m19.780s
sys0m10.388s

(with my changes + 3.9, I got real down to 4.8 seconds!  But I'm not
convinced you'll like all the changes.)

Is this expected?  I get roughly the same results when compiling llvm+lldb
from source.

I guess I can spend some time trying to bisect what happened.  5.0 looks to
be another 8% slower.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
The algorithm included in ObjectFileELF.cpp performs a byte at a time
computation, which causes long pipeline stalls in modern processors.
Unfortunately, the polynomial used is not the same one used by the SSE 4.2
instruction set, but there are two ways to make it faster:

1. Work on multiple bytes at a time, using multiple lookup tables. (see
http://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
2. Compute crcs over separate regions in parallel, then combine the
results.  (see
http://stackoverflow.com/questions/23122312/crc-calculation-of-a-mostly-static-data-stream
)

As it happens, zlib provides functions for both:
1. The zlib crc32 function uses the same polynomial as ObjectFileELF.cpp,
and uses slicing-by-4 along with loop unrolling.
2. The zlib library provides crc32_combine.

I decided to just call out to the zlib library, since I see my version of
lldb already links with zlib; however, the llvm CMakeLists.txt declares it
optional.

I'm including my patch that assumes zlib is always linked in.  Let me know
if you prefer:
1. I make the change conditional on having zlib (i.e. fall back to the old
code if zlib is not present)
2. I copy all the code from zlib and put it in ObjectFileELF.cpp.  However,
I'm going to guess that requires updating some documentation to include
zlib's copyright notice.

This brings startup time on my machine / my binary from 50 seconds down to
32.
(time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
Use zlib crc functions

diff --git a/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp b/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
index 6e2001b..ce4d2b0 100644
--- a/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
+++ b/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "lldb/Core/ArchSpec.h"
 #include "lldb/Core/FileSpecList.h"
@@ -28,6 +29,7 @@
 #include "lldb/Utility/Error.h"
 #include "lldb/Utility/Log.h"
 #include "lldb/Utility/Stream.h"
+#include "lldb/Utility/TaskPool.h"
 
 #include "llvm/ADT/PointerUnion.h"
 #include "llvm/ADT/StringRef.h"
@@ -474,67 +476,40 @@ bool ObjectFileELF::MagicBytesMatch(DataBufferSP &data_sp,
   return false;
 }
 
-/*
- * crc function from http://svnweb.freebsd.org/base/head/sys/libkern/crc32.c
- *
- *   COPYRIGHT (C) 1986 Gary S. Brown. You may use this program, or
- *   code or tables extracted from it, as desired without restriction.
- */
-static uint32_t calc_crc32(uint32_t crc, const void *buf, size_t size) {
-  static const uint32_t g_crc32_tab[] = {
-  0x, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f,
-  0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
-  0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2,
-  0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
-  0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9,
-  0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
-  0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c,
-  0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
-  0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423,
-  0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
-  0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106,
-  0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
-  0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d,
-  0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
-  0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950,
-  0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
-  0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7,
-  0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
-  0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa,
-  0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
-  0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81,
-  0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
-  0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84,
-  0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
-  0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb,
-  0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
-  0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e,
-  0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
-  0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55,
-  0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
-  0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
I didn't realize that existed; I just checked and it looks like there's
JamCRC which uses the same polynomial.  I don't know what "Jam" means in
this context, unless it identifies the polynomial some how?  The code is
also byte-at-a-time.

Would you prefer I use JamCRC support code instead, and then change JamCRC
to optionally use zlib if it's available?

On Wed, Apr 12, 2017 at 12:23 PM, Zachary Turner  wrote:

> Zlib is definitely optional and we cannot make it required.
>
> Did you check to see if llvm has a crc32 function somewhere in Support?
> On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>> computation, which causes long pipeline stalls in modern processors.
>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2
>> instruction set, but there are two ways to make it faster:
>>
>> 1. Work on multiple bytes at a time, using multiple lookup tables. (see
>> http://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
>> 2. Compute crcs over separate regions in parallel, then combine the
>> results.  (see http://stackoverflow.com/questions/23122312/crc-
>> calculation-of-a-mostly-static-data-stream)
>>
>> As it happens, zlib provides functions for both:
>> 1. The zlib crc32 function uses the same polynomial as ObjectFileELF.cpp,
>> and uses slicing-by-4 along with loop unrolling.
>> 2. The zlib library provides crc32_combine.
>>
>> I decided to just call out to the zlib library, since I see my version of
>> lldb already links with zlib; however, the llvm CMakeLists.txt declares it
>> optional.
>>
>> I'm including my patch that assumes zlib is always linked in.  Let me
>> know if you prefer:
>> 1. I make the change conditional on having zlib (i.e. fall back to the
>> old code if zlib is not present)
>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp.
>> However, I'm going to guess that requires updating some documentation to
>> include zlib's copyright notice.
>>
>> This brings startup time on my machine / my binary from 50 seconds down
>> to 32.
>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
>>
>> ___
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] LLDB performance drop from 3.9 to 4.0

2017-04-12 Thread Scott Smith via lldb-dev
For my app I think it's largely parsing debug symbols tables for shared
libraries.  My main performance improvement was to increase the parallelism
of parsing that information.

Funny, gdb/gold has a similar accelerator table (created when you link with
-gdb-index).  I assume lldb doesn't know how to parse it.

I'll work on bisecting the change.

On Wed, Apr 12, 2017 at 12:26 PM, Jason Molenda  wrote:

> I don't know exactly when the 3.9 / 4.0 branches were cut, and what was
> done between those two points, but in general we don't expect/want to see
> performance regressions like that.  I'm more familiar with the perf
> characteristics on macos, Linux is different in some important regards, so
> I can only speak in general terms here.
>
> In your example, you're measuring three things, assuming you have debug
> information for MY_PROGRAM.  The first is "Do the initial read of the main
> binary and its debug information".  The second is "Find all symbol names
> 'main'".  The third is "Scan a newly loaded solib's symbols" (assuming you
> don't have debug information from solibs from /usr/lib etc).  Technically
> there's some additional stuff here -- launching the process, detecting
> solibs as they're loaded, looking up the symbol context when we hit the
> breakpoint, backtracing a frame or two, etc, but that stuff is rarely where
> you'll see perf issues on a local debug session.
>
> Which of these is likely to be important will depend on your MY_PROGRAM.
> If you have a 'int main(){}', it's not going to be dwarf parsing.  If your
> binary only pulls in three solib's by the time it is running, it's not
> going to be new module scanning. A popular place to spend startup time is
> in C++ name demangling if you have a lot of solibs with C++ symbols.
>
>
> On Darwin systems, we have a nonstandard accelerator table in our DWARF
> emitted by clang that lldb reads.  The "apple_types", "apple_names" etc
> tables.  So when we need to find a symbol named "main", for Modules that
> have a SymbolFile, we can look in the accelerator table.  If that
> SymbolFile has a 'main', the accelerator table gives us a reference into
> the DWARF for the definition, and we can consume the DWARF lazily.  We
> should never need to do a full scan over the DWARF, that's considered a
> failure.
>
> (in fact, I'm working on a branch of the llvm.org sources from
> mid-October and I suspect Darwin lldb is often consuming a LOT more dwarf
> than it should be when I'm debugging, I need to figure out what is causing
> that, it's a big problem.)
>
>
> In general, I've been wanting to add a new "perf counters" infrastructure
> & testsuite to lldb, but haven't had time.  One thing I work on a lot is
> debugging over a bluetooth connection; it turns out that BT is very slow,
> and any extra packets we send between lldb and debugserver are very
> costly.  The communication is so fast over a local host, or over a usb
> cable, that it's easy for regressions to sneak in without anyone noticing.
> So the original idea was hey, we can have something that counts packets for
> distinct operations.  Like, this "next" command should take no more than 40
> packets, that kind of thing.  And it could be expanded -- "b main should
> fully parse the DWARF for only 1 symbol", or "p *this should only look up 5
> types", etc.
>
>
>
>
> > On Apr 12, 2017, at 11:26 AM, Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
> >
> > I worked on some performance improvements for lldb 3.9, and was about to
> forward port them so I can submit them for inclusion, but I realized there
> has been a major performance drop from 3.9 to 4.0.  I am using the official
> builds on an Ubuntu 16.04 machine with 16 cores / 32 hyperthreads.
> >
> > Running: time lldb-4.0 -b -o 'b main' -o 'run' MY_PROGRAM > /dev/null
> >
> > With 3.9, I get:
> > real0m31.782s
> > user0m50.024s
> > sys0m4.348s
> >
> > With 4.0, I get:
> > real0m51.652s
> > user1m19.780s
> > sys0m10.388s
> >
> > (with my changes + 3.9, I got real down to 4.8 seconds!  But I'm not
> convinced you'll like all the changes.)
> >
> > Is this expected?  I get roughly the same results when compiling
> llvm+lldb from source.
> >
> > I guess I can spend some time trying to bisect what happened.  5.0 looks
> to be another 8% slower.
> >
> > ___
> > lldb-dev mailing list
> > lldb-dev@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
What about the crc combining?  I don't feel comfortable reimplementing that
on my own.  Can I leave that as a feature predicated on zlib?

For the JamCRC improvements, I assume I submit that to llvm-dev@ instead?


On Wed, Apr 12, 2017 at 12:45 PM, Zachary Turner  wrote:

> BTW, the JamCRC is used in writing Windows COFF object files, PGO
> instrumentation, and PDB Debug Info reading / writing, so any work we do to
> make it faster will benefit many parts of the toolchain.
>
> On Wed, Apr 12, 2017 at 12:42 PM Zachary Turner 
> wrote:
>
>> It would be nice if we could simply update LLVM's implementation to be
>> faster.  Having multiple implementations of the same thing seems
>> undesirable, especially if one (fast) implementation is always superior to
>> some other reason.  i.e. there's no reason anyone would ever want to use a
>> slow implementation if a fast one is available.
>>
>> Can we change the JamCRC implementation in LLVM to use 4-byte slicing and
>> parallelize it ourselves?  This way there's no dependency on zlib, so even
>> people who have non-zlib enabled builds of LLDB get the benefits of the
>> fast algorithm.
>>
>> On Wed, Apr 12, 2017 at 12:36 PM Scott Smith 
>> wrote:
>>
>>> I didn't realize that existed; I just checked and it looks like there's
>>> JamCRC which uses the same polynomial.  I don't know what "Jam" means in
>>> this context, unless it identifies the polynomial some how?  The code is
>>> also byte-at-a-time.
>>>
>>> Would you prefer I use JamCRC support code instead, and then change
>>> JamCRC to optionally use zlib if it's available?
>>>
>>> On Wed, Apr 12, 2017 at 12:23 PM, Zachary Turner 
>>> wrote:
>>>
>>>> Zlib is definitely optional and we cannot make it required.
>>>>
>>>> Did you check to see if llvm has a crc32 function somewhere in Support?
>>>> On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev <
>>>> lldb-dev@lists.llvm.org> wrote:
>>>>
>>>>> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>>>>> computation, which causes long pipeline stalls in modern processors.
>>>>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2
>>>>> instruction set, but there are two ways to make it faster:
>>>>>
>>>>> 1. Work on multiple bytes at a time, using multiple lookup tables.
>>>>> (see http://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
>>>>> 2. Compute crcs over separate regions in parallel, then combine the
>>>>> results.  (see http://stackoverflow.com/questions/23122312/crc-
>>>>> calculation-of-a-mostly-static-data-stream)
>>>>>
>>>>> As it happens, zlib provides functions for both:
>>>>> 1. The zlib crc32 function uses the same polynomial as
>>>>> ObjectFileELF.cpp, and uses slicing-by-4 along with loop unrolling.
>>>>> 2. The zlib library provides crc32_combine.
>>>>>
>>>>> I decided to just call out to the zlib library, since I see my version
>>>>> of lldb already links with zlib; however, the llvm CMakeLists.txt declares
>>>>> it optional.
>>>>>
>>>>> I'm including my patch that assumes zlib is always linked in.  Let me
>>>>> know if you prefer:
>>>>> 1. I make the change conditional on having zlib (i.e. fall back to the
>>>>> old code if zlib is not present)
>>>>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp.
>>>>> However, I'm going to guess that requires updating some documentation to
>>>>> include zlib's copyright notice.
>>>>>
>>>>> This brings startup time on my machine / my binary from 50 seconds
>>>>> down to 32.
>>>>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
>>>>>
>>>>> ___
>>>>> lldb-dev mailing list
>>>>> lldb-dev@lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>
>>>>
>>>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
Ok I stripped out the zlib crc algorithm and just left the parallelism +
calls to zlib's crc32_combine, but only if we are actually linking with
zlib.  I left those calls here (rather than folding them info JamCRC)
because I'm taking advantage of TaskRunner to parallelize the work.

I moved the system include block after the llvm includes, both because I
had to (to use the config #defines), and because it fit the published
coding convention.

By itself, it reduces my test time from 55 to 47 seconds. (The original
time is slower than before because I pulled the latest code, guess there's
another slowdown to fix).

On Wed, Apr 12, 2017 at 12:15 PM, Scott Smith 
wrote:

> The algorithm included in ObjectFileELF.cpp performs a byte at a time
> computation, which causes long pipeline stalls in modern processors.
> Unfortunately, the polynomial used is not the same one used by the SSE 4.2
> instruction set, but there are two ways to make it faster:
>
> 1. Work on multiple bytes at a time, using multiple lookup tables. (see
> http://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
> 2. Compute crcs over separate regions in parallel, then combine the
> results.  (see http://stackoverflow.com/questions/23122312/crc-
> calculation-of-a-mostly-static-data-stream)
>
> As it happens, zlib provides functions for both:
> 1. The zlib crc32 function uses the same polynomial as ObjectFileELF.cpp,
> and uses slicing-by-4 along with loop unrolling.
> 2. The zlib library provides crc32_combine.
>
> I decided to just call out to the zlib library, since I see my version of
> lldb already links with zlib; however, the llvm CMakeLists.txt declares it
> optional.
>
> I'm including my patch that assumes zlib is always linked in.  Let me know
> if you prefer:
> 1. I make the change conditional on having zlib (i.e. fall back to the old
> code if zlib is not present)
> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp.
> However, I'm going to guess that requires updating some documentation to
> include zlib's copyright notice.
>
> This brings startup time on my machine / my binary from 50 seconds down to
> 32.
> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
>
>


zlib_crc.patch
Description: Binary data
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Parallelize loading of shared libraries

2017-04-12 Thread Scott Smith via lldb-dev
The POSIX dynamic loader processes one module at a time.  If you have a lot
of shared libraries, each with a lot of symbols, this creates unneeded
serialization (despite the use of TaskRunners during symbol loading, there
is still quite a bit of serialization when loading a library).

In order to parallelize this, I actually had to do two things.  Neither one
makes any difference, only the combination improves performance (I left
them as separate patches for clarity):

1. Change the POSIX dynamic loader to fork each module into its own
thread.  I didn't use TaskRunner because some of the called functions use
TaskRunner, and it isn't recursion safe.  The final modules are added to
the list in the original order despite whatever order the threads finish.

2. Change Module::AppendImpl to fire off some expensive work as a separate
thread.

These two changes bring startup time down from 36 (assuming the previously
mentioned crc changes) seconds to 11.  It doesn't improve efficiency, it
just increases parallelism.


dyn_load_thread.patch
Description: Binary data


prime_caches.patch
Description: Binary data
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Parallelize loading of shared libraries

2017-04-13 Thread Scott Smith via lldb-dev
Ok.  I tried doing something similar to gdb but was unable to make any
headway because they have so many global variables.  It looked more
promising with lldb since there were already some locks.

I assume you're talking about check-lldb?
https://lldb.llvm.org/test.html

I'll work on getting those to pass reliably.

As for eager vs not, I was just running code that already runs as part of:
b main
run

That said, I'm sure all the symbol loading is due to setting a breakpoint
on a function name.  Is there really that much value in deferring that?
What if loading the symbols was done in parallel without delaying execution
of the debugged program if you didn't have a breakpoint?  Then the impact
would be (nearly) invisible to the end user.

On Thu, Apr 13, 2017 at 5:35 AM, Pavel Labath  wrote:

> I've have looked at paralelization of the module loading code some time
> ago, albeit with a slightly different use case in mind. I eventually
> abandoned it (at least temporarily) because I could not get it to work
> correctly for all use cases.
>
> I do think that doing this is a good idea, but I think it will have to be
> done with a very steady hand. E.g., if I patch your changes in right now I
> get about 10 random tests failing on every test suite run, so it's clear
> that you are introducing a race somewhere.
>
> We will also need to have a discussion about what kind of work can be done
> eagerly, as I believe we are trying to a lot of things very lazily (which
> unfortunately makes efficient paralelization more complicated).
>
>
>
> On 13 April 2017 at 06:34, Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> The POSIX dynamic loader processes one module at a time.  If you have a
>> lot of shared libraries, each with a lot of symbols, this creates unneeded
>> serialization (despite the use of TaskRunners during symbol loading, there
>> is still quite a bit of serialization when loading a library).
>>
>> In order to parallelize this, I actually had to do two things.  Neither
>> one makes any difference, only the combination improves performance (I left
>> them as separate patches for clarity):
>>
>> 1. Change the POSIX dynamic loader to fork each module into its own
>> thread.  I didn't use TaskRunner because some of the called functions use
>> TaskRunner, and it isn't recursion safe.  The final modules are added to
>> the list in the original order despite whatever order the threads finish.
>>
>> 2. Change Module::AppendImpl to fire off some expensive work as a
>> separate thread.
>>
>> These two changes bring startup time down from 36 (assuming the
>> previously mentioned crc changes) seconds to 11.  It doesn't improve
>> efficiency, it just increases parallelism.
>>
>>
>> ___
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-13 Thread Scott Smith via lldb-dev
Interesting.  That saves lldb startup time (after crc
improvements/parallelization) by about 1.25 seconds wall clock / 10 seconds
cpu time, but increases linking by about 2 seconds of cpu time (and an
inconsistent amount of wall clock time).  That's only a good tradeoff if
you run the debugger a lot.

If all you need is a unique id, there are cheaper ways of going about it.
The SSE crc instruction would be cheaper, or using CityHash/MurmurHash for
other cpus.  I thought it was specifically tied to that crc algorithm.  In
that case it doesn't make sense to fold this into JamCRC, since that's tied
to a difficult-to-optimize algorithm.

On Thu, Apr 13, 2017 at 4:28 AM, Pavel Labath  wrote:

> Improving the checksumming speed is definitely a worthwhile contribution,
> but be aware that there is a pretty simple way to avoid computing the crc
> altogether, and that is to make sure your binaries have a build ID. This is
> generally as simple as adding -Wl,--build-id to your compiler flags.
>
> +1 to moving the checksumming code to llvm
>
> pl
>
> On 13 April 2017 at 07:20, Zachary Turner via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> I know this is outside of your initial goal, but it would be really great
>> if JamCRC be updated in llvm to be parallel. I see that you're making use
>> of TaskRunner for the parallelism, but that looks pretty generic, so
>> perhaps that could be raised into llvm as well if it helps.
>>
>> Not trying to throw extra work on you, but it seems like a really good
>> general purpose improvement and it would be a shame if only lldb can
>> benefit from it.
>> On Wed, Apr 12, 2017 at 8:35 PM Scott Smith via lldb-dev <
>> lldb-dev@lists.llvm.org> wrote:
>>
>>> Ok I stripped out the zlib crc algorithm and just left the parallelism +
>>> calls to zlib's crc32_combine, but only if we are actually linking with
>>> zlib.  I left those calls here (rather than folding them info JamCRC)
>>> because I'm taking advantage of TaskRunner to parallelize the work.
>>>
>>> I moved the system include block after the llvm includes, both because I
>>> had to (to use the config #defines), and because it fit the published
>>> coding convention.
>>>
>>> By itself, it reduces my test time from 55 to 47 seconds. (The original
>>> time is slower than before because I pulled the latest code, guess there's
>>> another slowdown to fix).
>>>
>>> On Wed, Apr 12, 2017 at 12:15 PM, Scott Smith <
>>> scott.sm...@purestorage.com> wrote:
>>>
>>>> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>>>> computation, which causes long pipeline stalls in modern processors.
>>>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2
>>>> instruction set, but there are two ways to make it faster:
>>>>
>>>> 1. Work on multiple bytes at a time, using multiple lookup tables. (see
>>>> http://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
>>>> 2. Compute crcs over separate regions in parallel, then combine the
>>>> results.  (see http://stackoverflow.com/quest
>>>> ions/23122312/crc-calculation-of-a-mostly-static-data-stream)
>>>>
>>>> As it happens, zlib provides functions for both:
>>>> 1. The zlib crc32 function uses the same polynomial as
>>>> ObjectFileELF.cpp, and uses slicing-by-4 along with loop unrolling.
>>>> 2. The zlib library provides crc32_combine.
>>>>
>>>> I decided to just call out to the zlib library, since I see my version
>>>> of lldb already links with zlib; however, the llvm CMakeLists.txt declares
>>>> it optional.
>>>>
>>>> I'm including my patch that assumes zlib is always linked in.  Let me
>>>> know if you prefer:
>>>> 1. I make the change conditional on having zlib (i.e. fall back to the
>>>> old code if zlib is not present)
>>>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp.
>>>> However, I'm going to guess that requires updating some documentation to
>>>> include zlib's copyright notice.
>>>>
>>>> This brings startup time on my machine / my binary from 50 seconds down
>>>> to 32.
>>>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
>>>>
>>>>
>>> ___
>>> lldb-dev mailing list
>>> lldb-dev@lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>
>>
>> ___
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-18 Thread Scott Smith via lldb-dev
Thank you for that clarification.  Sounds like we can't change the crc code
then.

I realized I had been using GNU's gold linker.  I switched to linking with
lld(-4.0) and now linking uses less than 1/3rd the cpu.  It seems that the
default hashing (fast == xxHash) is faster than whatever gold was using.
I'll just switch to that and call it a day.

On Tue, Apr 18, 2017 at 5:46 AM, Pavel Labath  wrote:

> What we need is the ability to connect a stripped version of an SO to one
> with debug symbols present. Currently there are (at least) two ways to
> achieve that:
>
> - build-id: both SOs have a build-id section with the same value.
> Normally, that's added by a linker in the final link, and subsequent strip
> steps do not remove it. Normally the build-id is some sort of a hash of the
> *initial* file contents, which is why you feel like you are trading
> debugger startup time for link time. However, that is not a requirement, as
> the exact checksumming algorithm does not matter here. A random byte
> sequence would do just fine, which is what "--build-id=uuid" does and it
> should have no impact on your link time. Be sure **not** to use this if you
> care about deterministic builds though.
>
> - gnu_debuglink: here, the stripped SO contains a checksum of the original
> SO, which is added at strip time. This is done using a fixed algorithm, and
> this is important as the debugger needs to arrive at the same checksum as
> the strip tool. Also worth noting is that this mechanism embeds the path of
> the original SO into the stripped one, whereas the first one leaves the
> search task up to the debugger. This may be a plus or a minus, depending on
> your use case.
>
> Hope that makes things a bit clearer. Cheers,
> pl
>
>
> On 13 April 2017 at 18:31, Scott Smith 
> wrote:
>
>> Interesting.  That saves lldb startup time (after crc
>> improvements/parallelization) by about 1.25 seconds wall clock / 10 seconds
>> cpu time, but increases linking by about 2 seconds of cpu time (and an
>> inconsistent amount of wall clock time).  That's only a good tradeoff if
>> you run the debugger a lot.
>>
>> If all you need is a unique id, there are cheaper ways of going about
>> it.  The SSE crc instruction would be cheaper, or using CityHash/MurmurHash
>> for other cpus.  I thought it was specifically tied to that crc algorithm.
>> In that case it doesn't make sense to fold this into JamCRC, since that's
>> tied to a difficult-to-optimize algorithm.
>>
>> On Thu, Apr 13, 2017 at 4:28 AM, Pavel Labath  wrote:
>>
>>> Improving the checksumming speed is definitely a worthwhile
>>> contribution, but be aware that there is a pretty simple way to avoid
>>> computing the crc altogether, and that is to make sure your binaries have a
>>> build ID. This is generally as simple as adding -Wl,--build-id to your
>>> compiler flags.
>>>
>>> +1 to moving the checksumming code to llvm
>>>
>>> pl
>>>
>>> On 13 April 2017 at 07:20, Zachary Turner via lldb-dev <
>>> lldb-dev@lists.llvm.org> wrote:
>>>
>>>> I know this is outside of your initial goal, but it would be really
>>>> great if JamCRC be updated in llvm to be parallel. I see that you're making
>>>> use of TaskRunner for the parallelism, but that looks pretty generic, so
>>>> perhaps that could be raised into llvm as well if it helps.
>>>>
>>>> Not trying to throw extra work on you, but it seems like a really good
>>>> general purpose improvement and it would be a shame if only lldb can
>>>> benefit from it.
>>>> On Wed, Apr 12, 2017 at 8:35 PM Scott Smith via lldb-dev <
>>>> lldb-dev@lists.llvm.org> wrote:
>>>>
>>>>> Ok I stripped out the zlib crc algorithm and just left the parallelism
>>>>> + calls to zlib's crc32_combine, but only if we are actually linking with
>>>>> zlib.  I left those calls here (rather than folding them info JamCRC)
>>>>> because I'm taking advantage of TaskRunner to parallelize the work.
>>>>>
>>>>> I moved the system include block after the llvm includes, both because
>>>>> I had to (to use the config #defines), and because it fit the published
>>>>> coding convention.
>>>>>
>>>>> By itself, it reduces my test time from 55 to 47 seconds. (The
>>>>> original time is slower than before because I pulled the latest code, 
>>>>> guess
>>>>> there's another slowdown to fix).
>>

[lldb-dev] Running check-lldb

2017-04-18 Thread Scott Smith via lldb-dev
I'm trying to make sure some of my changes don't break lldb tests, but I'm
having trouble getting a clean run even with a plain checkout.  I've tried
the latest head of master, as well as release_40.  I'm running Ubuntu
16.04/amd64.  I built with:

cmake ../llvm -G Ninja -DCMAKE_BUILD_TYPE=Debug
ninja lldb
ninja check-lldb

Compiler is gcc-5.4, though I've also tried with clang-4.0.

Am I missing something obvious?  Is there a docker image / vm image / known
good environments that I can use to reproduce a clean test run (on
something Linux-y - sorry, I don't have a Mac)?
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Running check-lldb

2017-04-19 Thread Scott Smith via lldb-dev
Yeah I found the buildbot instance for lldb on Ubuntu 14.04, but it looks
like it is only running release builds. Is that on purpose?

On Wed, Apr 19, 2017 at 3:59 AM, Pavel Labath  wrote:

> It looks like we are triggering an assert in llvm on a debug build. I'll
> try to track this down ASAP.
>
>
> On 18 April 2017 at 21:24, Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> I'm trying to make sure some of my changes don't break lldb tests, but
>> I'm having trouble getting a clean run even with a plain checkout.  I've
>> tried the latest head of master, as well as release_40.  I'm running Ubuntu
>> 16.04/amd64.  I built with:
>>
>> cmake ../llvm -G Ninja -DCMAKE_BUILD_TYPE=Debug
>> ninja lldb
>> ninja check-lldb
>>
>> Compiler is gcc-5.4, though I've also tried with clang-4.0.
>>
>> Am I missing something obvious?  Is there a docker image / vm image /
>> known good environments that I can use to reproduce a clean test run (on
>> something Linux-y - sorry, I don't have a Mac)?
>>
>>
>> ___
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Running check-lldb

2017-04-19 Thread Scott Smith via lldb-dev
A combination of:
1. Updating to a known good release according to buildbot
2. using Ubuntu 14.04
3. compiling release using clang-4.0
4. using the dotest command line that buildbot uses
5. specifying gcc-4.8 instead of the locally compiled clang

has most of the tests passing, with a handful of unexpected successes:

UNEXPECTED SUCCESS:
TestRegisterVariables.RegisterVariableTestCase.test_and_run_command_dwarf
(lang/c/register_variables/TestRegisterVariables.py)
UNEXPECTED SUCCESS:
TestRegisterVariables.RegisterVariableTestCase.test_and_run_command_dwo
(lang/c/register_variables/TestRegisterVariables.py)
UNEXPECTED SUCCESS:
TestExitDuringBreak.ExitDuringBreakpointTestCase.test_dwarf
(functionalities/thread/exit_during_break/TestExitDuringBreak.py)
UNEXPECTED SUCCESS:
TestExitDuringBreak.ExitDuringBreakpointTestCase.test_dwo
(functionalities/thread/exit_during_break/TestExitDuringBreak.py)
UNEXPECTED SUCCESS:
TestThreadStates.ThreadStateTestCase.test_process_interrupt_dwarf
(functionalities/thread/state/TestThreadStates.py)
UNEXPECTED SUCCESS:
TestThreadStates.ThreadStateTestCase.test_process_interrupt_dwo
(functionalities/thread/state/TestThreadStates.py)
UNEXPECTED SUCCESS: TestRaise.RaiseTestCase.test_restart_bug_dwarf
(functionalities/signal/raise/TestRaise.py)
UNEXPECTED SUCCESS: TestRaise.RaiseTestCase.test_restart_bug_dwo
(functionalities/signal/raise/TestRaise.py)
UNEXPECTED SUCCESS:
TestMultithreaded.SBBreakpointCallbackCase.test_sb_api_listener_resume_dwarf
(api/multithreaded/TestMultithreaded.py)
UNEXPECTED SUCCESS:
TestMultithreaded.SBBreakpointCallbackCase.test_sb_api_listener_resume_dwo
(api/multithreaded/TestMultithreaded.py)
UNEXPECTED SUCCESS: lldbsuite.test.lldbtest.TestPrintf.test_with_dwarf
(lang/cpp/printf/TestPrintf.py)
UNEXPECTED SUCCESS: lldbsuite.test.lldbtest.TestPrintf.test_with_dwo
(lang/cpp/printf/TestPrintf.py)

This looks different than another user's issue:
http://lists.llvm.org/pipermail/lldb-dev/2016-February/009504.html

I also tried gcc-4.9.4 (via the ubuntu-toolchain-r ppa) and got a different
set of problems:

FAIL: TestNamespaceDefinitions.NamespaceDefinitionsTestCase.test_expr_dwarf
(lang/cpp/namespace_definitions/TestNamespaceDefinitions.py)
FAIL: TestNamespaceDefinitions.NamespaceDefinitionsTestCase.test_expr_dwo
(lang/cpp/namespace_definitions/TestNamespaceDefinitions.py)
FAIL:
TestTopLevelExprs.TopLevelExpressionsTestCase.test_top_level_expressions_dwarf
(expression_command/top-level/TestTopLevelExprs.py)
FAIL:
TestTopLevelExprs.TopLevelExpressionsTestCase.test_top_level_expressions_dwo
(expression_command/top-level/TestTopLevelExprs.py)
UNEXPECTED SUCCESS:
TestExitDuringBreak.ExitDuringBreakpointTestCase.test_dwarf
(functionalities/thread/exit_during_break/TestExitDuringBreak.py)
UNEXPECTED SUCCESS:
TestExitDuringBreak.ExitDuringBreakpointTestCase.test_dwo
(functionalities/thread/exit_during_break/TestExitDuringBreak.py)
UNEXPECTED SUCCESS:
TestThreadStates.ThreadStateTestCase.test_process_interrupt_dwarf
(functionalities/thread/state/TestThreadStates.py)
UNEXPECTED SUCCESS: TestRaise.RaiseTestCase.test_restart_bug_dwarf
(functionalities/signal/raise/TestRaise.py)
UNEXPECTED SUCCESS: TestRaise.RaiseTestCase.test_restart_bug_dwo
(functionalities/signal/raise/TestRaise.py)
UNEXPECTED SUCCESS:
TestMultithreaded.SBBreakpointCallbackCase.test_sb_api_listener_resume_dwarf
(api/multithreaded/TestMultithreaded.py)
UNEXPECTED SUCCESS:
TestMultithreaded.SBBreakpointCallbackCase.test_sb_api_listener_resume_dwo
(api/multithreaded/TestMultithreaded.py)
UNEXPECTED SUCCESS: lldbsuite.test.lldbtest.TestPrintf.test_with_dwarf
(lang/cpp/printf/TestPrintf.py)
UNEXPECTED SUCCESS: lldbsuite.test.lldbtest.TestPrintf.test_with_dwo
(lang/cpp/printf/TestPrintf.py)

Well, at least the list is consistent, which gives me a base to start
testing race conditions :-)



On Wed, Apr 19, 2017 at 7:37 AM, Pavel Labath  wrote:

> It is on purpose, although whether that purpose is worthwhile is
> debatable...
>
> We chose to run release builds there so to align the bots closer to the
> binaries we release. Unfortunately, it does mean we run into situations
> like these...
>
> In any case, I have now a patch up for fixing one of the crashers. The
> main one (assert during relocation processing) seems to be caused by a
> recent change in llvm. I am working towards identifying the cause, but that
> may take a while.
>
> Then we can hopefully have a look at failures on your machine.
>
>
> On 19 April 2017 at 14:28, Scott Smith 
> wrote:
>
>> Yeah I found the buildbot instance for lldb on Ubuntu 14.04, but it looks
>> like it is only running release builds. Is that on purpose?
>>
>> On Wed, Apr 19, 2017 at 3:59 AM, Pavel Labath  wrote:
>>
>>> It looks like we are triggering an assert in llvm on a debug build. I'll
>>> try to track this down ASAP.
>>>
>>>
>>> On

Re: [lldb-dev] LLDB performance drop from 3.9 to 4.0

2017-04-19 Thread Scott Smith via lldb-dev
It looks like it was this change:

commit 45fb8d00309586c3f7027f66f9f8a0b56bf1cc4a
Author: Zachary Turner 
Date:   Thu Oct 6 21:22:44 2016 +

Convert UniqueCStringMap to use StringRef.

git-svn-id: https://llvm.org/svn/llvm-project/lldb/trunk@283494
91177308-0d34-0410-b5e6-96231b3b80d8


I'm guessing it's because the old code assumed const string, which meant
that uniqueness comparisons could be done by simply comparing the pointer.
Now it needs to use an actual string comparison routine.  This code:

 bool operator<(const Entry &rhs) const { return cstring < rhs.cstring;
}

didn't actually change in the revision, but cstring went from 'const char
*' to 'StringRef'.  If you know for sure that all the StringRefs come from
ConstString, then it'd be easy enough to change the comparison, but I don't
know how you guarantee that.

I assume the change was made to allow proper memory cleanup when the
symbols are discarded?

On Thu, Apr 13, 2017 at 5:37 AM, Pavel Labath  wrote:

> Bisecting the performance regression would be extremely valuable. If you
> want to do that, it would be very appreciated.
>
> On 12 April 2017 at 20:39, Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> For my app I think it's largely parsing debug symbols tables for shared
>> libraries.  My main performance improvement was to increase the parallelism
>> of parsing that information.
>>
>> Funny, gdb/gold has a similar accelerator table (created when you link
>> with -gdb-index).  I assume lldb doesn't know how to parse it.
>>
>> I'll work on bisecting the change.
>>
>> On Wed, Apr 12, 2017 at 12:26 PM, Jason Molenda 
>> wrote:
>>
>>> I don't know exactly when the 3.9 / 4.0 branches were cut, and what was
>>> done between those two points, but in general we don't expect/want to see
>>> performance regressions like that.  I'm more familiar with the perf
>>> characteristics on macos, Linux is different in some important regards, so
>>> I can only speak in general terms here.
>>>
>>> In your example, you're measuring three things, assuming you have debug
>>> information for MY_PROGRAM.  The first is "Do the initial read of the main
>>> binary and its debug information".  The second is "Find all symbol names
>>> 'main'".  The third is "Scan a newly loaded solib's symbols" (assuming you
>>> don't have debug information from solibs from /usr/lib etc).  Technically
>>> there's some additional stuff here -- launching the process, detecting
>>> solibs as they're loaded, looking up the symbol context when we hit the
>>> breakpoint, backtracing a frame or two, etc, but that stuff is rarely where
>>> you'll see perf issues on a local debug session.
>>>
>>> Which of these is likely to be important will depend on your
>>> MY_PROGRAM.  If you have a 'int main(){}', it's not going to be dwarf
>>> parsing.  If your binary only pulls in three solib's by the time it is
>>> running, it's not going to be new module scanning. A popular place to spend
>>> startup time is in C++ name demangling if you have a lot of solibs with C++
>>> symbols.
>>>
>>>
>>> On Darwin systems, we have a nonstandard accelerator table in our DWARF
>>> emitted by clang that lldb reads.  The "apple_types", "apple_names" etc
>>> tables.  So when we need to find a symbol named "main", for Modules that
>>> have a SymbolFile, we can look in the accelerator table.  If that
>>> SymbolFile has a 'main', the accelerator table gives us a reference into
>>> the DWARF for the definition, and we can consume the DWARF lazily.  We
>>> should never need to do a full scan over the DWARF, that's considered a
>>> failure.
>>>
>>> (in fact, I'm working on a branch of the llvm.org sources from
>>> mid-October and I suspect Darwin lldb is often consuming a LOT more dwarf
>>> than it should be when I'm debugging, I need to figure out what is causing
>>> that, it's a big problem.)
>>>
>>>
>>> In general, I've been wanting to add a new "perf counters"
>>> infrastructure & testsuite to lldb, but haven't had time.  One thing I work
>>> on a lot is debugging over a bluetooth connection; it turns out that BT is
>>> very slow, and any extra packets we send between lldb and debugserver are
>>> very costly.  The communication is so fast over a

Re: [lldb-dev] LLDB performance drop from 3.9 to 4.0

2017-04-19 Thread Scott Smith via lldb-dev
If I just assume the pointers are from ConstString, then doesn't that
defeat the purpose of making the interface safer?  Why not use an actual
ConstString and provide conversion operators from ConstString to
StringRef?  Seems we should be able to rely on the type system to get us
safety and performance.

I'll try putting something together tomorrow.

On Wed, Apr 19, 2017 at 4:48 PM, Zachary Turner  wrote:

> The change was made to make the interface safer and allow propagation of
> StringRef through other layers.  The previous code was already taking a
> const char *, and so it was working under the assumption that the const
> char* passed in came from a ConstString.  As such, continuing to make that
> same assumption seems completely reasonable.
>
> So perhaps you can just change the operator to compare the pointers, as
> was being done before.
>
> On Wed, Apr 19, 2017 at 4:24 PM Scott Smith 
> wrote:
>
>> It looks like it was this change:
>>
>> commit 45fb8d00309586c3f7027f66f9f8a0b56bf1cc4a
>> Author: Zachary Turner 
>> Date:   Thu Oct 6 21:22:44 2016 +
>>
>> Convert UniqueCStringMap to use StringRef.
>>
>> git-svn-id: https://llvm.org/svn/llvm-project/lldb/trunk@283494
>> 91177308-0d34-0410-b5e6-96231b3b80d8
>>
>>
>> I'm guessing it's because the old code assumed const string, which meant
>> that uniqueness comparisons could be done by simply comparing the pointer.
>> Now it needs to use an actual string comparison routine.  This code:
>>
>>  bool operator<(const Entry &rhs) const { return cstring <
>> rhs.cstring; }
>>
>> didn't actually change in the revision, but cstring went from 'const char
>> *' to 'StringRef'.  If you know for sure that all the StringRefs come from
>> ConstString, then it'd be easy enough to change the comparison, but I don't
>> know how you guarantee that.
>>
>> I assume the change was made to allow proper memory cleanup when the
>> symbols are discarded?
>>
>> On Thu, Apr 13, 2017 at 5:37 AM, Pavel Labath  wrote:
>>
>>> Bisecting the performance regression would be extremely valuable. If you
>>> want to do that, it would be very appreciated.
>>>
>>> On 12 April 2017 at 20:39, Scott Smith via lldb-dev <
>>> lldb-dev@lists.llvm.org> wrote:
>>>
>>>> For my app I think it's largely parsing debug symbols tables for shared
>>>> libraries.  My main performance improvement was to increase the parallelism
>>>> of parsing that information.
>>>>
>>>> Funny, gdb/gold has a similar accelerator table (created when you link
>>>> with -gdb-index).  I assume lldb doesn't know how to parse it.
>>>>
>>>> I'll work on bisecting the change.
>>>>
>>>> On Wed, Apr 12, 2017 at 12:26 PM, Jason Molenda 
>>>> wrote:
>>>>
>>>>> I don't know exactly when the 3.9 / 4.0 branches were cut, and what
>>>>> was done between those two points, but in general we don't expect/want to
>>>>> see performance regressions like that.  I'm more familiar with the perf
>>>>> characteristics on macos, Linux is different in some important regards, so
>>>>> I can only speak in general terms here.
>>>>>
>>>>> In your example, you're measuring three things, assuming you have
>>>>> debug information for MY_PROGRAM.  The first is "Do the initial read of 
>>>>> the
>>>>> main binary and its debug information".  The second is "Find all symbol
>>>>> names 'main'".  The third is "Scan a newly loaded solib's symbols"
>>>>> (assuming you don't have debug information from solibs from /usr/lib etc).
>>>>> Technically there's some additional stuff here -- launching the process,
>>>>> detecting solibs as they're loaded, looking up the symbol context when we
>>>>> hit the breakpoint, backtracing a frame or two, etc, but that stuff is
>>>>> rarely where you'll see perf issues on a local debug session.
>>>>>
>>>>> Which of these is likely to be important will depend on your
>>>>> MY_PROGRAM.  If you have a 'int main(){}', it's not going to be dwarf
>>>>> parsing.  If your binary only pulls in three solib's by the time it is
>>>>> running, it's not going to be new module scanning. A popular place to 
>>>>> sp

Re: [lldb-dev] LLDB performance drop from 3.9 to 4.0

2017-04-20 Thread Scott Smith via lldb-dev
What's the preferred way to post changes?  In the past I tried emailing
here but it was pointed out I should send to lldb-commit instead.  But,
there's also phabricator for web-based code reviews.

So,

1. just email lldb-commits?
2. post on http://reviews.llvm.org/?

On Thu, Apr 20, 2017 at 3:16 AM, Pavel Labath  wrote:

> Thank you very much for tracking this down.
>
> +1 for making UniqueCStringMap speak ConstString -- i think it just makes
> sense given that it already has "unique" in the name.
>
> ConstString already has a GetStringRef accessor. Also adding a conversion
> operator may be a good idea, although it probably won't help in all
> situations (you'll still have to write StringRef(X).drop_front() etc. if
> you want to do stringref operations on the string)
>
> pl
>
> On 20 April 2017 at 01:46, Zachary Turner  wrote:
>
>> It doesn't entirely defeat the purpose, it's just not *as good* as making
>> the interfaces take ConstStrings.  StringRef already has a lot of safety
>> and usability improvements over raw char pointers, and those improvements
>> don't disappear just because you aren't using ConstString.  Although I
>> agree that if you can make it work where the interface only accepts and
>> returns ConstStrings, and make conversion from ConstString to StringRef
>> more seamless, that would be an even better improvement.
>>
>> On Wed, Apr 19, 2017 at 5:33 PM Scott Smith 
>> wrote:
>>
>>> If I just assume the pointers are from ConstString, then doesn't that
>>> defeat the purpose of making the interface safer?  Why not use an actual
>>> ConstString and provide conversion operators from ConstString to
>>> StringRef?  Seems we should be able to rely on the type system to get us
>>> safety and performance.
>>>
>>> I'll try putting something together tomorrow.
>>>
>>> On Wed, Apr 19, 2017 at 4:48 PM, Zachary Turner 
>>> wrote:
>>>
>>>> The change was made to make the interface safer and allow propagation
>>>> of StringRef through other layers.  The previous code was already taking a
>>>> const char *, and so it was working under the assumption that the const
>>>> char* passed in came from a ConstString.  As such, continuing to make that
>>>> same assumption seems completely reasonable.
>>>>
>>>> So perhaps you can just change the operator to compare the pointers, as
>>>> was being done before.
>>>>
>>>> On Wed, Apr 19, 2017 at 4:24 PM Scott Smith <
>>>> scott.sm...@purestorage.com> wrote:
>>>>
>>>>> It looks like it was this change:
>>>>>
>>>>> commit 45fb8d00309586c3f7027f66f9f8a0b56bf1cc4a
>>>>> Author: Zachary Turner 
>>>>> Date:   Thu Oct 6 21:22:44 2016 +
>>>>>
>>>>> Convert UniqueCStringMap to use StringRef.
>>>>>
>>>>> git-svn-id: https://llvm.org/svn/llvm-project/lldb/trunk@283494
>>>>> 91177308-0d34-0410-b5e6-96231b3b80d8
>>>>>
>>>>>
>>>>> I'm guessing it's because the old code assumed const string, which
>>>>> meant that uniqueness comparisons could be done by simply comparing the
>>>>> pointer.  Now it needs to use an actual string comparison routine.  This
>>>>> code:
>>>>>
>>>>>  bool operator<(const Entry &rhs) const { return cstring <
>>>>> rhs.cstring; }
>>>>>
>>>>> didn't actually change in the revision, but cstring went from 'const
>>>>> char *' to 'StringRef'.  If you know for sure that all the StringRefs come
>>>>> from ConstString, then it'd be easy enough to change the comparison, but I
>>>>> don't know how you guarantee that.
>>>>>
>>>>> I assume the change was made to allow proper memory cleanup when the
>>>>> symbols are discarded?
>>>>>
>>>>> On Thu, Apr 13, 2017 at 5:37 AM, Pavel Labath 
>>>>> wrote:
>>>>>
>>>>>> Bisecting the performance regression would be extremely valuable. If
>>>>>> you want to do that, it would be very appreciated.
>>>>>>
>>>>>> On 12 April 2017 at 20:39, Scott Smith via lldb-dev <
>>>>>> lldb-dev@lists.llvm.org> wrote:
>>>>>>
>>>>>>> For my app I think it's largely parsing

Re: [lldb-dev] Running check-lldb

2017-04-20 Thread Scott Smith via lldb-dev
On Thu, Apr 20, 2017 at 6:47 AM, Pavel Labath  wrote:

> 5. specifying gcc-4.8 instead of the locally compiled clang
>
> has most of the tests passing, with a handful of unexpected successes:
>>
>> UNEXPECTED SUCCESS: TestRegisterVariables.Register
>> VariableTestCase.test_and_run_command_dwarf
>> (lang/c/register_variables/TestRegisterVariables.py)
>> UNEXPECTED SUCCESS: TestRegisterVariables.Register
>> VariableTestCase.test_and_run_command_dwo (lang/c/register_variables/Tes
>> tRegisterVariables.py)
>> UNEXPECTED SUCCESS: 
>> TestExitDuringBreak.ExitDuringBreakpointTestCase.test_dwarf
>> (functionalities/thread/exit_during_break/TestExitDuringBreak.py)
>> UNEXPECTED SUCCESS: TestExitDuringBreak.ExitDuringBreakpointTestCase.test_dwo
>> (functionalities/thread/exit_during_break/TestExitDuringBreak.py)
>> UNEXPECTED SUCCESS: TestThreadStates.ThreadStateTe
>> stCase.test_process_interrupt_dwarf (functionalities/thread/state/
>> TestThreadStates.py)
>> UNEXPECTED SUCCESS: TestThreadStates.ThreadStateTe
>> stCase.test_process_interrupt_dwo (functionalities/thread/state/
>> TestThreadStates.py)
>> UNEXPECTED SUCCESS: TestRaise.RaiseTestCase.test_restart_bug_dwarf
>> (functionalities/signal/raise/TestRaise.py)
>> UNEXPECTED SUCCESS: TestRaise.RaiseTestCase.test_restart_bug_dwo
>> (functionalities/signal/raise/TestRaise.py)
>> UNEXPECTED SUCCESS: TestMultithreaded.SBBreakpoint
>> CallbackCase.test_sb_api_listener_resume_dwarf
>> (api/multithreaded/TestMultithreaded.py)
>> UNEXPECTED SUCCESS: TestMultithreaded.SBBreakpoint
>> CallbackCase.test_sb_api_listener_resume_dwo
>> (api/multithreaded/TestMultithreaded.py)
>> UNEXPECTED SUCCESS: lldbsuite.test.lldbtest.TestPrintf.test_with_dwarf
>> (lang/cpp/printf/TestPrintf.py)
>> UNEXPECTED SUCCESS: lldbsuite.test.lldbtest.TestPrintf.test_with_dwo
>> (lang/cpp/printf/TestPrintf.py)
>>
> The unexpected successes are expected, unfortunately. :) What happens here
> is that the tests are flaky and they fail like 1% of the time, so they are
> marked as xfail.
>

Top of tree clang has the same set of unexpected successes.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Running check-lldb

2017-04-20 Thread Scott Smith via lldb-dev
Sorry, I take that back.  I forgot to save the buffer that ran the test
script.  Oops :-(

I get a number of errors that make me think it's missing libc++, which
makes sense because I never installed it.  However, I thought clang
automatically falls back to using gcc's libstdc++.

Failures include:

Build Command Output:
main.cpp:10:10: fatal error: 'atomic' file not found
#include 
 ^~~~
1 error generated.

and

Build Command Output:
main.cpp:1:10: fatal error: 'string' file not found
#include 
 ^~~~
1 error generated.




On Thu, Apr 20, 2017 at 11:30 AM, Scott Smith 
wrote:

> On Thu, Apr 20, 2017 at 6:47 AM, Pavel Labath  wrote:
>
>> 5. specifying gcc-4.8 instead of the locally compiled clang
>>
>> has most of the tests passing, with a handful of unexpected successes:
>>>
>>> UNEXPECTED SUCCESS: TestRegisterVariables.Register
>>> VariableTestCase.test_and_run_command_dwarf
>>> (lang/c/register_variables/TestRegisterVariables.py)
>>> UNEXPECTED SUCCESS: TestRegisterVariables.Register
>>> VariableTestCase.test_and_run_command_dwo (lang/c/register_variables/Tes
>>> tRegisterVariables.py)
>>> UNEXPECTED SUCCESS: 
>>> TestExitDuringBreak.ExitDuringBreakpointTestCase.test_dwarf
>>> (functionalities/thread/exit_during_break/TestExitDuringBreak.py)
>>> UNEXPECTED SUCCESS: 
>>> TestExitDuringBreak.ExitDuringBreakpointTestCase.test_dwo
>>> (functionalities/thread/exit_during_break/TestExitDuringBreak.py)
>>> UNEXPECTED SUCCESS: TestThreadStates.ThreadStateTe
>>> stCase.test_process_interrupt_dwarf (functionalities/thread/state/
>>> TestThreadStates.py)
>>> UNEXPECTED SUCCESS: TestThreadStates.ThreadStateTe
>>> stCase.test_process_interrupt_dwo (functionalities/thread/state/
>>> TestThreadStates.py)
>>> UNEXPECTED SUCCESS: TestRaise.RaiseTestCase.test_restart_bug_dwarf
>>> (functionalities/signal/raise/TestRaise.py)
>>> UNEXPECTED SUCCESS: TestRaise.RaiseTestCase.test_restart_bug_dwo
>>> (functionalities/signal/raise/TestRaise.py)
>>> UNEXPECTED SUCCESS: TestMultithreaded.SBBreakpoint
>>> CallbackCase.test_sb_api_listener_resume_dwarf
>>> (api/multithreaded/TestMultithreaded.py)
>>> UNEXPECTED SUCCESS: TestMultithreaded.SBBreakpoint
>>> CallbackCase.test_sb_api_listener_resume_dwo
>>> (api/multithreaded/TestMultithreaded.py)
>>> UNEXPECTED SUCCESS: lldbsuite.test.lldbtest.TestPrintf.test_with_dwarf
>>> (lang/cpp/printf/TestPrintf.py)
>>> UNEXPECTED SUCCESS: lldbsuite.test.lldbtest.TestPrintf.test_with_dwo
>>> (lang/cpp/printf/TestPrintf.py)
>>>
>> The unexpected successes are expected, unfortunately. :) What happens
>> here is that the tests are flaky and they fail like 1% of the time, so they
>> are marked as xfail.
>>
>
> Top of tree clang has the same set of unexpected successes.
>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Parallelizing loading of shared libraries

2017-04-26 Thread Scott Smith via lldb-dev
After a dealing with a bunch of microoptimizations, I'm back to
parallelizing loading of shared modules.  My naive approach was to just
create a new thread per shared library.  I have a feeling some users may
not like that; I think I read an email from someone who has thousands of
shared libraries.  That's a lot of threads :-)

The problem is loading a shared library can cause downstream
parallelization through TaskPool.  I can't then also have the loading of a
shared library itself go through TaskPool, as that could cause a deadlock -
if all the worker threads are waiting on work that TaskPool needs to run on
a worker thread then nothing will happen.

Three possible solutions:

1. Remove the notion of a single global TaskPool, but instead have a static
pool at each callsite that wants it.  That way multiple paths into the same
code would share the same pool, but different places in the code would have
their own pool.

2. Change the wait code for TaskRunner to note whether it is already on a
TaskPool thread, and if so, spawn another one.  However, I don't think that
fully solves the issue of having too many threads loading shared libraries,
as there is no guarantee the new worker would work on the "deepest" work.
I suppose each task would be annotated with depth, and the work could be
sorted in TaskPool though...

3. Leave a separate thread per shared library.

Thoughts?
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Parallelizing loading of shared libraries

2017-04-26 Thread Scott Smith via lldb-dev
A worker thread would call DynamicLoader::LoadModuleAtAddress.  This in
turn eventually calls SymbolFileDWARF::Index, which uses TaskRunners to
1. extracts dies for each DWARF compile unit in a separate thread
2. parse/unmangle/etc all the symbols

The code distance from DynamicLoader to SymbolFileDWARF is enough that
disallowing LoadModuleAtAddress to block seems to be a nonstarter.


On Wed, Apr 26, 2017 at 4:23 PM, Zachary Turner  wrote:

> Under what conditions would a worker thread spawn additional work to be
> run in parallel and then wait for it, as opposed to just doing it serially?
> Is it feasible to just require tasks to be non blocking?
> On Wed, Apr 26, 2017 at 4:12 PM Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> After a dealing with a bunch of microoptimizations, I'm back to
>> parallelizing loading of shared modules.  My naive approach was to just
>> create a new thread per shared library.  I have a feeling some users may
>> not like that; I think I read an email from someone who has thousands of
>> shared libraries.  That's a lot of threads :-)
>>
>> The problem is loading a shared library can cause downstream
>> parallelization through TaskPool.  I can't then also have the loading of a
>> shared library itself go through TaskPool, as that could cause a deadlock -
>> if all the worker threads are waiting on work that TaskPool needs to run on
>> a worker thread then nothing will happen.
>>
>> Three possible solutions:
>>
>> 1. Remove the notion of a single global TaskPool, but instead have a
>> static pool at each callsite that wants it.  That way multiple paths into
>> the same code would share the same pool, but different places in the code
>> would have their own pool.
>>
>> 2. Change the wait code for TaskRunner to note whether it is already on a
>> TaskPool thread, and if so, spawn another one.  However, I don't think that
>> fully solves the issue of having too many threads loading shared libraries,
>> as there is no guarantee the new worker would work on the "deepest" work.
>> I suppose each task would be annotated with depth, and the work could be
>> sorted in TaskPool though...
>>
>> 3. Leave a separate thread per shared library.
>>
>> Thoughts?
>>
>> ___
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Parallelizing loading of shared libraries

2017-04-27 Thread Scott Smith via lldb-dev
So as it turns out, at least on my platform (Ubuntu 14.04), the symbols are
loaded regardless.  I changed my test so:
1. main() just returns right away
2. cmdline is: lldb -b -o run /path/to/my/binary

and it takes the same amount of time as setting a breakpoint.

On Wed, Apr 26, 2017 at 5:00 PM, Jim Ingham  wrote:

>
> We started out with the philosophy that lldb wouldn't touch any more
> information in a shared library than we actually needed.  So when a library
> gets loaded we might need to read in and resolve its section list, but we
> won't read in any symbols if we don't need to look at them.  The idea was
> that if you did "load a binary, and run it" until the binary stops for some
> reason, we haven't done any unnecessary work.  Similarly, if all the
> breakpoints the user sets are scoped to a shared library then there's no
> need for us to read any symbols for any other shared libraries.  I think
> that is a good goal, it allows the debugger to be used in special purpose
> analysis tools w/o forcing it to pay costs that a more general purpose
> debug session might require.
>
> I think it would be hard to convert all the usages of modules to from "do
> something with a shared library" mode to "tell me you are interested in a
> shared library and give me a callback" so that the module reading could be
> parallelized on demand.  But at the very least we need to allow a mode
> where symbol reading is done lazily.
>
> The other concern is that lldb keeps the modules it reads in a global
> cache, shared by all debuggers & targets.  It is very possible that you
> could have two targets or two debuggers each with one target that are
> reading in shared libraries simultaneously, and adding them to the global
> cache.  In some of the uses that lldb has under Xcode this is actually very
> common.  So the task pool will have to be built up as things are added to
> the global shared module cache, not at the level of individual targets
> noticing the read-in of a shared library.
>
> Jim
>
>
>
> > On Apr 26, 2017, at 4:12 PM, Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
> >
> > After a dealing with a bunch of microoptimizations, I'm back to
> parallelizing loading of shared modules.  My naive approach was to just
> create a new thread per shared library.  I have a feeling some users may
> not like that; I think I read an email from someone who has thousands of
> shared libraries.  That's a lot of threads :-)
> >
> > The problem is loading a shared library can cause downstream
> parallelization through TaskPool.  I can't then also have the loading of a
> shared library itself go through TaskPool, as that could cause a deadlock -
> if all the worker threads are waiting on work that TaskPool needs to run on
> a worker thread then nothing will happen.
> >
> > Three possible solutions:
> >
> > 1. Remove the notion of a single global TaskPool, but instead have a
> static pool at each callsite that wants it.  That way multiple paths into
> the same code would share the same pool, but different places in the code
> would have their own pool.
> >
> > 2. Change the wait code for TaskRunner to note whether it is already on
> a TaskPool thread, and if so, spawn another one.  However, I don't think
> that fully solves the issue of having too many threads loading shared
> libraries, as there is no guarantee the new worker would work on the
> "deepest" work.  I suppose each task would be annotated with depth, and the
> work could be sorted in TaskPool though...
> >
> > 3. Leave a separate thread per shared library.
> >
> > Thoughts?
> >
> > ___
> > lldb-dev mailing list
> > lldb-dev@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Parallelizing loading of shared libraries

2017-04-27 Thread Scott Smith via lldb-dev
Hmm, turns out I was wrong about delayed symbol loading not working under
Linux.  I've added timings to the review.

On Thu, Apr 27, 2017 at 11:12 AM, Jim Ingham  wrote:

> Interesting.  Do you have to catch this information as the JIT modules get
> loaded, or can you recover the data after-the-fact?  For most uses, I don't
> think you need to track JIT modules as they are loaded, but it would be
> good enough to refresh the list on stop.
>
> Jim
>
>
> > On Apr 27, 2017, at 10:51 AM, Pavel Labath  wrote:
> >
> > It's the gdb jit interface breakpoint. I don't think there is a good
> > way to scope that to a library, as that symbol can be anywhere...
> >
> >
> > On 27 April 2017 at 18:35, Jim Ingham via lldb-dev
> >  wrote:
> >> Somebody is probably setting an internal breakpoint for some purpose
> w/o scoping it to the shared library it's to be found in.  Either that or
> somebody has broken lazy loading altogether.  But that's not intended
> behavior.
> >>
> >> Jim
> >>
> >>> On Apr 27, 2017, at 7:02 AM, Scott Smith 
> wrote:
> >>>
> >>> So as it turns out, at least on my platform (Ubuntu 14.04), the
> symbols are loaded regardless.  I changed my test so:
> >>> 1. main() just returns right away
> >>> 2. cmdline is: lldb -b -o run /path/to/my/binary
> >>>
> >>> and it takes the same amount of time as setting a breakpoint.
> >>>
> >>> On Wed, Apr 26, 2017 at 5:00 PM, Jim Ingham  wrote:
> >>>
> >>> We started out with the philosophy that lldb wouldn't touch any more
> information in a shared library than we actually needed.  So when a library
> gets loaded we might need to read in and resolve its section list, but we
> won't read in any symbols if we don't need to look at them.  The idea was
> that if you did "load a binary, and run it" until the binary stops for some
> reason, we haven't done any unnecessary work.  Similarly, if all the
> breakpoints the user sets are scoped to a shared library then there's no
> need for us to read any symbols for any other shared libraries.  I think
> that is a good goal, it allows the debugger to be used in special purpose
> analysis tools w/o forcing it to pay costs that a more general purpose
> debug session might require.
> >>>
> >>> I think it would be hard to convert all the usages of modules to from
> "do something with a shared library" mode to "tell me you are interested in
> a shared library and give me a callback" so that the module reading could
> be parallelized on demand.  But at the very least we need to allow a mode
> where symbol reading is done lazily.
> >>>
> >>> The other concern is that lldb keeps the modules it reads in a global
> cache, shared by all debuggers & targets.  It is very possible that you
> could have two targets or two debuggers each with one target that are
> reading in shared libraries simultaneously, and adding them to the global
> cache.  In some of the uses that lldb has under Xcode this is actually very
> common.  So the task pool will have to be built up as things are added to
> the global shared module cache, not at the level of individual targets
> noticing the read-in of a shared library.
> >>>
> >>> Jim
> >>>
> >>>
> >>>
> >>>> On Apr 26, 2017, at 4:12 PM, Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
> >>>>
> >>>> After a dealing with a bunch of microoptimizations, I'm back to
> parallelizing loading of shared modules.  My naive approach was to just
> create a new thread per shared library.  I have a feeling some users may
> not like that; I think I read an email from someone who has thousands of
> shared libraries.  That's a lot of threads :-)
> >>>>
> >>>> The problem is loading a shared library can cause downstream
> parallelization through TaskPool.  I can't then also have the loading of a
> shared library itself go through TaskPool, as that could cause a deadlock -
> if all the worker threads are waiting on work that TaskPool needs to run on
> a worker thread then nothing will happen.
> >>>>
> >>>> Three possible solutions:
> >>>>
> >>>> 1. Remove the notion of a single global TaskPool, but instead have a
> static pool at each callsite that wants it.  That way multiple paths into
> the same code would share the same pool, but different places in the code
> would have their

Re: [lldb-dev] Parallelizing loading of shared libraries

2017-04-28 Thread Scott Smith via lldb-dev
Hmmm ok, I don't like hard coding pools.  Your idea about limiting the
number of high level threads gave me an idea:

1. System has one high level TaskPool.
2. TaskPools have up to one child and one parent (the parent for the high
level TaskPool = nullptr).
3. When a worker starts up for a given TaskPool, it ensures a single child
exists.
4. There is a thread local variable that indicates which TaskPool that
thread enqueues into (via AddTask).  If that variable is nullptr, then it
is the high level TaskPool.Threads that are not workers enqueue into this
TaskPool.  If the thread is a worker thread, then the variable points to
the worker's child.
5. When creating a thread in a TaskPool, it's thread count AND the thread
count of the parent, grandparent, etc are incremented.
6. In the main worker loop, if there is no more work to do, OR the thread
count is too high, the worker "promotes" itself.  Promotion means:
a. decrement the thread count for the current task pool
b. if there is no parent, exit; otherwise, become a worker for the parent
task pool (and update the thread local TaskPool enqueue pointer).

The main points are:
1. We don't hard code the number of task pools; the code automatically uses
the fewest number of taskpools needed regardless of the number of places in
the code that want task pools.
2. When the child taskpools are busy, parent taskpools reduce their number
of workers over time to reduce oversubscription.

You can fiddle with the # of allowed threads per level; for example, if you
take into account number the height of the pool, and the number of child
threads, then you could allocate each level 1/2 of the number of threads as
the level below it, unless the level below wasn't using all the threads;
then the steady state would be 2 * cores, rather than height * cores.  I
think that it probably overkill though.


On Fri, Apr 28, 2017 at 4:37 AM, Pavel Labath  wrote:

> On 27 April 2017 at 00:12, Scott Smith via lldb-dev
>  wrote:
> > After a dealing with a bunch of microoptimizations, I'm back to
> > parallelizing loading of shared modules.  My naive approach was to just
> > create a new thread per shared library.  I have a feeling some users may
> not
> > like that; I think I read an email from someone who has thousands of
> shared
> > libraries.  That's a lot of threads :-)
> >
> > The problem is loading a shared library can cause downstream
> parallelization
> > through TaskPool.  I can't then also have the loading of a shared library
> > itself go through TaskPool, as that could cause a deadlock - if all the
> > worker threads are waiting on work that TaskPool needs to run on a worker
> > thread then nothing will happen.
> >
> > Three possible solutions:
> >
> > 1. Remove the notion of a single global TaskPool, but instead have a
> static
> > pool at each callsite that wants it.  That way multiple paths into the
> same
> > code would share the same pool, but different places in the code would
> have
> > their own pool.
> >
>
> I looked at this option in the past and this was my preferred
> solution. My suggestion would be to have two task pools. One for
> low-level parallelism, which spawns
> std::thread::hardware_concurrency() threads, and another one for
> higher level tasks, which can only spawn a smaller number of threads
> (the algorithm for the exact number TBD). The high-level threads can
> access to low-level ones, but not the other way around, which
> guarantees progress.
>
> I propose to hardcode 2 pools, as I don't want to make it easy for
> people to create additional ones -- I think we should be having this
> discussion every time someone tries to add one, and have a very good
> justification for it (FWIW, I think your justification is good in this
> case, and I am grateful that you are pursuing this).
>
> pl
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Parallelizing loading of shared libraries

2017-04-30 Thread Scott Smith via lldb-dev
The overall concept is similar; it comes down to implementation details like
1. llvm doesn't have a global pool, it's probably instantiated on demand
2. llvm keeps threads around until the pool is destroyed, rather than
letting the threads exit when they have nothing to do
3. llvm starts up all the threads immediately, rather than on demand.

Overall I like the current lldb version better than the llvm version, but I
haven't examined any of the use cases of the llvm version to know whether
it could be dropped in without issue.  However, neither does what I want,
so I'll move forward prototyping what I think it should do, and then see
how applicable it is to llvm.

On Sun, Apr 30, 2017 at 9:02 PM, Zachary Turner  wrote:

> Have we examined llvm::ThreadPool to see if it can work for our needs?
> And if not, what kind of changes would be needed to llvm::ThreadPool to
> make it suitable?
>
> On Fri, Apr 28, 2017 at 8:04 AM Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> Hmmm ok, I don't like hard coding pools.  Your idea about limiting the
>> number of high level threads gave me an idea:
>>
>> 1. System has one high level TaskPool.
>> 2. TaskPools have up to one child and one parent (the parent for the high
>> level TaskPool = nullptr).
>> 3. When a worker starts up for a given TaskPool, it ensures a single
>> child exists.
>> 4. There is a thread local variable that indicates which TaskPool that
>> thread enqueues into (via AddTask).  If that variable is nullptr, then it
>> is the high level TaskPool.Threads that are not workers enqueue into this
>> TaskPool.  If the thread is a worker thread, then the variable points to
>> the worker's child.
>> 5. When creating a thread in a TaskPool, it's thread count AND the thread
>> count of the parent, grandparent, etc are incremented.
>> 6. In the main worker loop, if there is no more work to do, OR the thread
>> count is too high, the worker "promotes" itself.  Promotion means:
>> a. decrement the thread count for the current task pool
>> b. if there is no parent, exit; otherwise, become a worker for the parent
>> task pool (and update the thread local TaskPool enqueue pointer).
>>
>> The main points are:
>> 1. We don't hard code the number of task pools; the code automatically
>> uses the fewest number of taskpools needed regardless of the number of
>> places in the code that want task pools.
>> 2. When the child taskpools are busy, parent taskpools reduce their
>> number of workers over time to reduce oversubscription.
>>
>> You can fiddle with the # of allowed threads per level; for example, if
>> you take into account number the height of the pool, and the number of
>> child threads, then you could allocate each level 1/2 of the number of
>> threads as the level below it, unless the level below wasn't using all the
>> threads; then the steady state would be 2 * cores, rather than height *
>> cores.  I think that it probably overkill though.
>>
>>
>> On Fri, Apr 28, 2017 at 4:37 AM, Pavel Labath  wrote:
>>
>>> On 27 April 2017 at 00:12, Scott Smith via lldb-dev
>>>  wrote:
>>> > After a dealing with a bunch of microoptimizations, I'm back to
>>> > parallelizing loading of shared modules.  My naive approach was to just
>>> > create a new thread per shared library.  I have a feeling some users
>>> may not
>>> > like that; I think I read an email from someone who has thousands of
>>> shared
>>> > libraries.  That's a lot of threads :-)
>>> >
>>> > The problem is loading a shared library can cause downstream
>>> parallelization
>>> > through TaskPool.  I can't then also have the loading of a shared
>>> library
>>> > itself go through TaskPool, as that could cause a deadlock - if all the
>>> > worker threads are waiting on work that TaskPool needs to run on a
>>> worker
>>> > thread then nothing will happen.
>>> >
>>> > Three possible solutions:
>>> >
>>> > 1. Remove the notion of a single global TaskPool, but instead have a
>>> static
>>> > pool at each callsite that wants it.  That way multiple paths into the
>>> same
>>> > code would share the same pool, but different places in the code would
>>> have
>>> > their own pool.
>>> >
>>>
>>> I looked at this option in the past and this was my preferred
>>> solution. My suggestion would be to have two task pools. One for
>>> low-leve

Re: [lldb-dev] Parallelizing loading of shared libraries

2017-05-01 Thread Scott Smith via lldb-dev
On Mon, May 1, 2017 at 2:42 PM, Pavel Labath  wrote:

> Besides, hardcoding the nesting logic into "add" is kinda wrong.
> Adding a task is not the problematic operation, waiting for the result
> of one is. Granted, generally these happen on the same thread, but
> they don't have to be -- you can write a continuation-style
> computation, where you do a bit of work, and then enqueue a task to do
> the rest. This would create an infinite pool depth here.
>

True, but that doesn't seem to be the style of code here.  If it were you
wouldn't need multiple pools, since you'd just wait for the callback that
your work was done.


>
> Btw, are we sure it's not possible to solve this with just one thread
> pool. What would happen if we changed the implementation of "wait" so
> that if the target task is not scheduled yet, we just go ahead an
> compute it on our thread? I haven't thought through all the details,
> but is sounds like this could actually give better performance in some
> scenarios...
>

My initial reaction was "that wouldn't work, what if you ran another posix
dl load?"  But then I suppose *it* would run more work, and eventually
you'd run a leaf task and finish something.

You'd have to make sure your work could be run regardless of what mutexes
the caller already had (since you may be running work for another
subsystem), but that's probably not too onerous, esp given how many
recursive mutexes lldb uses..
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Parallelizing loading of shared libraries

2017-05-01 Thread Scott Smith via lldb-dev
IMO we should start with proving a better version in the lldb codebase, and
then work on pushing it upstream.  I have found much more resistance
getting changes in to llvm than lldb, and for good reason - more projects
depend on llvm than lldb.


On Mon, May 1, 2017 at 9:48 PM, Zachary Turner  wrote:

> I would still very much prefer we see if there is a way we can adapt
> LLVM's ThreadPool class to be suitable for our needs.  Unless some
> fundamental aspect of its design results in unacceptable performance for
> our needs, I think we should just use it and not re-invent another one.  If
> there are improvements to be made, let's make them there instead of in LLDB
> so that other LLVM users can benefit.
>
> On Mon, May 1, 2017 at 2:58 PM Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> On Mon, May 1, 2017 at 2:42 PM, Pavel Labath  wrote:
>>
>>> Besides, hardcoding the nesting logic into "add" is kinda wrong.
>>> Adding a task is not the problematic operation, waiting for the result
>>> of one is. Granted, generally these happen on the same thread, but
>>> they don't have to be -- you can write a continuation-style
>>> computation, where you do a bit of work, and then enqueue a task to do
>>> the rest. This would create an infinite pool depth here.
>>>
>>
>> True, but that doesn't seem to be the style of code here.  If it were you
>> wouldn't need multiple pools, since you'd just wait for the callback that
>> your work was done.
>>
>>
>>>
>>> Btw, are we sure it's not possible to solve this with just one thread
>>> pool. What would happen if we changed the implementation of "wait" so
>>> that if the target task is not scheduled yet, we just go ahead an
>>> compute it on our thread? I haven't thought through all the details,
>>> but is sounds like this could actually give better performance in some
>>> scenarios...
>>>
>>
>> My initial reaction was "that wouldn't work, what if you ran another
>> posix dl load?"  But then I suppose *it* would run more work, and
>> eventually you'd run a leaf task and finish something.
>>
>> You'd have to make sure your work could be run regardless of what mutexes
>> the caller already had (since you may be running work for another
>> subsystem), but that's probably not too onerous, esp given how many
>> recursive mutexes lldb uses..
>> ___
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Lack of parallelism

2017-05-02 Thread Scott Smith via lldb-dev
I've been trying to improve the parallelism of lldb but have run into an
odd roadblock.  I have the code at the point where it creates 40 worker
threads, and it stays that way because it has enough work to do.  However,
running 'top -d 1' shows that for the time in question, cpu load never gets
above 4-8 cpus (even though I have 40).

1. I tried mutrace, which measures mutex contention (I had to call
unsetenv("LD_PRELOAD") in main() so it wouldn't propagate to the process
being tested).  It indicated some minor contention, but not enough to be
the problem.  Regardless, I converted everything I could to lockfree
structures (TaskPool and ConstString) and it didn't help.

2. I tried strace, but I don't think strace can figure out how to trace
lldb.  It says it waits on a single futex for 8 seconds, and then is done.

I'm about to try lttng to trace all syscalls, but I was wondering if anyone
else had any ideas?  At one point I wondered if it was mmap kernel
semaphore contention, but that shouldn't affect faulting individual pages,
and I assume lldb doesn't call mmap all the time.

I'm getting a bit frustrated because lldb should be taking 1-2 seconds to
start up (it has ~45s of user+system work to do), but instead is taking
8-10, and I've been stuck there for a while.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Parallelizing loading of shared libraries

2017-05-02 Thread Scott Smith via lldb-dev
LLDB has TaskRunner and TaskPool.  TaskPool is nearly the same as
llvm::ThreadPool.  TaskRunner itself is a layer on top, though, and doesn't
seem to have an analogy in llvm.  Not that I'm defending TaskRunner

I have written a new one called TaskMap.  The idea is that if all you want
is to call a lambda over the values 0 .. N-1, then it's more efficient to
use std::atomic rather than various std::function with std::future
and std::bind and std::. for each work item.  It is also a layer on top
of TaskPool, so it'd be easy to port to llvm::ThreadPool if that's how we
end up going. It ends up reducing lock contention within TaskPool without
needing to fall back on a lockfree queue.

On Tue, May 2, 2017 at 6:44 AM, Zachary Turner  wrote:

> Fwiw I haven't even followed the discussion closely enough to know what
> the issues with the lldb task runner even are.
>
> My motivation is simple though: don't reinvent the wheel.
>
> Iirc LLDB task runner was added before llvm's thread pool existed (I
> haven't checked, so i may be wrong about this). If that's the case, I would
> just assume replace all existing users of lldb task runner with llvm's as
> well and delete lldb's
>
> Regarding the issue with making debugging harder, llvm has functions to
> set thread name now. We could name all threadpool threads
> On Tue, May 2, 2017 at 3:05 AM Pavel Labath via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> On 1 May 2017 at 22:58, Scott Smith  wrote:
>> > On Mon, May 1, 2017 at 2:42 PM, Pavel Labath  wrote:
>> >>
>> >> Besides, hardcoding the nesting logic into "add" is kinda wrong.
>> >> Adding a task is not the problematic operation, waiting for the result
>> >> of one is. Granted, generally these happen on the same thread, but
>> >> they don't have to be -- you can write a continuation-style
>> >> computation, where you do a bit of work, and then enqueue a task to do
>> >> the rest. This would create an infinite pool depth here.
>> >
>> >
>> > True, but that doesn't seem to be the style of code here.  If it were
>> you
>> > wouldn't need multiple pools, since you'd just wait for the callback
>> that
>> > your work was done.
>> >
>> >>
>> >>
>> >> Btw, are we sure it's not possible to solve this with just one thread
>> >> pool. What would happen if we changed the implementation of "wait" so
>> >> that if the target task is not scheduled yet, we just go ahead an
>> >> compute it on our thread? I haven't thought through all the details,
>> >> but is sounds like this could actually give better performance in some
>> >> scenarios...
>> >
>> >
>> > My initial reaction was "that wouldn't work, what if you ran another
>> posix
>> > dl load?"  But then I suppose *it* would run more work, and eventually
>> you'd
>> > run a leaf task and finish something.
>> >
>> > You'd have to make sure your work could be run regardless of what
>> mutexes
>> > the caller already had (since you may be running work for another
>> > subsystem), but that's probably not too onerous, esp given how many
>> > recursive mutexes lldb uses..
>>
>> Is it any worse that if the thread got stuck in the "wait" call? Even
>> with a dead-lock-free thread pool the task at hand still would not be
>> able to make progress, as the waiter  would hold the mutex even while
>> blocked (and recursiveness will not save you here).
>>
>> >
>> > I think that's all the more reason we *should* work on getting
>> something into LLVM first.  Anything we already have in LLDB, or any
>> modifications we make will likely not be pushed up to LLVM, especially
>> since LLVM already has a ThreadPool, so any changes you make to LLDB's
>> thread pool will likely have to be re-written when trying to get it to
>> LLVM.  And since, as you said, more projects depend on LLVM than LLDB,
>> there's a good chance that the baseline you'd be starting from when making
>> improvements is more easily adaptable to what you want to do.  LLDB has a
>> long history of being shy of making changes in LLVM where appropriate, and
>> myself and others have started pushing back on that more and more, because
>> it accumulates long term technical debt.
>> > In my experience, "let's just get it into LLDB first and then work on
>> getting it up to LLVM later" ends up being "well, it's in LLDB now, so
>> since my immediate problem is solved I may or may not have time to revisit
>> this in the future"  (even if the original intent is sincere).
>> > If there is some resistance getting changes into LLVM, feel free to add
>> me as a reviewer, and I can find the right people to move it along.  I'd
>> still like to at least hear a strong argument for why the existing
>> implementation in LLVM is unacceptable for what we need.  I'm ok with "non
>> optimal".  Unless it's "unsuitable", we should start there and make
>> incremental improvements.
>>
>> I think we could solve our current problem by just having two global
>> instances of llvm::ThreadPool. The only issue I have with that is that
>> I will then have 

Re: [lldb-dev] Lack of parallelism

2017-05-02 Thread Scott Smith via lldb-dev
As it turns out, it was lock contention in the memory allocator.  Using
tcmalloc brought it from 8+ seconds down to 4.2.

I think this didn't show up in mutrace because glibc's malloc doesn't use
pthread mutexes.

Greg, that joke about adding tcmalloc wholesale is looking less funny and
more serious  Or maybe it's enough to make it a cmake link option (use
if present or use if requested).

On Tue, May 2, 2017 at 8:42 AM, Jim Ingham  wrote:

> I'm not sure about Linux, on OS X lldb will mmap the debug information
> rather that using straight reads.  But that should just be once per loaded
> module.
>
> Jim
>
> > On May 2, 2017, at 8:09 AM, Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
> >
> > I've been trying to improve the parallelism of lldb but have run into an
> odd roadblock.  I have the code at the point where it creates 40 worker
> threads, and it stays that way because it has enough work to do.  However,
> running 'top -d 1' shows that for the time in question, cpu load never gets
> above 4-8 cpus (even though I have 40).
> >
> > 1. I tried mutrace, which measures mutex contention (I had to call
> unsetenv("LD_PRELOAD") in main() so it wouldn't propagate to the process
> being tested).  It indicated some minor contention, but not enough to be
> the problem.  Regardless, I converted everything I could to lockfree
> structures (TaskPool and ConstString) and it didn't help.
> >
> > 2. I tried strace, but I don't think strace can figure out how to trace
> lldb.  It says it waits on a single futex for 8 seconds, and then is done.
> >
> > I'm about to try lttng to trace all syscalls, but I was wondering if
> anyone else had any ideas?  At one point I wondered if it was mmap kernel
> semaphore contention, but that shouldn't affect faulting individual pages,
> and I assume lldb doesn't call mmap all the time.
> >
> > I'm getting a bit frustrated because lldb should be taking 1-2 seconds
> to start up (it has ~45s of user+system work to do), but instead is taking
> 8-10, and I've been stuck there for a while.
> >
> > ___
> > lldb-dev mailing list
> > lldb-dev@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Lack of parallelism

2017-05-02 Thread Scott Smith via lldb-dev
On Tue, May 2, 2017 at 12:43 PM, Greg Clayton  wrote:

> The other thing would be to try and move the demangler to use a custom
> allocator everywhere. Not sure what demangler you are using when you are
> doing these tests, but we can either use the native system one from
> the #include , or the fast demangler in FastDemangle.cpp. If it
> is the latter, then we can probably optimize this.
>

I'm using the demangler I modified here: https://reviews.llvm.org/D32500
I think it still starts with FastDemangle.cpp, but one test showed the
modified llvm demangler is almost as fast (~1.25% slow down by disabling
FastDemangle).  I might be able to narrow that further by putting the
initial arena on the stack.

Now that I moved past the parallelism bottleneck, I think I need to revisit
my changes to make sure they're having the desired effect.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] OperatingSystem plugins

2017-05-04 Thread Scott Smith via lldb-dev
I would like to change the list of threads that lldb presents to the user
for an internal application (not to be submitted upstream).  It seems the
right way to do this is to write an OperatingSystem plugin.

1. Can I still make it so the user can see real threads as well as whatever
other "threads" I make up?

2. Is the purpose of the Python OperatingSystem plugin to allow the user to
write plugins in Python?  It doesn't look like it's to help debugging of
Python programs.

2a. If that's true, is there a reason the Go OperatingSystem plugin is
written in C++ instead of Python?  Is it just historical, or is there some
advantage to writing it in C++?

3. Does this work just as well when dealing with core files as when dealing
with a running process?
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Setting shared library search paths and core files

2017-05-04 Thread Scott Smith via lldb-dev
Before I dive into the code to see if there's a bug, I wanted to see if I
was just doing it wrong.

I have an application with a different libc, etc than the machine I'm
running the debugger on.  The application also has a bunch of libraries
that simply don't exist in the normal location on my dev machine.  I do
have everything extracted in a subdirectory with proper relative paths
(i.e. my_extract/lib/..., my_extract/opt/..., my_extract/usr/..., etc).

With gdb, I'd do something like:

set sysroot .
file opt/my_cool_program
core my_broken_coredump

then everything would work.

I've tried (
http://lists.llvm.org/pipermail/lldb-dev/2016-January/009233.html):

platform select --sysroot . host  (also tried remote-linux, that didn't
work either)
target create opt/my_cool_program --core my_broken_coredump

or based on:
http://lists.llvm.org/pipermail/lldb-dev/2016-January/009235.html

setting set target.exec-search-paths .
target create opt/my_cool_program --core my_broken_coredump

or, based on:
http://lists.llvm.org/pipermail/lldb-dev/2016-January/009236.html

target create opt/my_cool_program --core my_broken_coredump
target modules search-paths add /lib ./lib
...

None of them seem to work.  I tried lldb-3.9 in case any recent changes
affected this functionality.

Is there a more correct way to do this?  Or does this seem like a bug?
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [llvm-dev] RFC: Cleaning up the Itanium demangler

2017-06-22 Thread Scott Smith via lldb-dev
When I looked at demangler performance, I was able to make significant
improvements to the llvm demangler.  At that point removing lldb's fast
demangler didn't hurt performance very much, but the fast demangler was
still faster.  I forget (and apparently didn't write down) how much it
mattered, but post this change I think was single digit %.

https://reviews.llvm.org/D32500


On Thu, Jun 22, 2017 at 11:07 AM, Jim Ingham via lldb-dev <
lldb-dev@lists.llvm.org> wrote:

> This is Greg's area, he'll be able to answer in detail how the name
> chopper gets used.  IIRC it chops demangled names, so it is indirectly a
> client of the demangler, but it doesn't use the demangler to do this
> directly.  Name lookup is done by finding all the base name matches, then
> comparing the context.  We don't do a very good job of doing fuzzy full
> name matches - for instance when trying to break on one overload you have
> to get the arguments exactly as the demangler would produce them.  We could
> do some more heuristics here (remove all the spaces you can before
> comparison, etc.) though it would be even easier if we had something that
> could tokenize names - both mangled & natural.
>
> The Swift demangler produces a node tree for the demangled elements of a
> name which is very handy on the Swift side.  A long time ago Greg
> experimented with such a thing for the C++ demangler, but it ended up being
> too slow.
>
> On that note, the demangler is a performance bottleneck for lldb.  Going
> to the fast demangler over the system one was a big performance win.  Maybe
> the system demangler is fast enough nowadays, but if it isn't then we can't
> get rid of the FastDemangler.
>
> Jim
>
> > On Jun 22, 2017, at 8:08 AM, Pavel Labath via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
> >
> > On 22 June 2017 at 15:21, Erik Pilkington 
> wrote:
> >>
> >>
> >>
> >> On June 22, 2017 at 5:51:39 AM, Pavel Labath (lab...@google.com) wrote:
> >>
> >> I don't have any concrete feedback, but:
> >>
> >> - +1 for removing the "FastDemagler"
> >>
> >> - If you already construct an AST as a part of your demangling
> >> process, would it be possible to export that AST for external
> >> consumption somehow? Right now in lldb we sometimes need to parse the
> >> demangled name (to get the "basename" of a function for example), and
> >> the code for doing that is quite ugly. It would be much nicer if we
> >> could just query the parsed representation of the name somehow, and
> >> the AST would enable us to do that.
> >>
> >>
> >> I was thinking about this use case a little, actually. I think it makes
> more
> >> sense to provide a function, say getItaniumDemangledBasename(), which
> could
> >> just parse and query the AST for the base name (the AST already has an
> way
> >> of doing this). This would allow the demangler to bail out if it knows
> that
> >> the rest of the input string isn’t relevant, i.e., we could bail out
> after
> >> parsing the ‘foo’ in _Z3fooiii. That, and not having to print out
> the
> >> AST should make parsing the base name significantly faster on top of
> this.
> >>
> >> Do you have any other use case for the AST outside of base names? It
> still
> >> would be possible to export it from ItaniumDemangle.
> >>
> >
> > Well.. the current parser chops the name into "basename", "context",
> > "arguments", and "qualifiers" part. All of them seem to be used right
> > now, but I don't know e.g. how unavoidable that is. I know about this
> > because I was fixing some bugs there, but I am actually not that
> > familiar with this part of LLDB. I am cc-ing lldb-dev if they have any
> > thoughts on this. We also have the ability to set breakpoints by
> > providing just a part of the context (e.g. "breakpoint set -n
> > foo::bar" even though the full function name is baz::booze::foo::bar),
> > but this seems to be implemented in some different way.
> >
> > I don't think having the ability to short-circuit the demangling would
> > bring as any speed benefit, at least not without a major refactor, as
> > we demangle all the names anyway. Even the AST solution will probably
> > require a fair deal of plumbing on our part to make it useful.
> >
> > Also, any custom-tailored solution will probably make it hard to
> > retrieve any additional info, should we later need it, so I'd be in
> > favor of the AST solution. (I don't know how much it would complicate
> > the implementation though).
> > ___
> > lldb-dev mailing list
> > lldb-dev@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
> ___
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev