Re: [lldb-dev] RFC: Moving debug info parsing out of process

2019-03-02 Thread Adrian Prantl via lldb-dev


> On Feb 25, 2019, at 10:21 AM, Zachary Turner via lldb-dev 
>  wrote:
> 
> Hi all,
> 
> We've got some internal efforts in progress, and one of those would benefit 
> from debug info parsing being out of process (independently of whether or not 
> the rest of LLDB is out of process).
> 
> There's a couple of advantages to this, which I'll enumerate here:
> It improves one source of instability in LLDB which has been known to be 
> problematic -- specifically, that debug info can be bad and handling this can 
> often be difficult and bring down the entire debug session.  While other 
> efforts have been made to address stability by moving things out of process, 
> they have not been upstreamed, and even if they had I think we would still 
> want this anyway, for reasons that follow.
Where do you draw the line between debug info and the in-process part of LLDB? 
I'm asking because I have never seen the mechanical parsing of DWARF to be a 
source of instability; most crashes in LLDB are when reconstructing Clang ASTs 
because we're breaking some subtle and badly enforced invariants in Clang's 
Sema. Perhaps parsing PDBs is less stable? If you do mean at the AST level then 
I agree with the sentiment that it is a common source of crashes, but I don't 
see a good way of moving that component out of process. Serializing ASTs or 
types in general is a hard problem, and I'd find the idea of inventing yet 
another serialization format for types that we would have to develop, test, and 
maintain quite scary.
> It becomes theoretically possible to move debug info parsing not just to 
> another process, but to another machine entirely.  In a broader sense, this 
> decouples the physical debug info location (and for that matter, 
> representation) from the debugger host.
I can see how that can be useful in some settings. You'd need a really low 
latency network connection to make interactive debugging work but I expect 
you've got that covered :-)
> It becomes testable as an independent component, because you can just send 
> requests to it and dump the results and see if they make sense.  Currently 
> there is almost zero test coverage of this aspect of LLDB apart from what you 
> can get after going through many levels of indirection via spinning up a full 
> debug session and doing things that indirectly result in symbol queries.
You are right that the type system debug info ingestion and AST reconstruction 
is primarily tested end-to-end.

> The big win here, at least from my point of view, is the second one.  
> Traditional symbol servers operate by copying entire symbol files (DSYM, DWP, 
> PDB) from some machine to the debugger host.  These can be very large -- 
> we've seen 12+ GB in some cases -- which ranges from "slow bandwidth hog" to 
> "complete non-starter" depending on the debugger host and network. 

12 GB sounds suspiciously large. Do you know how this breaks down between line 
table, types, and debug locations? If it's types, are you deduplicating them? 
For comparison, the debug info of LLDB (which contains two compilers and a 
debugger) compresses to under 500MB, but perhaps the binaries you are working 
with are really just that much larger.

> In this kind of scenario, one could theoretically run the debug info process 
> on the same NAS, cloud, or whatever as the symbol server.  Then, rather than 
> copying over an entire symbol file, it responds only to the query you issued 
> -- if you asked for a type, it just returns a packet describing the type you 
> requested.
> 
> The API itself would be stateless (so that you could make queries for 
> multiple targets in any order) as well as asynchronous (so that responses 
> might arrive out of order).  Blocking could be implemented in LLDB, but 
> having the server be asynchronous means multiple clients could connect to the 
> same server instance.  This raises interesting possibilities.  For example, 
> one can imagine thousands of developers connecting to an internal symbol 
> server on the network and being able to debug remote processes or core dumps 
> over slow network connections or on machines with very little storage (e.g. 
> chromebooks).

You *could* just run LLDB remotely ;-)

That sounds all cool, but in my opinion you are leaving out the really 
important part: what is the abstraction level of the API going to be?

To be blunt, I'm against inventing yet another serialization format for *types* 
not just because of the considerable engineering effort it will take to get 
this right, but also because of the maintenance burden it would impose. We 
already have to support loading types from DWARF, PDB, Clang modules, the 
Objective-C runtime, Swift modules, and probably more sources, all of these 
operate to some degree at different levels of abstraction. Adding another 
source or abstraction layer into the mix needs to be really well thought out 
and justified.

> On the LLDB side, all of this is hidden behind the SymbolFile interface, 

Re: [lldb-dev] RFC: Moving debug info parsing out of process

2019-03-02 Thread Davide Italiano via lldb-dev
On Sat, Mar 2, 2019 at 2:56 PM Adrian Prantl via lldb-dev
 wrote:
>
>
>
> On Feb 25, 2019, at 10:21 AM, Zachary Turner via lldb-dev 
>  wrote:
>
> Hi all,
>
> We've got some internal efforts in progress, and one of those would benefit 
> from debug info parsing being out of process (independently of whether or not 
> the rest of LLDB is out of process).
>
> There's a couple of advantages to this, which I'll enumerate here:
>
> It improves one source of instability in LLDB which has been known to be 
> problematic -- specifically, that debug info can be bad and handling this can 
> often be difficult and bring down the entire debug session.  While other 
> efforts have been made to address stability by moving things out of process, 
> they have not been upstreamed, and even if they had I think we would still 
> want this anyway, for reasons that follow.
>
> Where do you draw the line between debug info and the in-process part of 
> LLDB? I'm asking because I have never seen the mechanical parsing of DWARF to 
> be a source of instability;

We recently ran some testing and found lldb crashing while parsing
DWARF (or, sometimes, failing to parse allegedly valid DWARF and
returning some default constructed object and crashing later on). See,
e.g. https://bugs.llvm.org/show_bug.cgi?id=40827
Qirun did his testing on Linux, FWIW. I would like to point out that
the problems we ended up finding test some less stressed (but IMHO,
equally important configurations, namely older compiler(s) [clang
3.8/clang 4.0/clang 5.0 etc..] and optimized code (-O1/-O2/-O3/-Os)].


--
Davide
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Remote debugging a docker process

2019-03-02 Thread Mason Kramer via lldb-dev

Greetings and salutations!

I am trying to remotely debug a process running inside of a Docker 
container. I can connect to lldb-server from my host, but can't launch a 
debugging process.  I can debug the target locally, inside or outside of 
the container.


Container:

lldb-server-4.0 platform --verbose --listen "*:5000"
Connection established.

Host:

$ lldb

(lldb) target create target/debug/hist

(lldb) platform connect connect://localhost:5000
  Platform: remote-linux
    Triple: x86_64-pc-linux-gnu
OS Version: 4.15.0 (4.15.0-45-generic)
    Kernel: #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019
  Hostname: 4ce058c8dba3
 Connected: yes
WorkingDir: /seraphim
(lldb) run
error: connect remote failed (Connection refused)
error: process launch failed: Connection refused

Docker is a containerization system that sandboxes the processes it 
manages in various ways. Processes inside of the container are running 
on a virtualized network stack and do not know the IP address of their 
host and cannot communicate to the outside except on "published" ports, 
for instance.


A mail [1] on this list dating to 2017 suggested the problem is that the 
gdbserver child process spawned by `process launch` can't talk to Docker 
over the firewall. However, I don't believe it's the issue - or at 
least, not the only one.


I isolated this problem by binding 5000 and 5001 in the container to the 
same values in the host. Then, I restricted the acceptable range of 
gdbserver ports to just 5001, using the flags suggested in the email.


lldb-server-4.0 platform --verbose --listen "*:5000" 
--min-gdbserver-port 5001 --max-gdbserver-port 5001


This had no apparent effect.

I also found a pull request [2] from 2018 which suggested that the 
problem is the virtualized IP address in the container. That is a 
promising direction, but unfortunately, that patch was abandoned and 
nothing took its place.


There are traces of this issue all over the net, but none of them that I 
have found were ever resolved. I think that remote-debugging a Docker 
container is an increasingly important use-case for lldb-remote, and if 
anyone is interested in this, I'm happy to work with you to hammer it out.


Unanswered Stack Overflow [3]

A bug filed on Swift's tracker [4]

[1] http://lists.llvm.org/pipermail/lldb-dev/2017-February/012004.html

[2] https://reviews.llvm.org/D42845

[3] 
https://stackoverflow.com/questions/45533026/remote-lldb-debugging-docker-container


[4] https://bugs.swift.org/browse/SR-3596?attachmentViewMode=list

___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev