Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB

Jonas Devlieghere via lldb-dev Tue, 10 Nov 2020 13:02:00 -0800

> On Nov 10, 2020, at 12:58 PM, Zdenek Prikryl <prik...@codasip.com> wrote:
> 
> Hi all,
> 
> Just for the record, we have successfully implemented the wrapping of addr_t 
> into a class to support multiple address spaces. The info about address space 
> is stored in the ELF file, so we get the info from ELF parser and then pass 
> it to the rest of the system. CLI/MI interface has been extended as well, so 
> user can select with address space he wants for memory printing. Similarly, 
> we patched expression evaluation, disassembler, etc.


That's really interesting, I'm excited to hear that this is feasible and has 
been done before. Is this code available publicly and/or is this something 
you'd be willing to upstream (with our help)? 

> 
> If the address wrap is part of the upstream version, it will be awesome :-)...
> 
> Best regards.
> 
> On 10/20/20 9:30 PM, Ted Woodward via lldb-dev wrote:
>> I agree with Pavel about the larger picture - we need to know the driver 
>> behind address spaces before we can discuss a workable solution.
>> 
>> I've dealt with 2 use cases - Harvard architecture cores, and low level 
>> hardware debugging.
>> 
>> A Harvard architecture core has separate instruction and data memories. 
>> These often use the same addresses, so to distinguish between them you need 
>> address spaces. The Motorola DSP56300 had 1 program and 2 data memories, 
>> called p, x and y. p:100, x:100 and y:100 were all separate memories, so 
>> "address 100" isn't enough to get what the user needed to see.
>> 
>> For low level hardware debugging (often using JTAG), many devices let you 
>> access memories in ways like "virtual using the TLB", or "virtual == 
>> physical, through the core", or "physical, through the SoC, not cached". 
>> Memory spaces, done right, can give the user the flexibility to pick how to 
>> view memory.
>> 
>> 
>> Are these the use cases you were envisioning, Jonas?
>> 
>>> -----Original Message-----
>>> From: lldb-dev <lldb-dev-boun...@lists.llvm.org> On Behalf Of Pavel Labath
>>> via lldb-dev
>>> Sent: Tuesday, October 20, 2020 12:51 PM
>>> To: Jonas Devlieghere <jo...@devlieghere.com>; LLDB <lldb-
>>> d...@lists.llvm.org>
>>> Subject: [EXT] Re: [lldb-dev] [RFC] Segmented Address Space Support in
>>> LLDB
>>> 
>>> There's a lot of things that are unclear to me about this proposal. The
>>> mechanics of representing an segmented address are one thing, but I I think
>>> that the really interesting part will be the interaction with the rest of 
>>> lldb. Like
>>> - What's going to be the source of this address space information? Is it 
>>> going
>>> to be statically baked into lldb (a function of the target architecture?), 
>>> or
>>> dynamically retrieved from the target or platform we're debugging? How
>>> would that work?
>>> - How is this going to interact with Object/SymbolFile classes? Are you
>>> expecting to use existing object and symbol formats for address space
>>> information, or some custom ones? AFAIK, none of the existing formats
>>> actually support encoding address space information (though that hasn't
>>> stopped people from trying).
>>> 
>>> Without understanding the bigger picture it's hard for me to say whether the
>>> proposed large scale refactoring is a good idea. Nonetheless, I am doubtful 
>>> of
>>> the viability of that approach. Some of my reasons for that are:
>>> - not all addr_ts represent an actual address -- sometimes that is a 
>>> difference
>>> between two addresses, which still uses addr_t, as that's guaranteed to fit.
>>> - relatedly to that, there is a difference (I'd expect) between the 
>>> operations
>>> supported by the two types. addr_t supports all integral operations (though 
>>> I
>>> hope we don't use all of them), but I wouldn't expect to be able to do the
>>> same with a SegmentedAddress. For one, I'd expect it wouldn't be possible
>>> to add two SegmentedAddresses together (which is possible for addr_t).
>>> OTOH, adding a SegmentedAddress and an addr_t would probably be fine?
>>> Would subtracting two SegmentedAddresses should result in an addr_t? But
>>> only if they have matching address spaces (and assert otherwise)?
>>> - I'd also be worried about over-generalizing specialized code which can
>>> afford to work with plain addresses, and where the added address space
>>> would be a nuisance (or a source of bugs). E.g. ELF has no notion of address
>>> space, so I don't think I'd find it helpful to replace all plain integer 
>>> calculations
>>> in elf parsing code with something more complex.
>>> (I'm aware that some people are using elf to encode address space
>>> information, but this is a pretty nonstandard extension, and it'd take more
>>> than type substitution to support anything like that.)
>>> - large scale refactorings are very much not the norm in llvm
>>> 
>>> 
>>> 
>>> On 19/10/2020 23:56, Jonas Devlieghere via lldb-dev wrote:
>>>> We want to support segmented address spaces in LLDB. Currently, all of
>>>> LLDB’s external API, command line interface, and internals assume that
>>>> an address in memory can be addressed unambiguously as an addr_t (aka
>>>> uint64_t). To support a segmented address space we’d need to extend
>>>> addr_t with a discriminator (an aspace_t) to uniquely identify a
>>>> location in memory. This RFC outlines what would need to change and
>>>> how we propose to do that.
>>>> 
>>>> ### Addresses in LLDB
>>>> 
>>>> Currently, LLDB has two ways of representing an address:
>>>> 
>>>>   - Address object. Mostly represents addresses as Section+offset for
>>>> a binary image loaded in the Target. An Address in this form can
>>>> persist across executions, e.g. an address breakpoint in a binary
>>>> image that loads at a different address every execution. An Address
>>>> object can represent memory not mapped to a binary image. Heap, stack,
>>>> jitted items, will all be represented as the uint64_t load address of
>>>> the object, and cannot persist across multiple executions. You must
>>>> have the Target object available to get the current load address of an
>>>> Address object in the current process run. Some parts of lldb do not
>>>> have a Target available to them, so they require that the Address can
>>>> be devolved to an addr_t (aka uint64_t) and passed in.
>>>>   - The addr_t (aka uint64_t) type. Primarily used when receiving
>>>> input (e.g. from a user on the command line) or when interacting with
>>>> the inferior (reading/writing memory) for addresses that need not
>>>> persist across runs. Also used when reading DWARF and in our symbol
>>>> tables to represent file offset addresses, where the size of an
>>>> Address object would be objectionable.
>>>> 
>>>> ## Proposal
>>>> 
>>>> ### Address + ProcessAddress
>>>> 
>>>>   - The Address object gains a segment discriminator member variable.
>>>> Everything that creates an Address will need to provide this segment
>>>> discriminator.
>>>>   - A ProcessAddress object which is a uint64_t and a segment
>>>> discriminator as a replacement for addr_t. ProcessAddress objects
>>>> would not persist across multiple executions. Similar to how you can
>>>> create an addr_t from an Address+Target today, you can create a
>>>> ProcessAddress given an Address+Target. When we pass around addr_ts
>>>> today, they would be replaced with ProcessAddress, with the exception
>>>> of symbol tables where the added space would be significant, and we do
>>>> not believe we need segment discriminators today.
>>> I'm strongly in favor of the first approach. The reason for that is that we 
>>> have
>>> a lot of code that can only reasonable deal with one kind of an address, and
>>> I'd like to be able to express that in the type system. In fact, I think we 
>>> could
>>> have more distinct types even now, but adding address spaces makes that
>>> even more important.
>>> 
>>>> ### Address Only
>>>> 
>>>> Extend the lldb_private::Address class to be the one representation of
>>>> locations; including file based ones valid before running, file
>>>> addresses resolved in a process, and process specific addresses
>>>> (heap/stack/JIT code) that are only valid during a run. That is
>>>> attractive because it would provide a uniform interface to any “where
>>>> is something” question you would ask, either about symbols in files,
>>>> variables in stack frames, etc.
>>>> 
>>>> At present, when we resolve a Section+Offset Address to a “load address”
>>>> we provide a Target to the resolution API.  Providing the Target
>>>> externally makes sense because a Target knows whether the Section is
>>>> present or not and can unambiguously return a load address.    We
>>>> could continue that approach since the Target always holds only one
>>>> process, or extend it to allow passing in a Process when resolving
>>>> non-file backed addresses.  But this would make the conversion from
>>>> addr_t uses to Address uses more difficult, since we will have to push
>>>> the Target or Process into all the API’s that make use of just an
>>>> addr_t.  Using a single Address class seems less attractive when you
>>>> have to provide an external entity to make sense of it at all the use 
>>>> sites.
>>>> 
>>>> We could improve this situation by including a Process (as a weak
>>>> pointer) and fill that in on the boundaries where in the current code
>>>> we go from an Address to a process specific addr_t.  That would make
>>>> the conversion easier, but add complexity.  Since Addresses are
>>>> ubiquitous, you won’t know what any given Address you’ve been handed
>>>> actually contains.  It could even have been resolved for another
>>>> process than the current one.  Making Address usage-dependent in this
>>>> way reduces the attractiveness of the solution.
>>>> 
>>>> ## Approach
>>>> 
>>>> Replacing all the instances of addr_t by hand would be a lot of work.
>>>> Therefore we propose writing a clang-based tool to automate this
>>>> menial task. The tool would update function signatures and replace
>>>> uses of addr_t inside those functions to get the addr_t from the
>>>> ProcessAddress or Address and return the appropriate object for
>>>> functions that currently return an addr_t. The goal of this tool is to
>>>> generate one big NFC patch. This tool needs not be perfect, at some
>>>> point it will be more work to improve the tool than fixing up the remaining
>>> code by hand.
>>>> After this patch LLDB would still not really understand address spaces
>>>> but it will have everything in place to support them.
>>>> 
>>>> Once all the APIs are updated, we can start working on the functional
>>>> changes. This means actually interpreting the aspace_t values and
>>>> making sure they don’t get dropped.
>>>> 
>>>> Finally, when all this work is done and we’re happy with the approach,
>>>> we extend the SB API with overloads for the functions that currently
>>>> take or return addr_t . I want to do this last so we have time to
>>>> iterate before committing to a stable interface.
>>>> 
>>>> ## Testing
>>>> 
>>>> By splitting off the intrusive non-functional changes we are able to
>>>> rely on the existing tests for coverage. Smaller functional changes
>>>> can be tested in isolation, either through a unit test or a small GDB
>>>> remote test. For end-to-end testing we can run the test suite with a
>>>> modified debugserver that spoofs address spaces.
>>>> 
>>>> Thanks,
>>>> Jonas
>>>> 
>>>> 
>>>> _______________________________________________
>>>> lldb-dev mailing list
>>>> lldb-dev@lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>> 
>>> _______________________________________________
>>> lldb-dev mailing list
>>> lldb-dev@lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> 
> -- 
> Zdenek Prikryl
> CTO
> T +420 541 141 475
> Codasip.com
> 

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB

Reply via email to