> On Nov 10, 2020, at 12:58 PM, Zdenek Prikryl <prik...@codasip.com> wrote: > > Hi all, > > Just for the record, we have successfully implemented the wrapping of addr_t > into a class to support multiple address spaces. The info about address space > is stored in the ELF file, so we get the info from ELF parser and then pass > it to the rest of the system. CLI/MI interface has been extended as well, so > user can select with address space he wants for memory printing. Similarly, > we patched expression evaluation, disassembler, etc.
That's really interesting, I'm excited to hear that this is feasible and has been done before. Is this code available publicly and/or is this something you'd be willing to upstream (with our help)? > > If the address wrap is part of the upstream version, it will be awesome :-)... > > Best regards. > > On 10/20/20 9:30 PM, Ted Woodward via lldb-dev wrote: >> I agree with Pavel about the larger picture - we need to know the driver >> behind address spaces before we can discuss a workable solution. >> >> I've dealt with 2 use cases - Harvard architecture cores, and low level >> hardware debugging. >> >> A Harvard architecture core has separate instruction and data memories. >> These often use the same addresses, so to distinguish between them you need >> address spaces. The Motorola DSP56300 had 1 program and 2 data memories, >> called p, x and y. p:100, x:100 and y:100 were all separate memories, so >> "address 100" isn't enough to get what the user needed to see. >> >> For low level hardware debugging (often using JTAG), many devices let you >> access memories in ways like "virtual using the TLB", or "virtual == >> physical, through the core", or "physical, through the SoC, not cached". >> Memory spaces, done right, can give the user the flexibility to pick how to >> view memory. >> >> >> Are these the use cases you were envisioning, Jonas? >> >>> -----Original Message----- >>> From: lldb-dev <lldb-dev-boun...@lists.llvm.org> On Behalf Of Pavel Labath >>> via lldb-dev >>> Sent: Tuesday, October 20, 2020 12:51 PM >>> To: Jonas Devlieghere <jo...@devlieghere.com>; LLDB <lldb- >>> d...@lists.llvm.org> >>> Subject: [EXT] Re: [lldb-dev] [RFC] Segmented Address Space Support in >>> LLDB >>> >>> There's a lot of things that are unclear to me about this proposal. The >>> mechanics of representing an segmented address are one thing, but I I think >>> that the really interesting part will be the interaction with the rest of >>> lldb. Like >>> - What's going to be the source of this address space information? Is it >>> going >>> to be statically baked into lldb (a function of the target architecture?), >>> or >>> dynamically retrieved from the target or platform we're debugging? How >>> would that work? >>> - How is this going to interact with Object/SymbolFile classes? Are you >>> expecting to use existing object and symbol formats for address space >>> information, or some custom ones? AFAIK, none of the existing formats >>> actually support encoding address space information (though that hasn't >>> stopped people from trying). >>> >>> Without understanding the bigger picture it's hard for me to say whether the >>> proposed large scale refactoring is a good idea. Nonetheless, I am doubtful >>> of >>> the viability of that approach. Some of my reasons for that are: >>> - not all addr_ts represent an actual address -- sometimes that is a >>> difference >>> between two addresses, which still uses addr_t, as that's guaranteed to fit. >>> - relatedly to that, there is a difference (I'd expect) between the >>> operations >>> supported by the two types. addr_t supports all integral operations (though >>> I >>> hope we don't use all of them), but I wouldn't expect to be able to do the >>> same with a SegmentedAddress. For one, I'd expect it wouldn't be possible >>> to add two SegmentedAddresses together (which is possible for addr_t). >>> OTOH, adding a SegmentedAddress and an addr_t would probably be fine? >>> Would subtracting two SegmentedAddresses should result in an addr_t? But >>> only if they have matching address spaces (and assert otherwise)? >>> - I'd also be worried about over-generalizing specialized code which can >>> afford to work with plain addresses, and where the added address space >>> would be a nuisance (or a source of bugs). E.g. ELF has no notion of address >>> space, so I don't think I'd find it helpful to replace all plain integer >>> calculations >>> in elf parsing code with something more complex. >>> (I'm aware that some people are using elf to encode address space >>> information, but this is a pretty nonstandard extension, and it'd take more >>> than type substitution to support anything like that.) >>> - large scale refactorings are very much not the norm in llvm >>> >>> >>> >>> On 19/10/2020 23:56, Jonas Devlieghere via lldb-dev wrote: >>>> We want to support segmented address spaces in LLDB. Currently, all of >>>> LLDB’s external API, command line interface, and internals assume that >>>> an address in memory can be addressed unambiguously as an addr_t (aka >>>> uint64_t). To support a segmented address space we’d need to extend >>>> addr_t with a discriminator (an aspace_t) to uniquely identify a >>>> location in memory. This RFC outlines what would need to change and >>>> how we propose to do that. >>>> >>>> ### Addresses in LLDB >>>> >>>> Currently, LLDB has two ways of representing an address: >>>> >>>> - Address object. Mostly represents addresses as Section+offset for >>>> a binary image loaded in the Target. An Address in this form can >>>> persist across executions, e.g. an address breakpoint in a binary >>>> image that loads at a different address every execution. An Address >>>> object can represent memory not mapped to a binary image. Heap, stack, >>>> jitted items, will all be represented as the uint64_t load address of >>>> the object, and cannot persist across multiple executions. You must >>>> have the Target object available to get the current load address of an >>>> Address object in the current process run. Some parts of lldb do not >>>> have a Target available to them, so they require that the Address can >>>> be devolved to an addr_t (aka uint64_t) and passed in. >>>> - The addr_t (aka uint64_t) type. Primarily used when receiving >>>> input (e.g. from a user on the command line) or when interacting with >>>> the inferior (reading/writing memory) for addresses that need not >>>> persist across runs. Also used when reading DWARF and in our symbol >>>> tables to represent file offset addresses, where the size of an >>>> Address object would be objectionable. >>>> >>>> ## Proposal >>>> >>>> ### Address + ProcessAddress >>>> >>>> - The Address object gains a segment discriminator member variable. >>>> Everything that creates an Address will need to provide this segment >>>> discriminator. >>>> - A ProcessAddress object which is a uint64_t and a segment >>>> discriminator as a replacement for addr_t. ProcessAddress objects >>>> would not persist across multiple executions. Similar to how you can >>>> create an addr_t from an Address+Target today, you can create a >>>> ProcessAddress given an Address+Target. When we pass around addr_ts >>>> today, they would be replaced with ProcessAddress, with the exception >>>> of symbol tables where the added space would be significant, and we do >>>> not believe we need segment discriminators today. >>> I'm strongly in favor of the first approach. The reason for that is that we >>> have >>> a lot of code that can only reasonable deal with one kind of an address, and >>> I'd like to be able to express that in the type system. In fact, I think we >>> could >>> have more distinct types even now, but adding address spaces makes that >>> even more important. >>> >>>> ### Address Only >>>> >>>> Extend the lldb_private::Address class to be the one representation of >>>> locations; including file based ones valid before running, file >>>> addresses resolved in a process, and process specific addresses >>>> (heap/stack/JIT code) that are only valid during a run. That is >>>> attractive because it would provide a uniform interface to any “where >>>> is something” question you would ask, either about symbols in files, >>>> variables in stack frames, etc. >>>> >>>> At present, when we resolve a Section+Offset Address to a “load address” >>>> we provide a Target to the resolution API. Providing the Target >>>> externally makes sense because a Target knows whether the Section is >>>> present or not and can unambiguously return a load address. We >>>> could continue that approach since the Target always holds only one >>>> process, or extend it to allow passing in a Process when resolving >>>> non-file backed addresses. But this would make the conversion from >>>> addr_t uses to Address uses more difficult, since we will have to push >>>> the Target or Process into all the API’s that make use of just an >>>> addr_t. Using a single Address class seems less attractive when you >>>> have to provide an external entity to make sense of it at all the use >>>> sites. >>>> >>>> We could improve this situation by including a Process (as a weak >>>> pointer) and fill that in on the boundaries where in the current code >>>> we go from an Address to a process specific addr_t. That would make >>>> the conversion easier, but add complexity. Since Addresses are >>>> ubiquitous, you won’t know what any given Address you’ve been handed >>>> actually contains. It could even have been resolved for another >>>> process than the current one. Making Address usage-dependent in this >>>> way reduces the attractiveness of the solution. >>>> >>>> ## Approach >>>> >>>> Replacing all the instances of addr_t by hand would be a lot of work. >>>> Therefore we propose writing a clang-based tool to automate this >>>> menial task. The tool would update function signatures and replace >>>> uses of addr_t inside those functions to get the addr_t from the >>>> ProcessAddress or Address and return the appropriate object for >>>> functions that currently return an addr_t. The goal of this tool is to >>>> generate one big NFC patch. This tool needs not be perfect, at some >>>> point it will be more work to improve the tool than fixing up the remaining >>> code by hand. >>>> After this patch LLDB would still not really understand address spaces >>>> but it will have everything in place to support them. >>>> >>>> Once all the APIs are updated, we can start working on the functional >>>> changes. This means actually interpreting the aspace_t values and >>>> making sure they don’t get dropped. >>>> >>>> Finally, when all this work is done and we’re happy with the approach, >>>> we extend the SB API with overloads for the functions that currently >>>> take or return addr_t . I want to do this last so we have time to >>>> iterate before committing to a stable interface. >>>> >>>> ## Testing >>>> >>>> By splitting off the intrusive non-functional changes we are able to >>>> rely on the existing tests for coverage. Smaller functional changes >>>> can be tested in isolation, either through a unit test or a small GDB >>>> remote test. For end-to-end testing we can run the test suite with a >>>> modified debugserver that spoofs address spaces. >>>> >>>> Thanks, >>>> Jonas >>>> >>>> >>>> _______________________________________________ >>>> lldb-dev mailing list >>>> lldb-dev@lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >>>> >>> _______________________________________________ >>> lldb-dev mailing list >>> lldb-dev@lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >> _______________________________________________ >> lldb-dev mailing list >> lldb-dev@lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev > > -- > Zdenek Prikryl > CTO > T +420 541 141 475 > Codasip.com > _______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev