Hi Jason, This is brilliant! :-) When we'll have segmented address spaces in LLDB maybe it will be possible to add support for WebAssembly debugging. My patch ( https://reviews.llvm.org/D78801) was blocked by my inability to manage addresses for different Wasm modules in a clean (or at least acceptably clean) way. Looking forward to this implementation, let me know if I can help in any way! Thanks!
-- Paolo Severini On Mon, Oct 26, 2020 at 3:45 PM via lldb-dev <lldb-dev@lists.llvm.org> wrote: > Send lldb-dev mailing list submissions to > lldb-dev@lists.llvm.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev > or, via email, send a message with subject or body 'help' to > lldb-dev-requ...@lists.llvm.org > > You can reach the person managing the list at > lldb-dev-ow...@lists.llvm.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of lldb-dev digest..." > > > Today's Topics: > > 1. Re: [RFC] Segmented Address Space Support in LLDB > (Greg Clayton via lldb-dev) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 26 Oct 2020 15:45:02 -0700 > From: Greg Clayton via lldb-dev <lldb-dev@lists.llvm.org> > To: Jason Molenda <jmole...@apple.com> > Cc: LLDB <lldb-dev@lists.llvm.org> > Subject: Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB > Message-ID: <ec34a86e-f2f4-441c-8b60-0ca7b6acc...@gmail.com> > Content-Type: text/plain; charset="utf-8" > > > > > On Oct 22, 2020, at 1:25 AM, Jason Molenda <jmole...@apple.com> wrote: > > > > Hi Greg, Pavel. > > > > I think it's worth saying that this is very early in this project. We > know we're going to need the ability to track segments on addresses, but > honestly a lot of the finer details aren't clear yet. It's such a > fundamental change that we wanted to start a discussion, even though I know > it's hard to have detailed discussions still. > > > > In the envisioned environment, there will be a default segment, and most > addresses will be in the default segment. DWARF, user input (lldb > cmdline), SB API, and clang expressions are going to be the places where > segments are specified --- Dump methods and ProcessGDBRemote will be the > main place where the segments are displayed/used. There will be > modifications to the memory read/write gdb RSP packets to include these. > > > > This early in the project, it's hard to tell what will be upstreamed to > the llvm.org <http://llvm.org/> monorepo, or when. My personal opinion > is that we don't actually want to add segment support to llvm.org < > http://llvm.org/> lldb at this point. We'd be initializing every address > object with LLDB_INVALID_SEGMENT or LLDB_DEFAULT_SEGMENT, and then testing > that each object is initialized this way? I don't see this actually being > useful. > > > > However, changing lldb's target addresses to be strictly handled in > terms of objects will allow us to add a segment discriminator ivar to > Address and ProcessAddress on our local branch while this is in > development, and minimize the places where we're diverging from the > llvm.org <http://llvm.org/> sources. We'll need to have local > modifications at the places where a segment is input (DWARF, cmdline, SB > API, compiler type) or output (Dump, ProcesssGDBRemote) and, hopefully, the > vast majority of lldb can be unmodified. > > > > The proposal was written in terms of what we need to accomplish based on > our current understanding for this project, but I think there will be a lot > of details figured out as we get more concrete experience of how this all > works. And when it's appropriate to upstream to llvm.org < > http://llvm.org/>, we'll be better prepared to discuss the tradeoffs of > the approaches we took in extending Address/ProcessAddress to incorporate a > segment. > > > > My hope is that these generic OO'ification of target addresses will not > change lldb beyond moving off of addr_t for now. > > > > I included a couple of inlined comments, but I need to address more of > yours & Pavel's notes later, I've been dealing with a few crazy things and > am way behind on emails but didn't want to wait any longer to send > something out. > > No worries! I would vote to upstream as much as possible as soon as > possible to avoid differences and merging issues for you guys. I would > really like to see LLDB have support for segmented address spaces. Many > comments I made were just my thinking out loud and trying to ease the > changes in with as little disruption as possible. > > > > > > > >> On Oct 19, 2020, at 4:11 PM, Greg Clayton via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > >> > >> > >> > >>> On Oct 19, 2020, at 2:56 PM, Jonas Devlieghere via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > >>> > >>> We want to support segmented address spaces in LLDB. Currently, all of > LLDB’s external API, command line interface, and internals assume that an > address in memory can be addressed unambiguously as an addr_t (aka > uint64_t). To support a segmented address space we’d need to extend addr_t > with a discriminator (an aspace_t) to uniquely identify a location in > memory. This RFC outlines what would need to change and how we propose to > do that. > >>> > >>> ### Addresses in LLDB > >>> > >>> Currently, LLDB has two ways of representing an address: > >>> > >>> - Address object. Mostly represents addresses as Section+offset for a > binary image loaded in the Target. An Address in this form can persist > across executions, e.g. an address breakpoint in a binary image that loads > at a different address every execution. An Address object can represent > memory not mapped to a binary image. Heap, stack, jitted items, will all be > represented as the uint64_t load address of the object, and cannot persist > across multiple executions. You must have the Target object available to > get the current load address of an Address object in the current process > run. Some parts of lldb do not have a Target available to them, so they > require that the Address can be devolved to an addr_t (aka uint64_t) and > passed in. > >>> - The addr_t (aka uint64_t) type. Primarily used when receiving input > (e.g. from a user on the command line) or when interacting with the > inferior (reading/writing memory) for addresses that need not persist > across runs. Also used when reading DWARF and in our symbol tables to > represent file offset addresses, where the size of an Address object would > be objectionable. > >>> > >> > >> Correction: LLDB has 3 kinds of uint64_t addresses: > >> - "file address" which are always mapped to a section + offset if put > into a Address object. This value only makes sense to the > lldb_private::Module that contains it. The only way to pass this around is > as a lldb_private::Address. You can make queries on a file address using > "image lookup --address" before you are debugging, but a single file > address can result in multiple matches in multiple modules because each > module might contain something at this virtual address. This object might > be able to be converted to a "load address" if the section is loaded in > your debug target. Since the target contains the section load list, the > target is needed when converting between Address and addr_t objects. > >> - "load address" which is guaranteed to be unique in a process with no > segments. It can always be put into a lldb_private::Address object, but > that object won't always have a section. If there is no section, it means > the memory location maps to stack, heap, or other memory that doesn't > reside in a object file section. This object might be able to be converted > to a section + offset address if the address matches one of the loaded > sections in a target. If this can be converted to a Address object that has > a section, then it can persist across debug sessions, otherwise, not. > >> - "host address" which is a pointer to memory in the LLDB process > itself. Used for storing expression results and other things. You cannot > convert this to/from a "file" or "load" address. > > > > Yes, good point, host memory is a third type of address that we use. > And our symbols tables, for instance, internally represent themselves as > uint64_t offsets into the file or section, I forget which, and we're not > talking about changing those uint64_t style addresses. On our project, I > do not believe the symbol table will give us segment information. > > You should be able to classify symbols from the symbol table to a segment > though right? > > We could add a special define for host addresses if needed. > > > > > > > > >> > >>> ## Proposal > >>> > >>> ### Address + ProcessAddress > >>> > >>> - The Address object gains a segment discriminator member variable. > Everything that creates an Address will need to provide this segment > discriminator. > >> > >> So an interesting thing to think about is if lldb_private::Section > object should contain a segment identifier? If this is the case, then an > Address object can have a Section that has a segment _and_ the Address > object itself might have one that was set from the section as well. It > would be good to figure out what the rules are for this case and it might > lead to the need for an intelligent accessor that always prefers the > section's segment if a section is available. The Address object must have > one in case we have a pointer to memory in data and there is no section for > this (like any heap addresses). > > > > I don't believe a Section in this project will have a segment. We're > looking purely at individual variables, primarily from debug information. > > So if you have a global variable, it will have a symbol right? And it will > have debug info. Are you saying that only the debug info would have segment > info? It seems important to be able to view a global variable without debug > info. > > > > > > >>> - A ProcessAddress object which is a uint64_t and a segment > discriminator as a replacement for addr_t. ProcessAddress objects would not > persist across multiple executions. Similar to how you can create an addr_t > from an Address+Target today, you can create a ProcessAddress given an > Address+Target. When we pass around addr_ts today, they would be replaced > with ProcessAddress, with the exception of symbol tables where the added > space would be significant, and we do not believe we need segment > discriminators today. > >> > >> Would SegmentedAddress be a more descriptive name here? > >> > >> A few things I would like to see on ProcessAddress or SegmentedAddress: > >> - Have a segment definition that says "no segment" like > LLDB_INVALID_SEGMENT or LLDB_NO_SEGMENT and allow these objects to be > constructed with just a lldb::addr_t and the segment gets auto set to > LLDB_NO_SEGMENT > > > >> - Any code that uses these should test if there is no segment and > continue to do what they used to do before > >> - like read/write memory in ProcessGDBRemote > > > > > > To be honest, testing this is going to be one of the tricky things I'm > not sure how we'll do. we will have a default segment that addresses will > use unless overridden, but how we spot places that *incorrectly* failed to > initialize the segment of an Address/ProcessAddress is something we're > going to need to figure out. > > Tests I can think of: > - read function disassembly from a code segment that would have the same > address as something from a data segment > - read a variable from a data segment that would have the same address as > something from a code segment > > It would be good to figure out where segments are going to come from. I > would hope that some sections would be able to be mapped to certain > segments so that we can live with a binary that has no debug info and still > read say a global variable. I know we can do things right inside of the > debug info since DWARF can have segment information. > > > > > > >> - Anything that dumps one of these objects should dump just like they > used to (just a uint64_t hex representation and no other notation) > >> - Add code that can convert a "load address" into a ProcessAddress or > SegmentedAddress that invent the segment notation and have no changes for > targets that don't support segmented address spaces > >> - 0x1000 should convert to ProcessAddress where the address is 0x1000 > and segment is LLDB_INVALID_SEGMENT or LLDB_NO_SEGMENT if the process > doesn't support segmented addresses > > > > > >> - 0x1000 would return an error on conversion for processes that do > support segmented addresses as the segment must be specified? Or should > there be a default segment if we run into this case? > >> - Come up with some quick way to represent segmented addresses for an > address of 0x1000 in segment 2: ideas: > >> - [2]0x1000 > >> - {2}0x1000 > >> - 0x1000[2] > >> - 0x1000{2} > >> - {0x1000, 2} > > > > To be honest, we haven't thought about the UI side of this very much > yet. I think there will be ABI or ArchSpec style information that maps > segment numbers to human-understandable names. It's ABI style enumerated > numbers - the DWARF will include a number that is passed down to the remote > gdb stub. > > Should be fine to have named segments if needed, but we will need to come > up with a way to specify a segment. We could always add new arguments to > command line commands if needed. > > > > > >> > >>> > >>> ### Address Only > >>> > >>> Extend the lldb_private::Address class to be the one representation of > locations; including file based ones valid before running, file addresses > resolved in a process, and process specific addresses (heap/stack/JIT code) > that are only valid during a run. That is attractive because it would > provide a uniform interface to any “where is something” question you would > ask, either about symbols in files, variables in stack frames, etc. > >>> > >>> At present, when we resolve a Section+Offset Address to a “load > address” we provide a Target to the resolution API. Providing the Target > externally makes sense because a Target knows whether the Section is > present or not and can unambiguously return a load address. We could > continue that approach since the Target always holds only one process, or > extend it to allow passing in a Process when resolving non-file backed > addresses. But this would make the conversion from addr_t uses to Address > uses more difficult, since we will have to push the Target or Process into > all the API’s that make use of just an addr_t. Using a single Address > class seems less attractive when you have to provide an external entity to > make sense of it at all the use sites. > >>> > >>> We could improve this situation by including a Process (as a weak > pointer) and fill that in on the boundaries where in the current code we go > from an Address to a process specific addr_t. That would make the > conversion easier, but add complexity. Since Addresses are ubiquitous, you > won’t know what any given Address you’ve been handed actually contains. It > could even have been resolved for another process than the current one. > Making Address usage-dependent in this way reduces the attractiveness of > the solution. > >>> > >>> ## Approach > >>> > >>> Replacing all the instances of addr_t by hand would be a lot of work. > Therefore we propose writing a clang-based tool to automate this menial > task. The tool would update function signatures and replace uses of addr_t > inside those functions to get the addr_t from the ProcessAddress or Address > and return the appropriate object for functions that currently return an > addr_t. The goal of this tool is to generate one big NFC patch. This tool > needs not be perfect, at some point it will be more work to improve the > tool than fixing up the remaining code by hand. After this patch LLDB would > still not really understand address spaces but it will have everything in > place to support them. > >> > >> This won't be NFC really as each location that plays with what used to > be addr_t now must check if the segment is invalid before doing what it did > before _and_ return an error if the segment is something valid. > >> > >> It might be better to look at all of the APIs that could end up using a > plain "addr_t" and adding new APIs that take a ProcessAddress and call the > old API if the segment is LLDB_INVALID_SEGMENT or LLDB_NO_SEGMENT, and > return an error if the segment is valid. For example in the Process class > we have: > >> > >> virtual size_t Process::DoReadMemory(lldb::addr_t vm_addr, void *buf, > size_t size, Status &error) = 0; > >> > >> We could add a new overload: > >> > >> virtual size_t Process::DoReadMemory(ProcessAddress proc_addr, void > *buf, size_t size, Status &error) { > >> if (proc_addr.GetSegment() == LLDB_NO_SEGMENT) > >> return DoReadMemory(proc_addr.GetAddress(), but, size, error); > >> error.SetErrorString("segmented addresses are not supported on this > process"); > >> return 0 > >> } > >> > >> Then we can start modifying the locations that need to support > segmented addresses as needed. For instance, if we were to add segmented > address support to ProcessGDBRemote, then we would override this function > in that class. > >> > >> I am not sure if slowly adding this functionality is better than > replacing this all right away, but we can't just do a global replace > without adding functionality or error checking IMHO. > >> > >> > >>> Once all the APIs are updated, we can start working on the functional > changes. This means actually interpreting the aspace_t values and making > sure they don’t get dropped. > >>> > >>> Finally, when all this work is done and we’re happy with the approach, > we extend the SB API with overloads for the functions that currently take > or return addr_t . I want to do this last so we have time to iterate before > committing to a stable interface. > >> > >> This might be one reason for doing the approach suggested above where > we add new internal APIs that take a ProcessAddress and cut over to using > them. As it would mean all of the current APIs in the lldb::SB layer would > remain in place (they can't be removed) and would still make sense. > >> > >>> > >>> ## Testing > >>> > >>> By splitting off the intrusive non-functional changes we are able to > rely on the existing tests for coverage. Smaller functional changes can be > tested in isolation, either through a unit test or a small GDB remote test. > For end-to-end testing we can run the test suite with a modified > debugserver that spoofs address spaces. > >> > >> That makes sense. ProcessGDBRemote will need to dynamically respond > with wether it supports segmented addresses by overloading the DoReadMemory > that takes a ProcessAddress and do the right thing. > >> > >> Thanks for taking this on. I hope some of the comments above help > moving this forward. > >> > >> Greg > >> > >>> > >>> Thanks, > >>> Jonas > >>> > >>> _______________________________________________ > >>> lldb-dev mailing list > >>> lldb-dev@lists.llvm.org > >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev > >> > >> _______________________________________________ > >> lldb-dev mailing list > >> lldb-dev@lists.llvm.org <mailto:lldb-dev@lists.llvm.org> > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev < > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.llvm.org/pipermail/lldb-dev/attachments/20201026/2ed416fd/attachment.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > lldb-dev mailing list > lldb-dev@lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev > > > ------------------------------ > > End of lldb-dev Digest, Vol 125, Issue 31 > ***************************************** >
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev