[lldb-dev] Unwinding call frames with separated data and return address stacks

2019-03-04 Thread Thomas Goodfellow via lldb-dev
I'm adding LLDB support for an unconventional platform which uses two
stacks: one purely for return addresses and another for frame context
(spilled registers, local variables, etc). There is no explicit link
between the two stacks, i.e. the frame context doesn't include any
pointer or index to identify the return address: the epilog for a
subroutine amounts to unwinding the frame context then finally popping
the top return address from the return stack. It has some resemblance
to the Intel CET scheme of shadow stacks, but without the primary
stack having a copy of the return address.

I can extend the emulation of the platform to better support LLDB. For
example while the real hardware platform provides no access to the
return address stack the emulation can expose it in the memory map,
provide an additional debug register for querying it, etc, which DWARF
expressions could then extract return addresses from. However doing
this seems to require knowing the frame number and I haven't found a
way of doing this (a pseudo-register manipulated by DWARF expressions
worked but needed some LLDB hacks to sneak it through the existing
link register handling, also seemed likely to be unstable against LLDB
implementation changes)

Is there a way to access the call frame number (or a reliable proxy)
from a DWARF expression? Or an existing example of unwinding a shadow
stack?

Thanks,
Tom
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Unwinding call frames with separated data and return address stacks

2019-03-05 Thread Thomas Goodfellow via lldb-dev
Hi Jason

Thanks for the advice - I've been surprised overall how capable DWARF
expressions are so wouldn't have been surprised to learn that there is
also a category of pseudo-variables (not that I can think of any
others, or other circumstances where it would be useful: the usual
combined code/data stack is ubiquitous). The RestoreType suggestion is
interesting as it might be a less-intrusive change.

Cheers,
Tom

On Mon, 4 Mar 2019 at 22:05, Jason Molenda  wrote:
>
> Hi Tom, interesting problem you're working on there.
>
> I'm not sure any of the DWARF expression operators would work here.  You want 
> to have an expression that works for a given frame, saying "to find the 
> caller's pc value, look at the saved-pc stack, third entry from the bottom of 
> that stack."  But that would require generating a different DWARF expression 
> for the frame each time it shows up in a backtrace - which is unlike lldb's 
> normal design of having an UnwindPlan for a function which is computed once 
> and reused for the duration of the debug session.
>
> I supposed you could add a user-defined DW_OP which means "get the current 
> stack frame number" and then have your expression deref the emulated saved-pc 
> stack to get the value?
>
> lldb uses an intermediate representation of unwind information (UnwindPlan) 
> which will use a DWARF expression, but you could also add an entry to 
> UnwindPlan::Row::RegisterLocation::RestoreType which handled this, I suppose.
>
>
> > On Mar 4, 2019, at 2:46 AM, Thomas Goodfellow via lldb-dev 
> >  wrote:
> >
> > I'm adding LLDB support for an unconventional platform which uses two
> > stacks: one purely for return addresses and another for frame context
> > (spilled registers, local variables, etc). There is no explicit link
> > between the two stacks, i.e. the frame context doesn't include any
> > pointer or index to identify the return address: the epilog for a
> > subroutine amounts to unwinding the frame context then finally popping
> > the top return address from the return stack. It has some resemblance
> > to the Intel CET scheme of shadow stacks, but without the primary
> > stack having a copy of the return address.
> >
> > I can extend the emulation of the platform to better support LLDB. For
> > example while the real hardware platform provides no access to the
> > return address stack the emulation can expose it in the memory map,
> > provide an additional debug register for querying it, etc, which DWARF
> > expressions could then extract return addresses from. However doing
> > this seems to require knowing the frame number and I haven't found a
> > way of doing this (a pseudo-register manipulated by DWARF expressions
> > worked but needed some LLDB hacks to sneak it through the existing
> > link register handling, also seemed likely to be unstable against LLDB
> > implementation changes)
> >
> > Is there a way to access the call frame number (or a reliable proxy)
> > from a DWARF expression? Or an existing example of unwinding a shadow
> > stack?
> >
> > Thanks,
> > Tom
> > ___
> > lldb-dev mailing list
> > lldb-dev@lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Unwinding call frames with separated data and return address stacks

2019-03-14 Thread Thomas Goodfellow via lldb-dev
Hi Pavel

Thanks for that very useful notion - adding a fake stack pointer to
the target was indeed the least-intrusive approach (I had tried the
RestoreType approach suggested by Jason and it seemed likely to work
but required backporting 7.0 fixes into the regrettably-6.0-based
codeline).

I suspect it may be a mild guideline for avoiding further pain on this
unconventional target: when necessary use the platform emulation to
fake its being conventional.

Cheers,
Tom



On Tue, 5 Mar 2019 at 12:55, Pavel Labath  wrote:
>
> On 04/03/2019 11:46, Thomas Goodfellow via lldb-dev wrote:
> > I'm adding LLDB support for an unconventional platform which uses two
> > stacks: one purely for return addresses and another for frame context
> > (spilled registers, local variables, etc). There is no explicit link
> > between the two stacks, i.e. the frame context doesn't include any
> > pointer or index to identify the return address: the epilog for a
> > subroutine amounts to unwinding the frame context then finally popping
> > the top return address from the return stack. It has some resemblance
> > to the Intel CET scheme of shadow stacks, but without the primary
> > stack having a copy of the return address.
> >
> > I can extend the emulation of the platform to better support LLDB. For
> > example while the real hardware platform provides no access to the
> > return address stack the emulation can expose it in the memory map,
> > provide an additional debug register for querying it, etc, which DWARF
> > expressions could then extract return addresses from. However doing
> > this seems to require knowing the frame number and I haven't found a
> > way of doing this (a pseudo-register manipulated by DWARF expressions
> > worked but needed some LLDB hacks to sneak it through the existing
> > link register handling, also seemed likely to be unstable against LLDB
> > implementation changes)
> >
> > Is there a way to access the call frame number (or a reliable proxy)
> > from a DWARF expression? Or an existing example of unwinding a shadow
> > stack?
> >
>
> I'm not sure I fully understood your setup, but it seems to me that this
> could be easily fixed if, in addition to the "fake" memory map, you
> could provide a fake "stack pointer" register which points to it.
>
> Then, it should be possible to express the unwind info in regular
> debug_frame syntax:
> previous_IP := [ fake_SP ]
> previous_fake_SP := fake_SP +/- sizeof(IP)
>
> regards,
> pavel
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Are overlapping ELF sections problematic?

2019-06-03 Thread Thomas Goodfellow via lldb-dev
I'm working with an embedded platform that segregates memory between
executable code, RAM, and constant values. The three kinds occupy
three separate address spaces, accessed by specific instructions (e.g.
"load from RAM address #0" vs "load from constant ROM address #0")
with fairly small ranges for literal address values. So necessarily
all three address spaces all start at zero.

We're using the LLVM toolchain with ELF32 files, mapping the three
spaces as.text, .data, and .crom sections, with a linker script
setting the address for all three sections to zero and so producing a
non-relocatable executable image (the .text section becomes a ROM for
an embedded device so final addresses are required). To support
debugging with LLDB (where the GDB server protocol presumes a single
flat memory space) the sections are mapped to address ranges in a
larger space (using the top two bits) and the debugger stub of the
platform then demuxes the memory accesses to the appropriate address
spaces).

Until recently this was done by loading the ELF file in LLDB, e.g:
"target modules load --file test.elf .data 0 .crom 0x4000 .text
0x8000". However the changes introduced through
https://reviews.llvm.org/D55998 removed support for overlapping
sections, with a remark "I don't anticipate running into this
situation in the real world. However, if we do run into it, and the
current behavior is not suitable for some reason, we can implement
this logic differently."

Our immediate coping strategy was implementing the remapping in the
file parsing of ObjectFileELF, but this LLDB change makes us
apprehensive that we may start encountering similar issues elsewhere
in the LLVM tooling. Are ELF sections with overlapping addresses so
rare (or even actually invalid) that ongoing support will be fragile?
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Are overlapping ELF sections problematic?

2019-06-04 Thread Thomas Goodfellow via lldb-dev
Hi Pavel

> I can't say what's the situation in the rest of llvm, but right now lldb
> has zero test coverage for the flow you are using, so the fact that this
> has worked until now was pretty much an accident.

It was a pleasant surprise that it worked at all, since flat memory
maps have become near-ubiquitous. But it's good to at least know that
the conceptual ice hasn't become any thinner through the patch, i.e.
it refines the existing state rather than reflecting a more explicit
policy change.

> In the mean time, I believe you can just patch out the part which drops
> the overlapping sections from the section list and get behavior which
> was more-or-less identical to the old one.

I think this also requires reverting the use of the IntervalMap as the
VM address container, since that relies upon non-overlapping
intervals? That smells like a bigger fork than I would want like to
keep indefinitely alive.

> I believe that a long term solution here would be to introduce some
> concept of address spaces to lldb. Then these queries would no longer be
> ambiguous as the function FindSectionContainingFileAddress would
> (presumably) take an additional address-space identifier as an argument.
> I know this is what some downstream users are doing to make things like
> this work. However, this is a fairly invasive change, so doing something
> like this upstream would require a lot of previous discussion.

Would this also extend the GDB remote protocol, where the single flat
address space seems the only current option? (at least the common
solution in various GDB discussions of DSP targets is address muxing
of the sort we're using)

I imagine such changes are hampered by the lack of in-tree targets
that require them, both to motivate the change and to keep it testable
(the recent "removing magic numbers assuming 8-bit bytes" discussion
in llvm-dev features the same issue). Previously Embecosm was
attempting to upstream a LLVM target for its demonstration AAP
architecture (features multiple address spaces), e.g.
http://lists.llvm.org/pipermail/llvm-dev/2017-February/109776.html .
However their public forks on GitHub only reveal GDB support rather
than LLDB, and that implementation is by an address mux.

Unfortunately the architecture I'm working with is (yet another) poor
candidate for upstreaming, since it lacks general availability, but
hopefully one of the exotic architectures lurking in the LLVM shadows
someday steps forth with a commitment to keep it alive in-tree.

Cheers,
Tom

On Mon, 3 Jun 2019 at 13:19, Pavel Labath  wrote:
>
> On 03/06/2019 10:19, Thomas Goodfellow via lldb-dev wrote:
> > I'm working with an embedded platform that segregates memory between
> > executable code, RAM, and constant values. The three kinds occupy
> > three separate address spaces, accessed by specific instructions (e.g.
> > "load from RAM address #0" vs "load from constant ROM address #0")
> > with fairly small ranges for literal address values. So necessarily
> > all three address spaces all start at zero.
> >
> > We're using the LLVM toolchain with ELF32 files, mapping the three
> > spaces as.text, .data, and .crom sections, with a linker script
> > setting the address for all three sections to zero and so producing a
> > non-relocatable executable image (the .text section becomes a ROM for
> > an embedded device so final addresses are required). To support
> > debugging with LLDB (where the GDB server protocol presumes a single
> > flat memory space) the sections are mapped to address ranges in a
> > larger space (using the top two bits) and the debugger stub of the
> > platform then demuxes the memory accesses to the appropriate address
> > spaces).
> >
> > Until recently this was done by loading the ELF file in LLDB, e.g:
> > "target modules load --file test.elf .data 0 .crom 0x4000 .text
> > 0x8000". However the changes introduced through
> > https://reviews.llvm.org/D55998 removed support for overlapping
> > sections, with a remark "I don't anticipate running into this
> > situation in the real world. However, if we do run into it, and the
> > current behavior is not suitable for some reason, we can implement
> > this logic differently."
> >
> > Our immediate coping strategy was implementing the remapping in the
> > file parsing of ObjectFileELF, but this LLDB change makes us
> > apprehensive that we may start encountering similar issues elsewhere
> > in the LLVM tooling. Are ELF sections with overlapping addresses so
> > rare (or even actually invalid) that ongoing support will be fragile?
> > ___
> > lldb-dev mailing list
&g

Re: [lldb-dev] Are overlapping ELF sections problematic?

2019-06-04 Thread Thomas Goodfellow via lldb-dev
Hi Zdenek

In an ideal world  LLVM and LLDB would support a common approach for
address spaces. Currently our LLVM backend doesn't yet support address
spaces anyway, e.g. access to a variable declared as constant data:

const int my_val __attribute__((section (".crom"))) = { 42 };

is only possible from assembler code. Since current code already
features a blend of C and assembler this limitation is cumbersome
rather than catastrophic, but of course we expect to add proper
lowering for such addresses, so we're certainly interested in this
domain.

The work already done in coupling LLDB more closely to LLVM is obvious
(e.g. migrating from duplicated utility code) but the backends still
seem a little disjoint, e.g. some targets specify the same
architectural attributes such as registers in both projects. It would
be nice if a new (?)  feature like address spaces was added in a way
that minimised redundancy.

Cheers,
Tom

On Mon, 3 Jun 2019 at 15:08, Zdenek Prikryl  wrote:
>
> Hi Pavel, Thomas,
>
> Just a note that this topic is repeating now and then. It'd be nice to
> have a concept at least. We can go with an additional argument, or
> enhance addr_t, or enhance Address, or create a new type for it. So,
> some sort of discussion that would clarify the concept a little bit is
> welcome, I think.
>
> Best regards.
>
> On 6/3/19 1:21 PM, Pavel Labath via lldb-dev wrote:
> > On 03/06/2019 10:19, Thomas Goodfellow via lldb-dev wrote:
> >> I'm working with an embedded platform that segregates memory between
> >> executable code, RAM, and constant values. The three kinds occupy
> >> three separate address spaces, accessed by specific instructions (e.g.
> >> "load from RAM address #0" vs "load from constant ROM address #0")
> >> with fairly small ranges for literal address values. So necessarily
> >> all three address spaces all start at zero.
> >>
> >> We're using the LLVM toolchain with ELF32 files, mapping the three
> >> spaces as.text, .data, and .crom sections, with a linker script
> >> setting the address for all three sections to zero and so producing a
> >> non-relocatable executable image (the .text section becomes a ROM for
> >> an embedded device so final addresses are required). To support
> >> debugging with LLDB (where the GDB server protocol presumes a single
> >> flat memory space) the sections are mapped to address ranges in a
> >> larger space (using the top two bits) and the debugger stub of the
> >> platform then demuxes the memory accesses to the appropriate address
> >> spaces).
> >>
> >> Until recently this was done by loading the ELF file in LLDB, e.g:
> >> "target modules load --file test.elf .data 0 .crom 0x4000 .text
> >> 0x8000". However the changes introduced through
> >> https://reviews.llvm.org/D55998 removed support for overlapping
> >> sections, with a remark "I don't anticipate running into this
> >> situation in the real world. However, if we do run into it, and the
> >> current behavior is not suitable for some reason, we can implement
> >> this logic differently."
> >>
> >> Our immediate coping strategy was implementing the remapping in the
> >> file parsing of ObjectFileELF, but this LLDB change makes us
> >> apprehensive that we may start encountering similar issues elsewhere
> >> in the LLVM tooling. Are ELF sections with overlapping addresses so
> >> rare (or even actually invalid) that ongoing support will be fragile?
> >> ___
> >> lldb-dev mailing list
> >> lldb-dev@lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> >>
> >
> > Hi Thomas,
> >
> > I can't say what's the situation in the rest of llvm, but right now
> > lldb has zero test coverage for the flow you are using, so the fact
> > that this has worked until now was pretty much an accident.
> >
> > The reason I chose to disallow the overlapping sections in the patch
> > you quote was because it was very hard to say what will be the meaning
> > of this to the upper layers of lldb. For instance, a lot things in
> > lldb work with "file addresses" (that is, virtual address, as they are
> > known in the file, without any remapping). This means that the
> > overlapping sections become ambiguous even though you have remapped
> > them to non-overlapping "load addresses" with the "target modules
> > load" command. For instance, the result of a query like
> > "

[lldb-dev] Deadlock with DWARF logging and symbol enumeration

2019-07-01 Thread Thomas Goodfellow via lldb-dev
I'm describing this initially through email rather than raising a defect
because I haven't developed a usable reproduction (I'm working on an
out-of-tree target based off the v8 release) and because it only bites with
DWARF logging enabled it's unlikely to affect many people.

The deadlock comes from SymbolVendor::FindFunctions() holding a module lock
across the delegated FindFunctions() call, which for SymbolFileDWARF
results in TaskMapToInt() passing the task to a worker "task runner"
thread. With DWARF logging enabled the worker thread calls
Module::GetDescription() from LogMessageVerboseBacktrace(), which tries to
claim the same mutex.

With a simple workaround (don't always enable DWARF logging) this
particular instance is easy to avoid but perhaps there are other cases or
some wider implications, e.g. rules for task runner behaviour to avoid such
states? (possibly part of the problem is that the FindFunctions() interface
looks synchronous so holding a recursive mutex across the call isn't
obviously risky)
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Reporting source errors from MCCodeEmitter::encodeInstruction() ?

2020-01-30 Thread Thomas Goodfellow via lldb-dev
We have a backend for a target that at present only detects some
assembler errors when emitting instructions (basically because the
platform has configurable properties with dependencies between
instructions and it was easier to check for their interaction late
than try to detect them earlier, e.g. through custom encoder methods
and tablegen). Emitting diagnostics through
SourceManager::PrintMessage() "works" in the limited sense of
communicating the problem to a human, however it doesn't prevent
generation of an incorrect output file or change the process exit
code.

We'd prefer not to resort to report_fatal_error() since that isn't a
polite way to diagnose problems in the source.

Is there a sensible way to properly signal a source error from the
level of encodeInstruction()? Or is it expected that all such errors
are reported earlier?
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] RISC-V (bare-board) debugging?

2021-06-19 Thread Thomas Goodfellow via lldb-dev
I saw the impending patch https://reviews.llvm.org/D62732, which suggests
that the current LLDB doesn't support RISC-V targets at all? If that's
correct, then do I understand correctly that the patch will support
bare-board debugging, e.g. via openocd?

Is there any sentiment on when the patch will land?

(background being that so far we've been using gdb + openocd to debug
LLVM-built images running on a VexRiscV platform, but obviously would
prefer to be purely-LLVM)

Cheers,
Tom
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev