Re: [lldb-dev] [RFC] Fast Conditional Breakpoints (FCB)
On 8/22/19 12:36 AM, Ismail Bennani via lldb-dev wrote: >> On Aug 21, 2019, at 3:48 PM, Pedro Alves wrote: >> Say, you're using a 5 bytes jmp instruction to jump to the >> trampoline, so you need to replace 5 bytes at the breakpoint address. >> But the instruction at the breakpoint address is shorter than >> 5 bytes. Like: >> >> ADDR | BEFORE | AFTER >> --- >> | INSN1 (1 byte) | JMP (5 bytes) >> 0001 | INSN2 (2 bytes) | <<< thread T's PC points here >> 0002 | | >> 0003 | INSN3 (2 bytes) | >> >> Now once you resume execution, thread T is going to execute a bogus >> instruction at ADDR 0001. > > That’s a relevant point. > > I haven’t thought of it, but I think this can be mitigated by checking at > the time of replacing the instructions if any thread is within the copied > instructions bounds. > > If so, I’ll change all the threads' pcs that are in the critical region to > point to new copied instruction location (inside the trampoline). > > This way, it won’t change the execution flow of the program. Yes, I think that would work, assuming that you can stop all threads, or all threads are already stopped, which I believe is true with LLDB currently. If any thread is running (like in gdb's non-stop mode) then you can't do that, of course. > > Thanks for pointing out this issue, I’ll make sure to add a fix to my > implementation. > > If you have any other suggestion on how to tackle this problem, I’d like > really to know about it :). Not off hand. I think I'd take a look at Dyninst, see if they have some sophisticated way to handle this scenario. Thanks, Pedro Alves ___ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] Evaluating the same expression at the same breakpoint gets slower after a certain number of steps
Hey Jim, We just noticed that 'target.experimental.inject-local-vars' is true by default. If we disable that experimental the performance for expression evaluation is significantly better. >From the flag description: "If true, inject local variables explicitly into the expression text. This will fix symbol resolution when there are name collisions between ivars and local variables. But it can make expressions run much more slowly." I put together a simple example: class bar { public: int foo = 2; int Run(int foo) { return foo + 1; } }; Evaluating 'foo' when stopped in bar.Run() seems to work as expected. Is there something I'm not capturing in my example? Do you have an example of a name collision that the experimental flag fixes? On Mon, Jul 15, 2019 at 1:53 PM Guilherme Andrade via lldb-dev < lldb-dev@lists.llvm.org> wrote: > Gábor, > Thanks for pointing this out to me. The AST changes - the resulting log > increases from 7k lines to 11k. I also verified that the fallback branch is > executed. 18k iterations during the first evaluation and 93k afterwards. > However, that only results in a couple extra milliseconds slowness (~ 4 > ms), whereas the overall performance hit is in the order of hundreds. > > Jim, > Thank you for the explanation. I think I understand the lazy approach for > types realization, but it is still not clear to me how that could cause the > performance to degrade. If we encounter an already realized type, won't > that save us work and make things run faster? Do you know of other points > in the code that could be particularly sensitive to the realized types pool > size (something like the branch Gábor mentioned)? > > On Fri, Jul 12, 2019 at 6:08 AM Gábor Márton > wrote: > >> Guilherme, >> >> Could you please check if you have any structural differences between the >> ASTs you see when >> 1) you first evaluate your expression at breakpoint A >> 2) you evaluate your expression the second time at breakpoint A ? >> The AST of the expression evaluator's context is dumped once a TagDecl is >> completed, but you need to enable a specific logging: (log enable lldb ast). >> >> I have a theory about the root cause of the slowness you experience, but >> it requires proof from your test scenario. >> There are known problems with the lookup we use in LLDB [1]. >> If the lookup mechanism cannot find a symbol then clang::ASTImporter will >> create a new AST node. >> Thus I am expecting a growing AST in your case at 2) with duplicated and >> redundant AST nodes. >> Why would the grown AST results in any slowness? >> Because the lookup we use is in >> `clang::DeclContext::localUncachedLookup()` and it has a fallback branch >> which I assume is executed in your case: >> ``` >> // Slow case: grovel through the declarations in our chain looking for >> // matches. >> // FIXME: If we have lazy external declarations, this will not find >> them! >> // FIXME: Should we CollectAllContexts and walk them all here? >> for (Decl *D = FirstDecl; D; D = D->getNextDeclInContext()) { >> if (auto *ND = dyn_cast(D)) >> if (ND->getDeclName() == Name) >> Results.push_back(ND); >> } >> ``` >> This for loop does a linear search on the list of the decls of a decl >> context, which explains the slowing factor as the AST grows. >> I wounder if you could help proving this theorem by first checking if the >> AST grows and if yes then checking if this linear search is executed or not. >> >> [1] If a symbol is in an `extern "C"` block then the existing lookup >> fails to find it. I try to fix it in https://reviews.llvm.org/D61333 >> >> Thanks, >> Gabor >> >> On Thu, Jul 11, 2019 at 8:24 PM Jim Ingham via lldb-dev < >> lldb-dev@lists.llvm.org> wrote: >> >>> lldb realizes types from DWARF lazily. So for instance, if an >>> expression refers to a pointer to type Foo, we won't necessarily realize >>> the full type of Foo from DWARF to parse that expression. Then if you >>> write a second expression that accesses a member of an object of type Foo, >>> we will realize the full type for Foo. Then if you run the first >>> expression again, the pointer to Foo type in the lldb type system will now >>> point to a realized type of Foo. That should not make any difference, >>> since if we were right the first time that we didn't need to know anything >>> about Foo, it shouldn't matter whether the full type is realized or not. >>> >>> Similarly, the "expression with no side effects" could have also caused >>> lldb to realize any number of other types. We find names from type >>> information in two ways: looking in the realized types, and looking in the >>> name indexes we read from DWARF or (on systems without accelerator tables) >>> from indexing the DWARF manually. So the expression with no side effects >>> will change the types in the realized types pool. That also "should not >>> matter" because if the expression X was looking up by name, the lookup >>> through the name indexes when th
[lldb-dev] [Bug 43091] New: lldb very slow single stepping rep stosb
https://bugs.llvm.org/show_bug.cgi?id=43091 Bug ID: 43091 Summary: lldb very slow single stepping rep stosb Product: lldb Version: unspecified Hardware: PC OS: Linux Status: NEW Severity: enhancement Priority: P Component: All Bugs Assignee: lldb-dev@lists.llvm.org Reporter: nty...@qq.com CC: jdevliegh...@apple.com, llvm-b...@lists.llvm.org When single stepping the asm instructions, I find lldb very slow when encountering a (large) rep stosb. The debugger essentially hung. While I can use Ctrl-C to break the program into debugger, I still can't continue debugging, because even I put a breakpoint on the next instruction, and use the "c" command, it is still very slow. However, if I don't try to single step the rep instruction, and directly put a breakpoint on the next instruction and use "c", it's way faster. -- You are receiving this mail because: You are the assignee for the bug.___ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] Evaluating the same expression at the same breakpoint gets slower after a certain number of steps
The test lldb/packages/Python/lldbsuite/test/lang/cpp/member-and-local-vars-with-same-name/main.cpp is testing this feature, so you should get a 10 (instead of a correct 12345) when you break in main.cpp:31 in this test and eval "expr a" while you have this feature disabled. At least for me that's the case. - Raphael Am Do., 22. Aug. 2019 um 16:31 Uhr schrieb Scott Funkenhauser via lldb-dev : > > Hey Jim, > > We just noticed that 'target.experimental.inject-local-vars' is true by > default. If we disable that experimental the performance for expression > evaluation is significantly better. > > From the flag description: > "If true, inject local variables explicitly into the expression text. This > will fix symbol resolution when there are name collisions between ivars and > local variables. But it can make expressions run much more slowly." > > I put together a simple example: > class bar { > public: > int foo = 2; > int Run(int foo) { > return foo + 1; > } > }; > > Evaluating 'foo' when stopped in bar.Run() seems to work as expected. Is > there something I'm not capturing in my example? Do you have an example of a > name collision that the experimental flag fixes? > > > On Mon, Jul 15, 2019 at 1:53 PM Guilherme Andrade via lldb-dev > wrote: >> >> Gábor, >> Thanks for pointing this out to me. The AST changes - the resulting log >> increases from 7k lines to 11k. I also verified that the fallback branch is >> executed. 18k iterations during the first evaluation and 93k afterwards. >> However, that only results in a couple extra milliseconds slowness (~ 4 ms), >> whereas the overall performance hit is in the order of hundreds. >> >> Jim, >> Thank you for the explanation. I think I understand the lazy approach for >> types realization, but it is still not clear to me how that could cause the >> performance to degrade. If we encounter an already realized type, won't that >> save us work and make things run faster? Do you know of other points in the >> code that could be particularly sensitive to the realized types pool size >> (something like the branch Gábor mentioned)? >> >> On Fri, Jul 12, 2019 at 6:08 AM Gábor Márton wrote: >>> >>> Guilherme, >>> >>> Could you please check if you have any structural differences between the >>> ASTs you see when >>> 1) you first evaluate your expression at breakpoint A >>> 2) you evaluate your expression the second time at breakpoint A ? >>> The AST of the expression evaluator's context is dumped once a TagDecl is >>> completed, but you need to enable a specific logging: (log enable lldb ast). >>> >>> I have a theory about the root cause of the slowness you experience, but it >>> requires proof from your test scenario. >>> There are known problems with the lookup we use in LLDB [1]. >>> If the lookup mechanism cannot find a symbol then clang::ASTImporter will >>> create a new AST node. >>> Thus I am expecting a growing AST in your case at 2) with duplicated and >>> redundant AST nodes. >>> Why would the grown AST results in any slowness? >>> Because the lookup we use is in `clang::DeclContext::localUncachedLookup()` >>> and it has a fallback branch which I assume is executed in your case: >>> ``` >>> // Slow case: grovel through the declarations in our chain looking for >>> // matches. >>> // FIXME: If we have lazy external declarations, this will not find them! >>> // FIXME: Should we CollectAllContexts and walk them all here? >>> for (Decl *D = FirstDecl; D; D = D->getNextDeclInContext()) { >>> if (auto *ND = dyn_cast(D)) >>> if (ND->getDeclName() == Name) >>> Results.push_back(ND); >>> } >>> ``` >>> This for loop does a linear search on the list of the decls of a decl >>> context, which explains the slowing factor as the AST grows. >>> I wounder if you could help proving this theorem by first checking if the >>> AST grows and if yes then checking if this linear search is executed or not. >>> >>> [1] If a symbol is in an `extern "C"` block then the existing lookup fails >>> to find it. I try to fix it in https://reviews.llvm.org/D61333 >>> >>> Thanks, >>> Gabor >>> >>> On Thu, Jul 11, 2019 at 8:24 PM Jim Ingham via lldb-dev >>> wrote: lldb realizes types from DWARF lazily. So for instance, if an expression refers to a pointer to type Foo, we won't necessarily realize the full type of Foo from DWARF to parse that expression. Then if you write a second expression that accesses a member of an object of type Foo, we will realize the full type for Foo. Then if you run the first expression again, the pointer to Foo type in the lldb type system will now point to a realized type of Foo. That should not make any difference, since if we were right the first time that we didn't need to know anything about Foo, it shouldn't matter whether the full type is realized or not. Similarly, the "expression with no side effects" could have also cause
Re: [lldb-dev] [RFC] Fast Conditional Breakpoints (FCB)
Another possibility is to have the IDE insert NOP opcodes for you when you write a breakpoint with a condition and compile NOPs into your program. So the flow is: - set a breakpoint in IDE - modify breakpoint to add a condition - compile and debug, the IDE inserts NOP instructions at the right places - now when you debug you have a NOP you can use and not have to worry about moving instructions > On Aug 22, 2019, at 5:29 AM, Pedro Alves via lldb-dev > wrote: > > On 8/22/19 12:36 AM, Ismail Bennani via lldb-dev wrote: >>> On Aug 21, 2019, at 3:48 PM, Pedro Alves wrote: > >>> Say, you're using a 5 bytes jmp instruction to jump to the >>> trampoline, so you need to replace 5 bytes at the breakpoint address. >>> But the instruction at the breakpoint address is shorter than >>> 5 bytes. Like: >>> >>> ADDR | BEFORE | AFTER >>> --- >>> | INSN1 (1 byte) | JMP (5 bytes) >>> 0001 | INSN2 (2 bytes) | <<< thread T's PC points here >>> 0002 | | >>> 0003 | INSN3 (2 bytes) | >>> >>> Now once you resume execution, thread T is going to execute a bogus >>> instruction at ADDR 0001. >> >> That’s a relevant point. >> >> I haven’t thought of it, but I think this can be mitigated by checking at >> the time of replacing the instructions if any thread is within the copied >> instructions bounds. >> >> If so, I’ll change all the threads' pcs that are in the critical region to >> point to new copied instruction location (inside the trampoline). >> >> This way, it won’t change the execution flow of the program. > > Yes, I think that would work, assuming that you can stop all threads, > or all threads are already stopped, which I believe is true with > LLDB currently. If any thread is running (like in gdb's non-stop mode) > then you can't do that, of course. > >> >> Thanks for pointing out this issue, I’ll make sure to add a fix to my >> implementation. >> >> If you have any other suggestion on how to tackle this problem, I’d like >> really to know about it :). > > Not off hand. I think I'd take a look at Dyninst, see if they have > some sophisticated way to handle this scenario. > > Thanks, > Pedro Alves > ___ > lldb-dev mailing list > lldb-dev@lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev ___ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] [RFC] Fast Conditional Breakpoints (FCB)
Hi Greg, Thanks for your suggestion! > On Aug 22, 2019, at 3:35 PM, Greg Clayton wrote: > > Another possibility is to have the IDE insert NOP opcodes for you when you > write a breakpoint with a condition and compile NOPs into your program. > > So the flow is: > - set a breakpoint in IDE > - modify breakpoint to add a condition > - compile and debug, the IDE inserts NOP instructions at the right places We’re trying to avoid rebuilding every time we want to debug, but I’ll keep this in mind as an eventual fallback. > - now when you debug you have a NOP you can use and not have to worry about > moving instructions > > >> On Aug 22, 2019, at 5:29 AM, Pedro Alves via lldb-dev >> wrote: >> >> On 8/22/19 12:36 AM, Ismail Bennani via lldb-dev wrote: On Aug 21, 2019, at 3:48 PM, Pedro Alves wrote: >> Say, you're using a 5 bytes jmp instruction to jump to the trampoline, so you need to replace 5 bytes at the breakpoint address. But the instruction at the breakpoint address is shorter than 5 bytes. Like: ADDR | BEFORE | AFTER --- | INSN1 (1 byte) | JMP (5 bytes) 0001 | INSN2 (2 bytes) | <<< thread T's PC points here 0002 | | 0003 | INSN3 (2 bytes) | Now once you resume execution, thread T is going to execute a bogus instruction at ADDR 0001. >>> >>> That’s a relevant point. >>> >>> I haven’t thought of it, but I think this can be mitigated by checking at >>> the time of replacing the instructions if any thread is within the copied >>> instructions bounds. >>> >>> If so, I’ll change all the threads' pcs that are in the critical region to >>> point to new copied instruction location (inside the trampoline). >>> >>> This way, it won’t change the execution flow of the program. >> >> Yes, I think that would work, assuming that you can stop all threads, >> or all threads are already stopped, which I believe is true with >> LLDB currently. If any thread is running (like in gdb's non-stop mode) >> then you can't do that, of course. >> >>> >>> Thanks for pointing out this issue, I’ll make sure to add a fix to my >>> implementation. >>> >>> If you have any other suggestion on how to tackle this problem, I’d like >>> really to know about it :). >> >> Not off hand. I think I'd take a look at Dyninst, see if they have >> some sophisticated way to handle this scenario. >> >> Thanks, >> Pedro Alves >> ___ >> lldb-dev mailing list >> lldb-dev@lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev > Sincerely, Ismail ___ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] [RFC] Fast Conditional Breakpoints (FCB)
> On Aug 22, 2019, at 3:58 PM, Ismail Bennani via lldb-dev > wrote: > > Hi Greg, > > Thanks for your suggestion! > >> On Aug 22, 2019, at 3:35 PM, Greg Clayton wrote: >> >> Another possibility is to have the IDE insert NOP opcodes for you when you >> write a breakpoint with a condition and compile NOPs into your program. >> >> So the flow is: >> - set a breakpoint in IDE >> - modify breakpoint to add a condition >> - compile and debug, the IDE inserts NOP instructions at the right places > > We’re trying to avoid rebuilding every time we want to debug, but I’ll keep > this in mind as an eventual fallback. It's also valuable to use FCBs on third party code. You might want to put a FCB on dlopen(), strcmp'ing the first argument for a specific argument, without rebuilding the C libraries. Recompilation/instrumentation makes this a lot simpler, but it also reduces the usefulness of the feature. > >> - now when you debug you have a NOP you can use and not have to worry about >> moving instructions >> >> >>> On Aug 22, 2019, at 5:29 AM, Pedro Alves via lldb-dev >>> wrote: >>> >>> On 8/22/19 12:36 AM, Ismail Bennani via lldb-dev wrote: > On Aug 21, 2019, at 3:48 PM, Pedro Alves wrote: >>> > Say, you're using a 5 bytes jmp instruction to jump to the > trampoline, so you need to replace 5 bytes at the breakpoint address. > But the instruction at the breakpoint address is shorter than > 5 bytes. Like: > > ADDR | BEFORE | AFTER > --- > | INSN1 (1 byte) | JMP (5 bytes) > 0001 | INSN2 (2 bytes) | <<< thread T's PC points here > 0002 | | > 0003 | INSN3 (2 bytes) | > > Now once you resume execution, thread T is going to execute a bogus > instruction at ADDR 0001. That’s a relevant point. I haven’t thought of it, but I think this can be mitigated by checking at the time of replacing the instructions if any thread is within the copied instructions bounds. If so, I’ll change all the threads' pcs that are in the critical region to point to new copied instruction location (inside the trampoline). This way, it won’t change the execution flow of the program. >>> >>> Yes, I think that would work, assuming that you can stop all threads, >>> or all threads are already stopped, which I believe is true with >>> LLDB currently. If any thread is running (like in gdb's non-stop mode) >>> then you can't do that, of course. >>> Thanks for pointing out this issue, I’ll make sure to add a fix to my implementation. If you have any other suggestion on how to tackle this problem, I’d like really to know about it :). >>> >>> Not off hand. I think I'd take a look at Dyninst, see if they have >>> some sophisticated way to handle this scenario. >>> >>> Thanks, >>> Pedro Alves >>> ___ >>> lldb-dev mailing list >>> lldb-dev@lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >> > > Sincerely, > > Ismail > ___ > lldb-dev mailing list > lldb-dev@lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev ___ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] [RFC] Fast Conditional Breakpoints (FCB)
If you can rely on the IDE & compile&debug, you might as well made the IDE&compiler bake in the breakpoint condition and trompoline into the code without having to have the debugger build the trampoline afterwards. Thanks, Pedro Alves On 8/22/19 11:35 PM, Greg Clayton wrote: > Another possibility is to have the IDE insert NOP opcodes for you when you > write a breakpoint with a condition and compile NOPs into your program. > > So the flow is: > - set a breakpoint in IDE > - modify breakpoint to add a condition > - compile and debug, the IDE inserts NOP instructions at the right places > - now when you debug you have a NOP you can use and not have to worry about > moving instructions > > >> On Aug 22, 2019, at 5:29 AM, Pedro Alves via lldb-dev >> wrote: >> >> On 8/22/19 12:36 AM, Ismail Bennani via lldb-dev wrote: On Aug 21, 2019, at 3:48 PM, Pedro Alves wrote: >> Say, you're using a 5 bytes jmp instruction to jump to the trampoline, so you need to replace 5 bytes at the breakpoint address. But the instruction at the breakpoint address is shorter than 5 bytes. Like: ADDR | BEFORE | AFTER --- | INSN1 (1 byte) | JMP (5 bytes) 0001 | INSN2 (2 bytes) | <<< thread T's PC points here 0002 | | 0003 | INSN3 (2 bytes) | Now once you resume execution, thread T is going to execute a bogus instruction at ADDR 0001. >>> >>> That’s a relevant point. >>> >>> I haven’t thought of it, but I think this can be mitigated by checking at >>> the time of replacing the instructions if any thread is within the copied >>> instructions bounds. >>> >>> If so, I’ll change all the threads' pcs that are in the critical region to >>> point to new copied instruction location (inside the trampoline). >>> >>> This way, it won’t change the execution flow of the program. >> >> Yes, I think that would work, assuming that you can stop all threads, >> or all threads are already stopped, which I believe is true with >> LLDB currently. If any thread is running (like in gdb's non-stop mode) >> then you can't do that, of course. >> >>> >>> Thanks for pointing out this issue, I’ll make sure to add a fix to my >>> implementation. >>> >>> If you have any other suggestion on how to tackle this problem, I’d like >>> really to know about it :). >> >> Not off hand. I think I'd take a look at Dyninst, see if they have >> some sophisticated way to handle this scenario. >> >> Thanks, >> Pedro Alves >> ___ >> lldb-dev mailing list >> lldb-dev@lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev > ___ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] [RFC] Fast Conditional Breakpoints (FCB)
On 23/08/2019 00:58, Ismail Bennani via lldb-dev wrote: Hi Greg, Thanks for your suggestion! On Aug 22, 2019, at 3:35 PM, Greg Clayton wrote: Another possibility is to have the IDE insert NOP opcodes for you when you write a breakpoint with a condition and compile NOPs into your program. So the flow is: - set a breakpoint in IDE - modify breakpoint to add a condition - compile and debug, the IDE inserts NOP instructions at the right places We’re trying to avoid rebuilding every time we want to debug, but I’ll keep this in mind as an eventual fallback. A slight variation on that feature would be to just have the compiler guarantee that there will always be enough space between two jump targets for us to insert a trampoline jump. One way to guarantee that would be to align all jump targets to 16-byte boundaries (on x86 anyway). I say this because I have a vague recollection that some of the more exotic llvm backends (webassembly?) may already have such a requirement, albeit for different reasons (to do with being able to statically analyze control flow), so the code for doing this might already be there, and maybe all it would take is a little tinkering with the codegen options to enable it. Unfortunately, I don't remember the details of this, but someone on this list might... pl ___ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev