[lldb-dev] Reporting bugs which only affect (semi-proprietary) downstream consumers.

2021-06-22 Thread Adam HARRIES via lldb-dev
Hi all,

I've recently taken over maintenance of my company's llvm+lldb branch,
where we have added support for our in-house architecture (in llvm) as well
as support for debugging through both hardware and our simulator. Our llvm
fork is public/open source, however many of our runtime libraries and
drivers (which are linked into lldb, clang, etc, and provide built-ins and
driver support etc) are not.

While attempting to update our branch from llvm-11 to llvm-12 we came
across a commit[1] in lldb which quite reliably causes a deadlock when we
launch a process to debug a core dump. Luckily, said commit simply modifies
some concurrency primitives, and reverting it is sufficient to fix the bug
without any further effects. We are quite confident that the commit is the
issue, as we performed a thorough bisect which maintained "our" code
unchanged throughout.

Unfortunately, however, we are unable to reproduce this bug in any "open"
architectures (such as x86-64, AArch64, etc), so are not entirely sure how
we should go about reporting the bug. Additionally, it makes it difficult
to open a discussion regarding whether the commit is correct (and thus we
may need to modify our additions to lldb to match new implicit behaviour),
as third parties may be unable to reproduce the issue. Finally, as the bug
results in a deadlock (which requires a sigkill to end) we won't (as I
understand it) be able to use a "Reproducer" to demonstrate the bug to
third parties.

Although we are able to "solve" the issue locally (by reverting the
commit), we feel that the better solution would be to feed back our
findings to the community and solve the issue, rather than (privately)
sweeping it under the rug. As components of our compiler are proprietary,
however, this process becomes difficult due to the reasons listed above.

To summarise, there are two main questions that I feel unable to answer:
- Is there an existing process for reporting bugs that only affect third
parties, and which cannot be reproduced in "core" targets.
- To what extend is it possible to discuss (or report) bugs "on faith" - as
in without any concrete evidence that a third party can reproduce.

We are currently looking into opening up our build process so that we are
able to distribute binary libraries to enable third parties to build our
compiler + debugger, but as this is currently a work-in-progress it is
unfortunately not a solution to this issue.

Many thanks in advance for any and all advice.
Yours,

-- 
*Adam Brouwers-Harries*
Compiler Engineer
aharr...@upmem.com

[1] Please note, I have specifically not named this commit as I wish to
better understand the "meta"-bug filing process, and I do not wish to
publicly assign blame for any bugs without understanding how and why I can
do so respectfully and properly.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Reporting bugs which only affect (semi-proprietary) downstream consumers.

2021-06-23 Thread Adam HARRIES via lldb-dev
Hi Raphael,

Thanks for the advice!

> I think the best idea is to comment on the commit on Phabricator (
reviews.llvm.org ) as it seems to be a relatively recent change. Otherwise
if you can somehow provide way to reproduce the deadlock using only code
you can share + LLVM.org sources then filing a bug would be an option too.

I'll definitely leave a comment then, as at the very least I should be able
to get some feedback on the commit itself. I can't (sadly) reproduce the
deadlock using public code - I'm still looking into how we can share our
(private) llvm/lldb dependencies so that public parties can build them, so
I may hold off on filing a bug until I have sorted that.

> At least the backtrace of all threads in the deadlocked state would be
good to know. And of course the commit your bisect stopped at if it's a bug
report.

I can absolutely share all of these, and I'll make sure to include them in
any bug report I file.

> And I believe you can't use the reproducer feature here as that requires
having the respective LLDB binary to replay (which you probably can't
share).

Our LLDB binaries are publicly available, however there are a number static
libraries that we link into our LLVM backend whose source is proprietary,
hence why I cannot reproduce the bug using public code.

Thanks,
Adam

On Tue, 22 Jun 2021 at 18:33, Raphael “Teemperor” Isemann <
teempe...@gmail.com> wrote:

> Hi Adam,
>
> I think the best idea is to comment on the commit on Phabricator (
> reviews.llvm.org ) as it seems to be a relatively recent change.
> Otherwise if you can somehow provide way to reproduce the deadlock using
> only code you can share + LLVM.org sources then filing a bug would be an
> option too.
>
> Regarding what information you should provide: Pretty much everything that
> you can share would help. At least the backtrace of all threads in the
> deadlocked state would be good to know. And of course the commit your
> bisect stopped at if it's a bug report. From there people might have an
> idea how to reproduce the issue in a unit test or via the SB API (or what
> could be going wrong in your downstream fork).
>
> And I believe you can't use the reproducer feature here as that requires
> having the respective LLDB binary to replay (which you probably can't
> share).
>
> - Raphael
>
> On 22 Jun 2021, at 19:10, Adam HARRIES via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
> Hi all,
>
> I've recently taken over maintenance of my company's llvm+lldb branch,
> where we have added support for our in-house architecture (in llvm) as well
> as support for debugging through both hardware and our simulator. Our llvm
> fork is public/open source, however many of our runtime libraries and
> drivers (which are linked into lldb, clang, etc, and provide built-ins and
> driver support etc) are not.
>
> While attempting to update our branch from llvm-11 to llvm-12 we came
> across a commit[1] in lldb which quite reliably causes a deadlock when we
> launch a process to debug a core dump. Luckily, said commit simply modifies
> some concurrency primitives, and reverting it is sufficient to fix the bug
> without any further effects. We are quite confident that the commit is the
> issue, as we performed a thorough bisect which maintained "our" code
> unchanged throughout.
>
> Unfortunately, however, we are unable to reproduce this bug in any "open"
> architectures (such as x86-64, AArch64, etc), so are not entirely sure how
> we should go about reporting the bug. Additionally, it makes it difficult
> to open a discussion regarding whether the commit is correct (and thus we
> may need to modify our additions to lldb to match new implicit behaviour),
> as third parties may be unable to reproduce the issue. Finally, as the bug
> results in a deadlock (which requires a sigkill to end) we won't (as I
> understand it) be able to use a "Reproducer" to demonstrate the bug to
> third parties.
>
> Although we are able to "solve" the issue locally (by reverting the
> commit), we feel that the better solution would be to feed back our
> findings to the community and solve the issue, rather than (privately)
> sweeping it under the rug. As components of our compiler are proprietary,
> however, this process becomes difficult due to the reasons listed above.
>
> To summarise, there are two main questions that I feel unable to answer:
> - Is there an existing process for reporting bugs that only affect third
> parties, and which cannot be reproduced in "core" targets.
> - To what extend is it possible to discuss (or report) bugs "on faith" -
> as in without any concrete evidence that a third party can reproduce

Re: [lldb-dev] Reporting bugs which only affect (semi-proprietary) downstream consumers.

2021-06-23 Thread Adam HARRIES via lldb-dev
Hi Greg,

Thanks for the advice!

> [...] I would suggest just submitting a bug and attaching stack traces of
your deadlock. Loading a core file is very similar across all targets, so I
can't imagine this being hard to reproduce with another core file?

Glad to hear this - I'll do so soon then. I also imagine that this bug
affects other "backends", but I can't confirm that myself (due to lack of
experience with other lldb backends), so hopefully others will be able to
verify it if I file a bug.

 > Is there something special about your core file or setup?

As I understand it there is not that much "weird" about our LLDB
integrations. We have made some specific additions to be able to debug
threads/processes running on our co-processor and allow printf/debugging
information to be passed back to the host, but aside from that we haven't
touched any of the core code.

It is, however, possible that we've incorrectly subclassed one of the
native thread/process classes incorrectly and violated some concurrency
invariant. This is part of my hesitation for filing a bug report, as I'm
not sure whether the commit itself was at fault, or whether we accidentally
relied on some incorrect concurrency behaviour which has now been
corrected, leaving our plugin broken.

> I would go ahead and debug the deadlock, attach repro steps for how you
are loading your core file (exact commands or APIs that are being used) and
then maybe attach the output "bt all" so we can see all of the threads and
see what is deadlocking your LLDB.

Okay, thanks for the advice regarding what would be good to include. I'll
make sure to add as much of this as I can when I file the bug report.

Thanks again,
Adam

On Tue, 22 Jun 2021 at 18:34, Greg Clayton  wrote:

>
>
> On Jun 22, 2021, at 10:10 AM, Adam HARRIES via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
> Hi all,
>
> I've recently taken over maintenance of my company's llvm+lldb branch,
> where we have added support for our in-house architecture (in llvm) as well
> as support for debugging through both hardware and our simulator. Our llvm
> fork is public/open source, however many of our runtime libraries and
> drivers (which are linked into lldb, clang, etc, and provide built-ins and
> driver support etc) are not.
>
> While attempting to update our branch from llvm-11 to llvm-12 we came
> across a commit[1] in lldb which quite reliably causes a deadlock when we
> launch a process to debug a core dump. Luckily, said commit simply modifies
> some concurrency primitives, and reverting it is sufficient to fix the bug
> without any further effects. We are quite confident that the commit is the
> issue, as we performed a thorough bisect which maintained "our" code
> unchanged throughout.
>
> Unfortunately, however, we are unable to reproduce this bug in any "open"
> architectures (such as x86-64, AArch64, etc), so are not entirely sure how
> we should go about reporting the bug. Additionally, it makes it difficult
> to open a discussion regarding whether the commit is correct (and thus we
> may need to modify our additions to lldb to match new implicit behaviour),
> as third parties may be unable to reproduce the issue. Finally, as the bug
> results in a deadlock (which requires a sigkill to end) we won't (as I
> understand it) be able to use a "Reproducer" to demonstrate the bug to
> third parties.
>
> Although we are able to "solve" the issue locally (by reverting the
> commit), we feel that the better solution would be to feed back our
> findings to the community and solve the issue, rather than (privately)
> sweeping it under the rug. As components of our compiler are proprietary,
> however, this process becomes difficult due to the reasons listed above.
>
> To summarise, there are two main questions that I feel unable to answer:
> - Is there an existing process for reporting bugs that only affect third
> parties, and which cannot be reproduced in "core" targets.
>
>
> I don't believe there is a formal process for this. Though I would suggest
> just submitting a bug and attaching stack traces of your deadlock. Loading
> a core file is very similar across all targets, so I can't imagine this
> being hard to reproduce with another core file? Is there something special
> about your core file or setup? I know that logging used to be able to cause
> deadlocks due to the Module::GetDescription(...) that tried to take the
> module lock. It no longer does this on top of tree.
>
> - To what extend is it possible to discuss (or report) bugs "on faith" -
> as in without any concrete evidence that a third party can reproduce.
>
> We are currently looking into opening up ou