Re: [lldb-dev] TestRaise.py test_restart_bug flakey stats

2015-10-19 Thread Tamas Berghammer via lldb-dev
The expected flakey works a bit differently then you are described:
* Run the tests
* If it passes, it goes as a successful test and we are done
* Run the test again
* If it is passes the 2nd time then record it as expected failure (IMO
expected falkey would be a better result, but we don't have that category)
* If it fails 2 times in a row then record it as a failure because a flakey
test should pass at least once in every 2 run (it means we need ~95%
success rate to keep the build bot green in most of the time). If it isn't
passing often enough for that then it should be marked as expected failure.
This is done this way to detect the case when a flakey test get broken
completely by a new change.

I checked some states for TestRaise on the build bot and in the current
definition of expected flakey we shouldn't mark it as flakey because it
will often fail 2 times in a row (it passing rate is ~50%) what will be
reported as a failure making the build bot red.

I will send you the full stats from the lass 100 build in a separate off
list mail as it is a too big for the mailing list. If somebody else is
interested in it then let me know.

Tamas

On Sun, Oct 18, 2015 at 2:18 AM Todd Fiala  wrote:

> Nope, no good either when I limit the flakey to DWO.
>
> So perhaps I don't understand how the flakey marking works.  I thought it
> meant:
> * run the test.
> * If it passes, it goes as a successful test.  Then we're done.
> * run the test again.
> * If it passes, then we're done and mark it a successful test.  If it
> fails, then mark it an expected failure.
>
> But that's definitely not the behavior I'm seeing, as a flakey marking in
> the above scheme should never produce a failing test.
>
> I'll have to revisit the flakey test marking to see what it's really doing
> since my understanding is clearly flawed!
>
> On Sat, Oct 17, 2015 at 5:57 PM, Todd Fiala  wrote:
>
>> Hmm, the flakey behavior may be specific to dwo.  Testing it locally as
>> unconditionally flaky on Linux is failing on dwarf.  All the ones I see
>> succeed are dwo.  I wouldn't expect a diff there but that seems to be the
>> case.
>>
>> So, the request still stands but I won't be surprised if we find that dwo
>> sometimes passes while dwarf doesn't (or at least not enough to get through
>> the flakey setting).
>>
>> On Sat, Oct 17, 2015 at 4:57 PM, Todd Fiala  wrote:
>>
>>> Hi Tamas,
>>>
>>> I think you grabbed me stats on failing tests in the past.  Can you dig
>>> up the failure rate for TestRaise.py's test_restart_bug() variants on
>>> Ubuntu 14.04 x86_64?  I'd like to mark it as flaky on Linux, since it is
>>> passing most of the time over here.  But I want to see if that's valid
>>> across all Ubuntu 14.04 x86_64.  (If it is passing some of the time, I'd
>>> prefer marking it flakey so that we don't see unexpected successes).
>>>
>>> Thanks!
>>>
>>> --
>>> -Todd
>>>
>>
>>
>>
>> --
>> -Todd
>>
>
>
>
> --
> -Todd
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] TestRaise.py test_restart_bug flakey stats

2015-10-19 Thread Pavel Labath via lldb-dev
I have created this test to reproduce a race condition in
ProcessGDBRemote. Given that it tests a race condition, it cannot be
failing 100% of the time, but I agree with Tamas that we should keep
it as XFAIL to avoid noise in the buildbots.

pl

On 19 October 2015 at 12:30, Tamas Berghammer via lldb-dev
 wrote:
> The expected flakey works a bit differently then you are described:
> * Run the tests
> * If it passes, it goes as a successful test and we are done
> * Run the test again
> * If it is passes the 2nd time then record it as expected failure (IMO
> expected falkey would be a better result, but we don't have that category)
> * If it fails 2 times in a row then record it as a failure because a flakey
> test should pass at least once in every 2 run (it means we need ~95% success
> rate to keep the build bot green in most of the time). If it isn't passing
> often enough for that then it should be marked as expected failure. This is
> done this way to detect the case when a flakey test get broken completely by
> a new change.
>
> I checked some states for TestRaise on the build bot and in the current
> definition of expected flakey we shouldn't mark it as flakey because it will
> often fail 2 times in a row (it passing rate is ~50%) what will be reported
> as a failure making the build bot red.
>
> I will send you the full stats from the lass 100 build in a separate off
> list mail as it is a too big for the mailing list. If somebody else is
> interested in it then let me know.
>
> Tamas
>
> On Sun, Oct 18, 2015 at 2:18 AM Todd Fiala  wrote:
>>
>> Nope, no good either when I limit the flakey to DWO.
>>
>> So perhaps I don't understand how the flakey marking works.  I thought it
>> meant:
>> * run the test.
>> * If it passes, it goes as a successful test.  Then we're done.
>> * run the test again.
>> * If it passes, then we're done and mark it a successful test.  If it
>> fails, then mark it an expected failure.
>>
>> But that's definitely not the behavior I'm seeing, as a flakey marking in
>> the above scheme should never produce a failing test.
>>
>> I'll have to revisit the flakey test marking to see what it's really doing
>> since my understanding is clearly flawed!
>>
>> On Sat, Oct 17, 2015 at 5:57 PM, Todd Fiala  wrote:
>>>
>>> Hmm, the flakey behavior may be specific to dwo.  Testing it locally as
>>> unconditionally flaky on Linux is failing on dwarf.  All the ones I see
>>> succeed are dwo.  I wouldn't expect a diff there but that seems to be the
>>> case.
>>>
>>> So, the request still stands but I won't be surprised if we find that dwo
>>> sometimes passes while dwarf doesn't (or at least not enough to get through
>>> the flakey setting).
>>>
>>> On Sat, Oct 17, 2015 at 4:57 PM, Todd Fiala  wrote:

 Hi Tamas,

 I think you grabbed me stats on failing tests in the past.  Can you dig
 up the failure rate for TestRaise.py's test_restart_bug() variants on 
 Ubuntu
 14.04 x86_64?  I'd like to mark it as flaky on Linux, since it is passing
 most of the time over here.  But I want to see if that's valid across all
 Ubuntu 14.04 x86_64.  (If it is passing some of the time, I'd prefer 
 marking
 it flakey so that we don't see unexpected successes).

 Thanks!

 --
 -Todd
>>>
>>>
>>>
>>>
>>> --
>>> -Todd
>>
>>
>>
>>
>> --
>> -Todd
>
>
> ___
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] [Bug 25251] New: Infinite recursion in LLDB stack unwinding

2015-10-19 Thread via lldb-dev
https://llvm.org/bugs/show_bug.cgi?id=25251

Bug ID: 25251
   Summary: Infinite recursion in LLDB stack unwinding
   Product: lldb
   Version: unspecified
  Hardware: PC
OS: Linux
Status: NEW
  Severity: normal
  Priority: P
 Component: All Bugs
  Assignee: lldb-dev@lists.llvm.org
  Reporter: tbergham...@google.com
CC: llvm-b...@lists.llvm.org
Classification: Unclassified

Created attachment 15112
  --> https://llvm.org/bugs/attachment.cgi?id=15112&action=edit
Source to reproduce the bug

Infinite recursion in LLDB stack unwinding (resulting in SIGSEGV)

Steps to reproduce the issue:
* g++ -g -std=c++11 ParallelTask.cpp (source file attached)
* ./bin/lldb a.out
* breakpoint set -f ParallelTask.cpp -l 144
* process launch
* thread backtrace all

The issue was introduced by http://reviews.llvm.org/rL249673

Tested on Ubuntu 14.04 wit g++ 4.8.4

So far I haven't managed to reproduce the issue with a smaller example.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] [Bug 25251] Infinite recursion in LLDB stack unwinding

2015-10-19 Thread via lldb-dev
https://llvm.org/bugs/show_bug.cgi?id=25251

ravithejaw...@gmail.com changed:

   What|Removed |Added

 CC||ravithejaw...@gmail.com
   Assignee|lldb-dev@lists.llvm.org |ravithejaw...@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] License & Patents discussion on llvm-dev

2015-10-19 Thread Chris Lattner via lldb-dev
FYI, I just started a discussion on llvm-dev about the license & patents 
situation in the project, it also affects LLDB, so if you’re interested, please 
check it out there.

-Chris
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] [Bug 25253] New: Expression evaluation crashes when base and derived classes are the same

2015-10-19 Thread via lldb-dev
https://llvm.org/bugs/show_bug.cgi?id=25253

Bug ID: 25253
   Summary: Expression evaluation crashes when base and derived
classes are the same
   Product: lldb
   Version: unspecified
  Hardware: PC
OS: Linux
Status: NEW
  Severity: normal
  Priority: P
 Component: All Bugs
  Assignee: lldb-dev@lists.llvm.org
  Reporter: tbergham...@google.com
CC: llvm-b...@lists.llvm.org
Classification: Unclassified

Created attachment 15115
  --> https://llvm.org/bugs/attachment.cgi?id=15115&action=edit
Source to reproduce the bug

If the base class of a class is the same class as the class itself but with
some (not all) of the template arguments are different then the expression
evaluation runs into an infinite recursion with the following calling sequence:

frame #7036: 0x7f7813d99397 liblldb.so.3.8`(anonymous
namespace)::EmptySubobjectMap::ComputeEmptySubobjectSizes(this=0x7ffd0e9ae938)
+ 167 at RecordLayoutBuilder.cpp:192
frame #7037: 0x7f7813d8e35d liblldb.so.3.8`(anonymous
namespace)::EmptySubobjectMap::EmptySubobjectMap(this=0x7ffd0e9ae938,
Context=0x1c48f650, Class=0x1918e340) + 125 at
RecordLayoutBuilder.cpp:171
frame #7038: 0x7f7813d8d8ca
liblldb.so.3.8`clang::ASTContext::getASTRecordLayout(this=0x1c48f650,
D=0x1918e340) const + 1546 at RecordLayoutBuilder.cpp:2909
frame #7039: 0x7f7813d99397 liblldb.so.3.8`(anonymous
namespace)::EmptySubobjectMap::ComputeEmptySubobjectSizes(this=0x7ffd0e9af708)
+ 167 at RecordLayoutBuilder.cpp:192
frame #7040: 0x7f7813d8e35d liblldb.so.3.8`(anonymous
namespace)::EmptySubobjectMap::EmptySubobjectMap(this=0x7ffd0e9af708,
Context=0x1c48f650, Class=0x1918e340) + 125 at
RecordLayoutBuilder.cpp:171
frame #7041: 0x7f7813d8d8ca
liblldb.so.3.8`clang::ASTContext::getASTRecordLayout(this=0x1c48f650,
D=0x1918e340) const + 1546 at RecordLayoutBuilder.cpp:2909

Steps to reproduce the issue:
* g++ -g -std=c++11 RecursiveBase.cpp (source attached)
* ./bin/lldb a.out
* breakpoint set -n main
* process launch
* expression A

The attached code is based on the implementation of the __atomic_base class in
libcxx after simplifying it to the minimal test case.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] TestRaise.py test_restart_bug flakey stats

2015-10-19 Thread Todd Fiala via lldb-dev
Thanks, Tamas.

On Mon, Oct 19, 2015 at 4:30 AM, Tamas Berghammer 
wrote:

> The expected flakey works a bit differently then you are described:
> * Run the tests
> * If it passes, it goes as a successful test and we are done
> * Run the test again
> * If it is passes the 2nd time then record it as expected failure (IMO
> expected falkey would be a better result, but we don't have that category)
>

I agree.  I plan to add that category (I think I even have a bugzilla bug I
created for myself on that).  The intent would be to have a "pass flakey"
and "fail flakey" end state for a run.  How many times to run and
entry/exit from run TBD.  If we mark it right, and we know how many times
we should be able to run it to have a single pass, we could really do this
right.


> * If it fails 2 times in a row then record it as a failure because a
> flakey test should pass at least once in every 2 run (it means we need ~95%
> success rate to keep the build bot green in most of the time). If it isn't
> passing often enough for that then it should be marked as expected failure.
> This is done this way to detect the case when a flakey test get broken
> completely by a new change.
>
>
I see.  Thanks.  That totally explains what I was seeing.

Internally I have been using "unexpected success' as an actionable item,
failing our testbots.  The idea being is that if something is supposed to
fail and it is now passing, that indicates either (1) somebody fixed it
with a change and didn't update the test as a oversight, (2) somebody fixed
it with a change that shouldn't have fixed it, and an issue with the test
logic is not testing something properly, and the test should be updated.

That is kind of stymied by this type of test result, as unexpected success
becomes a "sometimes meaningless" signal.  And anything that is sometimes
meaningless can make the meaningful ones get overlooked.

So I would actively like to move away from unexpected success containing a
sometimes useful / sometimes not useful semantic.  We should tackle that
soon.



> I checked some states for TestRaise on the build bot and in the current
> definition of expected flakey we shouldn't mark it as flakey because it
> will often fail 2 times in a row (it passing rate is ~50%) what will be
> reported as a failure making the build bot red.
>
>
I will send you the full stats from the lass 100 build in a separate off
> list mail as it is a too big for the mailing list. If somebody else is
> interested in it then let me know.
>
>
Thanks, Tamas!


> Tamas
>
> On Sun, Oct 18, 2015 at 2:18 AM Todd Fiala  wrote:
>
>> Nope, no good either when I limit the flakey to DWO.
>>
>> So perhaps I don't understand how the flakey marking works.  I thought it
>> meant:
>> * run the test.
>> * If it passes, it goes as a successful test.  Then we're done.
>> * run the test again.
>> * If it passes, then we're done and mark it a successful test.  If it
>> fails, then mark it an expected failure.
>>
>> But that's definitely not the behavior I'm seeing, as a flakey marking in
>> the above scheme should never produce a failing test.
>>
>> I'll have to revisit the flakey test marking to see what it's really
>> doing since my understanding is clearly flawed!
>>
>> On Sat, Oct 17, 2015 at 5:57 PM, Todd Fiala  wrote:
>>
>>> Hmm, the flakey behavior may be specific to dwo.  Testing it locally as
>>> unconditionally flaky on Linux is failing on dwarf.  All the ones I see
>>> succeed are dwo.  I wouldn't expect a diff there but that seems to be the
>>> case.
>>>
>>> So, the request still stands but I won't be surprised if we find that
>>> dwo sometimes passes while dwarf doesn't (or at least not enough to get
>>> through the flakey setting).
>>>
>>> On Sat, Oct 17, 2015 at 4:57 PM, Todd Fiala 
>>> wrote:
>>>
 Hi Tamas,

 I think you grabbed me stats on failing tests in the past.  Can you dig
 up the failure rate for TestRaise.py's test_restart_bug() variants on
 Ubuntu 14.04 x86_64?  I'd like to mark it as flaky on Linux, since it is
 passing most of the time over here.  But I want to see if that's valid
 across all Ubuntu 14.04 x86_64.  (If it is passing some of the time, I'd
 prefer marking it flakey so that we don't see unexpected successes).

 Thanks!

 --
 -Todd

>>>
>>>
>>>
>>> --
>>> -Todd
>>>
>>
>>
>>
>> --
>> -Todd
>>
>


-- 
-Todd
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] TestRaise.py test_restart_bug flakey stats

2015-10-19 Thread Todd Fiala via lldb-dev
Okay.  I think for the time being, the XFAIL makes sense.  Per my previous
email, though, I think we should move away from unexpected success (XPASS)
being a "sometimes meaningful, sometimes meaningless" signal.  For almost
all cases, an unexpected success is an actionable signal.  I don't want it
to become the warning that everybody lives without fixing, and then it
hides a real issue when one surfaces.

Thanks for explaining what I was seeing!

-Todd

On Mon, Oct 19, 2015 at 6:49 AM, Pavel Labath  wrote:

> I have created this test to reproduce a race condition in
> ProcessGDBRemote. Given that it tests a race condition, it cannot be
> failing 100% of the time, but I agree with Tamas that we should keep
> it as XFAIL to avoid noise in the buildbots.
>
> pl
>
> On 19 October 2015 at 12:30, Tamas Berghammer via lldb-dev
>  wrote:
> > The expected flakey works a bit differently then you are described:
> > * Run the tests
> > * If it passes, it goes as a successful test and we are done
> > * Run the test again
> > * If it is passes the 2nd time then record it as expected failure (IMO
> > expected falkey would be a better result, but we don't have that
> category)
> > * If it fails 2 times in a row then record it as a failure because a
> flakey
> > test should pass at least once in every 2 run (it means we need ~95%
> success
> > rate to keep the build bot green in most of the time). If it isn't
> passing
> > often enough for that then it should be marked as expected failure. This
> is
> > done this way to detect the case when a flakey test get broken
> completely by
> > a new change.
> >
> > I checked some states for TestRaise on the build bot and in the current
> > definition of expected flakey we shouldn't mark it as flakey because it
> will
> > often fail 2 times in a row (it passing rate is ~50%) what will be
> reported
> > as a failure making the build bot red.
> >
> > I will send you the full stats from the lass 100 build in a separate off
> > list mail as it is a too big for the mailing list. If somebody else is
> > interested in it then let me know.
> >
> > Tamas
> >
> > On Sun, Oct 18, 2015 at 2:18 AM Todd Fiala  wrote:
> >>
> >> Nope, no good either when I limit the flakey to DWO.
> >>
> >> So perhaps I don't understand how the flakey marking works.  I thought
> it
> >> meant:
> >> * run the test.
> >> * If it passes, it goes as a successful test.  Then we're done.
> >> * run the test again.
> >> * If it passes, then we're done and mark it a successful test.  If it
> >> fails, then mark it an expected failure.
> >>
> >> But that's definitely not the behavior I'm seeing, as a flakey marking
> in
> >> the above scheme should never produce a failing test.
> >>
> >> I'll have to revisit the flakey test marking to see what it's really
> doing
> >> since my understanding is clearly flawed!
> >>
> >> On Sat, Oct 17, 2015 at 5:57 PM, Todd Fiala 
> wrote:
> >>>
> >>> Hmm, the flakey behavior may be specific to dwo.  Testing it locally as
> >>> unconditionally flaky on Linux is failing on dwarf.  All the ones I see
> >>> succeed are dwo.  I wouldn't expect a diff there but that seems to be
> the
> >>> case.
> >>>
> >>> So, the request still stands but I won't be surprised if we find that
> dwo
> >>> sometimes passes while dwarf doesn't (or at least not enough to get
> through
> >>> the flakey setting).
> >>>
> >>> On Sat, Oct 17, 2015 at 4:57 PM, Todd Fiala 
> wrote:
> 
>  Hi Tamas,
> 
>  I think you grabbed me stats on failing tests in the past.  Can you
> dig
>  up the failure rate for TestRaise.py's test_restart_bug() variants on
> Ubuntu
>  14.04 x86_64?  I'd like to mark it as flaky on Linux, since it is
> passing
>  most of the time over here.  But I want to see if that's valid across
> all
>  Ubuntu 14.04 x86_64.  (If it is passing some of the time, I'd prefer
> marking
>  it flakey so that we don't see unexpected successes).
> 
>  Thanks!
> 
>  --
>  -Todd
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> -Todd
> >>
> >>
> >>
> >>
> >> --
> >> -Todd
> >
> >
> > ___
> > lldb-dev mailing list
> > lldb-dev@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> >
>



-- 
-Todd
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Does anyone depend on using LLDB with Python 2.6?

2015-10-19 Thread Zachary Turner via lldb-dev
AKA: Is Python 2.6 a supported configuration?  I found this
`argparse_compat.py` file in tests, and it opens with this:

"""
Compatibility module to use the lldb test-suite with Python 2.6.

Warning: This may be buggy. It has not been extensively tested and should
only
be used when it is impossible to use a newer Python version.
It is also a special-purpose class for lldb's test-suite.
"""

import sys

if sys.version_info >= (2, 7):
raise "This module shouldn't be used when argparse is available (Python
>= 2.7)"
else:
print("Using Python 2.6 compatibility layer. Some command line options
may not be supported")
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] proposal for reworked flaky test category

2015-10-19 Thread Todd Fiala via lldb-dev
Hi all,

I'd like unexpected successes (i.e. tests marked as unexpected failure that
in fact pass) to retain the actionable meaning that something is wrong.
The wrong part is that either (1) the test now passes consistently and the
author of the fix just missed updating the test definition (or perhaps was
unaware of the test), or (2) the test is not covering the condition it is
testing completely, and some change to the code just happened to make the
test pass (due to the test being not comprehensive enough).  Either of
those requires some sort of adjustment by the developers.

We have a category of test known as "flaky" or "flakey" (both are valid
spellings, for those who care:
http://www.merriam-webster.com/dictionary/flaky, although flaky is
considered the primary).  Flaky tests are tests that we can't get to pass
100% of the time.  This might be because it is extremely difficult to write
the test as such and deemed not worth the effort, or it is a condition that
is just not going to present itself successfully 100% of the time.  These
are tests we still want to exercise, but we don't want to have them start
generating test failures if they don't pass 100% of the time.  Currently
the flaky test mechanism requires a test to pass one in two times.  That is
okay for a test that exhibits a slim degree of flakiness.  For others, that
is not a large enough sample of runs to elicit a successful result.  Those
tests get marked as XFAIL, and generate a non-actionable "unexpected
success" result when they do happen to pass.

GOAL

* Enhance expectedFlakey* test decorators.  Allow specification of the
number of times in which a flaky test should be run to be expected to pass
at least once.  Call that MAX_RUNS.

* When running a flaky test, run it up MAX_RUNS number of times.  The first
time it passes, mark it as a successful test completion.  The test event
system will be given the number of times it was run before passing.
Whether we consume this info or not is TBD (and falls into the purview of
the test results formatter).

* If the test does not pass within MAX_RUNS time, mark it as a flaky fail.
For purposes of the standard output, this can look like FAIL: (flaky) or
something similar so fail scanners still see it.  (Note it's highly likely
I'll do the normal output counts with the TestResults formatter-based
output at the same time, so we get accurate test method counts and the
like).

* Flaky tests never generate a non-actionable "unexpected pass".  This
occurs because we no longer need to mark tests as XFAIL when they require
more than two runs to get a high degree of confidence in a passing test.

* Flaky tests get marked with a flaky category, so that test runners can
choose to skip flaky tests by skipping the category.  This may not be
necessary if tests don't take an excessively long time to get a passing
grade with high degree of confidence.

Let me know what you all think.  Once we come up with something, I'll
implement it.

-- 
-Todd
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] proposal for reworked flaky test category

2015-10-19 Thread Todd Fiala via lldb-dev
> I'd like unexpected successes (i.e. tests marked as unexpected failure
that in fact pass)

argh, that should have been "(i.e. tests marked as *expected* failure that
in fact pass)"

On Mon, Oct 19, 2015 at 12:50 PM, Todd Fiala  wrote:

> Hi all,
>
> I'd like unexpected successes (i.e. tests marked as unexpected failure
> that in fact pass) to retain the actionable meaning that something is
> wrong.  The wrong part is that either (1) the test now passes consistently
> and the author of the fix just missed updating the test definition (or
> perhaps was unaware of the test), or (2) the test is not covering the
> condition it is testing completely, and some change to the code just
> happened to make the test pass (due to the test being not comprehensive
> enough).  Either of those requires some sort of adjustment by the
> developers.
>
> We have a category of test known as "flaky" or "flakey" (both are valid
> spellings, for those who care:
> http://www.merriam-webster.com/dictionary/flaky, although flaky is
> considered the primary).  Flaky tests are tests that we can't get to pass
> 100% of the time.  This might be because it is extremely difficult to write
> the test as such and deemed not worth the effort, or it is a condition that
> is just not going to present itself successfully 100% of the time.  These
> are tests we still want to exercise, but we don't want to have them start
> generating test failures if they don't pass 100% of the time.  Currently
> the flaky test mechanism requires a test to pass one in two times.  That is
> okay for a test that exhibits a slim degree of flakiness.  For others, that
> is not a large enough sample of runs to elicit a successful result.  Those
> tests get marked as XFAIL, and generate a non-actionable "unexpected
> success" result when they do happen to pass.
>
> GOAL
>
> * Enhance expectedFlakey* test decorators.  Allow specification of the
> number of times in which a flaky test should be run to be expected to pass
> at least once.  Call that MAX_RUNS.
>
> * When running a flaky test, run it up MAX_RUNS number of times.  The
> first time it passes, mark it as a successful test completion.  The test
> event system will be given the number of times it was run before passing.
> Whether we consume this info or not is TBD (and falls into the purview of
> the test results formatter).
>
> * If the test does not pass within MAX_RUNS time, mark it as a flaky
> fail.  For purposes of the standard output, this can look like FAIL:
> (flaky) or something similar so fail scanners still see it.  (Note it's
> highly likely I'll do the normal output counts with the TestResults
> formatter-based output at the same time, so we get accurate test method
> counts and the like).
>
> * Flaky tests never generate a non-actionable "unexpected pass".  This
> occurs because we no longer need to mark tests as XFAIL when they require
> more than two runs to get a high degree of confidence in a passing test.
>
> * Flaky tests get marked with a flaky category, so that test runners can
> choose to skip flaky tests by skipping the category.  This may not be
> necessary if tests don't take an excessively long time to get a passing
> grade with high degree of confidence.
>
> Let me know what you all think.  Once we come up with something, I'll
> implement it.
>
> --
> -Todd
>



-- 
-Todd
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Does anyone depend on using LLDB with Python 2.6?

2015-10-19 Thread Todd Fiala via lldb-dev
I think the older Ubuntus and the RHEL 7 line both still have a 2.7-based
python.  I am not aware of any system on the Linux/OS X side where we are
seeing Python 2.6 systems anymore.

Can't speak to the BSDs.

My guess would be we don't need to worry about python < 2.7.

-Todd

On Mon, Oct 19, 2015 at 12:43 PM, Zachary Turner via lldb-dev <
lldb-dev@lists.llvm.org> wrote:

> AKA: Is Python 2.6 a supported configuration?  I found this
> `argparse_compat.py` file in tests, and it opens with this:
>
> """
> Compatibility module to use the lldb test-suite with Python 2.6.
>
> Warning: This may be buggy. It has not been extensively tested and should
> only
> be used when it is impossible to use a newer Python version.
> It is also a special-purpose class for lldb's test-suite.
> """
>
> import sys
>
> if sys.version_info >= (2, 7):
> raise "This module shouldn't be used when argparse is available
> (Python >= 2.7)"
> else:
> print("Using Python 2.6 compatibility layer. Some command line options
> may not be supported")
>
>
>
> ___
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
>


-- 
-Todd
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] proposal for reworked flaky test category

2015-10-19 Thread Zachary Turner via lldb-dev
On Mon, Oct 19, 2015 at 12:50 PM Todd Fiala via lldb-dev <
lldb-dev@lists.llvm.org> wrote:

> Hi all,
>
> I'd like unexpected successes (i.e. tests marked as unexpected failure
> that in fact pass) to retain the actionable meaning that something is
> wrong.  The wrong part is that either (1) the test now passes consistently
> and the author of the fix just missed updating the test definition (or
> perhaps was unaware of the test), or (2) the test is not covering the
> condition it is testing completely, and some change to the code just
> happened to make the test pass (due to the test being not comprehensive
> enough).  Either of those requires some sort of adjustment by the
> developers.
>
I'dd add #3.  The test is actually flaky but is tagged incorrectly.


>
> We have a category of test known as "flaky" or "flakey" (both are valid
> spellings, for those who care:
> http://www.merriam-webster.com/dictionary/flaky, although flaky is
> considered the primary).  Flaky tests are tests that we can't get to pass
> 100% of the time.  This might be because it is extremely difficult to write
> the test as such and deemed not worth the effort, or it is a condition that
> is just not going to present itself successfully 100% of the time.
>
IMO if it's not worth the effort to write the test correctly, we should
delete the test.  Flaky is useful as a temporary status, but if nobody ends
up fixing the flakiness, I think the test should be deleted (more reasons
follow).



> These are tests we still want to exercise, but we don't want to have them
> start generating test failures if they don't pass 100% of the time.
> Currently the flaky test mechanism requires a test to pass one in two
> times.  That is okay for a test that exhibits a slim degree of flakiness.
> For others, that is not a large enough sample of runs to elicit a
> successful result.  Those tests get marked as XFAIL, and generate a
> non-actionable "unexpected success" result when they do happen to pass.
>
> GOAL
>
> * Enhance expectedFlakey* test decorators.  Allow specification of the
> number of times in which a flaky test should be run to be expected to pass
> at least once.  Call that MAX_RUNS.
>
I think it's worth considering it it's a good idea include the date at
which they were declared flakey.  After a certain amount of time has
passed, if it's still flakey they can be relegated to hard failures.  I
don't think flakey should be a permanent state.


>
> * When running a flaky test, run it up MAX_RUNS number of times.  The
> first time it passes, mark it as a successful test completion.  The test
> event system will be given the number of times it was run before passing.
> Whether we consume this info or not is TBD (and falls into the purview of
> the test results formatter).
>

> * If the test does not pass within MAX_RUNS time, mark it as a flaky
> fail.  For purposes of the standard output, this can look like FAIL:
> (flaky) or something similar so fail scanners still see it.  (Note it's
> highly likely I'll do the normal output counts with the TestResults
> formatter-based output at the same time, so we get accurate test method
> counts and the like).
>
The concern I have here (and the reason I would like to delete flakey tests
if the flakiness isn't removed after  certain amount of time) is because
some of our tests are slow.  Repeating them many times is going to have an
impact on how long the test suite takes to run.  It's already tripled over
the past 3 weeks, and I think we need to be careful to keep out things that
have the potential to lead to significant slowness of the test suite runner.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Does anyone depend on using LLDB with Python 2.6?

2015-10-19 Thread Ted Woodward via lldb-dev
Ubuntu 10.04 uses 2.6 by default; Ubuntu 12.04 uses 2.7.

 

We have a bunch of Ubuntu 10 machines here, but anything that runs lldb has 2.7 
installed. I’m OK with dropping 2.6 support.

 

--

Qualcomm Innovation Center, Inc.

The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

 

From: lldb-dev [mailto:lldb-dev-boun...@lists.llvm.org] On Behalf Of Todd Fiala 
via lldb-dev
Sent: Monday, October 19, 2015 3:04 PM
To: Zachary Turner
Cc: LLDB
Subject: Re: [lldb-dev] Does anyone depend on using LLDB with Python 2.6?

 

I think the older Ubuntus and the RHEL 7 line both still have a 2.7-based 
python.  I am not aware of any system on the Linux/OS X side where we are 
seeing Python 2.6 systems anymore.

 

Can't speak to the BSDs.

 

My guess would be we don't need to worry about python < 2.7.

 

-Todd

 

On Mon, Oct 19, 2015 at 12:43 PM, Zachary Turner via lldb-dev 
mailto:lldb-dev@lists.llvm.org> > wrote:

AKA: Is Python 2.6 a supported configuration?  I found this 
`argparse_compat.py` file in tests, and it opens with this:

 

"""

Compatibility module to use the lldb test-suite with Python 2.6.

 

Warning: This may be buggy. It has not been extensively tested and should only

be used when it is impossible to use a newer Python version.

It is also a special-purpose class for lldb's test-suite.

"""

 

import sys

 

if sys.version_info >= (2, 7):

raise "This module shouldn't be used when argparse is available (Python >= 
2.7)"

else:

print("Using Python 2.6 compatibility layer. Some command line options may 
not be supported")

 

 


___
lldb-dev mailing list
lldb-dev@lists.llvm.org  
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev





 

-- 

-Todd

___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Does anyone depend on using LLDB with Python 2.6?

2015-10-19 Thread Kamil Rytarowski via lldb-dev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

(NetBSD) Python 2.6 was retired with pkgsrc-2015Q2

http://mail-index.netbsd.org/pkgsrc-users/2015/07/06/msg021778.html

On 19.10.2015 21:43, Zachary Turner via lldb-dev wrote:
> AKA: Is Python 2.6 a supported configuration?  I found this 
> `argparse_compat.py` file in tests, and it opens with this:
> 
> """ Compatibility module to use the lldb test-suite with Python
> 2.6.
> 
> Warning: This may be buggy. It has not been extensively tested and 
> should only be used when it is impossible to use a newer Python
> version. It is also a special-purpose class for lldb's test-suite. 
> """
> 
> import sys
> 
> if sys.version_info >= (2, 7): raise "This module shouldn't be used
> when argparse is available (Python >= 2.7)" else: print("Using
> Python 2.6 compatibility layer. Some command line options may not
> be supported")
> 
> 
> 
> 
> ___ lldb-dev mailing
> list lldb-dev@lists.llvm.org 
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJWJVOYAAoJEEuzCOmwLnZs/jAQAIUkbz/C7AkSfRGkaCDYKuGk
d6lY2eXIBM+KYWTsVsWiqzJgMyokcz8P1lF9YCD1hyKiNqSMeHA0h3qYHzfQKsli
GwT3e0eebVQjPdQTv8DrdaVWUq3nwvUFhYEzFaUP3HaIMmp0B1lZILjXL66XXrvv
bhxmJlFFQX9qgREHy1UYIfctHxyWXjIT82iw13CVGgXMEXdcwZH3Hbt/XZuhDICL
Be4n6Ht5sVbqE59Gucc0quioPMlnnvsJiIMoCggeryeBFvmrxsf0R6yecn7h+Evr
hROGapryC0hCu+B+p+1dtN56Eg4ZbqthQas6dDe0lSuohcsarE/tz5HS5rrDPNV0
Mec4cIDBIFWjESAGyVvy9CBz9qmMUy52Q6BzKTJwUgGb2fqcb7pYu6iwjXzIzCvn
y4dvIfo8nQJbo6q7rJyYpn5Pz3WCGAQEioZCGj6RN2XCIDvdHYMDXk40vq/kmCzt
O6uSSyrmtLxfoEcCt77FIsXPqA81z0gaZ4Q+Hp2kT848RX5xXSNEAfIpYBW8WfNX
34VmP5nc/cqnJHzAc/AZlcikWpuSl4bkjimK8YOHK8+Yt1qgxs5rREtJaGrcfRpK
2DVS9zuDXOdw3Fm8v7u5fzg0erWKeOtHLujt/wtDwpvMfWLm1RY+g46P2kPx21bk
bNGm/EoPIhvD57akpem8
=maz/
-END PGP SIGNATURE-
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] issues with simultaneous summary & synthetic formatters

2015-10-19 Thread Mike Mayers via lldb-dev
I have figured out how to get both synthetic and summary formatters
attached to a given datatype.
I call GetChildAtIndex from the summary which returns the synthetic child.
(and GetNonSyntheticValue has no effect - to which I must ask - why bother
having it then?)

Given these 2 bugs:
http://reviews.llvm.org/D10624
http://reviews.llvm.org/D10581

I cannot figure out how to have a data item present in the summary that is
NOT present in the synthetic. (I don't like the fact that the data is
repeated twice - wastes screen real estate - and with larger data
structures and the Xcode UI on a 15" screen - it matters. ;<


Thanks,

mtm
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] LLDB: Unwinding based on Assembly Instruction Profiling

2015-10-19 Thread Jason Molenda via lldb-dev
Hi all, sorry I missed this discussion last week, I was a little busy.

Greg's original statement isn't correct -- about a year ago Tong Shen changed 
lldb to using eh_frame for the currently-executing frame.  While it is true 
that eh_frame is not guaranteed to describe the prologue/epilogue, in practice 
eh_frame always describes the epilogue (gdb wouldn't couldn't without this, 
with its much more simplistic unwinder).  Newer gcc's also describe the 
epilogue.  clang does not (currently) describe the epilogue.  Tong's changes 
*augment* the eh_frame with an epilogue description if it doesn't already have 
one.

gcc does have an "asynchronous unwind tables" option -- "asynchronous" meaning 
the unwind rules are defined at every instruction location.  But the last time 
I tried it, it did nothing.  They've settled on an unfortunate middle ground 
where eh_frame (which should be compact and only describe enough for exception 
handling) has *some* async unwind instructions.  And the same unwind rules are 
emitted into the debug_frame section, even if -fasynchronous-unwind-tables is 
used.  

In the ideal world, eh_frame should be extremely compact and only sufficient 
for exception handling.  debug_frame should be extremely verbose and describe 
the unwind rules at all unwind locations.

As Tamas says, there's no indication in eh_frame or debug_frame as to how much 
is described:  call-sites only (for exception handling), call-sites + prologue, 
call-sites + prologue + epilogue, or fully asynchronous.  It's a drag, if the 
DWARF committee ever has enough reason to break open the debug_frame format for 
some other changes, I'd like to get more information in there.


Anyway, point is, we're living off of eh_frame (possibly "augmented") for the 
currently-executing stack frame these days.  lldb may avoid using the assembly 
unwinder altogether in an environment where it finds eh_frame unwind 
instructions for every stack frame.


(on Mac, we've switched to a format called "compact unwind" -- much like the 
ARM unwind info that Tamas recently added support for, this is an extremely 
small bit of information which describes one unwind rule for the entire 
function.  It is only applicable or exception handling, it has no way to 
describe prologues/epilogues.  compact unwind is two 4-byte words per function. 
 lldb will use compact unwind / ARM unwind info for the non-zeroth stack 
frames.  It will use its assembly instruction profiler for the 
currently-executing stack frame.)

Hope that helps.

J


> On Oct 15, 2015, at 2:56 AM, Tamas Berghammer via lldb-dev 
>  wrote:
> 
> If we are trying to unwind from a non call site (frame 0 or signal handler) 
> then the current implementation first try to use the non call site unwind 
> plan (usually assembly emulation) and if that one fails then it will fall 
> back to the call site unwind plan (eh_frame, compact unwind info, etc.) 
> instead of falling back to the architecture default unwind plan because it 
> should be a better guess in general and we usually fail with the assembly 
> emulation based unwind plan for hand written assembly functions where 
> eh_frame is usually valid at all address.
> 
> Generating asynchronous eh_frame (valid at all address) is possible with gcc 
> (I am not sure about clang) but there is no way to tell if a given eh_frame 
> inside an object file is valid at all address or only at call sites. The best 
> approximation what we can do is to say that each eh_frame entry is valid only 
> at the address what it specifies as start address but we don't make a use of 
> it in LLDB at the moment.
> 
> For the 2nd part of the original question, I think changing the eh_frame 
> based unwind plan after a failed unwind using instruction emulation is only a 
> valid option for the PC where we tried to unwind from because the assembly 
> based unwind plan could be valid at other parts of the function. Making the 
> change for that 1 concrete PC address would make sense, but have practically 
> no effect because the next time we want to unwind from the given address we 
> use the same fall back mechanism as in the first case and the change would 
> have only a very small performance gain.
> 
> Tamas
> 
> On Wed, Oct 14, 2015 at 9:36 PM Greg Clayton via lldb-dev 
>  wrote:
> 
> > On Oct 14, 2015, at 1:02 PM, Joerg Sonnenberger via lldb-dev 
> >  wrote:
> >
> > On Wed, Oct 14, 2015 at 11:42:06AM -0700, Greg Clayton via lldb-dev wrote:
> >> EH frame can't be used to unwind when we are in the first frame because
> >> it is only valid at call sites. It also can't be used in frames that
> >> are asynchronously interrupted like signal handler frames.
> >
> > This is not necessarily true, GCC can build them like that. I don't
> > think we have a flag for clang/LLVM to create full async unwind tables.
> 
> Most compilers don't generate stuff that is complete, and if it is complete, 
> I am not aware of any markings on EH frame that states it is complete. So w

Re: [lldb-dev] LLDB: Unwinding based on Assembly Instruction Profiling

2015-10-19 Thread Jason Molenda via lldb-dev

> On Oct 19, 2015, at 2:54 PM, Jason Molenda via lldb-dev 
>  wrote:

> Greg's original statement isn't correct -- about a year ago Tong Shen changed 
> lldb to using eh_frame for the currently-executing frame.  While it is true 
> that eh_frame is not guaranteed to describe the prologue/epilogue, in 
> practice eh_frame always describes the epilogue (gdb wouldn't couldn't 
> without this, with its much more simplistic unwinder).  Newer gcc's also 
> describe the epilogue.  clang does not (currently) describe the epilogue.  
> Tong's changes *augment* the eh_frame with an epilogue description if it 
> doesn't already have one.


Ahhh that paragraph was not clear.  I wrote that "in practice eh_frame 
always describes the epilogue".  I meant "always describes the prologue".

lldb needs the prologue description to step in to/step over functions 
correctly, at least at the first instruction of the function.

It's been five-six years since I worked on gdb's unwinder, but back when I 
worked on it, it didn't have multiple unwind schemes it could pick from, or the 
ability to use different unwind schemes in different contexts, or the ability 
to fall back to different unwind schemes.  That may not be true any longer, I 
don't know.  But back then it was an all-or-nothing approach, so if it was 
going to use eh_frame, it had to use it for everything.




> 
> gcc does have an "asynchronous unwind tables" option -- "asynchronous" 
> meaning the unwind rules are defined at every instruction location.  But the 
> last time I tried it, it did nothing.  They've settled on an unfortunate 
> middle ground where eh_frame (which should be compact and only describe 
> enough for exception handling) has *some* async unwind instructions.  And the 
> same unwind rules are emitted into the debug_frame section, even if 
> -fasynchronous-unwind-tables is used.  
> 
> In the ideal world, eh_frame should be extremely compact and only sufficient 
> for exception handling.  debug_frame should be extremely verbose and describe 
> the unwind rules at all unwind locations.
> 
> As Tamas says, there's no indication in eh_frame or debug_frame as to how 
> much is described:  call-sites only (for exception handling), call-sites + 
> prologue, call-sites + prologue + epilogue, or fully asynchronous.  It's a 
> drag, if the DWARF committee ever has enough reason to break open the 
> debug_frame format for some other changes, I'd like to get more information 
> in there.
> 
> 
> Anyway, point is, we're living off of eh_frame (possibly "augmented") for the 
> currently-executing stack frame these days.  lldb may avoid using the 
> assembly unwinder altogether in an environment where it finds eh_frame unwind 
> instructions for every stack frame.
> 
> 
> (on Mac, we've switched to a format called "compact unwind" -- much like the 
> ARM unwind info that Tamas recently added support for, this is an extremely 
> small bit of information which describes one unwind rule for the entire 
> function.  It is only applicable or exception handling, it has no way to 
> describe prologues/epilogues.  compact unwind is two 4-byte words per 
> function.  lldb will use compact unwind / ARM unwind info for the non-zeroth 
> stack frames.  It will use its assembly instruction profiler for the 
> currently-executing stack frame.)
> 
> Hope that helps.
> 
> J
> 
> 
>> On Oct 15, 2015, at 2:56 AM, Tamas Berghammer via lldb-dev 
>>  wrote:
>> 
>> If we are trying to unwind from a non call site (frame 0 or signal handler) 
>> then the current implementation first try to use the non call site unwind 
>> plan (usually assembly emulation) and if that one fails then it will fall 
>> back to the call site unwind plan (eh_frame, compact unwind info, etc.) 
>> instead of falling back to the architecture default unwind plan because it 
>> should be a better guess in general and we usually fail with the assembly 
>> emulation based unwind plan for hand written assembly functions where 
>> eh_frame is usually valid at all address.
>> 
>> Generating asynchronous eh_frame (valid at all address) is possible with gcc 
>> (I am not sure about clang) but there is no way to tell if a given eh_frame 
>> inside an object file is valid at all address or only at call sites. The 
>> best approximation what we can do is to say that each eh_frame entry is 
>> valid only at the address what it specifies as start address but we don't 
>> make a use of it in LLDB at the moment.
>> 
>> For the 2nd part of the original question, I think changing the eh_frame 
>> based unwind plan after a failed unwind using instruction emulation is only 
>> a valid option for the PC where we tried to unwind from because the assembly 
>> based unwind plan could be valid at other parts of the function. Making the 
>> change for that 1 concrete PC address would make sense, but have practically 
>> no effect because the next time we want to unwind from the given address we 
>> use the same fall back mechanism 

Re: [lldb-dev] proposal for reworked flaky test category

2015-10-19 Thread Todd Fiala via lldb-dev
Okay, so I'm not a fan of the flaky tests myself, nor of test suites taking
longer to run than needed.

Enrico is going to add a new 'flakey' category to the test categorization.

Scratch all the other complexity I offered up.  What we're going to ask is
if a test is flakey, please add it to the 'flakey' category.  We won't do
anything different with the category by default, so everyone will still get
flakey tests running the same manner they do now.  However, on our test
runners, we will be disabling the category entirely using the
skipCategories mechanism since those are generating too much noise.

We may need to add a per-test-method category mechanism since right now our
only mechanism to add categories (1) specify a dot-file to the directory to
have everything in it get tagged with a category, or (2) override the
categorization for the TestCase getCategories() mechanism.

-Todd

On Mon, Oct 19, 2015 at 1:03 PM, Zachary Turner  wrote:

>
>
> On Mon, Oct 19, 2015 at 12:50 PM Todd Fiala via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> Hi all,
>>
>> I'd like unexpected successes (i.e. tests marked as unexpected failure
>> that in fact pass) to retain the actionable meaning that something is
>> wrong.  The wrong part is that either (1) the test now passes consistently
>> and the author of the fix just missed updating the test definition (or
>> perhaps was unaware of the test), or (2) the test is not covering the
>> condition it is testing completely, and some change to the code just
>> happened to make the test pass (due to the test being not comprehensive
>> enough).  Either of those requires some sort of adjustment by the
>> developers.
>>
> I'dd add #3.  The test is actually flaky but is tagged incorrectly.
>
>
>>
>> We have a category of test known as "flaky" or "flakey" (both are valid
>> spellings, for those who care:
>> http://www.merriam-webster.com/dictionary/flaky, although flaky is
>> considered the primary).  Flaky tests are tests that we can't get to pass
>> 100% of the time.  This might be because it is extremely difficult to write
>> the test as such and deemed not worth the effort, or it is a condition that
>> is just not going to present itself successfully 100% of the time.
>>
> IMO if it's not worth the effort to write the test correctly, we should
> delete the test.  Flaky is useful as a temporary status, but if nobody ends
> up fixing the flakiness, I think the test should be deleted (more reasons
> follow).
>
>
>
>> These are tests we still want to exercise, but we don't want to have them
>> start generating test failures if they don't pass 100% of the time.
>> Currently the flaky test mechanism requires a test to pass one in two
>> times.  That is okay for a test that exhibits a slim degree of flakiness.
>> For others, that is not a large enough sample of runs to elicit a
>> successful result.  Those tests get marked as XFAIL, and generate a
>> non-actionable "unexpected success" result when they do happen to pass.
>>
>> GOAL
>>
>> * Enhance expectedFlakey* test decorators.  Allow specification of the
>> number of times in which a flaky test should be run to be expected to pass
>> at least once.  Call that MAX_RUNS.
>>
> I think it's worth considering it it's a good idea include the date at
> which they were declared flakey.  After a certain amount of time has
> passed, if it's still flakey they can be relegated to hard failures.  I
> don't think flakey should be a permanent state.
>
>
>>
>> * When running a flaky test, run it up MAX_RUNS number of times.  The
>> first time it passes, mark it as a successful test completion.  The test
>> event system will be given the number of times it was run before passing.
>> Whether we consume this info or not is TBD (and falls into the purview of
>> the test results formatter).
>>
>
>> * If the test does not pass within MAX_RUNS time, mark it as a flaky
>> fail.  For purposes of the standard output, this can look like FAIL:
>> (flaky) or something similar so fail scanners still see it.  (Note it's
>> highly likely I'll do the normal output counts with the TestResults
>> formatter-based output at the same time, so we get accurate test method
>> counts and the like).
>>
> The concern I have here (and the reason I would like to delete flakey
> tests if the flakiness isn't removed after  certain amount of time) is
> because some of our tests are slow.  Repeating them many times is going to
> have an impact on how long the test suite takes to run.  It's already
> tripled over the past 3 weeks, and I think we need to be careful to keep
> out things that have the potential to lead to significant slowness of the
> test suite runner.
>
>



-- 
-Todd
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] proposal for reworked flaky test category

2015-10-19 Thread Zachary Turner via lldb-dev
Don't get me wrong, I like the idea of running flakey tests a couple of
times and seeing if one passes (Chromium does this too as well, so it's not
without precedent).  If I sounded harsh, it's because I *want* to be harsh
on flaky tests.  Flaky tests indicate literally the *worst* kind of bugs
because you don't even know what kind of problems they're causing in the
wild, so by increasing the amount of pain they cause people (test suite
running longer, etc) the hope is that it will motivate someone to fix it.

On Mon, Oct 19, 2015 at 4:04 PM Todd Fiala  wrote:

> Okay, so I'm not a fan of the flaky tests myself, nor of test suites
> taking longer to run than needed.
>
> Enrico is going to add a new 'flakey' category to the test categorization.
>
> Scratch all the other complexity I offered up.  What we're going to ask is
> if a test is flakey, please add it to the 'flakey' category.  We won't do
> anything different with the category by default, so everyone will still get
> flakey tests running the same manner they do now.  However, on our test
> runners, we will be disabling the category entirely using the
> skipCategories mechanism since those are generating too much noise.
>
> We may need to add a per-test-method category mechanism since right now
> our only mechanism to add categories (1) specify a dot-file to the
> directory to have everything in it get tagged with a category, or (2)
> override the categorization for the TestCase getCategories() mechanism.
>
> -Todd
>
> On Mon, Oct 19, 2015 at 1:03 PM, Zachary Turner 
> wrote:
>
>>
>>
>> On Mon, Oct 19, 2015 at 12:50 PM Todd Fiala via lldb-dev <
>> lldb-dev@lists.llvm.org> wrote:
>>
>>> Hi all,
>>>
>>> I'd like unexpected successes (i.e. tests marked as unexpected failure
>>> that in fact pass) to retain the actionable meaning that something is
>>> wrong.  The wrong part is that either (1) the test now passes consistently
>>> and the author of the fix just missed updating the test definition (or
>>> perhaps was unaware of the test), or (2) the test is not covering the
>>> condition it is testing completely, and some change to the code just
>>> happened to make the test pass (due to the test being not comprehensive
>>> enough).  Either of those requires some sort of adjustment by the
>>> developers.
>>>
>> I'dd add #3.  The test is actually flaky but is tagged incorrectly.
>>
>>
>>>
>>> We have a category of test known as "flaky" or "flakey" (both are valid
>>> spellings, for those who care:
>>> http://www.merriam-webster.com/dictionary/flaky, although flaky is
>>> considered the primary).  Flaky tests are tests that we can't get to pass
>>> 100% of the time.  This might be because it is extremely difficult to write
>>> the test as such and deemed not worth the effort, or it is a condition that
>>> is just not going to present itself successfully 100% of the time.
>>>
>> IMO if it's not worth the effort to write the test correctly, we should
>> delete the test.  Flaky is useful as a temporary status, but if nobody ends
>> up fixing the flakiness, I think the test should be deleted (more reasons
>> follow).
>>
>>
>>
>>> These are tests we still want to exercise, but we don't want to have
>>> them start generating test failures if they don't pass 100% of the time.
>>> Currently the flaky test mechanism requires a test to pass one in two
>>> times.  That is okay for a test that exhibits a slim degree of flakiness.
>>> For others, that is not a large enough sample of runs to elicit a
>>> successful result.  Those tests get marked as XFAIL, and generate a
>>> non-actionable "unexpected success" result when they do happen to pass.
>>>
>>> GOAL
>>>
>>> * Enhance expectedFlakey* test decorators.  Allow specification of the
>>> number of times in which a flaky test should be run to be expected to pass
>>> at least once.  Call that MAX_RUNS.
>>>
>> I think it's worth considering it it's a good idea include the date at
>> which they were declared flakey.  After a certain amount of time has
>> passed, if it's still flakey they can be relegated to hard failures.  I
>> don't think flakey should be a permanent state.
>>
>>
>>>
>>> * When running a flaky test, run it up MAX_RUNS number of times.  The
>>> first time it passes, mark it as a successful test completion.  The test
>>> event system will be given the number of times it was run before passing.
>>> Whether we consume this info or not is TBD (and falls into the purview of
>>> the test results formatter).
>>>
>>
>>> * If the test does not pass within MAX_RUNS time, mark it as a flaky
>>> fail.  For purposes of the standard output, this can look like FAIL:
>>> (flaky) or something similar so fail scanners still see it.  (Note it's
>>> highly likely I'll do the normal output counts with the TestResults
>>> formatter-based output at the same time, so we get accurate test method
>>> counts and the like).
>>>
>> The concern I have here (and the reason I would like to delete flakey
>> tests if the flakiness isn't remo

Re: [lldb-dev] proposal for reworked flaky test category

2015-10-19 Thread Todd Fiala via lldb-dev
Nope, I have no issue with what you said.  We don't want to run them over
here at all because we don't see enough useful info come out of them.  You
need time series data for that to be somewhat useful, and even then it only
is useful if you see a sharp change in it after a specific change.

So I really don't want to be running flaky tests at all as their signals
are not useful on a per-run basis.

On Mon, Oct 19, 2015 at 4:16 PM, Zachary Turner  wrote:

> Don't get me wrong, I like the idea of running flakey tests a couple of
> times and seeing if one passes (Chromium does this too as well, so it's not
> without precedent).  If I sounded harsh, it's because I *want* to be harsh
> on flaky tests.  Flaky tests indicate literally the *worst* kind of bugs
> because you don't even know what kind of problems they're causing in the
> wild, so by increasing the amount of pain they cause people (test suite
> running longer, etc) the hope is that it will motivate someone to fix it.
>
> On Mon, Oct 19, 2015 at 4:04 PM Todd Fiala  wrote:
>
>> Okay, so I'm not a fan of the flaky tests myself, nor of test suites
>> taking longer to run than needed.
>>
>> Enrico is going to add a new 'flakey' category to the test categorization.
>>
>> Scratch all the other complexity I offered up.  What we're going to ask
>> is if a test is flakey, please add it to the 'flakey' category.  We won't
>> do anything different with the category by default, so everyone will still
>> get flakey tests running the same manner they do now.  However, on our test
>> runners, we will be disabling the category entirely using the
>> skipCategories mechanism since those are generating too much noise.
>>
>> We may need to add a per-test-method category mechanism since right now
>> our only mechanism to add categories (1) specify a dot-file to the
>> directory to have everything in it get tagged with a category, or (2)
>> override the categorization for the TestCase getCategories() mechanism.
>>
>> -Todd
>>
>> On Mon, Oct 19, 2015 at 1:03 PM, Zachary Turner 
>> wrote:
>>
>>>
>>>
>>> On Mon, Oct 19, 2015 at 12:50 PM Todd Fiala via lldb-dev <
>>> lldb-dev@lists.llvm.org> wrote:
>>>
 Hi all,

 I'd like unexpected successes (i.e. tests marked as unexpected failure
 that in fact pass) to retain the actionable meaning that something is
 wrong.  The wrong part is that either (1) the test now passes consistently
 and the author of the fix just missed updating the test definition (or
 perhaps was unaware of the test), or (2) the test is not covering the
 condition it is testing completely, and some change to the code just
 happened to make the test pass (due to the test being not comprehensive
 enough).  Either of those requires some sort of adjustment by the
 developers.

>>> I'dd add #3.  The test is actually flaky but is tagged incorrectly.
>>>
>>>

 We have a category of test known as "flaky" or "flakey" (both are valid
 spellings, for those who care:
 http://www.merriam-webster.com/dictionary/flaky, although flaky is
 considered the primary).  Flaky tests are tests that we can't get to pass
 100% of the time.  This might be because it is extremely difficult to write
 the test as such and deemed not worth the effort, or it is a condition that
 is just not going to present itself successfully 100% of the time.

>>> IMO if it's not worth the effort to write the test correctly, we should
>>> delete the test.  Flaky is useful as a temporary status, but if nobody ends
>>> up fixing the flakiness, I think the test should be deleted (more reasons
>>> follow).
>>>
>>>
>>>
 These are tests we still want to exercise, but we don't want to have
 them start generating test failures if they don't pass 100% of the time.
 Currently the flaky test mechanism requires a test to pass one in two
 times.  That is okay for a test that exhibits a slim degree of flakiness.
 For others, that is not a large enough sample of runs to elicit a
 successful result.  Those tests get marked as XFAIL, and generate a
 non-actionable "unexpected success" result when they do happen to pass.

 GOAL

 * Enhance expectedFlakey* test decorators.  Allow specification of the
 number of times in which a flaky test should be run to be expected to pass
 at least once.  Call that MAX_RUNS.

>>> I think it's worth considering it it's a good idea include the date at
>>> which they were declared flakey.  After a certain amount of time has
>>> passed, if it's still flakey they can be relegated to hard failures.  I
>>> don't think flakey should be a permanent state.
>>>
>>>

 * When running a flaky test, run it up MAX_RUNS number of times.  The
 first time it passes, mark it as a successful test completion.  The test
 event system will be given the number of times it was run before passing.
 Whether we consume this info or not is TBD (and falls into the 

Re: [lldb-dev] proposal for reworked flaky test category

2015-10-19 Thread Todd Fiala via lldb-dev
My initial proposal was an attempt to not entirely skip running them on our
end and still get them to generate actionable signals without conflating
them with unexpected successes (which they absolutely are not in a semantic
way).

On Mon, Oct 19, 2015 at 4:33 PM, Todd Fiala  wrote:

> Nope, I have no issue with what you said.  We don't want to run them over
> here at all because we don't see enough useful info come out of them.  You
> need time series data for that to be somewhat useful, and even then it only
> is useful if you see a sharp change in it after a specific change.
>
> So I really don't want to be running flaky tests at all as their signals
> are not useful on a per-run basis.
>
> On Mon, Oct 19, 2015 at 4:16 PM, Zachary Turner 
> wrote:
>
>> Don't get me wrong, I like the idea of running flakey tests a couple of
>> times and seeing if one passes (Chromium does this too as well, so it's not
>> without precedent).  If I sounded harsh, it's because I *want* to be harsh
>> on flaky tests.  Flaky tests indicate literally the *worst* kind of bugs
>> because you don't even know what kind of problems they're causing in the
>> wild, so by increasing the amount of pain they cause people (test suite
>> running longer, etc) the hope is that it will motivate someone to fix it.
>>
>> On Mon, Oct 19, 2015 at 4:04 PM Todd Fiala  wrote:
>>
>>> Okay, so I'm not a fan of the flaky tests myself, nor of test suites
>>> taking longer to run than needed.
>>>
>>> Enrico is going to add a new 'flakey' category to the test
>>> categorization.
>>>
>>> Scratch all the other complexity I offered up.  What we're going to ask
>>> is if a test is flakey, please add it to the 'flakey' category.  We won't
>>> do anything different with the category by default, so everyone will still
>>> get flakey tests running the same manner they do now.  However, on our test
>>> runners, we will be disabling the category entirely using the
>>> skipCategories mechanism since those are generating too much noise.
>>>
>>> We may need to add a per-test-method category mechanism since right now
>>> our only mechanism to add categories (1) specify a dot-file to the
>>> directory to have everything in it get tagged with a category, or (2)
>>> override the categorization for the TestCase getCategories() mechanism.
>>>
>>> -Todd
>>>
>>> On Mon, Oct 19, 2015 at 1:03 PM, Zachary Turner 
>>> wrote:
>>>


 On Mon, Oct 19, 2015 at 12:50 PM Todd Fiala via lldb-dev <
 lldb-dev@lists.llvm.org> wrote:

> Hi all,
>
> I'd like unexpected successes (i.e. tests marked as unexpected failure
> that in fact pass) to retain the actionable meaning that something is
> wrong.  The wrong part is that either (1) the test now passes consistently
> and the author of the fix just missed updating the test definition (or
> perhaps was unaware of the test), or (2) the test is not covering the
> condition it is testing completely, and some change to the code just
> happened to make the test pass (due to the test being not comprehensive
> enough).  Either of those requires some sort of adjustment by the
> developers.
>
 I'dd add #3.  The test is actually flaky but is tagged incorrectly.


>
> We have a category of test known as "flaky" or "flakey" (both are
> valid spellings, for those who care:
> http://www.merriam-webster.com/dictionary/flaky, although flaky is
> considered the primary).  Flaky tests are tests that we can't get to pass
> 100% of the time.  This might be because it is extremely difficult to 
> write
> the test as such and deemed not worth the effort, or it is a condition 
> that
> is just not going to present itself successfully 100% of the time.
>
 IMO if it's not worth the effort to write the test correctly, we should
 delete the test.  Flaky is useful as a temporary status, but if nobody ends
 up fixing the flakiness, I think the test should be deleted (more reasons
 follow).



> These are tests we still want to exercise, but we don't want to have
> them start generating test failures if they don't pass 100% of the time.
> Currently the flaky test mechanism requires a test to pass one in two
> times.  That is okay for a test that exhibits a slim degree of flakiness.
> For others, that is not a large enough sample of runs to elicit a
> successful result.  Those tests get marked as XFAIL, and generate a
> non-actionable "unexpected success" result when they do happen to pass.
>
> GOAL
>
> * Enhance expectedFlakey* test decorators.  Allow specification of the
> number of times in which a flaky test should be run to be expected to pass
> at least once.  Call that MAX_RUNS.
>
 I think it's worth considering it it's a good idea include the date at
 which they were declared flakey.  After a certain amount of time has
 passed, if it's still flakey t

Re: [lldb-dev] proposal for reworked flaky test category

2015-10-19 Thread Zachary Turner via lldb-dev
Yea, I definitely agree with you there.

Is this going to end up with an @expectedFlakeyWindows,
@expectedFlakeyLinux, @expectedFlakeyDarwin, @expectedFlakeyAndroid,
@expectedFlakeyFreeBSD?

It's starting to get a little crazy, at some point I think we just need
something that we can use like this:

@test_status(status=flaky, host=[win, linux, android, darwin, bsd],
target=[win, linux, android, darwin, bsd], compiler=[gcc, clang],
debug_info=[dsym, dwarf, dwo])

On Mon, Oct 19, 2015 at 4:35 PM Todd Fiala  wrote:

> My initial proposal was an attempt to not entirely skip running them on
> our end and still get them to generate actionable signals without
> conflating them with unexpected successes (which they absolutely are not in
> a semantic way).
>
> On Mon, Oct 19, 2015 at 4:33 PM, Todd Fiala  wrote:
>
>> Nope, I have no issue with what you said.  We don't want to run them over
>> here at all because we don't see enough useful info come out of them.  You
>> need time series data for that to be somewhat useful, and even then it only
>> is useful if you see a sharp change in it after a specific change.
>>
>> So I really don't want to be running flaky tests at all as their signals
>> are not useful on a per-run basis.
>>
>> On Mon, Oct 19, 2015 at 4:16 PM, Zachary Turner 
>> wrote:
>>
>>> Don't get me wrong, I like the idea of running flakey tests a couple of
>>> times and seeing if one passes (Chromium does this too as well, so it's not
>>> without precedent).  If I sounded harsh, it's because I *want* to be harsh
>>> on flaky tests.  Flaky tests indicate literally the *worst* kind of bugs
>>> because you don't even know what kind of problems they're causing in the
>>> wild, so by increasing the amount of pain they cause people (test suite
>>> running longer, etc) the hope is that it will motivate someone to fix it.
>>>
>>> On Mon, Oct 19, 2015 at 4:04 PM Todd Fiala  wrote:
>>>
 Okay, so I'm not a fan of the flaky tests myself, nor of test suites
 taking longer to run than needed.

 Enrico is going to add a new 'flakey' category to the test
 categorization.

 Scratch all the other complexity I offered up.  What we're going to ask
 is if a test is flakey, please add it to the 'flakey' category.  We won't
 do anything different with the category by default, so everyone will still
 get flakey tests running the same manner they do now.  However, on our test
 runners, we will be disabling the category entirely using the
 skipCategories mechanism since those are generating too much noise.

 We may need to add a per-test-method category mechanism since right now
 our only mechanism to add categories (1) specify a dot-file to the
 directory to have everything in it get tagged with a category, or (2)
 override the categorization for the TestCase getCategories() mechanism.

 -Todd

 On Mon, Oct 19, 2015 at 1:03 PM, Zachary Turner 
 wrote:

>
>
> On Mon, Oct 19, 2015 at 12:50 PM Todd Fiala via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> Hi all,
>>
>> I'd like unexpected successes (i.e. tests marked as unexpected
>> failure that in fact pass) to retain the actionable meaning that 
>> something
>> is wrong.  The wrong part is that either (1) the test now passes
>> consistently and the author of the fix just missed updating the test
>> definition (or perhaps was unaware of the test), or (2) the test is not
>> covering the condition it is testing completely, and some change to the
>> code just happened to make the test pass (due to the test being not
>> comprehensive enough).  Either of those requires some sort of adjustment 
>> by
>> the developers.
>>
> I'dd add #3.  The test is actually flaky but is tagged incorrectly.
>
>
>>
>> We have a category of test known as "flaky" or "flakey" (both are
>> valid spellings, for those who care:
>> http://www.merriam-webster.com/dictionary/flaky, although flaky is
>> considered the primary).  Flaky tests are tests that we can't get to pass
>> 100% of the time.  This might be because it is extremely difficult to 
>> write
>> the test as such and deemed not worth the effort, or it is a condition 
>> that
>> is just not going to present itself successfully 100% of the time.
>>
> IMO if it's not worth the effort to write the test correctly, we
> should delete the test.  Flaky is useful as a temporary status, but if
> nobody ends up fixing the flakiness, I think the test should be deleted
> (more reasons follow).
>
>
>
>> These are tests we still want to exercise, but we don't want to have
>> them start generating test failures if they don't pass 100% of the time.
>> Currently the flaky test mechanism requires a test to pass one in two
>> times.  That is okay for a test that exhibits a slim degree of flakiness.
>

[lldb-dev] llvm assertion while evaluating expressions for MIPS on Linux

2015-10-19 Thread Bhushan Attarde via lldb-dev
Hi,

I am facing issue (llvm assertion) in evaluating expressions for MIPS on Linux.

(lldb) p fooptr(a,b)
lldb: /home/battarde/git/llvm/lib/MC/ELFObjectWriter.cpp:791: void 
{anonymous}::ELFObjectWriter::computeSymbolTable(llvm::MCAssembler&, const 
llvm::MCAsmLayout&, const SectionIndexMapTy&, const RevGroupMapTy&, 
{anonymous}::ELFObjectWriter::SectionOffsetsTy&): Assertion `Local || 
!Symbol.isTemporary()' failed.

I debugged it and found that, LLDB inserts calls to dynamic checker function 
for pointer validation at appropriate locations in expression's IR.

The issue is that this checker function's name (hard-coded in LLDB in 
lldb\source\Expression\IRDynamicChecks.cpp) starts with "$" i.e 
"$__lldb_valid_pointer_check".
While creating a MCSymbol (MCContext::createSymbol() in 
llvm/lib/MC/MCContext.cpp) for this function llvm detects the name starts with 
"$" and marks that symbol as 'temporary' symbol (PrivateGlobalPrefix is '$' for 
MIPS)
Further while computing a symbol table in ELFObjectWriter::computeSymbolTable() 
the assertion triggers because this symbol is 'temporary'.

I tried couple of things that solves this issue for MIPS.

1. Remove '$' from the function name.
2. Remove "C Language linkage" from the dynamic pointer validation function i.e 
the below piece of code in lldb\source\Expression\IRDynamicChecks.cpp
-
static const char g_valid_pointer_check_text[] =
"extern \"C\" void\n"
"$__lldb_valid_pointer_check (unsigned char *$__lldb_arg_ptr)\n"
"{\n"
"unsigned char $__lldb_local_val = *$__lldb_arg_ptr;\n"
"}";
--

becomes


static const char g_valid_pointer_check_text[] =
"void\n"
"$__lldb_valid_pointer_check (unsigned char *$__lldb_arg_ptr)\n"
"{\n"
"unsigned char $__lldb_local_val = *$__lldb_arg_ptr;\n"
"}";


Removing C Language linkage will enable mangling and will mangle 
"$__lldb_valid_pointer_check" to something like 
"_Z27$__lldb_valid_pointer_checkPh".
So the mangled name won't start with '$' and the symbol will not be marked as 
Temporary and hence assertion won't be triggered.

Please let me know if there is any better solution to this issue.

Regards,
Bhushan
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev