Re: [lldb-dev] Python object lifetimes affect the reliability of tests

Adrian McCarthy via lldb-dev Thu, 15 Oct 2015 15:45:09 -0700

Thanks everyone.

To close the loop, we've address this problem with this patch:
http://reviews.llvm.org/rL250467




On Thu, Oct 15, 2015 at 11:49 AM, Zachary Turner <ztur...@google.com> wrote:

> it doesn't appear to be a bug in my implementation.  So it seems like
> everyone has been experiencing this problem, but Windows it just made a big
> difference because of the mandatory file locks.  From the Python
> dcoumentation <https://docs.python.org/2/library/sys.html>,
>
> This function returns a tuple of three values that give information about
> the exception that is currently being handled. The information returned is
> specific both to the current thread and to the current stack frame. If the
> current stack frame is not handling an exception, the information is taken
> from the calling stack frame, or its caller, and so on until a stack frame
> is found that is handling an exception. *Here, “handling an exception” is
> defined as “executing or having executed an except clause.”* For any
> stack frame, only information about the most recently handled exception is
> accessible.
>
> So within a given stack frame, if an exception has been handled before
> you, then every frame and every variable reachable from that frame still
> has a strong reference to it.
>
> So it sounds like our solution of moving this exception handler into a
> function or calling sys.exc_clear() is the right fix.
>
> On Thu, Oct 15, 2015 at 11:43 AM Zachary Turner <ztur...@google.com>
> wrote:
>
>> It's not that simple.  A test method could be holding onto arbitrary
>> resources which could in theory prevent cleanup.  tests can register their
>> own cleanup handlers for example, and the teardown will call back into the
>> cleanup handler.  So one of those handlers could be making the assumption
>> that it can clean something up even though it can't because the test method
>> is still holding onto some resources.
>>
>> I want to find out if this is a bug in my Python implementation, because
>> it seems quite strange to me that sys.exc_info(), which is documented to
>> return information about the exception *currently being processed*, still
>> returns something after it is finished being processed.
>>
>> On Thu, Oct 15, 2015 at 11:36 AM Ryan Brown <rib...@google.com> wrote:
>>
>>> Couldn't we just change DeleteTarget to make sure everything is unmapped?
>>>
>>> On Thu, Oct 15, 2015 at 11:34 AM Zachary Turner via lldb-dev <
>>> lldb-dev@lists.llvm.org> wrote:
>>>
>>>> To add more evidence for this, here's a small repro:
>>>>
>>>> import sys
>>>>
>>>> print "sys.exc_info() = ", "Empty" if sys.exc_info() == (None, None,
>>>> None) else "Valid"
>>>> try:
>>>>     raise Exception
>>>> except Exception, e:
>>>>     print "sys.exc_info() = ", "Empty" if sys.exc_info() == (None,
>>>> None, None) else "Valid"
>>>>     pass
>>>>
>>>> print "sys.exc_info() = ", "Empty" if sys.exc_info() == (None, None,
>>>> None) else "Valid"
>>>> print "e = ", "Bound" if 'e' in vars() else "Unbound"
>>>> pass
>>>>
>>>> For me this prints
>>>> sys.exc_info() =  Empty
>>>> sys.exc_info() =  Valid
>>>> sys.exc_info() =  Valid
>>>> e =  Bound
>>>>
>>>> On Thu, Oct 15, 2015 at 11:21 AM Zachary Turner <ztur...@google.com>
>>>> wrote:
>>>>
>>>>> We actually do already to the self.dbg.DeleteTarget(target), and
>>>>> that's the line that's failing.  The reason it's failing is because the
>>>>> 'sc' reference is still alive, which is holding an mmap, which causes a
>>>>> mandatory file lock on Windows.
>>>>>
>>>>> The diagnostics went pretty deep into python internals, but I think we
>>>>> might have figured it out.  I don't know if this is a bug in Python, but I
>>>>> think we'd probably need to ask Guido to be sure :)
>>>>>
>>>>> As far as we can tell, what happens is that on the exceptional
>>>>> codepath (e.g the assert fails), you walk back up the stack until you get
>>>>> to the except handler.  This exception handler is in TestCase.run().  
>>>>> After
>>>>> it handles the exception it goes and runs teardown.  However, for some
>>>>> reason, Python is still holding a strong reference to the *traceback*, 
>>>>> even
>>>>> though we're completely out of the finally block.  What this means is that
>>>>> if you call `sys.exc_info()` *even after you've exited the finally block,
>>>>> it still returns info about the previous exception that's not even being
>>>>> handled anymore.  I would have expected this to be gone since there's no
>>>>> exception in-fligth anymore.  So basically, Python is still holding a
>>>>> reference to the active exception, the exception holds the stack frame, 
>>>>> the
>>>>> stack frame holds the test method, the test method has locals, one of 
>>>>> which
>>>>> is a SymbolList, a member of which is symbol context, which has the file
>>>>> locked.
>>>>>
>>>>> Our best guess is that if you have something like this:
>>>>>
>>>>> def foo():
>>>>>     try:
>>>>>        # Do stuff
>>>>>     except Exception, e:
>>>>>        pass
>>>>>     # Do more stuff
>>>>>
>>>>> that if the exceptional path is executed, then both e and
>>>>> sys.exc_info() are alive *while* do more stuff is happening.  We've found
>>>>> two ways to fixthis:
>>>>>
>>>>> 1) Change to this:
>>>>> def foo():
>>>>>     try:
>>>>>        # Do stuff
>>>>>     except Exception, e:
>>>>>        pass
>>>>>     del e
>>>>>     sys.exc_clear()
>>>>>     # Do more stuff
>>>>>
>>>>> 2) Put the try / except inside a function.  When the function returns,
>>>>> sys.exc_info() is cleared.
>>>>>
>>>>> I like 2 better, but we're still testing some more to make sure this
>>>>> really fixes it 100% of the time.
>>>>>
>>>>> On Thu, Oct 15, 2015 at 10:25 AM Greg Clayton via lldb-dev <
>>>>> lldb-dev@lists.llvm.org> wrote:
>>>>>
>>>>>>
>>>>>> > On Oct 15, 2015, at 8:50 AM, Adrian McCarthy via lldb-dev <
>>>>>> lldb-dev@lists.llvm.org> wrote:
>>>>>> >
>>>>>> > I've tracked down a source of flakiness in tests on Windows to
>>>>>> Python object lifetimes and the SB interface, and I'm wondering how best 
>>>>>> to
>>>>>> handle it.
>>>>>> >
>>>>>> > Consider this portion of a test from TestTargetAPI:
>>>>>> >
>>>>>> >  def find_functions(self, exe_name):
>>>>>> >      """Exercise SBTaget.FindFunctions() API."""
>>>>>> >      exe = os.path.join(os.getcwd(), exe_name)
>>>>>> >
>>>>>> >      # Create a target by the debugger.
>>>>>> >      target = self.dbg.CreateTarget(exe)
>>>>>> >      self.assertTrue(target, VALID_TARGET)
>>>>>> >      list = target.FindFunctions('c', lldb.eFunctionNameTypeAuto)
>>>>>> >      self.assertTrue(list.GetSize() == 1)
>>>>>> >
>>>>>> >      for sc in list:
>>>>>> >          self.assertTrue(sc.GetModule().GetFileSpec().GetFilename()
>>>>>> == exe_name)
>>>>>> >          self.assertTrue(sc.GetSymbol().GetName() == 'c')
>>>>>> >
>>>>>> > The local variables go out of scope when the function exits, but
>>>>>> the SB (C++) objects they represent aren't (always) immediately 
>>>>>> destroyed.
>>>>>> At least some of these objects keep references to the executable module 
>>>>>> in
>>>>>> the shared module list, so when the test framework cleans up and calls
>>>>>> `SBDebugger::DeleteTarget`, the module isn't orphaned, so LLDB maintains 
>>>>>> an
>>>>>> open handle to the executable.
>>>>>>
>>>>>> Creating a target with:
>>>>>>
>>>>>>         target = self.dbg.CreateTarget(exe)
>>>>>>
>>>>>> Will give you a SBTarget object that has a strong reference to the
>>>>>> target, but the debugger still has a copy in its target list, so the
>>>>>> SBTarget isn't designed to delete the object when the target variable 
>>>>>> goes
>>>>>> out of scope. If you want the target to be deleted, you actually have to
>>>>>> call through to the debugger with:
>>>>>>
>>>>>>
>>>>>>  bool
>>>>>>  SBDebugger:DeleteTarget (lldb::SBTarget &target);
>>>>>>
>>>>>>
>>>>>> So the right way to clean up the target is:
>>>>>>
>>>>>>  self.dbg.DeleteTarget(target);
>>>>>>
>>>>>> Even though there might be code within LLDB that has a valid shared
>>>>>> pointer to the lldb_private::Target still, it calls
>>>>>> lldb_private::Target::Destroy() which clears out most instance variable
>>>>>> (the module list, the process, any plug-ins, etc).
>>>>>>
>>>>>> SBTarget objects have strong references so that they _can_ keep the
>>>>>> object alive if needed in case someone else destroys the target on 
>>>>>> another
>>>>>> thread, but they don't control the lifetime of the target.
>>>>>>
>>>>>> Other objects have weak references to the objects: SBProcess,
>>>>>> SBThread, SBFrame. If the objects are actually destroyed already, the 
>>>>>> weak
>>>>>> pointer won't be able to get a valid shared pointer to the underlying 
>>>>>> object
>>>>>> and any SB API calls on these objects will return error, none, zero,
>>>>>> etc...
>>>>>>
>>>>>> >
>>>>>> > The result of the lingering handle is that, when the next test case
>>>>>> in the test suite tries to re-build the executable, it fails because the
>>>>>> file is not writable.  (This is problematic on Windows because the file
>>>>>> system works differently in this regard than Unix derivatives.)  Every
>>>>>> subsequent case in the test suite fails.
>>>>>> >
>>>>>> > I managed to make the test work reliably by rewriting it like this:
>>>>>> >
>>>>>> >  def find_functions(self, exe_name):
>>>>>> >      """Exercise SBTaget.FindFunctions() API."""
>>>>>> >      exe = os.path.join(os.getcwd(), exe_name)
>>>>>> >
>>>>>> >      # Create a target by the debugger.
>>>>>> >      target = self.dbg.CreateTarget(exe)
>>>>>> >      self.assertTrue(target, VALID_TARGET)
>>>>>> >
>>>>>> >      try:
>>>>>> >          list = target.FindFunctions('c',
>>>>>> lldb.eFunctionNameTypeAuto)
>>>>>> >          self.assertTrue(list.GetSize() == 1)
>>>>>> >
>>>>>> >          for sc in list:
>>>>>> >              try:
>>>>>> >
>>>>>> self.assertTrue(sc.GetModule().GetFileSpec().GetFilename() == exe_name)
>>>>>> >                  self.assertTrue(sc.GetSymbol().GetName() == 'c')
>>>>>> >              finally:
>>>>>> >                  del sc
>>>>>> >
>>>>>> >      finally:
>>>>>> >          del list
>>>>>> >
>>>>>> > The finally blocks ensure that the corresponding C++ objects are
>>>>>> destroyed, even if the function exits as a result of a Python exception
>>>>>> (e.g., if one of the assertion expressions is false and the code throws 
>>>>>> an
>>>>>> exception).  Since the objects are destroyed, the reference counts are 
>>>>>> back
>>>>>> to where they should be, and the orphaned module is closed when the 
>>>>>> target
>>>>>> is deleted.
>>>>>> >
>>>>>> > But this is ugly and maintaining it would be error prone.  Is there
>>>>>> a better way to address this?
>>>>>>
>>>>>> So you should be able to fix this by deleting the target with
>>>>>> "self.dbg.DeleteTarget(target)"
>>>>>>
>>>>>> We could change all tests over to always store any targets they
>>>>>> create in the test object itself:
>>>>>>
>>>>>> self.target = self.dbg.CreateTarget(exe)
>>>>>>
>>>>>> Then the test suite could check for the existance of "self.target"
>>>>>> and if it exists, it could call "self.dbg.DeleteTarget(self.target)"
>>>>>> automatically to avoid such issues?
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> lldb-dev mailing list
>>>>>> lldb-dev@lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>>
>>>>> _______________________________________________
>>>> lldb-dev mailing list
>>>> lldb-dev@lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>
>>>

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] Python object lifetimes affect the reliability of tests

Reply via email to