[Lldb-commits] [lldb] Fix a bug with cancelling "attach -w" after you have run a process previously (PR #65822)

via lldb-commits Wed, 20 Sep 2023 13:41:13 -0700

jimingham wrote:



> On Sep 20, 2023, at 1:39 PM, Jim Ingham ***@***.***> wrote:
> 
> Interesting...  I split the end stage of the test up into:
> 
> (a) wait for up to 20 seconds the state to switch to eStateExited
> (b) wait another second, then check the command output file
> 
> On macOS, as I said, that seemed to make the test quite robust, but when I 
> checked the test in I got a bot failure at stage (a).

That is, a Linux bot failure...

Jim


>  That really sounds like the underlying feature of interrupting a process in 
> this state doesn't work reliably on those systems.  The patch this is testing 
> just changed a bug in generic code that prevented us from TRYING to do this 
> interrupt, but didn't change any of the underlying process interrupt 
> machinery.  So it looks like the test is uncovering underlying flakiness.
> 
> I changed the test to be macOS only for now.  We should go back and make the 
> feature work reliably on the other platforms when we get a change.
> 
> Jim
> 
> 
>> On Sep 20, 2023, at 1:14 PM, Jim Ingham ***@***.***> wrote:
>> 
>> You are right, there is a racy bit here - I was also getting ~one fails per 
>> 50 runs.  
>> 
>> The raciness is due to the time it takes between when we send the interrupt 
>> and when we finish up the command and produce the results.  So we can end up 
>> reading from the command output before the command has written its result.
>> 
>> I can make this more deterministic by waiting for the process state to shift 
>> to eStateExited.   But even with that it was still a little flakey because 
>> there's still some time between when the interrupt is noticed and causes us 
>> to switch the process state and when lldb makes the command return and 
>> prints that to stdout.
>> 
>> Unfortunately this is a test of command-line behavior, you don't get the 
>> same problem with the SB API's directly.  So I have to do it as a "run a 
>> command-line command" test, and that makes it harder to get direct signals 
>> about when to look at the result.
>> 
>> If we sent "command completed" events I could wait for that, then the test 
>> would be deterministic.  That seems like a feature we should design, not jam 
>> in to fix a test problem..  
>> 
>> However, if I just insert a time.sleep(1) between seeing the state flip to 
>> eStateExited and looking at the results, I can run this #200 and see no 
>> failures.  That's just the time to back out of the command execution stack 
>> and write the result to stdout, so that shouldn't be as variable.  If this 
>> still ends up failing intermittently, then I'll have to cook up some kind of 
>> "command completed" event that the test can wait on.
>> 
>> Jim
>>  
>> 
>> 
>>> On Sep 20, 2023, at 11:05 AM, Jim Ingham ***@***.***> wrote:
>>> 
>>> The way the test works, we run one real process and let it exit.  Then we 
>>> do "attach -w -n noone_would_use_this_name" because we don't want the 
>>> second attach attempt to be able to succeed.  lldb should just stay stuck 
>>> till the interrupt succeeds, there shouldn't be anything other way to get 
>>> out of this.  It sounds like something else is kicking us out of the attach 
>>> wait loop on these systems?
>>> 
>>> Jim
>>> 
>>> 
>>> 
>>>> On Sep 20, 2023, at 2:56 AM, David Spickett ***@***.***> wrote:
>>>> 
>>>> 
>>>> @DavidSpickett commented on this pull request.
>>>> 
>>>> In lldb/test/API/commands/process/attach/TestProcessAttach.py 
>>>> <https://github.com/llvm/llvm-project/pull/65822#discussion_r1331380724>:
>>>> 
>>>> > +            time.sleep(1)
>>>> +            if target.process.state == lldb.eStateAttaching:
>>>> +                break
>>>> +
>>>> +        self.dbg.DispatchInputInterrupt()
>>>> +        self.dbg.DispatchInputInterrupt()
>>>> +
>>>> +        self.out_filehandle.flush()
>>>> +        reader = open(self.stdout_path, "r")
>>>> +        results = reader.readlines()
>>>> +        found_result = False
>>>> +        for line in results:
>>>> +            if "Cancelled async attach" in line:
>>>> +                found_result = True
>>>> +                break
>>>> +        self.assertTrue(found_result, "Found async error in results")
>>>> This fails if results is empty.
>>>> 
>>>> ********************************
>>>> []
>>>> ********************************
>>>> FAIL: LLDB (/home/david.spickett/build-llvm-aarch64/bin/clang-aarch64) :: 
>>>> test_run_then_attach_wait_interrupt 
>>>> (TestProcessAttach.ProcessAttachTestCase)
>>>> <bound method SBProcess.Kill of SBProcess: pid = 0, state = exited, 
>>>> threads = 0, executable = ProcessAttach>: success
>>>> 
>>>> Restore dir to: /home/david.spickett/build-llvm-aarch64
>>>> ======================================================================
>>>> FAIL: test_run_then_attach_wait_interrupt 
>>>> (TestProcessAttach.ProcessAttachTestCase)
>>>> ----------------------------------------------------------------------
>>>> Traceback (most recent call last):
>>>>   File 
>>>> "/home/david.spickett/llvm-project/lldb/test/API/commands/process/attach/TestProcessAttach.py",
>>>>  line 195, in test_run_then_attach_wait_interrupt
>>>>     self.assertTrue(found_result, "Found async error in results")
>>>> AssertionError: False is not True : Found async error in results
>>>> Config=aarch64-/home/david.spickett/build-llvm-aarch64/bin/clang
>>>> ----------------------------------------------------------------------
>>>> When the test works the results contain:
>>>> 
>>>> ********************************
>>>> ['error: attach failed: Cancelled async attach.\n', '\n', '... 
>>>> Interrupted.\n']
>>>> ********************************
>>>> Running it in a loop it took ~40 runs to get a failing one.
>>>> 
>>>> I wonder if that is because the attach happens to finish a lot faster 
>>>> sometimes, so there's no time to cancel it? If that's down to the OS and 
>>>> machine load, I'm not sure how we'd make this predictable.
>>>> 
>>>> The ugly option is to say if the results are empty, pass the test and 
>>>> assume the other 39 runs will check for the bug.
>>>> 
>>>> —
>>>> Reply to this email directly, view it on GitHub 
>>>> <https://github.com/llvm/llvm-project/pull/65822#pullrequestreview-1635242071>,
>>>>  or unsubscribe 
>>>> <https://github.com/notifications/unsubscribe-auth/ADUPVW225PO46OIV2JRC2I3X3K4WPANCNFSM6AAAAAA4RAWCQY>.
>>>> You are receiving this because you modified the open/close state.
>>>> 
>>> 
>> 
> 



https://github.com/llvm/llvm-project/pull/65822
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [lldb] Fix a bug with cancelling "attach -w" after you have run a process previously (PR #65822)

Reply via email to