[Lldb-commits] [lldb] Fix a bug with cancelling "attach -w" after you have run a process previously (PR #65822)

via lldb-commits Wed, 20 Sep 2023 13:15:21 -0700

jimingham wrote:

You are right, there is a racy bit here - I was also getting ~one fails per 50 
runs.


The raciness is due to the time it takes between when we send the interrupt and 
when we finish up the command and produce the results.  So we can end up 
reading from the command output before the command has written its result.

I can make this more deterministic by waiting for the process state to shift to 
eStateExited.   But even with that it was still a little flakey because there's 
still some time between when the interrupt is noticed and causes us to switch 
the process state and when lldb makes the command return and prints that to 
stdout.

Unfortunately this is a test of command-line behavior, you don't get the same 
problem with the SB API's directly.  So I have to do it as a "run a 
command-line command" test, and that makes it harder to get direct signals 
about when to look at the result.

If we sent "command completed" events I could wait for that, then the test 
would be deterministic.  That seems like a feature we should design, not jam in 
to fix a test problem..  

However, if I just insert a time.sleep(1) between seeing the state flip to 
eStateExited and looking at the results, I can run this #200 and see no 
failures.  That's just the time to back out of the command execution stack and 
write the result to stdout, so that shouldn't be as variable.  If this still 
ends up failing intermittently, then I'll have to cook up some kind of "command 
completed" event that the test can wait on.

Jim
 


> On Sep 20, 2023, at 11:05 AM, Jim Ingham ***@***.***> wrote:
> 
> The way the test works, we run one real process and let it exit.  Then we do 
> "attach -w -n noone_would_use_this_name" because we don't want the second 
> attach attempt to be able to succeed.  lldb should just stay stuck till the 
> interrupt succeeds, there shouldn't be anything other way to get out of this. 
>  It sounds like something else is kicking us out of the attach wait loop on 
> these systems?
> 
> Jim
> 
> 
> 
>> On Sep 20, 2023, at 2:56 AM, David Spickett ***@***.***> wrote:
>> 
>> 
>> @DavidSpickett commented on this pull request.
>> 
>> In lldb/test/API/commands/process/attach/TestProcessAttach.py 
>> <https://github.com/llvm/llvm-project/pull/65822#discussion_r1331380724>:
>> 
>> > +            time.sleep(1)
>> +            if target.process.state == lldb.eStateAttaching:
>> +                break
>> +
>> +        self.dbg.DispatchInputInterrupt()
>> +        self.dbg.DispatchInputInterrupt()
>> +
>> +        self.out_filehandle.flush()
>> +        reader = open(self.stdout_path, "r")
>> +        results = reader.readlines()
>> +        found_result = False
>> +        for line in results:
>> +            if "Cancelled async attach" in line:
>> +                found_result = True
>> +                break
>> +        self.assertTrue(found_result, "Found async error in results")
>> This fails if results is empty.
>> 
>> ********************************
>> []
>> ********************************
>> FAIL: LLDB (/home/david.spickett/build-llvm-aarch64/bin/clang-aarch64) :: 
>> test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
>> <bound method SBProcess.Kill of SBProcess: pid = 0, state = exited, threads 
>> = 0, executable = ProcessAttach>: success
>> 
>> Restore dir to: /home/david.spickett/build-llvm-aarch64
>> ======================================================================
>> FAIL: test_run_then_attach_wait_interrupt 
>> (TestProcessAttach.ProcessAttachTestCase)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>   File 
>> "/home/david.spickett/llvm-project/lldb/test/API/commands/process/attach/TestProcessAttach.py",
>>  line 195, in test_run_then_attach_wait_interrupt
>>     self.assertTrue(found_result, "Found async error in results")
>> AssertionError: False is not True : Found async error in results
>> Config=aarch64-/home/david.spickett/build-llvm-aarch64/bin/clang
>> ----------------------------------------------------------------------
>> When the test works the results contain:
>> 
>> ********************************
>> ['error: attach failed: Cancelled async attach.\n', '\n', '... 
>> Interrupted.\n']
>> ********************************
>> Running it in a loop it took ~40 runs to get a failing one.
>> 
>> I wonder if that is because the attach happens to finish a lot faster 
>> sometimes, so there's no time to cancel it? If that's down to the OS and 
>> machine load, I'm not sure how we'd make this predictable.
>> 
>> The ugly option is to say if the results are empty, pass the test and assume 
>> the other 39 runs will check for the bug.
>> 
>> —
>> Reply to this email directly, view it on GitHub 
>> <https://github.com/llvm/llvm-project/pull/65822#pullrequestreview-1635242071>,
>>  or unsubscribe 
>> <https://github.com/notifications/unsubscribe-auth/ADUPVW225PO46OIV2JRC2I3X3K4WPANCNFSM6AAAAAA4RAWCQY>.
>> You are receiving this because you modified the open/close state.
>> 
> 



https://github.com/llvm/llvm-project/pull/65822
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [lldb] Fix a bug with cancelling "attach -w" after you have run a process previously (PR #65822)

Reply via email to