jimingham wrote:
> On Sep 20, 2023, at 1:39 PM, Jim Ingham ***@***.***> wrote: > > Interesting... I split the end stage of the test up into: > > (a) wait for up to 20 seconds the state to switch to eStateExited > (b) wait another second, then check the command output file > > On macOS, as I said, that seemed to make the test quite robust, but when I > checked the test in I got a bot failure at stage (a). That is, a Linux bot failure... Jim > That really sounds like the underlying feature of interrupting a process in > this state doesn't work reliably on those systems. The patch this is testing > just changed a bug in generic code that prevented us from TRYING to do this > interrupt, but didn't change any of the underlying process interrupt > machinery. So it looks like the test is uncovering underlying flakiness. > > I changed the test to be macOS only for now. We should go back and make the > feature work reliably on the other platforms when we get a change. > > Jim > > >> On Sep 20, 2023, at 1:14 PM, Jim Ingham ***@***.***> wrote: >> >> You are right, there is a racy bit here - I was also getting ~one fails per >> 50 runs. >> >> The raciness is due to the time it takes between when we send the interrupt >> and when we finish up the command and produce the results. So we can end up >> reading from the command output before the command has written its result. >> >> I can make this more deterministic by waiting for the process state to shift >> to eStateExited. But even with that it was still a little flakey because >> there's still some time between when the interrupt is noticed and causes us >> to switch the process state and when lldb makes the command return and >> prints that to stdout. >> >> Unfortunately this is a test of command-line behavior, you don't get the >> same problem with the SB API's directly. So I have to do it as a "run a >> command-line command" test, and that makes it harder to get direct signals >> about when to look at the result. >> >> If we sent "command completed" events I could wait for that, then the test >> would be deterministic. That seems like a feature we should design, not jam >> in to fix a test problem.. >> >> However, if I just insert a time.sleep(1) between seeing the state flip to >> eStateExited and looking at the results, I can run this #200 and see no >> failures. That's just the time to back out of the command execution stack >> and write the result to stdout, so that shouldn't be as variable. If this >> still ends up failing intermittently, then I'll have to cook up some kind of >> "command completed" event that the test can wait on. >> >> Jim >> >> >> >>> On Sep 20, 2023, at 11:05 AM, Jim Ingham ***@***.***> wrote: >>> >>> The way the test works, we run one real process and let it exit. Then we >>> do "attach -w -n noone_would_use_this_name" because we don't want the >>> second attach attempt to be able to succeed. lldb should just stay stuck >>> till the interrupt succeeds, there shouldn't be anything other way to get >>> out of this. It sounds like something else is kicking us out of the attach >>> wait loop on these systems? >>> >>> Jim >>> >>> >>> >>>> On Sep 20, 2023, at 2:56 AM, David Spickett ***@***.***> wrote: >>>> >>>> >>>> @DavidSpickett commented on this pull request. >>>> >>>> In lldb/test/API/commands/process/attach/TestProcessAttach.py >>>> <https://github.com/llvm/llvm-project/pull/65822#discussion_r1331380724>: >>>> >>>> > + time.sleep(1) >>>> + if target.process.state == lldb.eStateAttaching: >>>> + break >>>> + >>>> + self.dbg.DispatchInputInterrupt() >>>> + self.dbg.DispatchInputInterrupt() >>>> + >>>> + self.out_filehandle.flush() >>>> + reader = open(self.stdout_path, "r") >>>> + results = reader.readlines() >>>> + found_result = False >>>> + for line in results: >>>> + if "Cancelled async attach" in line: >>>> + found_result = True >>>> + break >>>> + self.assertTrue(found_result, "Found async error in results") >>>> This fails if results is empty. >>>> >>>> ******************************** >>>> [] >>>> ******************************** >>>> FAIL: LLDB (/home/david.spickett/build-llvm-aarch64/bin/clang-aarch64) :: >>>> test_run_then_attach_wait_interrupt >>>> (TestProcessAttach.ProcessAttachTestCase) >>>> <bound method SBProcess.Kill of SBProcess: pid = 0, state = exited, >>>> threads = 0, executable = ProcessAttach>: success >>>> >>>> Restore dir to: /home/david.spickett/build-llvm-aarch64 >>>> ====================================================================== >>>> FAIL: test_run_then_attach_wait_interrupt >>>> (TestProcessAttach.ProcessAttachTestCase) >>>> ---------------------------------------------------------------------- >>>> Traceback (most recent call last): >>>> File >>>> "/home/david.spickett/llvm-project/lldb/test/API/commands/process/attach/TestProcessAttach.py", >>>> line 195, in test_run_then_attach_wait_interrupt >>>> self.assertTrue(found_result, "Found async error in results") >>>> AssertionError: False is not True : Found async error in results >>>> Config=aarch64-/home/david.spickett/build-llvm-aarch64/bin/clang >>>> ---------------------------------------------------------------------- >>>> When the test works the results contain: >>>> >>>> ******************************** >>>> ['error: attach failed: Cancelled async attach.\n', '\n', '... >>>> Interrupted.\n'] >>>> ******************************** >>>> Running it in a loop it took ~40 runs to get a failing one. >>>> >>>> I wonder if that is because the attach happens to finish a lot faster >>>> sometimes, so there's no time to cancel it? If that's down to the OS and >>>> machine load, I'm not sure how we'd make this predictable. >>>> >>>> The ugly option is to say if the results are empty, pass the test and >>>> assume the other 39 runs will check for the bug. >>>> >>>> — >>>> Reply to this email directly, view it on GitHub >>>> <https://github.com/llvm/llvm-project/pull/65822#pullrequestreview-1635242071>, >>>> or unsubscribe >>>> <https://github.com/notifications/unsubscribe-auth/ADUPVW225PO46OIV2JRC2I3X3K4WPANCNFSM6AAAAAA4RAWCQY>. >>>> You are receiving this because you modified the open/close state. >>>> >>> >> > https://github.com/llvm/llvm-project/pull/65822 _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits