This is an automated email from the ASF dual-hosted git repository.
ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 735eda8a5b6b [SPARK-54874][TESTS][INFRA] Avoid interleave failed test
logs with test outputs
735eda8a5b6b is described below
commit 735eda8a5b6b9394686ec97a586e6aa25d0d8771
Author: Tian Gao <[email protected]>
AuthorDate: Sun Jan 4 08:33:54 2026 +0800
[SPARK-54874][TESTS][INFRA] Avoid interleave failed test logs with test
outputs
### What changes were proposed in this pull request?
1. `FAILURE_REPORTING_LOCK` is only for the unified logger file, we don't
need that for `per_test_output`
2. Use `LOGGER` instead of `print` to print data because `LOGGER` has an
internal lock to avoid interleave
3. Put all the lines together and print it once to avoid interleave
### Why are the changes needed?
We have a thread pool to run different individual tests and the test output
is interleaved with error messages.
https://github.com/apache/spark/actions/runs/20594052053/job/59144974177
It's difficult to tell which test the debugging message belongs to.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Locally it works
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #53648 from gaogaotiantian/avoid-interleave-logs.
Authored-by: Tian Gao <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
python/run-tests.py | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/python/run-tests.py b/python/run-tests.py
index c7348ec34e86..b3522a13df4a 100755
--- a/python/run-tests.py
+++ b/python/run-tests.py
@@ -301,16 +301,21 @@ def run_individual_python_test(target_dir, test_name,
pyspark_python, keep_test_
# Exit on the first failure but exclude the code 5 for no test ran, see
SPARK-46801.
if retcode != 0 and retcode != 5:
try:
+ per_test_output.seek(0)
with FAILURE_REPORTING_LOCK:
with open(LOG_FILE, 'ab') as log_file:
- per_test_output.seek(0)
log_file.writelines(per_test_output)
- per_test_output.seek(0)
- for line in per_test_output:
- decoded_line = line.decode("utf-8", "replace")
- if not re.match('[0-9]+', decoded_line):
- print(decoded_line, end='')
- per_test_output.close()
+
+ # We don't want the logging lines interleave with the test output,
so we read the
+ # full file and output with LOGGER which has internal locking.
+ per_test_output.seek(0)
+ lines = []
+ for line in per_test_output:
+ line = line.decode("utf-8", "replace")
+ if not re.match('[0-9]+', line):
+ lines.append(line)
+ LOGGER.error(f"{test_name} with {pyspark_python}
failed:\n{''.join(lines)}")
+ per_test_output.close()
except BaseException:
LOGGER.exception("Got an exception while trying to print failed
test output")
finally:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]