This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-4.0 by this push:
     new 25bee87585fe [SPARK-51758][SS][FOLLOWUP][TESTS] Fix flaky test around 
watermark due to additional batch causing empty df
25bee87585fe is described below

commit 25bee87585fe0d2490c5fa5e0a51a32db25a8a3b
Author: Anish Shrigondekar <anish.shrigonde...@databricks.com>
AuthorDate: Thu Apr 17 14:33:56 2025 +0900

    [SPARK-51758][SS][FOLLOWUP][TESTS] Fix flaky test around watermark due to 
additional batch causing empty df
    
    ### What changes were proposed in this pull request?
    Fix flaky test around watermark due to additional batch causing empty df
    
    ### Why are the changes needed?
    Without the changes, sometimes we would see the test fail because the extra 
batch with empty df case was not accounted for.
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    Test only change
    
    ```
    Will test the following Python tests: 
['pyspark.sql.tests.pandas.test_pandas_transform_with_state 
TransformWithStateInPandasTests.test_transform_with_state_with_wmark_and_non_event_time']
    python3.9 python_implementation is CPython
    python3.9 version is: Python 3.9.20
    Starting test(python3.9): 
pyspark.sql.tests.pandas.test_pandas_transform_with_state 
TransformWithStateInPandasTests.test_transform_with_state_with_wmark_and_non_event_time
 (temp output: 
/Users/anish.shrigondekar/spark/spark/python/target/ad6814cf-2fcb-4f6c-a6c3-d63ce94013d4/python3.9__pyspark.sql.tests.pandas.test_pandas_transform_with_state_TransformWithStateInPandasTests.test_transform_with_state_with_wmark_and_non_event_time__7nb2vczb.log)
    Finished test(python3.9): 
pyspark.sql.tests.pandas.test_pandas_transform_with_state 
TransformWithStateInPandasTests.test_transform_with_state_with_wmark_and_non_event_time
 (40s)
    Tests passed in 40 seconds
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #50615 from anishshri-db/task/SPARK-51758-test.
    
    Authored-by: Anish Shrigondekar <anish.shrigonde...@databricks.com>
    Signed-off-by: Jungtaek Lim <kabhwan.opensou...@gmail.com>
    (cherry picked from commit 2b78cec410cafad994ca3a2a3204c1b17fec86fb)
    Signed-off-by: Jungtaek Lim <kabhwan.opensou...@gmail.com>
---
 python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py 
b/python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py
index 44ea90ab6659..ed343085854d 100644
--- a/python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py
+++ b/python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py
@@ -641,11 +641,14 @@ class TransformWithStateInPandasTestsMixin:
                 assert set(batch_df.sort("id").collect()) == {
                     Row(id="a", timestamp="4"),
                 }
-            else:
+            elif batch_id == 2:
                 # watermark for late event = 10 and min event = 2 with no 
filtering
                 assert set(batch_df.sort("id").collect()) == {
                     Row(id="a", timestamp="2"),
                 }
+            else:
+                for q in self.spark.streams.active:
+                    q.stop()
 
         self._test_transform_with_state_in_pandas_event_time(
             MinEventTimeStatefulProcessor(), check_results, "None"


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to