This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new a0ccdf27e5ff [SPARK-47824][PS] Fix nondeterminism in
pyspark.pandas.series.asof
a0ccdf27e5ff is described below
commit a0ccdf27e5ff30817b8f058f08f98d5b44bad2db
Author: Mark Jarvin <[email protected]>
AuthorDate: Fri Apr 12 09:37:19 2024 +0900
[SPARK-47824][PS] Fix nondeterminism in pyspark.pandas.series.asof
### What changes were proposed in this pull request?
Use the monotonically ID as a sorting condition for `max_by` instead of a
literal string.
### Why are the changes needed?
https://github.com/apache/spark/pull/35191 had a error where the literal
string `"__monotonically_increasing_id__"` was used as the tie-breaker in
`max_by` instead of the actual ID.
### Does this PR introduce _any_ user-facing change?
Fixes nondeterminism in `asof`
### How was this patch tested?
In some circumstances
`//python:pyspark.pandas.tests.connect.series.test_parity_as_of` is sufficient
to reproduce
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #46018 from markj-db/SPARK-47824.
Authored-by: Mark Jarvin <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/pandas/series.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py
index 98818a368a9f..8edc2c531b51 100644
--- a/python/pyspark/pandas/series.py
+++ b/python/pyspark/pandas/series.py
@@ -5870,7 +5870,7 @@ class Series(Frame, IndexOpsMixin, Generic[T]):
# then return monotonically_increasing_id. This will let
max by
# to return last index value, which is the behaviour of
pandas
else spark_column.isNotNull(),
- monotonically_increasing_id_column,
+ F.col(monotonically_increasing_id_column),
),
)
for index in where
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]