This is an automated email from the ASF dual-hosted git repository.
ueshin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 89fc3c5c8376 [SPARK-55901][PS] Raise an error from Series.replace()
with no arguments
89fc3c5c8376 is described below
commit 89fc3c5c8376543a6c47f5b96753a138a181ed31
Author: Takuya Ueshin <[email protected]>
AuthorDate: Tue Mar 10 11:03:27 2026 -0700
[SPARK-55901][PS] Raise an error from Series.replace() with no arguments
### What changes were proposed in this pull request?
Raises an error from `Series.replace()` with no arguments.
### Why are the changes needed?
In pandas 3, `Series.replace()` doesn't allow no arguments.
For example:
```py
>>> pser = pd.Series([10, 20, 15, 30, np.nan], name="x")
```
- pandas 2
```py
>>> pser.replace()
<stdin>:1: FutureWarning: Series.replace without 'value' and with
non-dict-like 'to_replace' is deprecated and will raise in a future version.
Explicitly specify the new values instead.
0 10.0
1 20.0
2 15.0
3 30.0
4 30.0
Name: x, dtype: float64
```
- pandas 3
```py
>>> pser.replace()
Traceback (most recent call last):
...
ValueError: Series.replace must specify either 'value', a dict-like
'to_replace', or dict-like 'regex'.
```
### Does this PR introduce _any_ user-facing change?
Yes, it will behave more like pandas 3.
### How was this patch tested?
Updated the related tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #54703 from ueshin/issues/SPARK-55901/replace.
Authored-by: Takuya Ueshin <[email protected]>
Signed-off-by: Takuya Ueshin <[email protected]>
---
python/pyspark/pandas/series.py | 10 +++++++++-
python/pyspark/pandas/tests/series/test_missing_data.py | 13 ++++++++++++-
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py
index 4eddbe5ad8ad..79898c6de9dd 100644
--- a/python/pyspark/pandas/series.py
+++ b/python/pyspark/pandas/series.py
@@ -5179,7 +5179,15 @@ class Series(Frame, IndexOpsMixin, Generic[T]):
), "If 'regex' is True then 'to_replace' must be a string"
if to_replace is None:
- return self._fillna_with_method(method="ffill")
+ if LooseVersion(pd.__version__) < "3.0.0":
+ return self._fillna_with_method(method="ffill")
+ else:
+ if value is None and regex is False:
+ raise ValueError(
+ "Series.replace must specify either 'value', a
dict-like "
+ "'to_replace', or dict-like 'regex'."
+ )
+
if not isinstance(to_replace, (str, list, tuple, dict, int, float)):
raise TypeError("'to_replace' should be one of str, list, tuple,
dict, int, float")
diff --git a/python/pyspark/pandas/tests/series/test_missing_data.py
b/python/pyspark/pandas/tests/series/test_missing_data.py
index 2f90fdd63914..545b52f15b79 100644
--- a/python/pyspark/pandas/tests/series/test_missing_data.py
+++ b/python/pyspark/pandas/tests/series/test_missing_data.py
@@ -134,7 +134,18 @@ class SeriesMissingDataMixin:
pser = pd.Series([10, 20, 15, 30, np.nan], name="x")
psser = ps.from_pandas(pser)
- self.assert_eq(psser.replace(), pser.replace())
+ if LooseVersion(pd.__version__) < "3.0.0":
+ self.assert_eq(psser.replace(), pser.replace())
+ else:
+ msg = (
+ "Series.replace must specify either 'value', a dict-like "
+ "'to_replace', or dict-like 'regex'."
+ )
+ with self.assertRaisesRegex(ValueError, msg):
+ psser.replace()
+ with self.assertRaisesRegex(ValueError, msg):
+ pser.replace()
+
self.assert_eq(psser.replace({}), pser.replace({}))
self.assert_eq(psser.replace(np.nan, 45), pser.replace(np.nan, 45))
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]