This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 20af8bdfb907 [SPARK-54787][PS] Use list comprehension in pandas
_bool_column_labels
20af8bdfb907 is described below
commit 20af8bdfb9073be5711cea5df0c5ce0ff168e92c
Author: Devin Petersohn <[email protected]>
AuthorDate: Sun Dec 21 14:50:40 2025 +0900
[SPARK-54787][PS] Use list comprehension in pandas _bool_column_labels
### What changes were proposed in this pull request?
Use list comprehension in the pandas.DataFrame method _bool_column_labels.
This will modestly improve memory and performance, but also reduces code to a
single line.
### Why are the changes needed?
For mantainability and performance
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #53550 from devin-petersohn/devin/pandas_maintain_01.
Authored-by: Devin Petersohn <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/pandas/frame.py | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index 0ec7ee60bb5b..7f0a516d5963 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -11268,15 +11268,9 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
"""
Filter column labels of boolean columns (without None).
"""
- bool_column_labels = []
- for label in column_labels:
- psser = self._psser_for(label)
- if is_bool_dtype(psser):
- # Rely on dtype rather than spark type because
- # columns that consist of bools and Nones should be excluded
- # if bool_only is True
- bool_column_labels.append(label)
- return bool_column_labels
+ # Rely on dtype rather than spark type because columns that consist of
bools and
+ # Nones should be excluded if bool_only is True
+ return [label for label in column_labels if
is_bool_dtype(self._psser_for(label))]
def _result_aggregated(
self, column_labels: List[Label], scols: Sequence[PySparkColumn]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]