(spark) branch master updated: [SPARK-55624][PS][TESTS][FOLLOW-UP] Fix `_ignore_arrow_dtypes` util for DataFrame

ueshin Mon, 23 Feb 2026 17:17:47 -0800

This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new c3a97cf3d4c5 [SPARK-55624][PS][TESTS][FOLLOW-UP] Fix 
`_ignore_arrow_dtypes` util for DataFrame
c3a97cf3d4c5 is described below

commit c3a97cf3d4c534af427c9ec75bf05e825deabf11
Author: Takuya Ueshin <[email protected]>
AuthorDate: Mon Feb 23 17:17:27 2026 -0800

    [SPARK-55624][PS][TESTS][FOLLOW-UP] Fix `_ignore_arrow_dtypes` util for 
DataFrame
    
    ### What changes were proposed in this pull request?
    
    This is a follow-up of apache/spark#54412.
    
    Fix `_ignore_arrow_dtypes` util for DataFrame.
    
    ### Why are the changes needed?
    
    There is a bug in `_ignore_arrow_dtypes` for DataFrame case.
    
    `obj.columns` will return a list of strings, instead of a list of `Series`.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    The existing tests should pass.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #54430 from ueshin/issues/SPARK-55624/fix.
    
    Authored-by: Takuya Ueshin <[email protected]>
    Signed-off-by: Takuya Ueshin <[email protected]>
---
 python/pyspark/testing/pandasutils.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/testing/pandasutils.py 
b/python/pyspark/testing/pandasutils.py
index ff5bfa8d0d48..3c529e524a2d 100644
--- a/python/pyspark/testing/pandasutils.py
+++ b/python/pyspark/testing/pandasutils.py
@@ -498,13 +498,13 @@ class PandasOnSparkTestUtils:
         else:
             if isinstance(obj, pd.DataFrame):
                 arrow_boolean_columns = [
-                    col
-                    for col in obj.columns
+                    name
+                    for name, col in obj.items()
                     if isinstance(col.dtype, pd.ArrowDtype)
                     and col.dtype.pyarrow_dtype == pa.bool_()
                 ]
                 if arrow_boolean_columns:
-                    return obj.astype({col: "boolean" for col in 
arrow_boolean_columns})
+                    return obj.astype({name: "boolean" for name in 
arrow_boolean_columns})
             elif isinstance(obj, (pd.Series, pd.Index)):
                 if isinstance(obj.dtype, pd.ArrowDtype) and 
obj.dtype.pyarrow_dtype == pa.bool_():
                     return obj.astype("boolean")


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-55624][PS][TESTS][FOLLOW-UP] Fix `_ignore_arrow_dtypes` util for DataFrame

Reply via email to