This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new a43f93d76c9d [SPARK-46728][PYTHON] Check Pandas installation properly
a43f93d76c9d is described below

commit a43f93d76c9d0e12cf8c79419e55abd4601a1fe4
Author: Haejoon Lee <[email protected]>
AuthorDate: Wed Jan 24 10:33:05 2024 +0900

    [SPARK-46728][PYTHON] Check Pandas installation properly
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to check Pandas installation properly
    
    ### Why are the changes needed?
    
    Checking Pandas installation is not working correctly, but raising improper 
exception when Pandas is not installed.
    
    This issue occurs because the deleted Pandas was not actually deleted 
completely when related extension is installed (e.g. `pandas-stubs`).
    
    ### Does this PR introduce _any_ user-facing change?
    
    No API change, but user-facing error message is now showing proper error 
message to guide:
    
    **Before**
    ```python
    >>> import pyspark.pandas
    AttributeError: module 'pandas' has no attribute '__version__'
    ```
    
    **After**
    ```python
    >>> import pyspark.pandas
    pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] 
Pandas >= 1.4.4 must be installed; however, it was not found.
    ```
    
    ### How was this patch tested?
    
    Manually tested
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #44745 from itholic/pandas_check.
    
    Authored-by: Haejoon Lee <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/sql/pandas/utils.py | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/pandas/utils.py 
b/python/pyspark/sql/pandas/utils.py
index 63554c5a50ce..ff8183c61746 100644
--- a/python/pyspark/sql/pandas/utils.py
+++ b/python/pyspark/sql/pandas/utils.py
@@ -27,7 +27,15 @@ def require_minimum_pandas_version() -> None:
     try:
         import pandas
 
-        have_pandas = True
+        # Even if pandas is deleted, if the pandas extension package (e.g. 
pandas-stubs) is still
+        # installed, the pandas path will not be completely deleted.
+        # Therefore, even if the import is successful, additional check is 
required here to verify
+        # that pandas is actually installed properly.
+        if hasattr(pandas, "__version__"):
+            have_pandas = True
+        else:
+            have_pandas = False
+            raised_error = None
     except ImportError as error:
         have_pandas = False
         raised_error = error


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to