raulcd opened a new issue, #44986:
URL: https://github.com/apache/arrow/issues/44986

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   As seen here:
   https://github.com/apache/arrow/pull/44981#issuecomment-2529087381
   When I tried to run "pyspark.sql.tests.arrow.test_arrow_grouped_map" and 
"pyspark.sql.tests.arrow.test_arrow_cogrouped_map" they fail due to missing 
pandas:
   ```
    Traceback (most recent call last):
     File "/spark/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py", 
line 264, in test_self_join
       df2 = df.groupby("k").applyInArrow(arrow_func, schema="x long, y long")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/spark/python/pyspark/sql/pandas/group_ops.py", line 809, in 
applyInArrow
       udf = pandas_udf(
             ^^^^^^^^^^^
     File "/spark/python/pyspark/sql/pandas/functions.py", line 372, in 
pandas_udf
       require_minimum_pandas_version()
     File "/spark/python/pyspark/sql/pandas/utils.py", line 43, in 
require_minimum_pandas_version
       raise PySparkImportError(
   pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] 
Pandas >= 2.0.0 must be installed; however, it was not found.
   ```
   Those tests were never executed in the past but might be worth to include 
them on the job as they are arrow related.
   
   ### Component(s)
   
   Continuous Integration, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to