raulcd opened a new issue, #44986: URL: https://github.com/apache/arrow/issues/44986
### Describe the bug, including details regarding any error messages, version, and platform. As seen here: https://github.com/apache/arrow/pull/44981#issuecomment-2529087381 When I tried to run "pyspark.sql.tests.arrow.test_arrow_grouped_map" and "pyspark.sql.tests.arrow.test_arrow_cogrouped_map" they fail due to missing pandas: ``` Traceback (most recent call last): File "/spark/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py", line 264, in test_self_join df2 = df.groupby("k").applyInArrow(arrow_func, schema="x long, y long") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/spark/python/pyspark/sql/pandas/group_ops.py", line 809, in applyInArrow udf = pandas_udf( ^^^^^^^^^^^ File "/spark/python/pyspark/sql/pandas/functions.py", line 372, in pandas_udf require_minimum_pandas_version() File "/spark/python/pyspark/sql/pandas/utils.py", line 43, in require_minimum_pandas_version raise PySparkImportError( pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] Pandas >= 2.0.0 must be installed; however, it was not found. ``` Those tests were never executed in the past but might be worth to include them on the job as they are arrow related. ### Component(s) Continuous Integration, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org