monircefalo opened a new issue, #43774:
URL: https://github.com/apache/arrow/issues/43774
### Describe the bug, including details regarding any error messages,
version, and platform.
I'm working with PySpark and trying to use a pandas_udf on my macOS system
with an M3 chip. My environment is Python 3.10 running from a virtual
environment. The code runs fine until I introduce the pandas_udf, at which
point it crashes with an EOFError.
Here is a simplified version of my code:
```
from pyspark.sql import SparkSession
from typing import Iterator
import pandas as pd
from pyspark.sql.functions import pandas_udf, col
# Create a Spark session
spark = (SparkSession.builder.appName("InferenceValidation")
.config("spark.driver.memory", "8g")
.config("spark.executor.memory", "8g")
.config("spark.sql.execution.arrow.pyspark.enabled", "false")
.config("spark.sql.execution.arrow.pyspark.fallback.enabled",
"true").getOrCreate())
pdf = pd.DataFrame([1, 2, 3], columns=["x"])
df = spark.createDataFrame(pdf)
df.show()
@pandas_udf("long")
def plus_one(batch_iter: Iterator[pd.Series]) -> Iterator[pd.Series]:
for x in batch_iter:
yield x + 1
df.select(plus_one(col("x"))).show()
```
The first df.show() works successfully, but when I try to use the
pandas_udf, I get the following error:
```
24/08/21 10:32:52 ERROR ArrowPythonRunner: Python worker exited unexpectedly
(crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
File
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py",
line 1225, in main
eval_type = read_int(infile)
File
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py",
line 596, in read_int
raise EOFError
EOFError
at
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:572)
at
org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:118)
at
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.UnsupportedOperationException: sun.misc.Unsafe or
java.nio.DirectByteBuffer.<init>(long, int) not available
at
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
at
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
at
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
24/08/21 10:32:52 ERROR ArrowPythonRunner: This may have been caused by a
prior exception:
java.lang.UnsupportedOperationException: sun.misc.Unsafe or
java.nio.DirectByteBuffer.<init>(long, int) not available
at
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
at
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
at
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
24/08/21 10:32:52 ERROR Executor: Exception in task 2.0 in stage 4.0 (TID 14)
java.lang.UnsupportedOperationException: sun.misc.Unsafe or
java.nio.DirectByteBuffer.<init>(long, int) not available
at
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
at
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
at
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
24/08/21 10:32:52 WARN TaskSetManager: Lost task 2.0 in stage 4.0 (TID 14)
(192.168.101.56 executor driver): java.lang.UnsupportedOperationException:
sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
at
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
at
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
at
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
24/08/21 10:32:52 ERROR TaskSetManager: Task 2 in stage 4.0 failed 1 times;
aborting job
Traceback (most recent call last):
File "/Users/monir/Documents/work/image-inference/pandas_udf_3.py", line
24, in <module>
df.select(plus_one(col("x"))).show()
File
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/pyspark/sql/dataframe.py",
line 947, in show
print(self._show_string(n, truncate, vertical))
File
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/pyspark/sql/dataframe.py",
line 965, in _show_string
return self._jdf.showString(n, 20, vertical)
File
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/py4j/java_gateway.py",
line 1322, in __call__
return_value = get_return_value(
File
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/pyspark/errors/exceptions/captured.py",
line 179, in deco
return f(*a, **kw)
File
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/py4j/protocol.py",
line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o74.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2
in stage 4.0 failed 1 times, most recent failure: Lost task 2.0 in stage 4.0
(TID 14) (192.168.101.56 executor driver):
java.lang.UnsupportedOperationException: sun.misc.Unsafe or
java.nio.DirectByteBuffer.<init>(long, int) not available
at
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
at
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
at
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2856)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2792)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2791)
at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2791)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247)
at scala.Option.foreach(Option.scala:407)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3060)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2994)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2983)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:989)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2393)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2414)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2433)
at
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:530)
at
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:483)
at
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:61)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:4334)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:3316)
at
org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4324)
at
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
at
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4322)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4322)
at org.apache.spark.sql.Dataset.head(Dataset.scala:3316)
at org.apache.spark.sql.Dataset.take(Dataset.scala:3539)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:280)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:315)
at
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.UnsupportedOperationException: sun.misc.Unsafe or
java.nio.DirectByteBuffer.<init>(long, int) not available
at
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
at
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
at
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
at
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
at
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
```
I've already tried setting spark.sql.execution.arrow.pyspark.enabled to
false and spark.sql.execution.arrow.pyspark.fallback.enabled to true, but the
issue persists.
Has anyone encountered a similar problem, or can anyone suggest a solution
to resolve this crash?
System Information: Running on a virtual env with python 3.10.
OS: macOS-14.6.1-arm64-arm-64bit
```
numpy==1.26.4
pandas==2.2.2
py4j==0.10.9.7
pyarrow==17.0.0
pyspark==3.5.2
python-dateutil==2.9.0.post0
pytz==2024.1
six==1.16.0
tzdata==2024.1
```
Java version:
```
openjdk 22.0.2 2024-07-16
OpenJDK Runtime Environment Homebrew (build 22.0.2)
OpenJDK 64-Bit Server VM Homebrew (build 22.0.2, mixed mode, sharing)
```
Note: All the codes here working fine in ubuntu 22.04 with same package
configuration.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]