monircefalo opened a new issue, #43774:
URL: https://github.com/apache/arrow/issues/43774

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I'm working with PySpark and trying to use a pandas_udf on my macOS system 
with an M3 chip. My environment is Python 3.10 running from a virtual 
environment. The code runs fine until I introduce the pandas_udf, at which 
point it crashes with an EOFError.
   
   Here is a simplified version of my code:
   
   ```
   from pyspark.sql import SparkSession
   from typing import Iterator
   import pandas as pd
   from pyspark.sql.functions import pandas_udf, col
   
   # Create a Spark session
   spark = (SparkSession.builder.appName("InferenceValidation")
            .config("spark.driver.memory", "8g")
            .config("spark.executor.memory", "8g")
            .config("spark.sql.execution.arrow.pyspark.enabled", "false")
            .config("spark.sql.execution.arrow.pyspark.fallback.enabled", 
"true").getOrCreate())
   
   pdf = pd.DataFrame([1, 2, 3], columns=["x"])
   df = spark.createDataFrame(pdf)
   df.show()
   
   @pandas_udf("long")
   def plus_one(batch_iter: Iterator[pd.Series]) -> Iterator[pd.Series]:
       for x in batch_iter:
           yield x + 1
   
   df.select(plus_one(col("x"))).show()
   ```
   
   The first df.show() works successfully, but when I try to use the 
pandas_udf, I get the following error:
   
   ```
   24/08/21 10:32:52 ERROR ArrowPythonRunner: Python worker exited unexpectedly 
(crashed)
   org.apache.spark.api.python.PythonException: Traceback (most recent call 
last):
     File 
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py",
 line 1225, in main
       eval_type = read_int(infile)
     File 
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py",
 line 596, in read_int
       raise EOFError
   EOFError
   
           at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:572)
           at 
org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:118)
           at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
           at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
 Source)
           at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at 
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
           at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
           at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
           at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
           at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
           at org.apache.spark.scheduler.Task.run(Task.scala:141)
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
           at java.base/java.lang.Thread.run(Thread.java:1570)
   Caused by: java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
java.nio.DirectByteBuffer.<init>(long, int) not available
           at 
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
           at 
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
           at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
           at 
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
           at 
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
           at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
           at 
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
   24/08/21 10:32:52 ERROR ArrowPythonRunner: This may have been caused by a 
prior exception:
   java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
java.nio.DirectByteBuffer.<init>(long, int) not available
           at 
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
           at 
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
           at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
           at 
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
           at 
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
           at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
           at 
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
   24/08/21 10:32:52 ERROR Executor: Exception in task 2.0 in stage 4.0 (TID 14)
   java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
java.nio.DirectByteBuffer.<init>(long, int) not available
           at 
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
           at 
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
           at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
           at 
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
           at 
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
           at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
           at 
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
   24/08/21 10:32:52 WARN TaskSetManager: Lost task 2.0 in stage 4.0 (TID 14) 
(192.168.101.56 executor driver): java.lang.UnsupportedOperationException: 
sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
           at 
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
           at 
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
           at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
           at 
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
           at 
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
           at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
           at 
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
   
   24/08/21 10:32:52 ERROR TaskSetManager: Task 2 in stage 4.0 failed 1 times; 
aborting job
   Traceback (most recent call last):
     File "/Users/monir/Documents/work/image-inference/pandas_udf_3.py", line 
24, in <module>
       df.select(plus_one(col("x"))).show()
     File 
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/pyspark/sql/dataframe.py",
 line 947, in show
       print(self._show_string(n, truncate, vertical))
     File 
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/pyspark/sql/dataframe.py",
 line 965, in _show_string
       return self._jdf.showString(n, 20, vertical)
     File 
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/py4j/java_gateway.py",
 line 1322, in __call__
       return_value = get_return_value(
     File 
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/pyspark/errors/exceptions/captured.py",
 line 179, in deco
       return f(*a, **kw)
     File 
"/Users/monir/Documents/work/image-inference/.venv/lib/python3.10/site-packages/py4j/protocol.py",
 line 326, in get_return_value
       raise Py4JJavaError(
   py4j.protocol.Py4JJavaError: An error occurred while calling o74.showString.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 
in stage 4.0 failed 1 times, most recent failure: Lost task 2.0 in stage 4.0 
(TID 14) (192.168.101.56 executor driver): 
java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
java.nio.DirectByteBuffer.<init>(long, int) not available
           at 
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
           at 
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
           at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
           at 
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
           at 
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
           at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
           at 
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
   
   Driver stacktrace:
           at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2856)
           at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2792)
           at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2791)
           at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
           at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
           at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2791)
           at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247)
           at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247)
           at scala.Option.foreach(Option.scala:407)
           at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3060)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2994)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2983)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
           at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:989)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2393)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2414)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2433)
           at 
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:530)
           at 
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:483)
           at 
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:61)
           at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:4334)
           at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:3316)
           at 
org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4324)
           at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
           at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4322)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
           at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4322)
           at org.apache.spark.sql.Dataset.head(Dataset.scala:3316)
           at org.apache.spark.sql.Dataset.take(Dataset.scala:3539)
           at org.apache.spark.sql.Dataset.getRows(Dataset.scala:280)
           at org.apache.spark.sql.Dataset.showString(Dataset.scala:315)
           at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
           at java.base/java.lang.reflect.Method.invoke(Method.java:580)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
           at py4j.Gateway.invoke(Gateway.java:282)
           at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
           at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
           at java.base/java.lang.Thread.run(Thread.java:1570)
   Caused by: java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
java.nio.DirectByteBuffer.<init>(long, int) not available
           at 
org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
           at 
org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
           at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
           at 
org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
           at 
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:147)
           at 
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:133)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream(PythonArrowInput.scala:140)
           at 
org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeIteratorToArrowStream$(PythonArrowInput.scala:124)
           at 
org.apache.spark.sql.execution.python.ArrowPythonRunner.writeIteratorToArrowStream(ArrowPythonRunner.scala:30)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.$anonfun$writeIteratorToStream$1(PythonArrowInput.scala:96)
           at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
           at 
org.apache.spark.sql.execution.python.PythonArrowInput$$anon$1.writeIteratorToStream(PythonArrowInput.scala:102)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:451)
           at 
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1928)
           at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:282)
   
   ```
   
   I've already tried setting spark.sql.execution.arrow.pyspark.enabled to 
false and spark.sql.execution.arrow.pyspark.fallback.enabled to true, but the 
issue persists.
   
   Has anyone encountered a similar problem, or can anyone suggest a solution 
to resolve this crash?
   
   System Information: Running on a virtual env with python 3.10.
   OS: macOS-14.6.1-arm64-arm-64bit
   ```
   numpy==1.26.4
   pandas==2.2.2
   py4j==0.10.9.7
   pyarrow==17.0.0
   pyspark==3.5.2
   python-dateutil==2.9.0.post0
   pytz==2024.1
   six==1.16.0
   tzdata==2024.1
   ```
   
   Java version:
   ```
   openjdk 22.0.2 2024-07-16
   OpenJDK Runtime Environment Homebrew (build 22.0.2)
   OpenJDK 64-Bit Server VM Homebrew (build 22.0.2, mixed mode, sharing)
   ```
   
   Note: All the codes here working fine in ubuntu 22.04 with same package 
configuration.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to