asfimport opened a new issue, #256:
URL: https://github.com/apache/arrow-java/issues/256
While running pyspark3 with pandas 1.1.5 and pyarrow 2.0.0 getting the below
error:
**Spark Code:**
```java
import pyarrow
import pandas as pd
df = pd.DataFrame({'col1' : [1,2,3], 'col2': [4,5,6]})
df_sp = spark.createDataFrame(df)
df_sp.cache().count()
schema = df_sp.schema
def dummy_udf(data):
return data
res = df_sp.groupby('col1').applyInPandas(dummy_udf, schema=schema)
print(res.cache().count())
print(res.toPandas())
```
**Exception:**
```java
21/09/17 07:28:10 ERROR util.Utils: Uncaught exception in thread stdout
writer for python3
java.lang.NoSuchMethodError:
com.google.flatbuffers.FlatBufferBuilder.createString(Ljava/lang/CharSequence;)I
at org.apache.arrow.vector.types.pojo.Field.getField(Field.java:204)
at org.apache.arrow.vector.types.pojo.Schema.getSchema(Schema.java:178)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.serializeMetadata(MessageSerializer.java:187)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:165)
at
org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:159)
at org.apache.arrow.vector.ipc.ArrowWriter.start(ArrowWriter.java:112)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.$anonfun$writeIteratorToStream$1(ArrowPythonRunner.scala:86)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.writeIteratorToStream(ArrowPythonRunner.scala:103)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
21/09/17 07:28:10 ERROR util.SparkUncaughtExceptionHandler: Uncaught
exception in thread Thread[stdout writer for python3,5,main]
java.lang.NoSuchMethodError:
com.google.flatbuffers.FlatBufferBuilder.createString(Ljava/lang/CharSequence;)I
at org.apache.arrow.vector.types.pojo.Field.getField(Field.java:204)
at org.apache.arrow.vector.types.pojo.Schema.getSchema(Schema.java:178)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.serializeMetadata(MessageSerializer.java:187)
at
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:165)
at
org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:159)
at org.apache.arrow.vector.ipc.ArrowWriter.start(ArrowWriter.java:112)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.$anonfun$writeIteratorToStream$1(ArrowPythonRunner.scala:86)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.writeIteratorToStream(ArrowPythonRunner.scala:103)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
at
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
21/09/17 07:28:10 WARN storage.BlockManager: Putting block rdd_25_69 failed
due to exception org.apache.spark.SparkException: Python worker exited
unexpectedly (crashed).
21/09/17 07:28:10 INFO memory.MemoryStore: MemoryStore cleared
21/09/17 07:28:10 INFO storage.BlockManager: BlockManager stopped
21/09/17 07:28:10 INFO util.ShutdownHookManager: Shutdown hook called
```
**Reporter**: [Ranga
Reddy](https://issues.apache.org/jira/browse/ARROW-14038)
<sub>**Note**: *This issue was originally created as
[ARROW-14038](https://issues.apache.org/jira/browse/ARROW-14038). Please see
the [migration documentation](https://github.com/apache/arrow/issues/14542) for
further details.*</sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]