Hi,

Is there any reason that Tuple object
<https://lucene.apache.org/solr/6_4_2/solr-solrj/org/apache/solr/client/solrj/io/Tuple.html>
does
not implement Serializable like SolrDocumentBase which does implement
<https://lucene.apache.org/solr/6_4_2/solr-solrj/org/apache/solr/common/SolrDocumentBase.html>
Serializable?

In spark-solr <https://github.com/LucidWorks/spark-solr> library, I want to
return an RDD of Tuple objects but it fails because the Tuple class does
not implement Serializable

2017-03-22 01:45:51,230 [Executor task launch worker-0] ERROR Executor  -
> Exception in task 0.0 in stage 0.0 (TID 0)
> java.io.NotSerializableException: org.apache.solr.client.solrj.io.Tuple
> Serialization stack:
>     - object not serializable (class:
> org.apache.solr.client.solrj.io.Tuple, value:
> org.apache.solr.client.solrj.io.Tuple@365e4da1)
>     - element of array (index: 0)
>     - array (class [Lorg.apache.solr.client.solrj.io.Tuple;, size 10)
>     at
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
>     at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
>     at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
>     at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:324)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)


To get past this error, we need to implement Serializable for Tuple object.
Is there a reason not to do that?

We are working past this error by doing conversions from Tuple object to
other objects but it would be ideal (in terms of performance) if we can
just deal with Tuple objects directly in Spark world.

Thanks,
-- 
Kiran Chitturi

Reply via email to