Re: Solr on HDFS: Streaming API performance tuning

Chetas Joshi Sat, 17 Dec 2016 11:46:01 -0800

Here is the stack trace.

java.lang.NullPointerException


        at
org.apache.solr.client.solrj.io.comp.FieldComparator$2.compare(FieldComparator.java:85)

        at
org.apache.solr.client.solrj.io.comp.FieldComparator.compare(FieldComparator.java:92)

        at
org.apache.solr.client.solrj.io.comp.FieldComparator.compare(FieldComparator.java:30)

        at
org.apache.solr.client.solrj.io.comp.MultiComp.compare(MultiComp.java:45)

        at
org.apache.solr.client.solrj.io.comp.MultiComp.compare(MultiComp.java:33)

        at
org.apache.solr.client.solrj.io.stream.CloudSolrStream$TupleWrapper.compareTo(CloudSolrStream.java:396)

        at
org.apache.solr.client.solrj.io.stream.CloudSolrStream$TupleWrapper.compareTo(CloudSolrStream.java:381)

        at java.util.TreeMap.put(TreeMap.java:560)

        at java.util.TreeSet.add(TreeSet.java:255)

        at
org.apache.solr.client.solrj.io.stream.CloudSolrStream._read(CloudSolrStream.java:366)

        at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.read(CloudSolrStream.java:353)

        at

*.*.*.*.SolrStreamResultIterator$$anon$1.run(SolrStreamResultIterator.scala:101)

        at java.lang.Thread.run(Thread.java:745)

16/11/17 13:04:31 *ERROR* SolrStreamResultIterator:missing exponent number:
char=A,position=106596
BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA'
AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj'

org.noggit.JSONParser$ParseException: missing exponent number:
char=A,position=106596
BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA'
AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj'

        at org.noggit.JSONParser.err(JSONParser.java:356)

        at org.noggit.JSONParser.readExp(JSONParser.java:513)

        at org.noggit.JSONParser.readNumber(JSONParser.java:419)

        at org.noggit.JSONParser.next(JSONParser.java:845)

        at org.noggit.JSONParser.nextEvent(JSONParser.java:951)

        at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:127)

        at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:57)

        at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37)

        at
org.apache.solr.client.solrj.io.stream.JSONTupleStream.next(JSONTupleStream.java:84)

        at
org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:147)

        at
org.apache.solr.client.solrj.io.stream.CloudSolrStream$TupleWrapper.next(CloudSolrStream.java:413)

        at
org.apache.solr.client.solrj.io.stream.CloudSolrStream._read(CloudSolrStream.java:365)

        at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.read(CloudSolrStream.java:353)


Thanks!

On Fri, Dec 16, 2016 at 11:45 PM, Reth RM <reth.ik...@gmail.com> wrote:

> If you could provide the json parse exception stack trace, it might help to
> predict issue there.
>
>
> On Fri, Dec 16, 2016 at 5:52 PM, Chetas Joshi <chetas.jo...@gmail.com>
> wrote:
>
> > Hi Joel,
> >
> > The only NON alpha-numeric characters I have in my data are '+' and '/'.
> I
> > don't have any backslashes.
> >
> > If the special characters was the issue, I should get the JSON parsing
> > exceptions every time irrespective of the index size and irrespective of
> > the available memory on the machine. That is not the case here. The
> > streaming API successfully returns all the documents when the index size
> is
> > small and fits in the available memory. That's the reason I am confused.
> >
> > Thanks!
> >
> > On Fri, Dec 16, 2016 at 5:43 PM, Joel Bernstein <joels...@gmail.com>
> > wrote:
> >
> > > The Streaming API may have been throwing exceptions because the JSON
> > > special characters were not escaped. This was fixed in Solr 6.0.
> > >
> > >
> > >
> > >
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi <chetas.jo...@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I am running Solr 5.5.0.
> > > > It is a solrCloud of 50 nodes and I have the following config for all
> > the
> > > > collections.
> > > > maxShardsperNode: 1
> > > > replicationFactor: 1
> > > >
> > > > I was using Streaming API to get back results from Solr. It worked
> fine
> > > for
> > > > a while until the index data size reached beyond 40 GB per shard
> (i.e.
> > > per
> > > > node). It started throwing JSON parsing exceptions while reading the
> > > > TupleStream data. FYI: I have other services (Yarn, Spark) deployed
> on
> > > the
> > > > same boxes on which Solr shards are running. Spark jobs also use a
> lot
> > of
> > > > disk cache. So, the free available disk cache on the boxes vary a
> > > > lot depending upon what else is running on the box.
> > > >
> > > > Due to this issue, I moved to using the cursor approach and it works
> > fine
> > > > but as we all know it is way slower than the streaming approach.
> > > >
> > > > Currently the index size per shard is 80GB (The machine has 512 GB of
> > RAM
> > > > and being used by different services/programs: heap/off-heap and the
> > disk
> > > > cache requirements).
> > > >
> > > > When I have enough RAM (more than 80 GB so that all the index data
> > could
> > > > fit in memory) available on the machine, the streaming API succeeds
> > > without
> > > > running into any exceptions.
> > > >
> > > > Question:
> > > > How different the index data caching mechanism (for HDFS) is for the
> > > > Streaming API from the cursorMark approach?
> > > > Why cursor works every time but streaming works only when there is a
> > lot
> > > of
> > > > free disk cache?
> > > >
> > > > Thank you.
> > > >
> > >
> >
>

Re: Solr on HDFS: Streaming API performance tuning

Reply via email to