Re: Solr on HDFS: Streaming API performance tuning

Joel Bernstein Mon, 19 Dec 2016 14:27:11 -0800

I took another look at the stack trace and I'm pretty sure the issue is
with NULL values in one of the sort fields. The null pointer is occurring
during the comparison of sort values. See line 85 of:
https://github.com/apache/lucene-solr/blob/branch_5_5/solr/solrj/src/java/org/apache/solr/client/solrj/io/comp/FieldComparator.java


Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Dec 19, 2016 at 4:43 PM, Chetas Joshi <chetas.jo...@gmail.com>
wrote:

> Hi Joel,
>
> I don't have any solr documents that have NULL values for the sort fields I
> use in my queries.
>
> Thanks!
>
> On Sun, Dec 18, 2016 at 12:56 PM, Joel Bernstein <joels...@gmail.com>
> wrote:
>
> > Ok, based on the stack trace I suspect one of your sort fields has NULL
> > values, which in the 5x branch could produce null pointers if a segment
> had
> > no values for a sort field. This is also fixed in the Solr 6x branch.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Sat, Dec 17, 2016 at 2:44 PM, Chetas Joshi <chetas.jo...@gmail.com>
> > wrote:
> >
> > > Here is the stack trace.
> > >
> > > java.lang.NullPointerException
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.comp.FieldComparator$2.
> > > compare(FieldComparator.java:85)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.comp.FieldComparator.
> > > compare(FieldComparator.java:92)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.comp.FieldComparator.
> > > compare(FieldComparator.java:30)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.comp.MultiComp.compare(
> > MultiComp.java:45)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.comp.MultiComp.compare(
> > MultiComp.java:33)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > > TupleWrapper.compareTo(CloudSolrStream.java:396)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > > TupleWrapper.compareTo(CloudSolrStream.java:381)
> > >
> > >         at java.util.TreeMap.put(TreeMap.java:560)
> > >
> > >         at java.util.TreeSet.add(TreeSet.java:255)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.stream.CloudSolrStream._
> > > read(CloudSolrStream.java:366)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > > read(CloudSolrStream.java:353)
> > >
> > >         at
> > >
> > > *.*.*.*.SolrStreamResultIterator$$anon$1.run(SolrStreamResultIterator.
> > > scala:101)
> > >
> > >         at java.lang.Thread.run(Thread.java:745)
> > >
> > > 16/11/17 13:04:31 *ERROR* SolrStreamResultIterator:missing exponent
> > > number:
> > > char=A,position=106596
> > > BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA'
> > > AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj'
> > >
> > > org.noggit.JSONParser$ParseException: missing exponent number:
> > > char=A,position=106596
> > > BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA'
> > > AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj'
> > >
> > >         at org.noggit.JSONParser.err(JSONParser.java:356)
> > >
> > >         at org.noggit.JSONParser.readExp(JSONParser.java:513)
> > >
> > >         at org.noggit.JSONParser.readNumber(JSONParser.java:419)
> > >
> > >         at org.noggit.JSONParser.next(JSONParser.java:845)
> > >
> > >         at org.noggit.JSONParser.nextEvent(JSONParser.java:951)
> > >
> > >         at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:127)
> > >
> > >         at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:57)
> > >
> > >         at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > > next(JSONTupleStream.java:84)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > > SolrStream.java:147)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > TupleWrapper.next(
> > > CloudSolrStream.java:413)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.stream.CloudSolrStream._
> > > read(CloudSolrStream.java:365)
> > >
> > >         at
> > > org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > > read(CloudSolrStream.java:353)
> > >
> > >
> > > Thanks!
> > >
> > > On Fri, Dec 16, 2016 at 11:45 PM, Reth RM <reth.ik...@gmail.com>
> wrote:
> > >
> > > > If you could provide the json parse exception stack trace, it might
> > help
> > > to
> > > > predict issue there.
> > > >
> > > >
> > > > On Fri, Dec 16, 2016 at 5:52 PM, Chetas Joshi <
> chetas.jo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Joel,
> > > > >
> > > > > The only NON alpha-numeric characters I have in my data are '+' and
> > > '/'.
> > > > I
> > > > > don't have any backslashes.
> > > > >
> > > > > If the special characters was the issue, I should get the JSON
> > parsing
> > > > > exceptions every time irrespective of the index size and
> irrespective
> > > of
> > > > > the available memory on the machine. That is not the case here. The
> > > > > streaming API successfully returns all the documents when the index
> > > size
> > > > is
> > > > > small and fits in the available memory. That's the reason I am
> > > confused.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > On Fri, Dec 16, 2016 at 5:43 PM, Joel Bernstein <
> joels...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > The Streaming API may have been throwing exceptions because the
> > JSON
> > > > > > special characters were not escaped. This was fixed in Solr 6.0.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Joel Bernstein
> > > > > > http://joelsolr.blogspot.com/
> > > > > >
> > > > > > On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi <
> > > chetas.jo...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I am running Solr 5.5.0.
> > > > > > > It is a solrCloud of 50 nodes and I have the following config
> for
> > > all
> > > > > the
> > > > > > > collections.
> > > > > > > maxShardsperNode: 1
> > > > > > > replicationFactor: 1
> > > > > > >
> > > > > > > I was using Streaming API to get back results from Solr. It
> > worked
> > > > fine
> > > > > > for
> > > > > > > a while until the index data size reached beyond 40 GB per
> shard
> > > > (i.e.
> > > > > > per
> > > > > > > node). It started throwing JSON parsing exceptions while
> reading
> > > the
> > > > > > > TupleStream data. FYI: I have other services (Yarn, Spark)
> > deployed
> > > > on
> > > > > > the
> > > > > > > same boxes on which Solr shards are running. Spark jobs also
> use
> > a
> > > > lot
> > > > > of
> > > > > > > disk cache. So, the free available disk cache on the boxes
> vary a
> > > > > > > lot depending upon what else is running on the box.
> > > > > > >
> > > > > > > Due to this issue, I moved to using the cursor approach and it
> > > works
> > > > > fine
> > > > > > > but as we all know it is way slower than the streaming
> approach.
> > > > > > >
> > > > > > > Currently the index size per shard is 80GB (The machine has 512
> > GB
> > > of
> > > > > RAM
> > > > > > > and being used by different services/programs: heap/off-heap
> and
> > > the
> > > > > disk
> > > > > > > cache requirements).
> > > > > > >
> > > > > > > When I have enough RAM (more than 80 GB so that all the index
> > data
> > > > > could
> > > > > > > fit in memory) available on the machine, the streaming API
> > succeeds
> > > > > > without
> > > > > > > running into any exceptions.
> > > > > > >
> > > > > > > Question:
> > > > > > > How different the index data caching mechanism (for HDFS) is
> for
> > > the
> > > > > > > Streaming API from the cursorMark approach?
> > > > > > > Why cursor works every time but streaming works only when there
> > is
> > > a
> > > > > lot
> > > > > > of
> > > > > > > free disk cache?
> > > > > > >
> > > > > > > Thank you.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Solr on HDFS: Streaming API performance tuning

Reply via email to