Here is the stack trace. java.lang.NullPointerException
at org.apache.solr.client.solrj.io.comp.FieldComparator$2.compare(FieldComparator.java:85) at org.apache.solr.client.solrj.io.comp.FieldComparator.compare(FieldComparator.java:92) at org.apache.solr.client.solrj.io.comp.FieldComparator.compare(FieldComparator.java:30) at org.apache.solr.client.solrj.io.comp.MultiComp.compare(MultiComp.java:45) at org.apache.solr.client.solrj.io.comp.MultiComp.compare(MultiComp.java:33) at org.apache.solr.client.solrj.io.stream.CloudSolrStream$TupleWrapper.compareTo(CloudSolrStream.java:396) at org.apache.solr.client.solrj.io.stream.CloudSolrStream$TupleWrapper.compareTo(CloudSolrStream.java:381) at java.util.TreeMap.put(TreeMap.java:560) at java.util.TreeSet.add(TreeSet.java:255) at org.apache.solr.client.solrj.io.stream.CloudSolrStream._read(CloudSolrStream.java:366) at org.apache.solr.client.solrj.io.stream.CloudSolrStream.read(CloudSolrStream.java:353) at *.*.*.*.SolrStreamResultIterator$$anon$1.run(SolrStreamResultIterator.scala:101) at java.lang.Thread.run(Thread.java:745) 16/11/17 13:04:31 *ERROR* SolrStreamResultIterator:missing exponent number: char=A,position=106596 BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA' AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj' org.noggit.JSONParser$ParseException: missing exponent number: char=A,position=106596 BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA' AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj' at org.noggit.JSONParser.err(JSONParser.java:356) at org.noggit.JSONParser.readExp(JSONParser.java:513) at org.noggit.JSONParser.readNumber(JSONParser.java:419) at org.noggit.JSONParser.next(JSONParser.java:845) at org.noggit.JSONParser.nextEvent(JSONParser.java:951) at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:127) at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:57) at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37) at org.apache.solr.client.solrj.io.stream.JSONTupleStream.next(JSONTupleStream.java:84) at org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:147) at org.apache.solr.client.solrj.io.stream.CloudSolrStream$TupleWrapper.next(CloudSolrStream.java:413) at org.apache.solr.client.solrj.io.stream.CloudSolrStream._read(CloudSolrStream.java:365) at org.apache.solr.client.solrj.io.stream.CloudSolrStream.read(CloudSolrStream.java:353) Thanks! On Fri, Dec 16, 2016 at 11:45 PM, Reth RM <reth.ik...@gmail.com> wrote: > If you could provide the json parse exception stack trace, it might help to > predict issue there. > > > On Fri, Dec 16, 2016 at 5:52 PM, Chetas Joshi <chetas.jo...@gmail.com> > wrote: > > > Hi Joel, > > > > The only NON alpha-numeric characters I have in my data are '+' and '/'. > I > > don't have any backslashes. > > > > If the special characters was the issue, I should get the JSON parsing > > exceptions every time irrespective of the index size and irrespective of > > the available memory on the machine. That is not the case here. The > > streaming API successfully returns all the documents when the index size > is > > small and fits in the available memory. That's the reason I am confused. > > > > Thanks! > > > > On Fri, Dec 16, 2016 at 5:43 PM, Joel Bernstein <joels...@gmail.com> > > wrote: > > > > > The Streaming API may have been throwing exceptions because the JSON > > > special characters were not escaped. This was fixed in Solr 6.0. > > > > > > > > > > > > > > > > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi <chetas.jo...@gmail.com> > > > wrote: > > > > > > > Hello, > > > > > > > > I am running Solr 5.5.0. > > > > It is a solrCloud of 50 nodes and I have the following config for all > > the > > > > collections. > > > > maxShardsperNode: 1 > > > > replicationFactor: 1 > > > > > > > > I was using Streaming API to get back results from Solr. It worked > fine > > > for > > > > a while until the index data size reached beyond 40 GB per shard > (i.e. > > > per > > > > node). It started throwing JSON parsing exceptions while reading the > > > > TupleStream data. FYI: I have other services (Yarn, Spark) deployed > on > > > the > > > > same boxes on which Solr shards are running. Spark jobs also use a > lot > > of > > > > disk cache. So, the free available disk cache on the boxes vary a > > > > lot depending upon what else is running on the box. > > > > > > > > Due to this issue, I moved to using the cursor approach and it works > > fine > > > > but as we all know it is way slower than the streaming approach. > > > > > > > > Currently the index size per shard is 80GB (The machine has 512 GB of > > RAM > > > > and being used by different services/programs: heap/off-heap and the > > disk > > > > cache requirements). > > > > > > > > When I have enough RAM (more than 80 GB so that all the index data > > could > > > > fit in memory) available on the machine, the streaming API succeeds > > > without > > > > running into any exceptions. > > > > > > > > Question: > > > > How different the index data caching mechanism (for HDFS) is for the > > > > Streaming API from the cursorMark approach? > > > > Why cursor works every time but streaming works only when there is a > > lot > > > of > > > > free disk cache? > > > > > > > > Thank you. > > > > > > > > > >