Hi Joel, I don't have any solr documents that have NULL values for the sort fields I use in my queries.
Thanks! On Sun, Dec 18, 2016 at 12:56 PM, Joel Bernstein <joels...@gmail.com> wrote: > Ok, based on the stack trace I suspect one of your sort fields has NULL > values, which in the 5x branch could produce null pointers if a segment had > no values for a sort field. This is also fixed in the Solr 6x branch. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Sat, Dec 17, 2016 at 2:44 PM, Chetas Joshi <chetas.jo...@gmail.com> > wrote: > > > Here is the stack trace. > > > > java.lang.NullPointerException > > > > at > > org.apache.solr.client.solrj.io.comp.FieldComparator$2. > > compare(FieldComparator.java:85) > > > > at > > org.apache.solr.client.solrj.io.comp.FieldComparator. > > compare(FieldComparator.java:92) > > > > at > > org.apache.solr.client.solrj.io.comp.FieldComparator. > > compare(FieldComparator.java:30) > > > > at > > org.apache.solr.client.solrj.io.comp.MultiComp.compare( > MultiComp.java:45) > > > > at > > org.apache.solr.client.solrj.io.comp.MultiComp.compare( > MultiComp.java:33) > > > > at > > org.apache.solr.client.solrj.io.stream.CloudSolrStream$ > > TupleWrapper.compareTo(CloudSolrStream.java:396) > > > > at > > org.apache.solr.client.solrj.io.stream.CloudSolrStream$ > > TupleWrapper.compareTo(CloudSolrStream.java:381) > > > > at java.util.TreeMap.put(TreeMap.java:560) > > > > at java.util.TreeSet.add(TreeSet.java:255) > > > > at > > org.apache.solr.client.solrj.io.stream.CloudSolrStream._ > > read(CloudSolrStream.java:366) > > > > at > > org.apache.solr.client.solrj.io.stream.CloudSolrStream. > > read(CloudSolrStream.java:353) > > > > at > > > > *.*.*.*.SolrStreamResultIterator$$anon$1.run(SolrStreamResultIterator. > > scala:101) > > > > at java.lang.Thread.run(Thread.java:745) > > > > 16/11/17 13:04:31 *ERROR* SolrStreamResultIterator:missing exponent > > number: > > char=A,position=106596 > > BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA' > > AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj' > > > > org.noggit.JSONParser$ParseException: missing exponent number: > > char=A,position=106596 > > BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA' > > AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj' > > > > at org.noggit.JSONParser.err(JSONParser.java:356) > > > > at org.noggit.JSONParser.readExp(JSONParser.java:513) > > > > at org.noggit.JSONParser.readNumber(JSONParser.java:419) > > > > at org.noggit.JSONParser.next(JSONParser.java:845) > > > > at org.noggit.JSONParser.nextEvent(JSONParser.java:951) > > > > at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:127) > > > > at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:57) > > > > at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37) > > > > at > > org.apache.solr.client.solrj.io.stream.JSONTupleStream. > > next(JSONTupleStream.java:84) > > > > at > > org.apache.solr.client.solrj.io.stream.SolrStream.read( > > SolrStream.java:147) > > > > at > > org.apache.solr.client.solrj.io.stream.CloudSolrStream$ > TupleWrapper.next( > > CloudSolrStream.java:413) > > > > at > > org.apache.solr.client.solrj.io.stream.CloudSolrStream._ > > read(CloudSolrStream.java:365) > > > > at > > org.apache.solr.client.solrj.io.stream.CloudSolrStream. > > read(CloudSolrStream.java:353) > > > > > > Thanks! > > > > On Fri, Dec 16, 2016 at 11:45 PM, Reth RM <reth.ik...@gmail.com> wrote: > > > > > If you could provide the json parse exception stack trace, it might > help > > to > > > predict issue there. > > > > > > > > > On Fri, Dec 16, 2016 at 5:52 PM, Chetas Joshi <chetas.jo...@gmail.com> > > > wrote: > > > > > > > Hi Joel, > > > > > > > > The only NON alpha-numeric characters I have in my data are '+' and > > '/'. > > > I > > > > don't have any backslashes. > > > > > > > > If the special characters was the issue, I should get the JSON > parsing > > > > exceptions every time irrespective of the index size and irrespective > > of > > > > the available memory on the machine. That is not the case here. The > > > > streaming API successfully returns all the documents when the index > > size > > > is > > > > small and fits in the available memory. That's the reason I am > > confused. > > > > > > > > Thanks! > > > > > > > > On Fri, Dec 16, 2016 at 5:43 PM, Joel Bernstein <joels...@gmail.com> > > > > wrote: > > > > > > > > > The Streaming API may have been throwing exceptions because the > JSON > > > > > special characters were not escaped. This was fixed in Solr 6.0. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Joel Bernstein > > > > > http://joelsolr.blogspot.com/ > > > > > > > > > > On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi < > > chetas.jo...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > I am running Solr 5.5.0. > > > > > > It is a solrCloud of 50 nodes and I have the following config for > > all > > > > the > > > > > > collections. > > > > > > maxShardsperNode: 1 > > > > > > replicationFactor: 1 > > > > > > > > > > > > I was using Streaming API to get back results from Solr. It > worked > > > fine > > > > > for > > > > > > a while until the index data size reached beyond 40 GB per shard > > > (i.e. > > > > > per > > > > > > node). It started throwing JSON parsing exceptions while reading > > the > > > > > > TupleStream data. FYI: I have other services (Yarn, Spark) > deployed > > > on > > > > > the > > > > > > same boxes on which Solr shards are running. Spark jobs also use > a > > > lot > > > > of > > > > > > disk cache. So, the free available disk cache on the boxes vary a > > > > > > lot depending upon what else is running on the box. > > > > > > > > > > > > Due to this issue, I moved to using the cursor approach and it > > works > > > > fine > > > > > > but as we all know it is way slower than the streaming approach. > > > > > > > > > > > > Currently the index size per shard is 80GB (The machine has 512 > GB > > of > > > > RAM > > > > > > and being used by different services/programs: heap/off-heap and > > the > > > > disk > > > > > > cache requirements). > > > > > > > > > > > > When I have enough RAM (more than 80 GB so that all the index > data > > > > could > > > > > > fit in memory) available on the machine, the streaming API > succeeds > > > > > without > > > > > > running into any exceptions. > > > > > > > > > > > > Question: > > > > > > How different the index data caching mechanism (for HDFS) is for > > the > > > > > > Streaming API from the cursorMark approach? > > > > > > Why cursor works every time but streaming works only when there > is > > a > > > > lot > > > > > of > > > > > > free disk cache? > > > > > > > > > > > > Thank you. > > > > > > > > > > > > > > > > > > > > >