Dave: I should have asked this first. What version of Solr are you using? I Not sure whether it was fixed in BETA or not (certainly is in the 4.0 GA release). There was a problem with adding a doclist via solrj, here's one related JIRA, although it wasn't the main fix: https://issues.apache.org/jira/browse/SOLR-3001. I suspect that's the "known problem" Mark mentioned.
Because what you're seeing _sure_ sounds similar.... Best Erick On Mon, Nov 19, 2012 at 12:49 PM, Buttler, David <buttl...@llnl.gov> wrote: > Answers inline below > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Saturday, November 17, 2012 6:40 AM > To: solr-user@lucene.apache.org > Subject: Re: inconsistent number of results returned in solr cloud > > Hmmm, first an aside. If by "commit after every batch of documents " you > mean after every call to server.add(doclist), there's no real need to do > that unless you're striving for really low latency. the usual > recommendation is to use commitWithin when adding and commit only at the > very end of the run. This shouldn't actually be germane to your issue, just > an FYI. > > DB> Good point. The code for committing docs to solr is fairly old. I > will update it since I don't have a latency requirement. > > So you're saying that the inconsistency is permanent? By that I mean it > keeps coming back inconsistently for minutes/hours/days? > > DB> Yes, it is permanent. I have collections that have been up for weeks, > and are still returning inconsistent results, and I haven't been adding any > additional documents. > DB> Related to this, I seem to have a discrepancy between the number of > documents I think I am sending to solr, and the number of documents it is > reporting. I have tried reducing the number of shards for one of my small > collections, so I deleted all references to this collections, and reloaded > it. I think I have 260 documents submitted (counted from a hadoop job). > Solr returns a count of ~430 (it varies), and the first returned document > is not consistent. > > I guess if I were trying to test this I'd need to know how you added > subsequent collections. In particular what you did re: zookeeper as you > added each collection. > > DB> These are my steps > DB> 1. Create the collection via the HTTP API: http:// > <host>:<port>/solr/admin/collections?action=CREATE&name=<collection>&numShards=6&%20collection.configName=<collection> > DB> 2. Relaunch one of my JVM processes, bootstrapping the collection: > DB> java -Xmx16g -Dcollection.configName=<collection> -Djetty.port=<port> > -DzkHost=<zkhost> -Dsolr.solr.home=<solr home> -DnumShards=6 > -Dbootstrap_confdir=conf -jar start.jar > DB> load data > > DB> Let me know if something is unclear. I can run through the process > again and document it more carefully. > DB> > DB> Thanks for looking at it, > DB> Dave > > Best > Erick > > > On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David <buttl...@llnl.gov> wrote: > > > My typical way of adding documents is through SolrJ, where I commit after > > every batch of documents (where the batch size is configurable) > > > > I have now tried committing several times, from the command line (curl) > > with and without openSearcher=true. It does not affect anything. > > > > Dave > > > > -----Original Message----- > > From: Mark Miller [mailto:markrmil...@gmail.com] > > Sent: Friday, November 16, 2012 11:04 AM > > To: solr-user@lucene.apache.org > > Subject: Re: inconsistent number of results returned in solr cloud > > > > How did you do the final commit? Can you try a lone commit (with > > openSearcher=true) and see if that affects things? > > > > Trying to determine if this is a known issue or not. > > > > - Mark > > > > On Nov 16, 2012, at 1:34 PM, "Buttler, David" <buttl...@llnl.gov> wrote: > > > > > Hi all, > > > I buried an issue in my last post, so let me pop it up. > > > > > > I have a cluster with 10 collections on it. The first collection I > > loaded works perfectly. But every subsequent collection returns an > > inconsistent number of results for each query. The queries can be simply > > *:*, or more complex facet queries. If I go to individual cores and > issue > > the query, with distrib=false, I get a consistent number of results. I > am > > wondering if there is some delay in returning results from my shards, and > > the queried node just times out and displays the number of results that > it > > has received so far. If there is such a timeout, it must be very small, > as > > my QTime is around 11 ms. > > > > > > Dave > > > > >