Dave:

I should have asked this first. What version of Solr are you using? I  Not
sure whether it was fixed in BETA or not (certainly is in the 4.0 GA
release). There was a problem with adding a doclist via solrj, here's one
related JIRA, although it wasn't the main fix:
https://issues.apache.org/jira/browse/SOLR-3001. I suspect that's the
"known problem" Mark mentioned.

Because what you're seeing _sure_ sounds similar....

Best
Erick


On Mon, Nov 19, 2012 at 12:49 PM, Buttler, David <buttl...@llnl.gov> wrote:

> Answers inline below
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, November 17, 2012 6:40 AM
> To: solr-user@lucene.apache.org
> Subject: Re: inconsistent number of results returned in solr cloud
>
> Hmmm, first an aside. If by "commit after every batch of documents " you
> mean after every call to server.add(doclist), there's no real need to do
> that unless you're striving for really low latency. the usual
> recommendation is to use commitWithin when adding and commit only at the
> very end of the run. This shouldn't actually be germane to your issue, just
> an FYI.
>
> DB> Good point.  The code for committing docs to solr is fairly old.  I
> will update it since I don't have a latency requirement.
>
> So you're saying that the inconsistency is permanent? By that I mean it
> keeps coming back inconsistently for minutes/hours/days?
>
> DB> Yes, it is permanent.  I have collections that have been up for weeks,
> and are still returning inconsistent results, and I haven't been adding any
> additional documents.
> DB> Related to this, I seem to have a discrepancy between the number of
> documents I think I am sending to solr, and the number of documents it is
> reporting.  I have tried reducing the number of shards for one of my small
> collections, so I deleted all references to this collections, and reloaded
> it. I think I have 260 documents submitted (counted from a hadoop job).
>  Solr returns a count of ~430 (it varies), and the first returned document
> is not consistent.
>
> I guess if I were trying to test this I'd need to know how you added
> subsequent collections. In particular what you did re: zookeeper as you
> added each collection.
>
> DB> These are my steps
> DB> 1. Create the collection via the HTTP API: http://
> <host>:<port>/solr/admin/collections?action=CREATE&name=<collection>&numShards=6&%20collection.configName=<collection>
> DB> 2. Relaunch one of my JVM processes, bootstrapping the collection:
> DB> java -Xmx16g -Dcollection.configName=<collection> -Djetty.port=<port>
> -DzkHost=<zkhost> -Dsolr.solr.home=<solr home> -DnumShards=6
> -Dbootstrap_confdir=conf -jar start.jar
> DB> load data
>
> DB> Let me know if something is unclear.  I can run through the process
> again and document it more carefully.
> DB>
> DB> Thanks for looking at it,
> DB> Dave
>
> Best
> Erick
>
>
> On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David <buttl...@llnl.gov> wrote:
>
> > My typical way of adding documents is through SolrJ, where I commit after
> > every batch of documents (where the batch size is configurable)
> >
> > I have now tried committing several times, from the command line (curl)
> > with and without openSearcher=true.  It does not affect anything.
> >
> > Dave
> >
> > -----Original Message-----
> > From: Mark Miller [mailto:markrmil...@gmail.com]
> > Sent: Friday, November 16, 2012 11:04 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: inconsistent number of results returned in solr cloud
> >
> > How did you do the final commit? Can you try a lone commit (with
> > openSearcher=true) and see if that affects things?
> >
> > Trying to determine if this is a known issue or not.
> >
> > - Mark
> >
> > On Nov 16, 2012, at 1:34 PM, "Buttler, David" <buttl...@llnl.gov> wrote:
> >
> > > Hi all,
> > > I buried an issue in my last post, so let me pop it up.
> > >
> > > I have a cluster with 10 collections on it.  The first collection I
> > loaded works perfectly.  But every subsequent collection returns an
> > inconsistent number of results for each query.  The queries can be simply
> > *:*, or more complex facet queries.  If I go to individual cores and
> issue
> > the query, with distrib=false, I get a consistent number of results.  I
> am
> > wondering if there is some delay in returning results from my shards, and
> > the queried node just times out and displays the number of results that
> it
> > has received so far.  If there is such a timeout, it must be very small,
> as
> > my QTime is around 11 ms.
> > >
> > > Dave
> >
> >
>

Reply via email to