A good answer may also depend on WHY you are wanting to restrict to 500K 
documents.

Are you seeking to reduce the time spent by Solr in determining the doc count?  
Are you just wanting to prevent people from moving too far into the result set? 
 Is it case that you can only display 6 digits for your return count? :)

If Solr is performing adequately, you could always just artificially restrict 
the result set.  Solr doesn't actually 'return' all 5M documents - it only 
returns the number you have specified in your query (as well as having some 
cache for the next results in anticipation of a subsequent query).  So, if the 
total count returned exceeds 500K, then just report 500K as the number of 
results, and similarly restrict how far a user can page through the results...

(And - you can (and sounds like you should) sort your results by descending 
post date so that you do in fact get the most recent ones coming back first...)

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


> -----Original Message-----
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Monday, July 11, 2011 7:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Restricting the Solr Posting List (retrieved set)
> 
> 
> > We want to search in an index in such a way that even if a
> > clause has a long
> > posting list - Solr should stop collecting documents for
> > the clause
> > after receiving X documents that match the clause.
> >
> > For example, if  for query "India",solr can return 5M
> > documents, we would
> > like to restrict the set at only 500K documents.
> >
> > The assumption is that since we are posting chronologically
> > - we would like
> > the X most recent documents to be matched for the clause
> > only.
> >
> > Is it possible anyway?
> 
> Looks like your use-case is suitable for time based sharding.
> http://wiki.apache.org/solr/DistributedSearch
> 
> Lets say you divide your shards according to months. You will have a
> separate core for each month.
> http://wiki.apache.org/solr/CoreAdmin
> 
> When a query comes in, you will hit the most recent core. If you don't
> obtain enough results add a new value (previous month core) to &shards=
> parameter.
> 


Reply via email to