James,

Regarding your questions about n users per index - this is a fine approach.  
The largest Social Network that you know of uses the same approach for various 
things, including full-text indices (not Solr, but close).  You'd have to 
maintain user->shard/index mapping somewhere, of course.  What should the n be, 
you ask?  Look at the overall index size, I'd say, against server capabilities 
(RAM, disk, CPU), increase n up to a point where you're maximizing your 
hardware at some target query rate.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: James Brady <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, February 27, 2008 10:08:02 PM
> Subject: Strategy for handling large (and growing) index: horizontal 
> partitioning?
> 
> Hi all,
> Our current setup is a master and slave pair on a single machine,  
> with an index size of ~50GB.
> 
> Query and update times are still respectable, but commits are taking  
> ~20% of time on the master, while our daily index optimise can up to  
> 4 hours...
> Here's the most relevant part of solrconfig.xml:
>      true
>      10
>      1000
>      10000
>      10000
> 
> I've given both master and slave 2.5GB of RAM.
> 
> Does an index optimise read and re-write the whole thing? If so,  
> taking about 4 hours is pretty good! However, the documentation here:
> http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten 
> +minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8b
> states "Optimizations can take nearly ten minutes to run..." which  
> leads me to think that we've grossly misconfigured something...
> 
> Firstly, we would obviously love any way to reduce this optimise time  
> - I have yet to experiment extensively with the settings above, and  
> optimise frequency, but some general guidance would be great.
> 
> Secondly, this index size is increasing monotonously over time and as  
> we acquire new users. We need to take action to ensure we can scale  
> in the future. The approach we're favouring at the moment is  
> horizontal partitioning of indices by user id as our data suits this  
> scheme well. A given index would hold the indexed data for n users,  
> where n would probably be between 1 and 100 users, and we will have  
> multiple indices per search server.
> 
> Running server per index is impractical, especially for a small n, so  
> is a sinlge Solr instance capable of managing multiple searchers and  
> writers in this way? Following on from that, does anyone know of  
> limiting factors in Solr or Lucene that would influence our decision  
> on the value of n - the number of users per index?
> 
> Thanks!
> James
> 
> 
> 
> 


Reply via email to