James, Regarding your questions about n users per index - this is a fine approach. The largest Social Network that you know of uses the same approach for various things, including full-text indices (not Solr, but close). You'd have to maintain user->shard/index mapping somewhere, of course. What should the n be, you ask? Look at the overall index size, I'd say, against server capabilities (RAM, disk, CPU), increase n up to a point where you're maximizing your hardware at some target query rate.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: James Brady <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, February 27, 2008 10:08:02 PM > Subject: Strategy for handling large (and growing) index: horizontal > partitioning? > > Hi all, > Our current setup is a master and slave pair on a single machine, > with an index size of ~50GB. > > Query and update times are still respectable, but commits are taking > ~20% of time on the master, while our daily index optimise can up to > 4 hours... > Here's the most relevant part of solrconfig.xml: > true > 10 > 1000 > 10000 > 10000 > > I've given both master and slave 2.5GB of RAM. > > Does an index optimise read and re-write the whole thing? If so, > taking about 4 hours is pretty good! However, the documentation here: > http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten > +minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8b > states "Optimizations can take nearly ten minutes to run..." which > leads me to think that we've grossly misconfigured something... > > Firstly, we would obviously love any way to reduce this optimise time > - I have yet to experiment extensively with the settings above, and > optimise frequency, but some general guidance would be great. > > Secondly, this index size is increasing monotonously over time and as > we acquire new users. We need to take action to ensure we can scale > in the future. The approach we're favouring at the moment is > horizontal partitioning of indices by user id as our data suits this > scheme well. A given index would hold the indexed data for n users, > where n would probably be between 1 and 100 users, and we will have > multiple indices per search server. > > Running server per index is impractical, especially for a small n, so > is a sinlge Solr instance capable of managing multiple searchers and > writers in this way? Following on from that, does anyone know of > limiting factors in Solr or Lucene that would influence our decision > on the value of n - the number of users per index? > > Thanks! > James > > > >