James, I can't comment more on the SN's arch choices.
Here is the story about your questions - 1 Solr instance can hold 1+ indices, either via JNDI (see Wiki) or via the new multi-core support which works, but is still being hacked on - See SOLR-303 in JIRA for distributed search. Yonik committed it just the other day, so now that's in nightly builds if you want to try it. There are 2 Wiki pages about that, too, see Recent changes log on the Wiki to quickly find them. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: James Brady <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, February 29, 2008 1:11:07 AM > Subject: Re: Strategy for handling large (and growing) index: horizontal > partitioning? > > Hi Otis, > Thanks for your comments -- I didn't realise the wiki is open to > editing; my apologies. I've put in a few words to try and clear > things up a bit. > > So determining n will probably be a best guess followed by trial and > error, that's fine. I'm still not clear about whether single Solr > servers can operate across several indices, however.. can anyone give > me some pointers here? > An alternative would be to have 1 index per instance, and n instances > per server, where n is small. This might actually be a practical > solution -- I'm spending ~20% of my time committing, so I should > probably only have 3 or 4 indices in total per server to avoid two > committing at the same time. > > Your mention of The Large Social Network was interesting! A social > network's data is by definition pretty poorly partitioned by user id, > so unless they've done something extremely clever like co-locating > social cliques in the same indices, I would have though it would be a > sub-optimal architecture. If me and my friends are scattered around > different indices, each search would have to be federated massively. > > James > > > On 28 Feb 2008, at 20:49, Otis Gospodnetic wrote: > > > James, > > > > Regarding your questions about n users per index - this is a fine > > approach. The largest Social Network that you know of uses the > > same approach for various things, including full-text indices (not > > Solr, but close). You'd have to maintain user->shard/index mapping > > somewhere, of course. What should the n be, you ask? Look at the > > overall index size, I'd say, against server capabilities (RAM, > > disk, CPU), increase n up to a point where you're maximizing your > > hardware at some target query rate. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > >> From: James Brady > >> To: solr-user@lucene.apache.org > >> Sent: Wednesday, February 27, 2008 10:08:02 PM > >> Subject: Strategy for handling large (and growing) index: > >> horizontal partitioning? > >> > >> Hi all, > >> Our current setup is a master and slave pair on a single machine, > >> with an index size of ~50GB. > >> > >> Query and update times are still respectable, but commits are taking > >> ~20% of time on the master, while our daily index optimise can up to > >> 4 hours... > >> Here's the most relevant part of solrconfig.xml: > >> true > >> 10 > >> 1000 > >> 10000 > >> 10000 > >> > >> I've given both master and slave 2.5GB of RAM. > >> > >> Does an index optimise read and re-write the whole thing? If so, > >> taking about 4 hours is pretty good! However, the documentation here: > >> http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten > >> +minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8b > >> states "Optimizations can take nearly ten minutes to run..." which > >> leads me to think that we've grossly misconfigured something... > >> > >> Firstly, we would obviously love any way to reduce this optimise time > >> - I have yet to experiment extensively with the settings above, and > >> optimise frequency, but some general guidance would be great. > >> > >> Secondly, this index size is increasing monotonously over time and as > >> we acquire new users. We need to take action to ensure we can scale > >> in the future. The approach we're favouring at the moment is > >> horizontal partitioning of indices by user id as our data suits this > >> scheme well. A given index would hold the indexed data for n users, > >> where n would probably be between 1 and 100 users, and we will have > >> multiple indices per search server. > >> > >> Running server per index is impractical, especially for a small n, so > >> is a sinlge Solr instance capable of managing multiple searchers and > >> writers in this way? Following on from that, does anyone know of > >> limiting factors in Solr or Lucene that would influence our decision > >> on the value of n - the number of users per index? > >> > >> Thanks! > >> James > >> > >> > >> > >> > > > > > >