How many documents are in the index? If you haven't already done this I'd take a really close look at your schema and make sure you're only storing the things that should really be stored, same with the indexed fields. I drastically reduced my index size just by changing some indexed/stored options on a few fields.
On Thu, Feb 28, 2008 at 10:54 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > James, > > I can't comment more on the SN's arch choices. > > Here is the story about your questions > - 1 Solr instance can hold 1+ indices, either via JNDI (see Wiki) or via the > new multi-core support which works, but is still being hacked on > - See SOLR-303 in JIRA for distributed search. Yonik committed it just the > other day, so now that's in nightly builds if you want to try it. There are > 2 Wiki pages about that, too, see Recent changes log on the Wiki to quickly > find them. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > > From: James Brady <[EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > > > Sent: Friday, February 29, 2008 1:11:07 AM > > Subject: Re: Strategy for handling large (and growing) index: horizontal > partitioning? > > > > Hi Otis, > > Thanks for your comments -- I didn't realise the wiki is open to > > editing; my apologies. I've put in a few words to try and clear > > things up a bit. > > > > So determining n will probably be a best guess followed by trial and > > error, that's fine. I'm still not clear about whether single Solr > > servers can operate across several indices, however.. can anyone give > > me some pointers here? > > An alternative would be to have 1 index per instance, and n instances > > per server, where n is small. This might actually be a practical > > solution -- I'm spending ~20% of my time committing, so I should > > probably only have 3 or 4 indices in total per server to avoid two > > committing at the same time. > > > > Your mention of The Large Social Network was interesting! A social > > network's data is by definition pretty poorly partitioned by user id, > > so unless they've done something extremely clever like co-locating > > social cliques in the same indices, I would have though it would be a > > sub-optimal architecture. If me and my friends are scattered around > > different indices, each search would have to be federated massively. > > > > James > > > > > > On 28 Feb 2008, at 20:49, Otis Gospodnetic wrote: > > > > > James, > > > > > > Regarding your questions about n users per index - this is a fine > > > approach. The largest Social Network that you know of uses the > > > same approach for various things, including full-text indices (not > > > Solr, but close). You'd have to maintain user->shard/index mapping > > > somewhere, of course. What should the n be, you ask? Look at the > > > overall index size, I'd say, against server capabilities (RAM, > > > disk, CPU), increase n up to a point where you're maximizing your > > > hardware at some target query rate. > > > > > > Otis > > > -- > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > ----- Original Message ---- > > >> From: James Brady > > > > >> To: solr-user@lucene.apache.org > > >> Sent: Wednesday, February 27, 2008 10:08:02 PM > > >> Subject: Strategy for handling large (and growing) index: > > >> horizontal partitioning? > > >> > > >> Hi all, > > >> Our current setup is a master and slave pair on a single machine, > > >> with an index size of ~50GB. > > >> > > >> Query and update times are still respectable, but commits are taking > > >> ~20% of time on the master, while our daily index optimise can up to > > >> 4 hours... > > >> Here's the most relevant part of solrconfig.xml: > > >> true > > >> 10 > > >> 1000 > > >> 10000 > > >> 10000 > > >> > > >> I've given both master and slave 2.5GB of RAM. > > >> > > >> Does an index optimise read and re-write the whole thing? If so, > > >> taking about 4 hours is pretty good! However, the documentation here: > > >> http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten > > >> +minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8b > > >> states "Optimizations can take nearly ten minutes to run..." which > > >> leads me to think that we've grossly misconfigured something... > > >> > > >> Firstly, we would obviously love any way to reduce this optimise time > > >> - I have yet to experiment extensively with the settings above, and > > >> optimise frequency, but some general guidance would be great. > > >> > > >> Secondly, this index size is increasing monotonously over time and as > > >> we acquire new users. We need to take action to ensure we can scale > > >> in the future. The approach we're favouring at the moment is > > >> horizontal partitioning of indices by user id as our data suits this > > >> scheme well. A given index would hold the indexed data for n users, > > >> where n would probably be between 1 and 100 users, and we will have > > >> multiple indices per search server. > > >> > > >> Running server per index is impractical, especially for a small n, so > > >> is a sinlge Solr instance capable of managing multiple searchers and > > >> writers in this way? Following on from that, does anyone know of > > >> limiting factors in Solr or Lucene that would influence our decision > > >> on the value of n - the number of users per index? > > >> > > >> Thanks! > > >> James > > >> > > >> > > >> > > >> > > > > > > > > > > > > >