Hola James, I think what Wunder was suggesting was really a copy (time cp -a oldIndex newIndex).
I'm not sure why you have both the master and the slave on the same box.... :) As for the 10 minute Wiki thing - use the Wiki, please edit it, anyone can get an account and help with the Wiki. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: James Brady <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, February 28, 2008 12:59:50 PM > Subject: Re: Strategy for handling large (and growing) index: horizontal > partitioning? > > Hi, yes a post-optimise copy takes 45 minutes at present. Disk IO is > definitely the bottleneck, you're right -- iostat was showing 100% > utilisation for the 5 hours it took to optimise yesterday... > > The master and slave are on the same disk, and it's definitely on my > list to fix that, but the searcher is so lightly loaded compared to > the indexer that I don't think it will win us too much. > > As there has been another optimise time question on the list today > could I request that the "10 minute" claim is taken of the > CollectionDistribution wiki page? It's extremely misleading for > newcomers who don't necessarily realise an optimise entails reading > and writing the whole index, and that optimise time is going to be at > least O(n) > > James > > > On 28 Feb 2008, at 09:07, Walter Underwood wrote: > > > Have you timed how long it takes to copy the index files? Optimizing > > can never be faster than that, since it must read every byte and write > > a whole new set. Disc speed may be your bottleneck. > > > > You could also look at disc access rates in a monitoring tool. > > > > Is there read contention between the master and slave for the same > > disc? > > > > wunder > > > > On 2/27/08 7:08 PM, "James Brady" wrote: > > > >> Hi all, > >> Our current setup is a master and slave pair on a single machine, > >> with an index size of ~50GB. > >> > >> Query and update times are still respectable, but commits are taking > >> ~20% of time on the master, while our daily index optimise can up to > >> 4 hours... > >> Here's the most relevant part of solrconfig.xml: > >> true > >> 10 > >> 1000 > >> 10000 > >> 10000 > >> > >> I've given both master and slave 2.5GB of RAM. > >> > >> Does an index optimise read and re-write the whole thing? If so, > >> taking about 4 hours is pretty good! However, the documentation here: > >> http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten > >> +minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8b > >> states "Optimizations can take nearly ten minutes to run..." which > >> leads me to think that we've grossly misconfigured something... > >> > >> Firstly, we would obviously love any way to reduce this optimise time > >> - I have yet to experiment extensively with the settings above, and > >> optimise frequency, but some general guidance would be great. > >> > >> Secondly, this index size is increasing monotonously over time and as > >> we acquire new users. We need to take action to ensure we can scale > >> in the future. The approach we're favouring at the moment is > >> horizontal partitioning of indices by user id as our data suits this > >> scheme well. A given index would hold the indexed data for n users, > >> where n would probably be between 1 and 100 users, and we will have > >> multiple indices per search server. > >> > >> Running server per index is impractical, especially for a small n, so > >> is a sinlge Solr instance capable of managing multiple searchers and > >> writers in this way? Following on from that, does anyone know of > >> limiting factors in Solr or Lucene that would influence our decision > >> on the value of n - the number of users per index? > >> > >> Thanks! > >> James > >> > >> > >> > > > >