We should probably work out a rule of thumb, like "10-20 minutes per gigabyte". I'll send a separate message to collect that info.
wunder On 2/28/08 9:59 AM, "James Brady" <[EMAIL PROTECTED]> wrote: > Hi, yes a post-optimise copy takes 45 minutes at present. Disk IO is > definitely the bottleneck, you're right -- iostat was showing 100% > utilisation for the 5 hours it took to optimise yesterday... > > The master and slave are on the same disk, and it's definitely on my > list to fix that, but the searcher is so lightly loaded compared to > the indexer that I don't think it will win us too much. > > As there has been another optimise time question on the list today > could I request that the "10 minute" claim is taken of the > CollectionDistribution wiki page? It's extremely misleading for > newcomers who don't necessarily realise an optimise entails reading > and writing the whole index, and that optimise time is going to be at > least O(n) > > James > > > On 28 Feb 2008, at 09:07, Walter Underwood wrote: > >> Have you timed how long it takes to copy the index files? Optimizing >> can never be faster than that, since it must read every byte and write >> a whole new set. Disc speed may be your bottleneck. >> >> You could also look at disc access rates in a monitoring tool. >> >> Is there read contention between the master and slave for the same >> disc? >> >> wunder >> >> On 2/27/08 7:08 PM, "James Brady" <[EMAIL PROTECTED]> wrote: >> >>> Hi all, >>> Our current setup is a master and slave pair on a single machine, >>> with an index size of ~50GB. >>> >>> Query and update times are still respectable, but commits are taking >>> ~20% of time on the master, while our daily index optimise can up to >>> 4 hours... >>> Here's the most relevant part of solrconfig.xml: >>> <useCompoundFile>true</useCompoundFile> >>> <mergeFactor>10</mergeFactor> >>> <maxBufferedDocs>1000</maxBufferedDocs> >>> <maxMergeDocs>10000</maxMergeDocs> >>> <maxFieldLength>10000</maxFieldLength> >>> >>> I've given both master and slave 2.5GB of RAM. >>> >>> Does an index optimise read and re-write the whole thing? If so, >>> taking about 4 hours is pretty good! However, the documentation here: >>> http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten >>> +minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8b >>> states "Optimizations can take nearly ten minutes to run..." which >>> leads me to think that we've grossly misconfigured something... >>> >>> Firstly, we would obviously love any way to reduce this optimise time >>> - I have yet to experiment extensively with the settings above, and >>> optimise frequency, but some general guidance would be great. >>> >>> Secondly, this index size is increasing monotonously over time and as >>> we acquire new users. We need to take action to ensure we can scale >>> in the future. The approach we're favouring at the moment is >>> horizontal partitioning of indices by user id as our data suits this >>> scheme well. A given index would hold the indexed data for n users, >>> where n would probably be between 1 and 100 users, and we will have >>> multiple indices per search server. >>> >>> Running server per index is impractical, especially for a small n, so >>> is a sinlge Solr instance capable of managing multiple searchers and >>> writers in this way? Following on from that, does anyone know of >>> limiting factors in Solr or Lucene that would influence our decision >>> on the value of n - the number of users per index? >>> >>> Thanks! >>> James >>> >>> >>> >> >