Strategy for handling large (and growing) index: horizontal partitioning?

James Brady Wed, 27 Feb 2008 19:08:48 -0800

Hi all,

Our current setup is a master and slave pair on a single machine,with an index size of ~50GB.

Query and update times are still respectable, but commits are taking~20% of time on the master, while our daily index optimise can up to4 hours...

Here's the most relevant part of solrconfig.xml:
    <useCompoundFile>true</useCompoundFile>
    <mergeFactor>10</mergeFactor>
    <maxBufferedDocs>1000</maxBufferedDocs>
    <maxMergeDocs>10000</maxMergeDocs>
    <maxFieldLength>10000</maxFieldLength>

I've given both master and slave 2.5GB of RAM.

Does an index optimise read and re-write the whole thing? If so,taking about 4 hours is pretty good! However, the documentation here:http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten+minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8bstates "Optimizations can take nearly ten minutes to run..." whichleads me to think that we've grossly misconfigured something...

Firstly, we would obviously love any way to reduce this optimise time- I have yet to experiment extensively with the settings above, andoptimise frequency, but some general guidance would be great.

Secondly, this index size is increasing monotonously over time and aswe acquire new users. We need to take action to ensure we can scalein the future. The approach we're favouring at the moment ishorizontal partitioning of indices by user id as our data suits thisscheme well. A given index would hold the indexed data for n users,where n would probably be between 1 and 100 users, and we will havemultiple indices per search server.

Running server per index is impractical, especially for a small n, sois a sinlge Solr instance capable of managing multiple searchers andwriters in this way? Following on from that, does anyone know oflimiting factors in Solr or Lucene that would influence our decisionon the value of n - the number of users per index?


Thanks!
James

Strategy for handling large (and growing) index: horizontal partitioning?

Reply via email to