Hi all,
Our current setup is a master and slave pair on a single machine, with an index size of ~50GB.

Query and update times are still respectable, but commits are taking ~20% of time on the master, while our daily index optimise can up to 4 hours...
Here's the most relevant part of solrconfig.xml:
    <useCompoundFile>true</useCompoundFile>
    <mergeFactor>10</mergeFactor>
    <maxBufferedDocs>1000</maxBufferedDocs>
    <maxMergeDocs>10000</maxMergeDocs>
    <maxFieldLength>10000</maxFieldLength>

I've given both master and slave 2.5GB of RAM.

Does an index optimise read and re-write the whole thing? If so, taking about 4 hours is pretty good! However, the documentation here: http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten +minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8b states "Optimizations can take nearly ten minutes to run..." which leads me to think that we've grossly misconfigured something...

Firstly, we would obviously love any way to reduce this optimise time - I have yet to experiment extensively with the settings above, and optimise frequency, but some general guidance would be great.

Secondly, this index size is increasing monotonously over time and as we acquire new users. We need to take action to ensure we can scale in the future. The approach we're favouring at the moment is horizontal partitioning of indices by user id as our data suits this scheme well. A given index would hold the indexed data for n users, where n would probably be between 1 and 100 users, and we will have multiple indices per search server.

Running server per index is impractical, especially for a small n, so is a sinlge Solr instance capable of managing multiple searchers and writers in this way? Following on from that, does anyone know of limiting factors in Solr or Lucene that would influence our decision on the value of n - the number of users per index?

Thanks!
James



Reply via email to