Re: SolrCloud loadbalancing, replication, and failover

Shawn Heisey Thu, 31 Jul 2014 09:06:06 -0700

On 7/31/2014 12:58 AM, shuss...@del.aithent.com wrote:
> Thanks for giving great explanation about the memory requirements. Could you 
> tell be what all parameters that I need to change in my SolrConfig.xml to 
> handle large index size. What are the optimal values that I need to use.
>
> My indexed data size is 65 GB (for 8.6 million documents) and I am having 48 
> GB RAM on my server. Whenever I perform delta-indexing, the server become 
> unresponsive while updating the index. 
>
> Following are the changes that I did in solrconfig.xml after going through net
> <writeLockTimeout>60000</writeLockTimeout>
> <ramBufferSizeMB>256</ramBufferSizeMB>
> <useCompoundFile>false</useCompoundFile>
> <maxBufferedDocs>1000</maxBufferedDocs>
>
>  <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>           <int name="maxMergeAtOnce">10</int>
>           <int name="segmentsPerTier">10</int>
>  </mergePolicy>
>  
> <mergeFactor>10</mergeFactor>
> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
>
> <lockType>simple</lockType>
> <unlockOnStartup>true</unlockOnStartup>
>
> <updateHandler class="solr.DirectUpdateHandler2">
>   <autoCommit>
>     <maxDocs>15000</maxDocs>
>     <openSearcher>true</openSearcher>
>   </autoCommit>
>   <updateLog>
>   <str name="dir">${solr.data.dir:}</str>
>  </updateLog>
> </updateHandler>
>
> So, please provide your valuable suggestion on this problem


You replied directly to me, not to the list.  I am redirecting this back
to the list.

One of the first things that I would do is change openSearcher to false
in your autoCommit settings.  This will mean that you must take care of
commits yourself when you index, to make documents visible.  If you want
any more suggestions, we'll need to see the entire solrconfig.xml file.

The fact that you don't have enough RAM to cache your whole index could
be a problem.  If 8.6 million documents results in 65GB of index, then
your documents are probably quite large, and that can lead to other
possible challenges, because it usually means that a lot of work must be
done to index a single document.  There are also probably a lot of terms
to match when querying.

I do not know how much of your 48GB has been allocated to the java heap,
which takes away from memory that the operating system can use to cache
index files.

Thanks,
Shawn

Re: SolrCloud loadbalancing, replication, and failover

Reply via email to