Re: Default Index config

Shawn Heisey Mon, 09 Apr 2018 07:47:45 -0700

On 4/9/2018 4:04 AM, mganeshs wrote:

Regarding CPU high, when we are troubleshooting, we found that Merge threads
are keep on running and it's take most CPU time ( as per Visual JVM ).

With a one second autoSoftCommit, nearly constant indexing will producea lot of very small index segments. Those index segments will have tobe merged eventually. You have increased the merge policy numbers whichwill reduce the total number of merges, but each merge is going to belarger than it would with defaults, so it's going to take a little bitlonger. This isn't too big a deal with first-level merges, but at thehigher levels, they do get large -- no matter what the configuration is.

*Note*: following is the code snippet we use for indexing / adding solr
document in batch per collection

/for (SolrCollectionList solrCollection : SolrCollectionList.values()) {
        CollectionBucket collectionBucket = getCollectionBucket(solrCollection);
        List<SolrInputDocument> solrInputDocuments =
collectionBucket.getSolrInputDocumentList();
        String collectionName = collectionBucket.getCollectionName();
        try {
                if(solrInputDocuments.size() > 0) {
                        CloudSolrClient solrClient =
PlatformIndexManager.getInstance().getCloudSolrClient(collectionName);
                        solrClient.add(collectionName, solrInputDocuments);
                }
}/

*where solrClient is created as below
*
/this.cloudSolrClient = new
CloudSolrClient.Builder().withZkHost(zooKeeperHost).withHttpClient(HttpClientUtil.HttpClientFactory.createHttpClient()).build();
this.cloudSolrClient.setZkClientTimeout(30000);
/

Is that code running on the Solr server, or on a different machine? Areyou creating a SolrClient each time you use it, or have you createdclient objects that get re-used?

You don't need a different SolrClient object for each collection. Your"getCloudSolrClient" method takes a collection name, which suggeststhere might be a different client object for each one. Most of thetime, you need precisely one client object for the entire application.

Hard commit is kept as automatic and set to 15000 ms.
In this process, we also see, when merge is happening, and already
maxMergeCount ( default one ) is reached, commits are getting delayed and
solrj client ( where we add document ) is getting blocked and once once of
Merge thread process the merge, then solrj client returns the result.
How do we avoid this blocking of solrj client ? Do I need to go out of
default config for this scenario? I mean change the merge factor
configuration ?

Can you suggest what would be merge config for such a scenario ? Based on
forums, I tried to change the merge settings to the following,

What are you trying to accomplish by changing the merge policy? It'sfine to find information for a config on the Internet, but you need toknow what that config *does* before you use it, and make sure it alignswith your goals. On mine, I change maxMergeAtOnce and segmentsPerTierto 35, and maxMergeAtOnceExplicit to 105. I know exactly what I'mtrying to do with this config -- reduce the frequency of merges. Eachmerge is going to be larger with this config, but they will happen lessfrequently. These three settings are the only ones that I change in mymerge policy. Changing all of the other settings that you have changedshould not be necessary. I make one other adjustment in this area -- tothe merge scheduler.

In same solr node, we have multiple index / collection. In that case,
whether TieredMergePolicyFactory will be right option or for multiple
collection in same node we should go for other merge policy ( like LogByte
etc )

TieredMergePolicy was made the default policy after a great deal oftesting and discussion by Lucene developers. They found that it worksbetter than the others for the vast majority of users. It is likely thebest choice for you too.

These are the settings that I use in indexConfig to reduce the impact ofmerges on my indexing:


  <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
    <int name="maxMergeAtOnce">35</int>
    <int name="segmentsPerTier">35</int>
    <int name="maxMergeAtOnceExplicit">105</int>
  </mergePolicy>
  <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
    <int name="maxThreadCount">1</int>
    <int name="maxMergeCount">6</int>
  </mergeScheduler>

Note that this config is designed for 6.x and earlier. I do not know ifit will work in 7.x. It probably needs to be adjusted to the newFactory config. You can use it as a guide, though.


Thanks,
Shawn

Re: Default Index config

Reply via email to