On 4/9/2018 4:04 AM, mganeshs wrote:
Regarding CPU high, when we are troubleshooting, we found that Merge threads
are keep on running and it's take most CPU time ( as per Visual JVM ).
With a one second autoSoftCommit, nearly constant indexing will produce
a lot of very small index segments. Those index segments will have to
be merged eventually. You have increased the merge policy numbers which
will reduce the total number of merges, but each merge is going to be
larger than it would with defaults, so it's going to take a little bit
longer. This isn't too big a deal with first-level merges, but at the
higher levels, they do get large -- no matter what the configuration is.
*Note*: following is the code snippet we use for indexing / adding solr
document in batch per collection
/for (SolrCollectionList solrCollection : SolrCollectionList.values()) {
CollectionBucket collectionBucket = getCollectionBucket(solrCollection);
List<SolrInputDocument> solrInputDocuments =
collectionBucket.getSolrInputDocumentList();
String collectionName = collectionBucket.getCollectionName();
try {
if(solrInputDocuments.size() > 0) {
CloudSolrClient solrClient =
PlatformIndexManager.getInstance().getCloudSolrClient(collectionName);
solrClient.add(collectionName, solrInputDocuments);
}
}/
*where solrClient is created as below
*
/this.cloudSolrClient = new
CloudSolrClient.Builder().withZkHost(zooKeeperHost).withHttpClient(HttpClientUtil.HttpClientFactory.createHttpClient()).build();
this.cloudSolrClient.setZkClientTimeout(30000);
/
Is that code running on the Solr server, or on a different machine? Are
you creating a SolrClient each time you use it, or have you created
client objects that get re-used?
You don't need a different SolrClient object for each collection. Your
"getCloudSolrClient" method takes a collection name, which suggests
there might be a different client object for each one. Most of the
time, you need precisely one client object for the entire application.
Hard commit is kept as automatic and set to 15000 ms.
In this process, we also see, when merge is happening, and already
maxMergeCount ( default one ) is reached, commits are getting delayed and
solrj client ( where we add document ) is getting blocked and once once of
Merge thread process the merge, then solrj client returns the result.
How do we avoid this blocking of solrj client ? Do I need to go out of
default config for this scenario? I mean change the merge factor
configuration ?
Can you suggest what would be merge config for such a scenario ? Based on
forums, I tried to change the merge settings to the following,
What are you trying to accomplish by changing the merge policy? It's
fine to find information for a config on the Internet, but you need to
know what that config *does* before you use it, and make sure it aligns
with your goals. On mine, I change maxMergeAtOnce and segmentsPerTier
to 35, and maxMergeAtOnceExplicit to 105. I know exactly what I'm
trying to do with this config -- reduce the frequency of merges. Each
merge is going to be larger with this config, but they will happen less
frequently. These three settings are the only ones that I change in my
merge policy. Changing all of the other settings that you have changed
should not be necessary. I make one other adjustment in this area -- to
the merge scheduler.
In same solr node, we have multiple index / collection. In that case,
whether TieredMergePolicyFactory will be right option or for multiple
collection in same node we should go for other merge policy ( like LogByte
etc )
TieredMergePolicy was made the default policy after a great deal of
testing and discussion by Lucene developers. They found that it works
better than the others for the vast majority of users. It is likely the
best choice for you too.
These are the settings that I use in indexConfig to reduce the impact of
merges on my indexing:
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnce">35</int>
<int name="segmentsPerTier">35</int>
<int name="maxMergeAtOnceExplicit">105</int>
</mergePolicy>
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxThreadCount">1</int>
<int name="maxMergeCount">6</int>
</mergeScheduler>
Note that this config is designed for 6.x and earlier. I do not know if
it will work in 7.x. It probably needs to be adjusted to the new
Factory config. You can use it as a guide, though.
Thanks,
Shawn