[
https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651942#comment-16651942
]
Steph van Schalkwyk commented on CONNECTORS-1546:
-------------------------------------------------
Hans is correct. I would remove it. It can mess up merging later if not used
correctly. It may also take a long time to complete.
I'm going to upload a patch or two soon and will remove it if you concur.
BTW, from the ES 6.4 doc:
"Force merge should only be called against *read-only indices*. Running force
merge against a read-write index can cause very large segments to be produced
(>5Gb per segment), and the merge policy +*will never consider it for merging
again until it mostly consists of deleted docs*+. This can cause very large
segments to remain in the shards."
But I agree. It isn't up to MCF to decide what to do as it does impact
ingesting.
Hans may want to try this before ingesting:
PUT /_cluster/settings{"transient" : {"indices.store.throttle.type" : "none"
}}
and after ingesting:
PUT /_cluster/settings{"transient" : {"indices.store.throttle.type" : "merge"
}}
> Optimize Elasticsearch performance by removing 'forcemerge'
> -----------------------------------------------------------
>
> Key: CONNECTORS-1546
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1546
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Elastic Search connector
> Reporter: Hans Van Goethem
> Assignee: Steph van Schalkwyk
> Priority: Major
>
> After crawling with ManifoldCF, forcemerge is applied to optimize the
> Elasticsearch index. This optimization makes the Elastic faster for
> read-operations but not for write-opeartions. On the contrary, performance on
> the write operations becomes worse after every forcemerge.
> Can you remove this forcemerge in ManifoldCF to optimize perfomance for
> recurrent crawling to Elasticsearch?
> If somene needs this forcemerge, it can be applied mannually against
> Elasticsearch directly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)