On 10/24/2019 1:52 AM, Danilo Tomasoni wrote:
For every document processed, a soft commit is performed to make the update visible to other concurrent update processes.

This is not the way to do things. Doing a commit after every document means that Solr will spend more time doing commits than anything else.

Documents should be indexed in batches.

https://lucidworks.com/post/really-batch-updates-solr-2/

Every process at the end will perform an hard commit.

Use autoCommit to do hard commits. I would suggest NOT using maxDoc, only use maxTime, and set it to 60000 -- one minute. Also ensure that openSearcher is set to false. Commits that do not open a new searcher are VERY fast. These hard commits will not do anything for document visibility, they are about data durability.

Then you can use autoSoftCommit for change visibility, and not worry about sending commits in your indexing application. Again, don't set maxDoc. Set maxTime to as long an interval as you can stand. I would suggest a minumum of two minutes, but make it longer if you can. Something like 5 or 10 minutes.

https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

The issue I have is that hard commits never terminate (it's ongoing by more than 3 days) and the number of segments and the solr index will grow a lot.

What do you mean by "terminate" here? I cannnot figure this out from the context. The only thing I'm aware of that a hard commit is going to terminate is the current transaction log ... the current log is closed and the next time a document is indexed, a new one will be created. Hard commits are the only thing that will close a transaction log.

In the past when the commit finished I was used to incrementally optimize the index (from 40 segments to 39, to 38 and so on)
but also here the process is very slow.

If you're going to optimize, which we generally recommend NOT doing, optimize in a single pass. Optimizing with multiple passes means reading the index and writing the index multiple times ... and each forced merge will require significant system resources. It may not require them all, but it is significant.

Thanks,
Shawn

Reply via email to