On 10/24/2019 1:52 AM, Danilo Tomasoni wrote:
For every document processed, a soft commit is performed to make the
update visible to other concurrent update processes.
This is not the way to do things. Doing a commit after every document
means that Solr will spend more time doing commits than anything else.
Documents should be indexed in batches.
https://lucidworks.com/post/really-batch-updates-solr-2/
Every process at the end will perform an hard commit.
Use autoCommit to do hard commits. I would suggest NOT using maxDoc,
only use maxTime, and set it to 60000 -- one minute. Also ensure that
openSearcher is set to false. Commits that do not open a new searcher
are VERY fast. These hard commits will not do anything for document
visibility, they are about data durability.
Then you can use autoSoftCommit for change visibility, and not worry
about sending commits in your indexing application. Again, don't set
maxDoc. Set maxTime to as long an interval as you can stand. I would
suggest a minumum of two minutes, but make it longer if you can.
Something like 5 or 10 minutes.
https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
The issue I have is that hard commits never terminate (it's ongoing by
more than 3 days) and the number of segments and the solr index will
grow a lot.
What do you mean by "terminate" here? I cannnot figure this out from
the context. The only thing I'm aware of that a hard commit is going to
terminate is the current transaction log ... the current log is closed
and the next time a document is indexed, a new one will be created.
Hard commits are the only thing that will close a transaction log.
In the past when the commit finished I was used to incrementally
optimize the index (from 40 segments to 39, to 38 and so on)
but also here the process is very slow.
If you're going to optimize, which we generally recommend NOT doing,
optimize in a single pass. Optimizing with multiple passes means
reading the index and writing the index multiple times ... and each
forced merge will require significant system resources. It may not
require them all, but it is significant.
Thanks,
Shawn