On 1/25/2018 7:48 AM, Vincenzo D'Amore wrote:
I have few questions about SolrCloud and how could behave in an
environment where there are more concurrent clients updating the same
collection.
We have a SolrCloud 4.8.1 collection that stores a catalog of millions
of products (index size about 20GB).
Actually there is only one SolrJ client committing all the
modifications, this client takes care of update all product
descriptions, attributes, prices, availabilities, etc.
And every few minutes this client submit a group of documents
(thousands or more) that have to be updated.
This is the current updateHandler configuration:
<updateHandler class="solr.DirectUpdateHandler2">
<autoCommit>
<maxTime>300000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
That autoCommit configuration is not affecting document visibility at
all, because openSearcher is set to false. Don't rush to change this --
this kind of configuration is what you want. I would probably use one
minute here rather than five minutes, but again, don't be in a rush to
change it without some evidence that the change is needed.
But in order to have prices updates as soon as possible, we're
planning to add a second client that, even while the first client is
running, should submit many prices atomic updates.
Now I'm worried about to have two clients on the same collection, even
if those clients can be orchestrated using a kind of semaphore, I'm
afraid that those atomic commits could come too quickly or in worst
case might even overlap the other (first) client.
Having multiple clients send updates should not be a problem. This is
the recommended way to increase indexing speed.
When/where are the commits that open a new searcher happening? The
autoCommit settings aren't handling that.
There are several ways to accomplish commits that make changes visible.
Three of them are what I would call "correct" to pair with your
autoCommit settings.
One good option is to configure autoSoftCommit, with a value that
describes how long after the first update a commit will take place. A
second option is to include a "commitWithin" parameter on every update
request, with a value that works similarly to autoSoftCommit. Another
is to send an update request that explicitly commits. Ideally those
manual commits will be soft commits.
I would recommend one of the first two options, but be very careful
about making the intervals too short. The interval should be longer
than it takes for a typical commit to actually happen, probably two to
three times as long, or longer. I would not recommend manual commit
requests unless you can be sure that only one of your clients will send
them, and that the commits will be spaced far enough apart that they
can't overlap. I am using manual soft commits for updates on my own
indexes.
Here's a blog post that talks about commits:
https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
If commits are taking a long time to occur, then the most common way to
reduce that time is to reduce autowarmCount on your caches.
Thanks,
Shawn