On 1/25/2018 7:48 AM, Vincenzo D'Amore wrote:
I have few questions about SolrCloud and how could behave in an
environment where there are more concurrent clients updating the same
collection.

We have a SolrCloud 4.8.1 collection that stores a catalog of millions
of products (index size about 20GB).

Actually there is only one SolrJ client committing all the
modifications, this client takes care of update all product
descriptions, attributes, prices, availabilities, etc.
And every few minutes this client submit a group of documents
(thousands or more) that have to be updated.

This is the current updateHandler configuration:

<updateHandler class="solr.DirectUpdateHandler2">

<autoCommit>
<maxTime>300000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>

That autoCommit configuration is not affecting document visibility at all, because openSearcher is set to false. Don't rush to change this -- this kind of configuration is what you want. I would probably use one minute here rather than five minutes, but again, don't be in a rush to change it without some evidence that the change is needed.

But in order to have prices updates as soon as possible, we're
planning to add a second client that, even while the first client is
running, should submit many prices atomic updates.

Now I'm worried about to have two clients on the same collection, even
if those clients can be orchestrated using a kind of semaphore, I'm
afraid that those atomic commits could come too quickly or in worst
case might even overlap the other (first) client.

Having multiple clients send updates should not be a problem. This is the recommended way to increase indexing speed.

When/where are the commits that open a new searcher happening? The autoCommit settings aren't handling that.

There are several ways to accomplish commits that make changes visible. Three of them are what I would call "correct" to pair with your autoCommit settings.

One good option is to configure autoSoftCommit, with a value that describes how long after the first update a commit will take place. A second option is to include a "commitWithin" parameter on every update request, with a value that works similarly to autoSoftCommit. Another is to send an update request that explicitly commits. Ideally those manual commits will be soft commits.

I would recommend one of the first two options, but be very careful about making the intervals too short. The interval should be longer than it takes for a typical commit to actually happen, probably two to three times as long, or longer. I would not recommend manual commit requests unless you can be sure that only one of your clients will send them, and that the commits will be spaced far enough apart that they can't overlap. I am using manual soft commits for updates on my own indexes.

Here's a blog post that talks about commits:

https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

If commits are taking a long time to occur, then the most common way to reduce that time is to reduce autowarmCount on your caches.

Thanks,
Shawn

Reply via email to