Hi All,

I have few questions about SolrCloud and how could behave in an
environment where there are more concurrent clients updating the same
collection.

We have a SolrCloud 4.8.1 collection that stores a catalog of millions
of products (index size about 20GB).

Actually there is only one SolrJ client committing all the
modifications, this client takes care of update all product
descriptions, attributes, prices, availabilities, etc.
And every few minutes this client submit a group of documents
(thousands or more) that have to be updated.

This is the current updateHandler configuration:

<updateHandler class="solr.DirectUpdateHandler2">

<autoCommit>
<maxTime>300000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>

And things works well even when there is an high amount of users searching.

But in order to have prices updates as soon as possible, we're
planning to add a second client that, even while the first client is
running, should submit many prices atomic updates.

Now I'm worried about to have two clients on the same collection, even
if those clients can be orchestrated using a kind of semaphore, I'm
afraid that those atomic commits could come too quickly or in worst
case might even overlap the other (first) client.

As far as I read, continuous commits could dangerously slow down the
performance of the search engine.

In case commits between the two clients are overlapped, this could
even compromise the collections integrity, given there is no
transaction isolation in Solr.

To be clear, what happens if the second client does an atomic update
while the first client is doing a full delete and re-indexing of the
entire collection?

My idea is that is better have always only one client that update the
collection, may be using near real time indexing, but always only one
client.

Could please anyone confirm my concerns or there is a workaround I
have not considered in order to have more clients?

Best regards,
Vincenzo


-- 
Vincenzo D'Amore

Reply via email to