On 1/11/2018 12:05 AM, Bernd Fehling wrote:
This will nerver pass a Jepsen test and I call it _NOT_ thread safe.

I haven't looked into the code yet, to see if the queue is FIFO, otherwise
this would be stupid.

I was not thinking about order of operations when I said that the client was threadsafe. I meant that one client object can be used simultaneously by multiple threads without anything getting cross-contaminated within the program.

If you are absolutely reliant on operations happening in a precise order, such that a document could get indexed in one request and then replaced (or updated) with a later request, you should not use the concurrent client. You could define it with a single thread, but if you do that, then the concurrent client doesn't work any faster than the standard client.

When a concurrent client is built, it creates the specified number of processing threads. When updates are sent, they are added to an internal queue. The processing threads will handle requests from the queue as long as the queue is not empty.

Those threads will process the requests they have been assigned simultaneously. Although I'm sure that each thread pulls requests off the queue in a FIFO manner, I have a scenario for you to consider. This scenario is not just an intellectual exercise, it is the kind of thing that can easily happen in the wild.

Let's say that when document X is initially indexed, it is at position 997 in a batch of 1000 documents. Then two update requests later, the new version of document X is at position 2 in another batch of 1000 documents.

If there are at least three threads in the concurrent client, those update requests may begin execution at nearly the same time. In that situation, Solr is likely to index document X in the request added later before it indexes document X in the request added earlier, resulting in outdated information ending up in the index.

The same thing can happen even with a non-concurrent client when it is used in a multi-threaded manner.

Preserving order of operations cannot be guaranteed if there are multiple threads. It could be possible to add some VERY sophisticated synchronization capabilities, but writing code to do that would be very difficult, and it wouldn't be trivial to use either.

Thanks,
Shawn

Reply via email to