On 1/11/2018 1:38 AM, Bernd Fehling wrote:
To sum it up, there is no way for bulk loading in solr, due to the lack of preserving the order of operation. Solr can only supply bulk loading if you really have unique data, right?
Bulk loading implies that every document is inserted exactly once and that there are no other operations, like updates or deletes. If there are other operations, then in my mind, it's not bulk loading.
By the way, the queue used is java.util.concurrent.BlockingQueue. Changing that to ArrayBlockingQueue (to force FIFO) would not really help, I guess.
Correct, the issue is that updates are processed simultaneously. Making absolutely sure that removal is FIFO wouldn't make any difference. Although I think that the current implementation is probably just as FIFO as the array implementation.
You say "If there are at least three threads in the concurrent client...", but two threads would work?
The thread count of three was specific to the exact scenario I described, where update 1 contains the initial indexing and update 3 (two updates later) contains the new version. If it were update 1 and update 7, then there would need to be a thread count of seven to see the problem.
How are other users doing bulk loading with archived backups and preserving the order? Can't believe that I'm the only one on earth having this need.
If the backup is a log of changes rather than an info dump, then the only reliable way you could guarantee correct operation is to do the indexing with one thread. But then indexing will be slower, possibly a LOT slower.
Thanks, Shawn