First a brief description of how the topic expression works.

The topic expression allows you to subscribe to a query. Below is how it
works internally.

The topic expression maps to the TopicStream in the java Streaming API. So
I encourage people who are interested to review this code.

Under the covers the TopicStream persists checkpoints to a SolrCloud
Collection that describe where the topic left off. The TopicStream uses
these checkpoints as a filter on the query to return only documents higher
then the last sent checkpoints. After each call to the topic the
checkpoints are updated.

What is a checkpoint?

A checkpoint is the highest version number read from each shard in the
collection. The topic stream sorts by _version_ asc. As it cycles through
the documents from the shards it tracks the highest version number for each
shard and persists it.

Why use version numbers?

Version numbers are monotonic longs. Each new document receives a version
number which is higher then the last document on the shard. So by sorting
on _version_  asc you can cycle through all the documents in a shard in
batches.

Can a topic miss documents?

Currently the answer is theoretically yes. But in practice I believe it
would be very rare. To miss documents the following must occur:

1) Documents must be indexed with out of order version numbers. On the
leader I believe this is no longer possible. So only the replicas have this
issue currently.

2) The out of order version numbers must cross commit boundaries. This
means that a commit must occur while an out of order document is outside
the index.

3) The topic must pull the out of order committed document before the next
commit occurs. Once the out of order document is committed the sort by
version number will fix up the out of order documents.


Since #1 can be eliminated by only querying the leaders, that is one
possible option for dealing with the issue. But this will cut down on
scalability.

But, in my testing getting #1, #2 and #3 to actually occur is very hard.
This is particularly true if commit windows are short because that leaves a
very short window for #2 and #3 to line up. For example a one second
softCommit would allow only a one second window for #2 and #3 to occur at
the same time and this would have to coincide with #1.

I've spent days attempting to make the TopicStream lose data with different
types of stress tests and I've never been able to make it happen.



































Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Dec 13, 2016 at 11:22 AM, Joel Bernstein <joels...@gmail.com> wrote:

> I plan on using this thread to address questions that were posted to
> SOLR-4587. Below are the questions asked:
>
>
> 1) You mentioned that "The issue here is that it's possible that an out of
> order version number could persist across commits."
>
> Is the above possible even if I am using optimistic concurrency (
> http://yonik.com/solr/optimistic-concurrency/) to write documents on Solr?
>
> 2) Query subscription is going be critical part of my project and our
> subscribers won't be able to afford loss of alerts. What can I do to make
> sure that there is not loss of alerts. As long as I get error message
> whenever there is failure, I will make sure that my system re-tries/replays
> indexing that specific document.
>
> 3) Do you happen to have any stats about possibility of data loss in Solr.
> How often does that happen? Are there any best practices that we can follow
> to avoid it?
>
> 4) In general, are stream expressions robust enough to be used in
> production?
>
> 5) Is there any more deep dive documentation about topic(). I would love
> to know its stats for query volume as big as ours (9-10 million). Or, I
> would love to know how its working internally.
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>

Reply via email to