On 2/10/2016 8:02 AM, tedsolr wrote:
> I have my head wrapped around sending index requests in parallel, but in a
> later post you mentioned how you separately track the most recent update and
> are able to sync from that point if needed. That I don't get. Is it an index
> version you are tracking? Is the version number unique to the cluster or is
> a per collection/shard number? So you have built something that can read the
> zookeeper transaction logs and forward updates to another cluster. That's
> clever. If you have time to provide any more pointers on implementing such a
> solution I would really appreciate it.

What I track is the position within the source MySQL database indicating
what data still needs to be indexed.  I do this by tracking the last
value in the autoincrement primary key columns for various tables in the
database.

My build program knows what data still needs to be inserted, as well as
which records in the delete table and reindex table still need to be
processed.  The saved positions for these things are only updated if the
entire index cycle (deletes, reindexes, and inserts) is successful. 
These pieces of information are tracked separately for each copy of the
index.

The way the entire system works is somewhat complex, but above are the
immediately relevant pieces of information for your question.

I do not use SolrCloud, so there is no zookeeper, but this approach
would also work there.

Thanks,
Shawn

Reply via email to