On 2/10/2016 8:02 AM, tedsolr wrote: > I have my head wrapped around sending index requests in parallel, but in a > later post you mentioned how you separately track the most recent update and > are able to sync from that point if needed. That I don't get. Is it an index > version you are tracking? Is the version number unique to the cluster or is > a per collection/shard number? So you have built something that can read the > zookeeper transaction logs and forward updates to another cluster. That's > clever. If you have time to provide any more pointers on implementing such a > solution I would really appreciate it.
What I track is the position within the source MySQL database indicating what data still needs to be indexed. I do this by tracking the last value in the autoincrement primary key columns for various tables in the database. My build program knows what data still needs to be inserted, as well as which records in the delete table and reindex table still need to be processed. The saved positions for these things are only updated if the entire index cycle (deletes, reindexes, and inserts) is successful. These pieces of information are tracked separately for each copy of the index. The way the entire system works is somewhat complex, but above are the immediately relevant pieces of information for your question. I do not use SolrCloud, so there is no zookeeper, but this approach would also work there. Thanks, Shawn