I've implemented a custom solr2solr ongoing unidirectional replication
mechanism.
A Replicator (acting as solrJ client), crawls documents from SolrCloud1 and
writes them to SolrCloud2 in batches.
The replicator crawl logic is to read documents with a time greater/equale
to the time of the last replicated document.
Whenever a document is added/updated, I auto updated a a tdate field
"last_updated_in_solr" using TimestampUpdateProcessorFactory.
*My problem: *When a client indexes a batch of 100 documents, all 100 docs
have the same "last_updated_in_solr" value. This makes my ongoing
replication check for new documents to replicate much more complex than if
the time value was unique.
1. Can I use some other processor to generate increasing unique values?
2. Can I use the internal _version_ field for this? is it guaranteed to be
monotonically increasing for the entire collection or only per document,
with each add/update?
Any other options?
Schema.xml:
<field name="last_updated_in_solr" type="tdate" indexed="true"
stored="true" multiValued="false"/>
solrconfig.xml:
<updateRequestProcessorChain name="default">
<processor class="solr.TimestampUpdateProcessorFactory">
<str name="fieldName">last_updated_in_solr</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
I know there's work for a build-in replication mechanism, but it's not yet
released.
Using Solr 4.7.2.