[
https://issues.apache.org/jira/browse/SOLR-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992556#comment-16992556
]
Joel Bernstein edited comment on SOLR-14022 at 12/10/19 2:10 PM:
-----------------------------------------------------------------
I'll open a ticket to discuss further but here is a possible design that I
think would be more resilient:
1) Have a flag that turns on CDCR mode. In CDCR mode document batches are
assigned a batch number and each document has a field called "batch_number".
After each batch is added to the index a final *batch record* is added to the
index with a few fields describing the batch.
2) All updates and deletes would get a batch record. Delete records would
contain enough information to re-apply the deletes from the batch record.
3) In a separate data center a Solr Cloud would be setup to pull records from
the primary Solr Cloud. This process would involve first checking the local
index to see what the last batch processed was, and then retrieving the next N
batch records from the primary. Then for each batch record it would query the
primary Solr Cloud and retrieve records for the batch. If the batch record is
for a delete the deletes are applied. After it's finished it would also index
the batch record for the batch.
4) Another process checks the integrity of the remote Solr Cloud against the
primary Solr Cloud by reconciling the batch records.
5) The batch records can be deleted over time on the primary Solr Cloud. These
deletes would also get batch record so they can be applied automatically on the
follower Solr Clouds.
was (Author: joel.bernstein):
I'll open a ticket to discuss further but here is a possible design that I
think would be more resilient:
1) Have a flag that turns on CDCR mode. In CDCR mode document batches are
assigned a batch number and each document has a field called "batch_number".
After each batch is added to the index a final *batch record* is added to the
index with a few fields describing the batch.
2) All updates and deletes would get a batch record. Delete records would
contain enough information to re-apply the deletes from the batch record.
3) In a separate data center a Solr Cloud would be setup to pull records from
the primary Solr Cloud. This process would involve first checking the local
index to see what the last batch processed was, and then retrieving the next N
batch records from the primary. Then for each batch record it would query the
primary Solr Cloud and retrieve records for the batch. If the batch record is
for a delete the deletes are applied. After it's finished it would also index
the batch record for the batch.
4) Another process checks the integrity of the remote Solr Cloud against the
primary Solr Cloud by reconciling the batch records.
5) The records can be deleted over time on the primary Solr Cloud. These
deletes would also get batch record so they can be applied automatically on the
follower Solr Clouds.
> Remove CDCR from Solr
> ---------------------
>
> Key: SOLR-14022
> URL: https://issues.apache.org/jira/browse/SOLR-14022
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: CDCR
> Reporter: Joel Bernstein
> Priority: Major
>
> This ticket will remove CDCR from Solr
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]