[jira] [Comment Edited] (SOLR-14022) Remove CDCR from Solr

Joel Bernstein (Jira) Tue, 10 Dec 2019 06:11:31 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992556#comment-16992556
 ]


Joel Bernstein edited comment on SOLR-14022 at 12/10/19 2:10 PM:
-----------------------------------------------------------------

I'll open a ticket to discuss further but here is a possible design that I 
think would be more resilient:

1) Have a flag that turns on CDCR mode. In CDCR mode document batches are 
assigned a batch number and each document has a field called "batch_number". 
After each batch is added to the index a final *batch record* is added to the 
index with a few fields describing the batch.

2) All updates and deletes would get a batch record. Delete records would 
contain enough information to re-apply the deletes from the batch record.

3) In a separate data center a Solr Cloud would be setup to pull records from 
the primary Solr Cloud. This process would involve first checking the local 
index to see what the last batch processed was, and then retrieving the next N 
batch records from the primary. Then for each batch record it would query the 
primary Solr Cloud and retrieve records for the batch. If the batch record is 
for a delete the deletes are applied. After it's finished it would also index 
the batch record for the batch.

4) Another process checks the integrity of the remote Solr Cloud against the 
primary Solr Cloud by reconciling the batch records.

5) The batch records can be deleted over time on the primary Solr Cloud. These 
deletes would also get batch record so they can be applied automatically on the 
follower Solr Clouds. 

 


was (Author: joel.bernstein):
I'll open a ticket to discuss further but here is a possible design that I 
think would be more resilient:

1) Have a flag that turns on CDCR mode. In CDCR mode document batches are 
assigned a batch number and each document has a field called "batch_number". 
After each batch is added to the index a final *batch record* is added to the 
index with a few fields describing the batch.

2) All updates and deletes would get a batch record. Delete records would 
contain enough information to re-apply the deletes from the batch record.

3) In a separate data center a Solr Cloud would be setup to pull records from 
the primary Solr Cloud. This process would involve first checking the local 
index to see what the last batch processed was, and then retrieving the next N 
batch records from the primary. Then for each batch record it would query the 
primary Solr Cloud and retrieve records for the batch. If the batch record is 
for a delete the deletes are applied. After it's finished it would also index 
the batch record for the batch.

4) Another process checks the integrity of the remote Solr Cloud against the 
primary Solr Cloud by reconciling the batch records.

5) The records can be deleted over time on the primary Solr Cloud. These 
deletes would also get batch record so they can be applied automatically on the 
follower Solr Clouds. 

 

> Remove CDCR from Solr
> ---------------------
>
>                 Key: SOLR-14022
>                 URL: https://issues.apache.org/jira/browse/SOLR-14022
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: CDCR
>            Reporter: Joel Bernstein
>            Priority: Major
>
> This ticket will remove CDCR from Solr



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-14022) Remove CDCR from Solr

Reply via email to