[ 
https://issues.apache.org/jira/browse/PHOENIX-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Vissapragada reassigned PHOENIX-5315:
---------------------------------------------

    Assignee: Bharath Vissapragada

> Cross cluster replication of the base table only should be sufficient
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-5315
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5315
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Andrew Kyle Purtell
>            Assignee: Bharath Vissapragada
>            Priority: Major
>
> When replicating Phoenix tables using the HBase cross cluster replication 
> facility, it should be sufficient (and must, for correctness and avoidance of 
> race conditions and inconsistencies) to replicate the base table only. On the 
> sink cluster the replication client's application of mutations from the 
> replication stream to the local base table should trigger all necessary index 
> update operations. To the extent that won't happen now due to implementation 
> details, those details should be reworked.
> This also has important efficiency benefits: no matter how many indexes are 
> defined for a base table, only the base table updates need be replicated 
> (presuming Phoenix schema is synchronized over all sites by some other 
> external means).
> This would likely constitute multiple components, so we should use this issue 
> as an umbrella. We'd need:
>  # A Phoenix implementation of HBase's ReplicationEndpoint that tails the WAL 
> like a normal replication endpoint. However, rather than writing to HBase's 
> replication sink APIs (which create HBase RPCs to a remote cluster), they 
> should write to a new Phoenix Endpoint coprocessor.
>  # An HBase coprocessor Endpoint hook that takes in a request from a remote 
> cluster (containing both the WALEdit's data and the WALKey's annotated 
> metadata telling the remote cluster what tenant_id, logical tablename, and 
> timestamp the data is associated with). Ideally the API's message format 
> should be configurable, and could be either a protobuf or an Avro schema 
> similar to the one described by PHOENIX-5443. The endpoint hook would take 
> the metadata + data and regenerate a complete set of Phoenix mutations, both 
> data and indexes, just as the phoenix client did for the original SQL 
> statement that generated the source-side edits. These mutations would be 
> written to the remote cluster by the normal Phoenix write path. 
> (Unfortunately, HBase uses the term "endpoint" to mean both a replication 
> plugin, AND a stored-procedure-like coprocessor hook. To be clear, 1 is a 
> replication plugin, 2 is a coprocessor hook)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to