[ 
https://issues.apache.org/jira/browse/PHOENIX-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani reassigned PHOENIX-7794:
-------------------------------------

    Assignee: Viraj Jasani

> Eventually Consistent Global Secondary Indexes
> ----------------------------------------------
>
>                 Key: PHOENIX-7794
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7794
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>
> Achieving consistently low latency at any scale is a major challenge for many 
> critical web applications/services. This requires such applications to choose 
> the right database. Distributed NoSQL databases like Apache Phoenix offer the 
> scalability and throughput required for such critical workloads. However, 
> when it comes to global secondary indexes, Phoenix provides strongly 
> consistent (synchronous) indexes. Here, index updates are tightly coupled 
> with the data table updates, meaning as the number of indexes grows, write 
> latency on the data table can increase depending on the network overhead, 
> and/or WAL replica availability of each index table. As a result, 
> applications with high write volumes and multiple indexes can experience some 
> throughput and availability degradation.
> The purpose of this Jira is to provide the Eventually Consistent Global 
> Secondary Indexes. Here, index updates are managed separately from the data 
> table updates. This keeps write latency on the data table consistently lower 
> regardless of the number of indexes created on the data table. This allows 
> high write volume applications to take advantage of the global secondary 
> indexes in Phoenix without slowing down their writes, while accepting 
> eventual consistency of the indexes.
> The design document attached to the Jira describes several possible 
> approaches to achieve this, while finalizing two approaches to provide 
> eventually consistent indexes.
> h1. Requirements for Eventually Consistent Indexes
>  # Users should be able to create eventually consistent indexes for both 
> covered and uncovered indexes.
>  # The SQL statement should include the CONSISTENCY clause to determine 
> whether the given covered or uncovered index is strongly consistent or 
> eventually consistent. By default, consider the given index as strongly 
> consistent. *CREATE INDEX <index-name> ON <data-table> ( <col1>,... <colN>) 
> INCLUDE (<col1>,...<colN>) CONSISTENCY = EVENTUAL*
>  # Users should be able to seamlessly update the CONSISTENCY property of the 
> given index from strong to eventual and vice versa using ALTER INDEX SQL 
> statement. (although the change of consistency update depends on the 
> UPDATE_CACHE_FREQUENCY used at the table level) *ALTER INDEX <index-name> ON 
> <data-table> CONSISTENCY = EVENTUAL*
>  # Depending on the use cases, data tables can consist of the mix of zero or 
> more strongly consistent indexes and zero or more eventually consistent 
> indexes.
>  # Index verification MapReduce jobs should work for the eventually 
> consistent global secondary indexes similar to how they work for the strongly 
> consistent global secondary indexes.
>  # Concurrent mutations on the data table should work for eventually 
> consistent indexes.
>  # Data table mutations need to produce and store the time ordered metadata 
> (change records) for consumers to replay them and perform the index mutation 
> RPCs.
>  # Updates to eventually consistent indexes should mirror the pre-index and 
> post-index update semantics of strongly consistent updates. However, the 
> separate RPCs for pre-index and post-index updates can be combined into a 
> single RPC call. For instance, if the data table update failed, the consumer 
> should update corresponding indexes with unverified rows (pre-index updates) 
> only. If the data table update succeeded, the consumer should update 
> corresponding indexes with verified rows (post-index update) only. The 
> consumer does not need to perform both pre and post index update RPCs on the 
> indexes.
>  # To improve the scale of index updates, mutations on indexes should be 
> executed by consuming ordered change records per table region. This allows 
> for parallel processing across all table regions.
>  # Once the data table region splits or merges into new daughter regions, any 
> remaining ordered change records from the closed parent region should be 
> processed before consuming newly generated change records for the new 
> daughter regions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to