[
https://issues.apache.org/jira/browse/PHOENIX-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani reassigned PHOENIX-7794:
-------------------------------------
Assignee: Viraj Jasani
> Eventually Consistent Global Secondary Indexes
> ----------------------------------------------
>
> Key: PHOENIX-7794
> URL: https://issues.apache.org/jira/browse/PHOENIX-7794
> Project: Phoenix
> Issue Type: New Feature
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
>
> Achieving consistently low latency at any scale is a major challenge for many
> critical web applications/services. This requires such applications to choose
> the right database. Distributed NoSQL databases like Apache Phoenix offer the
> scalability and throughput required for such critical workloads. However,
> when it comes to global secondary indexes, Phoenix provides strongly
> consistent (synchronous) indexes. Here, index updates are tightly coupled
> with the data table updates, meaning as the number of indexes grows, write
> latency on the data table can increase depending on the network overhead,
> and/or WAL replica availability of each index table. As a result,
> applications with high write volumes and multiple indexes can experience some
> throughput and availability degradation.
> The purpose of this Jira is to provide the Eventually Consistent Global
> Secondary Indexes. Here, index updates are managed separately from the data
> table updates. This keeps write latency on the data table consistently lower
> regardless of the number of indexes created on the data table. This allows
> high write volume applications to take advantage of the global secondary
> indexes in Phoenix without slowing down their writes, while accepting
> eventual consistency of the indexes.
> The design document attached to the Jira describes several possible
> approaches to achieve this, while finalizing two approaches to provide
> eventually consistent indexes.
> h1. Requirements for Eventually Consistent Indexes
> # Users should be able to create eventually consistent indexes for both
> covered and uncovered indexes.
> # The SQL statement should include the CONSISTENCY clause to determine
> whether the given covered or uncovered index is strongly consistent or
> eventually consistent. By default, consider the given index as strongly
> consistent. *CREATE INDEX <index-name> ON <data-table> ( <col1>,... <colN>)
> INCLUDE (<col1>,...<colN>) CONSISTENCY = EVENTUAL*
> # Users should be able to seamlessly update the CONSISTENCY property of the
> given index from strong to eventual and vice versa using ALTER INDEX SQL
> statement. (although the change of consistency update depends on the
> UPDATE_CACHE_FREQUENCY used at the table level) *ALTER INDEX <index-name> ON
> <data-table> CONSISTENCY = EVENTUAL*
> # Depending on the use cases, data tables can consist of the mix of zero or
> more strongly consistent indexes and zero or more eventually consistent
> indexes.
> # Index verification MapReduce jobs should work for the eventually
> consistent global secondary indexes similar to how they work for the strongly
> consistent global secondary indexes.
> # Concurrent mutations on the data table should work for eventually
> consistent indexes.
> # Data table mutations need to produce and store the time ordered metadata
> (change records) for consumers to replay them and perform the index mutation
> RPCs.
> # Updates to eventually consistent indexes should mirror the pre-index and
> post-index update semantics of strongly consistent updates. However, the
> separate RPCs for pre-index and post-index updates can be combined into a
> single RPC call. For instance, if the data table update failed, the consumer
> should update corresponding indexes with unverified rows (pre-index updates)
> only. If the data table update succeeded, the consumer should update
> corresponding indexes with verified rows (post-index update) only. The
> consumer does not need to perform both pre and post index update RPCs on the
> indexes.
> # To improve the scale of index updates, mutations on indexes should be
> executed by consuming ordered change records per table region. This allows
> for parallel processing across all table regions.
> # Once the data table region splits or merges into new daughter regions, any
> remaining ordered change records from the closed parent region should be
> processed before consuming newly generated change records for the new
> daughter regions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)