Kadir OZDEMIR created PHOENIX-6677:
--------------------------------------
Summary: Parallelism within a batch of mutations
Key: PHOENIX-6677
URL: https://issues.apache.org/jira/browse/PHOENIX-6677
Project: Phoenix
Issue Type: Improvement
Reporter: Kadir OZDEMIR
Fix For: 4.17.0, 5.2.0
Currently, Phoenix client simply passes the batches of row mutations from the
application to HBase client without any parallelism or intelligent grouping
(except grouping mutations for the same row).
Assume that the application creates batches 10000 row mutations for a given
table. Phoenix client divides these rows based on their arrival order into
HBase batches of n (e.g., 100) rows based on the configured batch size, i.e.,
the number of rows and bytes. Then, Phoenix calls HBase batch API, one batch at
a time (i.e., serially). HBase client further divides a given batch of rows
into smaller batches based on their regions. This means that a large batch
created by the application is divided into many tiny batches and executed
mostly serially. For slated tables, this will result in even smaller batches.
We can improve the current implementation greatly if we group the rows of the
batch prepared by the application into sub batches based on table region
boundaries and then execute these batches in parallel.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)