Csaba Ringhofer created IMPALA-14879:
----------------------------------------
Summary: Support incremental stats in Iceberg tables with Puffin
files
Key: IMPALA-14879
URL: https://issues.apache.org/jira/browse/IMPALA-14879
Project: IMPALA
Issue Type: Epic
Reporter: Csaba Ringhofer
Currently Impala always reads the whole table during COMPUTE STATS (unless
table sample is used). It is possible to provide efficient incremental
implementation in Iceberg (as long as deletes are ignored) by saving stats in
Puffin (e.g. Theta sketches for NDV) for the snapshot where the stats were
computed. Subsequent COMPUTE STATS calls only need to read new file and merge
the results with one in the snapshot.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)