Peter Rozsa created IMPALA-14908:
------------------------------------
Summary: OPTIMIZE statement leaves equality-delete files in
metadata
Key: IMPALA-14908
URL: https://issues.apache.org/jira/browse/IMPALA-14908
Project: IMPALA
Issue Type: Bug
Components: Frontend
Reporter: Peter Rozsa
Assignee: Noémi Pap-Takács
OPTIMIZE uses planFiles to collect all data files with associated deletes
during the catalog finalization phase. Iceberg's planFiles applies column-range
statistics to prune equality-delete files from scan tasks - if a delete file's
target value does not overlap with a data file's column bounds, it is excluded
from that file's FileScanTask.deletes(). As a result, the rewrite operation
never sees those equality-delete files, and they are not passed to
rewrite.deleteFile(). The new snapshot therefore still contains the
equality-delete files after OPTIMIZE completes.
Steps to reproduce (rollback required after execution):
OPTIMIZE TABLE functional_parquet.iceberg_v2_delete_equality;
--
This message was sent by Atlassian Jira
(v8.20.10#820010)