huaxingao commented on code in PR #16961: URL: https://github.com/apache/iceberg/pull/16961#discussion_r3522062607
########## format/index.md: ########## @@ -0,0 +1,322 @@ +--- +title: "Index Spec" +--- +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Index Specification + +## Background and Motivation + +Indexes enable query engines to locate relevant rows without scanning entire datasets. +They can accelerate point lookups, range predicates, and other retrieval patterns +while preserving Iceberg's table format, snapshot isolation, and interoperability. + +This specification defines: + +- A portable metadata format for indexes +- A catalog-managed index object independent from table metadata +- A common storage architecture for index data +- A framework for defining new index types and transform functions + +Indexes are optional. Engines may choose to create, maintain, consume, or ignore them. + +## Goals + +- Define a common metadata format for indexes +- Allow indexes to evolve independently from table metadata +- Enable index sharing across engines +- Provide a foundation for future index types + +## Definitions + +### Index Type + +The index type defines the logical category of an index and the class of queries it is designed to accelerate. + +The following index type is defined in this specification: + +| Type | +|--------| +| SCALAR | + +The following index types are reserved for future specifications: + +| Type | +|--------| +| VECTOR | +| TERM | + +The index type communicates the capabilities of an index to query engines and helps determine whether an index is +applicable to a particular query. + +### Index Transform Function + +The index transform function defines how the index organization key is derived from source table columns when rows are +stored in the index. + +The transform function determines the physical organization of the indexed data and therefore influences which query +patterns can efficiently leverage the index. + +The following index types are reserved for future specifications: + +| Transform | +|-----------| +| IDENTITY | +| HASH | +| HILBERT | + +The following transform functions are reserved for future specifications: + +| Transform | +|-----------| +| IVF | + +Index Instances may share the same index type while using different transform functions. + +### Index Instance + +An index instance is a concrete realization of an index type and function applied to a specific table. + +Users create index instances by specifying: + +- The source table +- The index type +- The transform function +- The indexed columns +- The included columns +- Index properties + +Multiple instances of the same index type may exist for a table. + +### Index Snapshot + +An index snapshot is an immutable version of the index data generated from a specific table snapshot. + +Each index snapshot references a complete set of index files and contains all data from the referenced table snapshot. + +## Overview + +Indexes are stored as independent catalog objects. + +Like Iceberg tables, views, and functions: + +- Metadata and data files are immutable +- Updates create new metadata files +- Catalogs perform atomic metadata swaps + +Index data is stored separately from metadata. + +Each index snapshot references a tracking file which describes the leaf files belonging to the snapshot. + +```text +Index Metadata + | + +-- Index Snapshot + | + +-- Tracking File + | + +-- Leaf Data Files +``` + +Transform functions derive a transform value from the key columns and determine how index entries are organized within +the leaf files. +- The transform value space is divided into non-overlapping ranges. +- Each leaf file stores entries for a single range. +- The tracking file stores range bounds for each leaf file. + +This structure enables efficient planning while keeping the data layout flexible for different index implementations. + +## Index Metadata + +The index metadata file stores the index definition and snapshot history. + +### Index Metadata File + +| Requirement | Field | Type | Description | +|-------------|---------------------|--------------------------|-------------------------------------------------| +| required | format-version | int | Index specification version | +| required | uuid | string | Stable UUID assigned at creation | +| required | table-uuid | string | UUID of the indexed table | +| required | location | string | Index root location | +| required | type | string | Logical index type | Review Comment: You're right, @fpj. An engine that encounters a type it doesn't recognize can just ignore that index and fall back to a normal scan, with no correctness risk. So a strictly closed set isn't required for correctness, and custom indexes should be allowed. I think the two goals reconcile: define a set of standard, well-known types in the spec so engines that recognize a type agree on its semantics (interoperability), while still allowing custom/experimental types that unrecognized readers safely ignore. We'd just want custom names namespaced so they don't collide with future standard names. ########## format/index.md: ########## @@ -0,0 +1,309 @@ +--- +title: "Index Spec" +--- +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Index Specification + +## Background and Motivation + +Indexes enable query engines to locate relevant rows without scanning entire datasets. +They can accelerate point lookups, range predicates, and other retrieval patterns +while preserving Iceberg's table format, snapshot isolation, and interoperability. + +Indexes are optional. Engines may choose to create, maintain, consume, or ignore them. + +## Goals + +- Define a portable metadata format for indexes +- Provide a common storage architecture for index data +- Expose indexes as catalog-managed objects +- Allow indexes to be operated independently from source table metadata +- Enable index sharing across engines +- Provide a framework for defining new index types and transform functions + +## Overview + +Indexes are stored as a collection of files with some Iceberg table like semantics. At a high level they consist of a tracking file (similar to a root manifest file) which contains listings for a defined set of leaf files (similar to data files.) Leaf files store an ordered set of rows containing at least a key and the path of a Iceberg Table data file and the position within that file where the row where that key is stored. The organization of leaf files is defined by an Index Transform Function which varies based on the type of index. This structure is recorded in an Index metadata.json file which contains a set of snapshots, each of which points to a single tracking file mapping to the complete state of an Iceberg table at a given Iceberg table snapshot. + +Like Iceberg tables, views, and functions: + +- Metadata files (index metadata and tracking files) and data files (leaf files) are immutable +- Updates create new metadata files +- Catalogs perform atomic metadata swaps + +Each index snapshot references a tracking file which describes the leaf files belonging to the snapshot. + +```text +Index Metadata + | + +-- Index Snapshot + | + +-- Tracking File + | + +-- Leaf Data Files +``` + +Transform functions derive a transform value from the key columns and determine how index entries are organized within +the leaf files. +- The transform value space is divided into non-overlapping ranges. +- Each leaf file stores entries for a single range. +- The tracking file stores range bounds for each leaf file. + +This structure enables efficient planning while keeping the data layout flexible for different index implementations. + +## Definitions + +### Index Type + +The index type defines the logical category of an index and the class of queries it is designed to accelerate. + +The metadata, snapshot, tracking-file, and leaf-file structures defined in this specification form a generic framework shared by all index types. Each index type builds on this framework by defining its type-specific details, such as the leaf schema and the applicable transform functions. + +The following index type is fully defined in this specification: + +| Type | Description | +|--------|------------------------------------------------------------------| +| SCALAR | Maps scalar key values to their locations for equality lookups. | + +The following index types are reserved for future specifications. Their identifiers are claimed so that engines and catalog implementations recognize them as valid type names and handle them gracefully, but this specification defines no type-specific requirements (leaf schema, transforms, or query semantics) for them: + +| Type | Description | +|--------|----------------------------------------------------------| +| VECTOR | Reserved for similarity search over vector embeddings. | +| TERM | Reserved for text/term search. | + +The index type communicates the capabilities of an index to query engines and helps determine whether an index is +applicable to a particular query. + +### Index Transform Function + +The index transform function defines how the transform value is derived from the key columns when rows are stored in the +index. The following terms are used throughout this specification: + +- **Key columns**: the source-table columns the transform function is applied to. +- **Transform value**: the value produced by applying the transform function to a row's key columns. Index entries are organized by transform value. +- **Included columns**: optional source-table columns copied into the index for read convenience. They do not affect how the index is organized. + +The transform function determines the physical organization of the indexed data and therefore influences which query +patterns can efficiently leverage the index. + +The following transform functions are defined in this specification. The bound interpretation describes what the +transform-value bounds stored in the tracking file represent for each transform: + +| Transform | Bound Interpretation | +|-----------|----------------------| +| IDENTITY | Original value range | +| HASH | Hash bucket range | +| HILBERT | Hilbert key range | + +The following transform function is reserved for future specifications: + +| Transform | Bound Interpretation | +|-----------|---------------------------| +| IVF | Centroid identifier range | + +An index type does not fix a single transform function; the same index type can be realized with different transform functions. + +### Index Instance + +An index instance is a concrete realization of an index type and function applied to a specific table. + +Users create index instances by specifying: + +- The source table +- The index type +- The transform function +- The key columns +- The included columns (optional) +- Index properties (optional) + +Multiple instances of the same index type may exist for a table. + +### Index Snapshot + +An index snapshot is an immutable version of the index data generated from a specific table snapshot. + +Each index snapshot references a complete set of index files and contains all data from the referenced table snapshot. + +## Index Metadata + +The index metadata file stores the index definition and snapshot history. + +### Index Metadata File + +| Requirement | Field | Type | Description | +|-------------|---------------------|--------------------------|-------------------------------------------------| +| required | format-version | int | Index specification version | +| required | uuid | string | Stable UUID assigned at creation | +| required | table-uuid | string | UUID of the indexed table | +| required | location | string | Index root location | +| required | type | string | Logical index type | +| required | transform-function | string | Physical organization transform | +| required | key-column-ids | list<int> | Source-table column IDs the transform is applied to (key columns) | +| optional | included-column-ids | list<int> | Source-table column IDs copied into the index for read convenience (included columns) | +| optional | properties | map<string,string> | Index properties applicable for every snapshot | +| optional | current-snapshot-id | long | ID of the current index snapshot | +| required | snapshots | list<index-snapshot> | Index snapshots | + +## Index Snapshot + +Each index snapshot corresponds to one version of the index data. + +| Requirement | Field | Type | Description | +|-------------|--------------------------|--------------------|---------------------------------------------------------------------| +| required | snapshot-id | long | Index snapshot identifier | +| required | source-table-snapshot-id | long | Source table snapshot | +| required | timestamp-ms | long | Snapshot creation timestamp | +| required | tracking-file | string | Tracking file location | +| optional | properties | map<string,string> | Snapshot properties specific to this snapshot | +| optional | key-metadata | binary | Implementation-specific key metadata, for tracking file encryption. | + +## Tracking File + +Each index snapshot references exactly one tracking file. + +It contains summary metadata about all leaf files belonging to the index snapshot and enables efficient planning +without scanning every leaf file. + +The tracking file may be stored using any supported metadata file format. + +### Leaf File Entry + +Each tracking file contains a collection of leaf file entries. A leaf file entry describes a single leaf file +tracked by an index snapshot. The fields are the subset of the V4 manifest entry fields that are relevant to planning +queries against the index. +Entries contain aggregated statistics for all referenced leaf files, enabling engines to perform pruning and planning +without opening every leaf file. + +| Field ID | Name | Type | Requirement | Description | +|----------|--------------------|---------|--------------|--------------------------------------------------------------------------------------------------------------| +| 100 | location | string | required | Location of the referenced file. | +| 101 | file_format | string | required | File format name, such as parquet, avro, or orc. | Review Comment: The `file_format` here is for the leaf files, which are tabular sorted-row collections (transform_value, key columns, optional included columns, file_path, position), so they use columnar/row formats (Parquet/Avro/ORC) that give a typed schema, sorted layout, and per-column stats. Puffin is a blob container for whole-read artifacts like sketches and deletion vectors; it has no columnar schema or sorted-row layout, so it isn't a fit for leaf data. The "index" Puffin refers to is that blob/sketch style, not this leaf layout. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
