Re: [PR] Spec v3: Add deletion vectors to the table spec [iceberg]

via GitHub Mon, 14 Oct 2024 18:50:21 -0700


szehon-ho commented on code in PR #11240:
URL: https://github.com/apache/iceberg/pull/11240#discussion_r1800282377



##########
format/spec.md:
##########
@@ -454,35 +457,40 @@ The schema of a manifest file is a struct called 
`manifest_entry` with the follo
 
 `data_file` is a struct with the following fields:
 
-| v1         | v2         | Field id, name                    | Type           
              | Description |
-| ---------- | ---------- 
|-----------------------------------|------------------------------|-------------|
-|            | _required_ | **`134  content`**                | `int` with 
meaning: `0: DATA`, `1: POSITION DELETES`, `2: EQUALITY DELETES` | Type of 
content stored by the data file: data, equality deletes, or position deletes 
(all v1 files are data files) |
-| _required_ | _required_ | **`100  file_path`**              | `string`       
              | Full URI for the file with FS scheme |
-| _required_ | _required_ | **`101  file_format`**            | `string`       
              | String file format name, avro, orc or parquet |
-| _required_ | _required_ | **`102  partition`**              | `struct<...>`  
              | Partition data tuple, schema based on the partition spec output 
using partition field ids for the struct field ids |
-| _required_ | _required_ | **`103  record_count`**           | `long`         
              | Number of records in this file |
-| _required_ | _required_ | **`104  file_size_in_bytes`**     | `long`         
              | Total file size in bytes |
-| _required_ |            | ~~**`105 block_size_in_bytes`**~~ | `long`         
              | **Deprecated. Always write a default in v1. Do not write in 
v2.** |
-| _optional_ |            | ~~**`106  file_ordinal`**~~       | `int`          
              | **Deprecated. Do not write.** |
-| _optional_ |            | ~~**`107  sort_columns`**~~       | `list<112: 
int>`             | **Deprecated. Do not write.** |
-| _optional_ | _optional_ | **`108  column_sizes`**           | `map<117: int, 
118: long>`   | Map from column id to the total size on disk of all regions 
that store the column. Does not include bytes necessary to read other columns, 
like footers. Leave null for row-oriented formats (Avro) |
-| _optional_ | _optional_ | **`109  value_counts`**           | `map<119: int, 
120: long>`   | Map from column id to number of values in the column (including 
null and NaN values) |
-| _optional_ | _optional_ | **`110  null_value_counts`**      | `map<121: int, 
122: long>`   | Map from column id to number of null values in the column |
-| _optional_ | _optional_ | **`137  nan_value_counts`**       | `map<138: int, 
139: long>`   | Map from column id to number of NaN values in the column |
-| _optional_ | _optional_ | **`111  distinct_counts`**        | `map<123: int, 
124: long>`   | Map from column id to number of distinct values in the column; 
distinct counts must be derived using values in the file by counting or using 
sketches, but not using methods like merging existing distinct counts |
-| _optional_ | _optional_ | **`125  lower_bounds`**           | `map<126: int, 
127: binary>` | Map from column id to lower bound in the column serialized as 
binary [1]. Each value must be less than or equal to all non-null, non-NaN 
values in the column for the file [2] |
-| _optional_ | _optional_ | **`128  upper_bounds`**           | `map<129: int, 
130: binary>` | Map from column id to upper bound in the column serialized as 
binary [1]. Each value must be greater than or equal to all non-null, non-Nan 
values in the column for the file [2] |
-| _optional_ | _optional_ | **`131  key_metadata`**           | `binary`       
              | Implementation-specific key metadata for encryption |
-| _optional_ | _optional_ | **`132  split_offsets`**          | `list<133: 
long>`            | Split offsets for the data file. For example, all row group 
offsets in a Parquet file. Must be sorted ascending |
-|            | _optional_ | **`135  equality_ids`**           | `list<136: 
int>`             | Field ids used to determine row equality in equality delete 
files. Required when `content=2` and should be null otherwise. Fields with ids 
listed in this column must be present in the delete file |
-| _optional_ | _optional_ | **`140  sort_order_id`**          | `int`          
              | ID representing sort order for this file [3]. |
+| v1         | v2         | v3         | Field id, name                    | 
Type                         | Description |
+| ---------- | ---------- | ---------- 
|-----------------------------------|------------------------------|-------------|
+|            | _required_ | _required_ | **`134  content`**                | 
`int` with meaning: `0: DATA`, `1: POSITION DELETES`, `2: EQUALITY DELETES` | 
Type of content stored by the data file: data, equality deletes, or position 
deletes (all v1 files are data files) |
+| _required_ | _required_ | _required_ | **`100  file_path`**              | 
`string`                     | Full URI for the file with FS scheme |
+| _required_ | _required_ | _required_ | **`101  file_format`**            | 
`string`                     | String file format name, `avro`, `orc`, 
`parquet`, or `puffin` |
+| _required_ | _required_ | _required_ | **`102  partition`**              | 
`struct<...>`                | Partition data tuple, schema based on the 
partition spec output using partition field ids for the struct field ids |
+| _required_ | _required_ | _required_ | **`103  record_count`**           | 
`long`                       | Number of records in this file, or the 
cardinality of a deletion vector |
+| _required_ | _required_ | _required_ | **`104  file_size_in_bytes`**     | 
`long`                       | Total file size in bytes |
+| _required_ |            |            | ~~**`105 block_size_in_bytes`**~~ | 
`long`                       | **Deprecated. Always write a default in v1. Do 
not write in v2.** |
+| _optional_ |            |            | ~~**`106  file_ordinal`**~~       | 
`int`                        | **Deprecated. Do not write.** |
+| _optional_ |            |            | ~~**`107  sort_columns`**~~       | 
`list<112: int>`             | **Deprecated. Do not write.** |
+| _optional_ | _optional_ | _optional_ | **`108  column_sizes`**           | 
`map<117: int, 118: long>`   | Map from column id to the total size on disk of 
all regions that store the column. Does not include bytes necessary to read 
other columns, like footers. Leave null for row-oriented formats (Avro) |
+| _optional_ | _optional_ | _optional_ | **`109  value_counts`**           | 
`map<119: int, 120: long>`   | Map from column id to number of values in the 
column (including null and NaN values) |
+| _optional_ | _optional_ | _optional_ | **`110  null_value_counts`**      | 
`map<121: int, 122: long>`   | Map from column id to number of null values in 
the column |
+| _optional_ | _optional_ | _optional_ | **`137  nan_value_counts`**       | 
`map<138: int, 139: long>`   | Map from column id to number of NaN values in 
the column |
+| _optional_ | _optional_ | _optional_ | **`111  distinct_counts`**        | 
`map<123: int, 124: long>`   | Map from column id to number of distinct values 
in the column; distinct counts must be derived using values in the file by 
counting or using sketches, but not using methods like merging existing 
distinct counts |
+| _optional_ | _optional_ | _optional_ | **`125  lower_bounds`**           | 
`map<126: int, 127: binary>` | Map from column id to lower bound in the column 
serialized as binary [1]. Each value must be less than or equal to all 
non-null, non-NaN values in the column for the file [2] |
+| _optional_ | _optional_ | _optional_ | **`128  upper_bounds`**           | 
`map<129: int, 130: binary>` | Map from column id to upper bound in the column 
serialized as binary [1]. Each value must be greater than or equal to all 
non-null, non-Nan values in the column for the file [2] |
+| _optional_ | _optional_ | _optional_ | **`131  key_metadata`**           | 
`binary`                     | Implementation-specific key metadata for 
encryption |
+| _optional_ | _optional_ | _optional_ | **`132  split_offsets`**          | 
`list<133: long>`            | Split offsets for the data file. For example, 
all row group offsets in a Parquet file. Must be sorted ascending |
+|            | _optional_ | _optional_ | **`135  equality_ids`**           | 
`list<136: int>`             | Field ids used to determine row equality in 
equality delete files. Required when `content=2` and should be null otherwise. 
Fields with ids listed in this column must be present in the delete file |
+| _optional_ | _optional_ | _optional_ | **`140  sort_order_id`**          | 
`int`                        | ID representing sort order for this file [3]. |
+|            | _optional_ | _optional_ | **`143  referenced_data_file`**   | 
`string`                     | Fully qualified location (URI with FS scheme) of 
a data file that all deletes reference [4] |
+|            |            | _optional_ | **`144  blob_offset`**            | 
`long`                       | The offset in the file where the content starts 
[5] |
+|            |            | _optional_ | **`145  blob_size_in_bytes`**     | 
`long`                       | The length of a referenced blob stored in the 
file [5] |
 
 Notes:
 
 1. Single-value serialization for lower and upper bounds is detailed in 
Appendix D.
 2. For `float` and `double`, the value `-0.0` must precede `+0.0`, as in the 
IEEE 754 `totalOrder` predicate. NaNs are not permitted as lower or upper 
bounds.
 3. If sort order ID is missing or unknown, then the order is assumed to be 
unsorted. Only data files and equality delete files should be written with a 
non-null order id. [Position deletes](#position-delete-files) are required to 
be sorted by file and position, not a table order, and should set sort order id 
to null. Readers must ignore sort order id for position delete files.
-4. The following field ids are reserved on `data_file`: 141.
+4. Position delete metadata can use `referenced_data_file` when all deletes 
tracked by the entry are in a single data file. Setting the referenced file is 
required for deletion vectors.
+5. The `blob_offset` and `blob_size_in_bytes` fields are used to reference a 
specific blob in a Puffin file for direct access to a deletion vector. The 
values must exactly match the `offset` and `length` stored in the Puffin footer 
for the deletion vector blob.

Review Comment:
   May be worth to mention, that both must be set if one is?



##########
format/spec.md:
##########
@@ -841,19 +855,45 @@ Notes:
 
 ## Delete Formats
 
-This section details how to encode row-level deletes in Iceberg delete files. 
Row-level deletes are not supported in v1.
+This section details how to encode row-level deletes in Iceberg delete files. 
Row-level deletes are added by v2 and are not supported in v1. Deletion vectors 
are added in v3 and are not supported in v2 or earlier. Position delete files 
must not be added to v3 tables, but existing position delete files are valid.
+
+There are three types of row-level deletes:
+* Deletion vectors (DVs) identify deleted rows within a single referenced data 
file by position in a bitmap
+* Position delete files identify deleted rows by file location and row 
position (**deprecated**)
+* Equality delete files identify deleted rows by the value of one or more 
columns
+
+Deletion vectors are a binary representation of deletes for a single data file 
that is more efficient at execution time than position delete files. Unlike 
equality or position delete files, there can be at most one deletion vector for 
a given data file in a table. Writers must ensure that there is at most one 
deletion vector per data file and must merge new deletes with existing vectors 
or position delete files.
+
+Row-level delete files (both equality and position delete files) are valid 
Iceberg data files: files must use valid Iceberg formats, schemas, and column 
projection. It is recommended that these delete files are written using the 
table's default file format.
+
+Row-level delete files and deletion vectors are tracked by manifests. A 
separate set of manifests is used for delete files and DVs, but the same 
manifest schema is used for both data and delete manifests. Deletion vectors 
are tracked individually by file location, offset, and length within the 
containing file. Deletion vector metadata must include the referenced data file.
+
+Both position and equality delete files allow encoding deleted row values with 
a delete. This can be used to reconstruct a stream of changes to a table.
+
 
-Row-level delete files are valid Iceberg data files: files must use valid 
Iceberg formats, schemas, and column projection. It is recommended that delete 
files are written using the table's default file format.
+### Deletion Vectors
 
-Row-level delete files are tracked by manifests, like data files. A separate 
set of manifests is used for delete files, but the manifest schemas are 
identical.
+Deletion vectors identify deleted rows of a file by encoding deleted positions 
in a bitmap. A set bit at position P indicates that the row at position P is 
deleted.
 
-Both position and equality deletes allow encoding deleted row values with a 
delete. This can be used to reconstruct a stream of changes to a table.
+These vectors are stored using the `delete-vector-v1` blob definition from the 
[Puffin spec][puffin-spec].
 
+Deletion vectors support positive 64-bit positions, but are optimized for 
cases where most positions fit in 32 bits by using a collection of 32-bit 
Roaring bitmaps. 64-bit positions are divided into a 32-bit "key" using the 
most significant 4 bytes and a 32-bit sub-position using the least significant 
4 bytes. For each key in the set of positions, a 32-bit Roaring bitmap is 
maintained to store a set of 32-bit sub-positions for that key.
+
+To test whether a certain position is set, its most significant 4 bytes (the 
key) are used to find a 32-bit bitmap and the least significant 4 bytes (the 
sub-position) are tested for inclusion in the bitmap. If a bitmap is not found 
for the key, then it is not set.

Review Comment:
   Should be combined with previous paragraph, I feel its continuing the 
concept of the last sentence of that paragraph.



##########
format/spec.md:
##########
@@ -454,35 +457,40 @@ The schema of a manifest file is a struct called 
`manifest_entry` with the follo
 
 `data_file` is a struct with the following fields:
 
-| v1         | v2         | Field id, name                    | Type           
              | Description |
-| ---------- | ---------- 
|-----------------------------------|------------------------------|-------------|
-|            | _required_ | **`134  content`**                | `int` with 
meaning: `0: DATA`, `1: POSITION DELETES`, `2: EQUALITY DELETES` | Type of 
content stored by the data file: data, equality deletes, or position deletes 
(all v1 files are data files) |
-| _required_ | _required_ | **`100  file_path`**              | `string`       
              | Full URI for the file with FS scheme |
-| _required_ | _required_ | **`101  file_format`**            | `string`       
              | String file format name, avro, orc or parquet |
-| _required_ | _required_ | **`102  partition`**              | `struct<...>`  
              | Partition data tuple, schema based on the partition spec output 
using partition field ids for the struct field ids |
-| _required_ | _required_ | **`103  record_count`**           | `long`         
              | Number of records in this file |
-| _required_ | _required_ | **`104  file_size_in_bytes`**     | `long`         
              | Total file size in bytes |
-| _required_ |            | ~~**`105 block_size_in_bytes`**~~ | `long`         
              | **Deprecated. Always write a default in v1. Do not write in 
v2.** |
-| _optional_ |            | ~~**`106  file_ordinal`**~~       | `int`          
              | **Deprecated. Do not write.** |
-| _optional_ |            | ~~**`107  sort_columns`**~~       | `list<112: 
int>`             | **Deprecated. Do not write.** |
-| _optional_ | _optional_ | **`108  column_sizes`**           | `map<117: int, 
118: long>`   | Map from column id to the total size on disk of all regions 
that store the column. Does not include bytes necessary to read other columns, 
like footers. Leave null for row-oriented formats (Avro) |
-| _optional_ | _optional_ | **`109  value_counts`**           | `map<119: int, 
120: long>`   | Map from column id to number of values in the column (including 
null and NaN values) |
-| _optional_ | _optional_ | **`110  null_value_counts`**      | `map<121: int, 
122: long>`   | Map from column id to number of null values in the column |
-| _optional_ | _optional_ | **`137  nan_value_counts`**       | `map<138: int, 
139: long>`   | Map from column id to number of NaN values in the column |
-| _optional_ | _optional_ | **`111  distinct_counts`**        | `map<123: int, 
124: long>`   | Map from column id to number of distinct values in the column; 
distinct counts must be derived using values in the file by counting or using 
sketches, but not using methods like merging existing distinct counts |
-| _optional_ | _optional_ | **`125  lower_bounds`**           | `map<126: int, 
127: binary>` | Map from column id to lower bound in the column serialized as 
binary [1]. Each value must be less than or equal to all non-null, non-NaN 
values in the column for the file [2] |
-| _optional_ | _optional_ | **`128  upper_bounds`**           | `map<129: int, 
130: binary>` | Map from column id to upper bound in the column serialized as 
binary [1]. Each value must be greater than or equal to all non-null, non-Nan 
values in the column for the file [2] |
-| _optional_ | _optional_ | **`131  key_metadata`**           | `binary`       
              | Implementation-specific key metadata for encryption |
-| _optional_ | _optional_ | **`132  split_offsets`**          | `list<133: 
long>`            | Split offsets for the data file. For example, all row group 
offsets in a Parquet file. Must be sorted ascending |
-|            | _optional_ | **`135  equality_ids`**           | `list<136: 
int>`             | Field ids used to determine row equality in equality delete 
files. Required when `content=2` and should be null otherwise. Fields with ids 
listed in this column must be present in the delete file |
-| _optional_ | _optional_ | **`140  sort_order_id`**          | `int`          
              | ID representing sort order for this file [3]. |
+| v1         | v2         | v3         | Field id, name                    | 
Type                         | Description |
+| ---------- | ---------- | ---------- 
|-----------------------------------|------------------------------|-------------|
+|            | _required_ | _required_ | **`134  content`**                | 
`int` with meaning: `0: DATA`, `1: POSITION DELETES`, `2: EQUALITY DELETES` | 
Type of content stored by the data file: data, equality deletes, or position 
deletes (all v1 files are data files) |

Review Comment:
   I think I just left a similar thought in the design doc: 
https://docs.google.com/document/d/18Bqhr-vnzFfQk1S4AgRISkA_5_m5m32Nnc2Cw0zn2XM/edit?disco=AAABVb7Ww5k.
  Is there any complication to add a new type for clarity?  Slightly agree with 
@emkornfield 



##########
format/spec.md:
##########
@@ -841,19 +855,45 @@ Notes:
 
 ## Delete Formats
 
-This section details how to encode row-level deletes in Iceberg delete files. 
Row-level deletes are not supported in v1.
+This section details how to encode row-level deletes in Iceberg delete files. 
Row-level deletes are added by v2 and are not supported in v1. Deletion vectors 
are added in v3 and are not supported in v2 or earlier. Position delete files 
must not be added to v3 tables, but existing position delete files are valid.
+
+There are three types of row-level deletes:
+* Deletion vectors (DVs) identify deleted rows within a single referenced data 
file by position in a bitmap
+* Position delete files identify deleted rows by file location and row 
position (**deprecated**)
+* Equality delete files identify deleted rows by the value of one or more 
columns
+
+Deletion vectors are a binary representation of deletes for a single data file 
that is more efficient at execution time than position delete files. Unlike 
equality or position delete files, there can be at most one deletion vector for 
a given data file in a table. Writers must ensure that there is at most one 
deletion vector per data file and must merge new deletes with existing vectors 
or position delete files.
+
+Row-level delete files (both equality and position delete files) are valid 
Iceberg data files: files must use valid Iceberg formats, schemas, and column 
projection. It is recommended that these delete files are written using the 
table's default file format.
+
+Row-level delete files and deletion vectors are tracked by manifests. A 
separate set of manifests is used for delete files and DVs, but the same 
manifest schema is used for both data and delete manifests. Deletion vectors 
are tracked individually by file location, offset, and length within the 
containing file. Deletion vector metadata must include the referenced data file.
+
+Both position and equality delete files allow encoding deleted row values with 
a delete. This can be used to reconstruct a stream of changes to a table.
+
 
-Row-level delete files are valid Iceberg data files: files must use valid 
Iceberg formats, schemas, and column projection. It is recommended that delete 
files are written using the table's default file format.
+### Deletion Vectors
 
-Row-level delete files are tracked by manifests, like data files. A separate 
set of manifests is used for delete files, but the manifest schemas are 
identical.
+Deletion vectors identify deleted rows of a file by encoding deleted positions 
in a bitmap. A set bit at position P indicates that the row at position P is 
deleted.
 
-Both position and equality deletes allow encoding deleted row values with a 
delete. This can be used to reconstruct a stream of changes to a table.
+These vectors are stored using the `delete-vector-v1` blob definition from the 
[Puffin spec][puffin-spec].
 
+Deletion vectors support positive 64-bit positions, but are optimized for 
cases where most positions fit in 32 bits by using a collection of 32-bit 
Roaring bitmaps. 64-bit positions are divided into a 32-bit "key" using the 
most significant 4 bytes and a 32-bit sub-position using the least significant 
4 bytes. For each key in the set of positions, a 32-bit Roaring bitmap is 
maintained to store a set of 32-bit sub-positions for that key.
+
+To test whether a certain position is set, its most significant 4 bytes (the 
key) are used to find a 32-bit bitmap and the least significant 4 bytes (the 
sub-position) are tested for inclusion in the bitmap. If a bitmap is not found 
for the key, then it is not set.
+
+Delete manifests track deletion vectors individually by the containing file 
location (`file_path`), starting offset of the DV magic bytes (`blob_offset`), 
and total length of the deletion vector blob (`blob_size_in_bytes`). Multiple 
deletion vectors can be stored in the same file. There are no restrictions on 
the data files that can be referenced by deletion vectors in the same Puffin 
file.

Review Comment:
   what do you mean no restrictions?  I thought one DV blob refer to one data 
file, from the previous section?  And obviously you have to make sure the data 
file has compatible partition spec, seq number.



##########
format/spec.md:
##########
@@ -619,19 +627,25 @@ Data files that match the query filter must be read by 
the scan.
 Note that for any snapshot, all file paths marked with "ADDED" or "EXISTING" 
may appear at most once across all manifest files in the snapshot. If a file 
path appears more than once, the results of the scan are undefined. Reader 
implementations may raise an error in this case, but are not required to do so.
 
 
-Delete files that match the query filter must be applied to data files at read 
time, limited by the scope of the delete file using the following rules.
+Delete files and deletion vector metadata that match the filters must be 
applied to data files at read time, limited by the following scope rules.
 
+* A deletion vector must be applied to a data file when all of the following 
are true:
+    - The data file's `file_path` is equal to the deletion vector's 
`referenced_data_file`
+    - The data file's data sequence number is _less than or equal to_ the 
deletion vector's data sequence number
+    - The data file's partition (both spec and partition values) is equal [4] 
to the deletion vector's partition
 * A _position_ delete file must be applied to a data file when all of the 
following are true:
+    - The data file's `file_path` is equal to the delete file's 
`referenced_data_file` if it is non-null
     - The data file's data sequence number is _less than or equal to_ the 
delete file's data sequence number
     - The data file's partition (both spec and partition values) is equal [4] 
to the delete file's partition
+    - There is no deletion vector that must be applied to the data file (when 
added, such a vector must contain all deletes from existing position delete 
files)

Review Comment:
   does this condition apply to eq delete application as well?



##########
format/spec.md:
##########
@@ -841,19 +855,45 @@ Notes:
 
 ## Delete Formats
 
-This section details how to encode row-level deletes in Iceberg delete files. 
Row-level deletes are not supported in v1.
+This section details how to encode row-level deletes in Iceberg delete files. 
Row-level deletes are added by v2 and are not supported in v1. Deletion vectors 
are added in v3 and are not supported in v2 or earlier. Position delete files 
must not be added to v3 tables, but existing position delete files are valid.
+
+There are three types of row-level deletes:
+* Deletion vectors (DVs) identify deleted rows within a single referenced data 
file by position in a bitmap
+* Position delete files identify deleted rows by file location and row 
position (**deprecated**)
+* Equality delete files identify deleted rows by the value of one or more 
columns
+
+Deletion vectors are a binary representation of deletes for a single data file 
that is more efficient at execution time than position delete files. Unlike 
equality or position delete files, there can be at most one deletion vector for 
a given data file in a table. Writers must ensure that there is at most one 
deletion vector per data file and must merge new deletes with existing vectors 
or position delete files.
+
+Row-level delete files (both equality and position delete files) are valid 
Iceberg data files: files must use valid Iceberg formats, schemas, and column 
projection. It is recommended that these delete files are written using the 
table's default file format.
+
+Row-level delete files and deletion vectors are tracked by manifests. A 
separate set of manifests is used for delete files and DVs, but the same 
manifest schema is used for both data and delete manifests. Deletion vectors 
are tracked individually by file location, offset, and length within the 
containing file. Deletion vector metadata must include the referenced data file.
+
+Both position and equality delete files allow encoding deleted row values with 
a delete. This can be used to reconstruct a stream of changes to a table.
+
 
-Row-level delete files are valid Iceberg data files: files must use valid 
Iceberg formats, schemas, and column projection. It is recommended that delete 
files are written using the table's default file format.
+### Deletion Vectors
 
-Row-level delete files are tracked by manifests, like data files. A separate 
set of manifests is used for delete files, but the manifest schemas are 
identical.
+Deletion vectors identify deleted rows of a file by encoding deleted positions 
in a bitmap. A set bit at position P indicates that the row at position P is 
deleted.
 
-Both position and equality deletes allow encoding deleted row values with a 
delete. This can be used to reconstruct a stream of changes to a table.
+These vectors are stored using the `delete-vector-v1` blob definition from the 
[Puffin spec][puffin-spec].
 
+Deletion vectors support positive 64-bit positions, but are optimized for 
cases where most positions fit in 32 bits by using a collection of 32-bit 
Roaring bitmaps. 64-bit positions are divided into a 32-bit "key" using the 
most significant 4 bytes and a 32-bit sub-position using the least significant 
4 bytes. For each key in the set of positions, a 32-bit Roaring bitmap is 
maintained to store a set of 32-bit sub-positions for that key.
+
+To test whether a certain position is set, its most significant 4 bytes (the 
key) are used to find a 32-bit bitmap and the least significant 4 bytes (the 
sub-position) are tested for inclusion in the bitmap. If a bitmap is not found 
for the key, then it is not set.
+
+Delete manifests track deletion vectors individually by the containing file 
location (`file_path`), starting offset of the DV magic bytes (`blob_offset`), 
and total length of the deletion vector blob (`blob_size_in_bytes`). Multiple 
deletion vectors can be stored in the same file. There are no restrictions on 
the data files that can be referenced by deletion vectors in the same Puffin 
file.
+
+At most one deletion vector is allowed per data file in a table. If a DV is 
written for a data file, it must replace all previously written position delete 
files so that when a DV is present, readers can safely ignore matching position 
delete files.

Review Comment:
   > At most one deletion vector is allowed per data file in a table.
   
   I guess this is pretty critical, and answer my question in 
https://docs.google.com/document/d/18Bqhr-vnzFfQk1S4AgRISkA_5_m5m32Nnc2Cw0zn2XM/edit?disco=AAABWolDRwg
 , just had a question how is this implemented?  In spark, do we cluster delete 
writers by data-files?  Not to block the spec, just curious here.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spec v3: Add deletion vectors to the table spec [iceberg]

Reply via email to