Re: [PR] Spec: V4 Adaptive Metadata Tree Spec Changes for Entry Structures [iceberg]

via GitHub Fri, 03 Jul 2026 10:38:12 -0700


amogh-jahagirdar commented on code in PR #16025:
URL: https://github.com/apache/iceberg/pull/16025#discussion_r3521245590



##########
format/spec.md:
##########
@@ -685,21 +701,103 @@ The `manifest_entry` struct consists of the following 
fields:
     |            | _optional_ | **`4  file_sequence_number`** | `long`         
                                           | File sequence number indicating 
when the file was added. Inherited when null and status is 1 (added). |
     | _required_ | _required_ | **`2  data_file`**            | `data_file` 
`struct` (see below)                          | File path, partition tuple, 
metrics, ... |
 
-The manifest entry fields are used to keep track of the snapshot in which 
files were added or logically deleted. The `data_file` struct, defined below, 
is nested inside the manifest entry so that it can be easily passed to job 
planning without the manifest entry fields.
+    The manifest entry fields are used to keep track of the snapshot in which 
files were added or logically deleted. The `data_file` struct, defined below, 
is nested inside the manifest entry so that it can be easily passed to job 
planning without the manifest entry fields.
 
-When a file is added to the dataset, its manifest entry should store the 
snapshot ID in which the file was added and set status to 1 (added).
+    When a file is added to the dataset, its manifest entry should store the 
snapshot ID in which the file was added and set status to 1 (added).
 
-When a file is replaced or deleted from the dataset, its manifest entry fields 
store the snapshot ID in which the file was deleted and status 2 (deleted). The 
file may be deleted from the file system when the snapshot in which it was 
deleted is garbage collected, assuming that older snapshots have also been 
garbage collected [1].
+    When a file is replaced or deleted from the dataset, its manifest entry 
fields store the snapshot ID in which the file was deleted and status 2 
(deleted). The file may be deleted from the file system when the snapshot in 
which it was deleted is garbage collected, assuming that older snapshots have 
also been garbage collected [1].
 
-Iceberg v2 adds data and file sequence numbers to the entry and makes the 
snapshot ID optional. Values for these fields are inherited from manifest 
metadata when `null`. That is, if the field is `null` for an entry, then the 
entry must inherit its value from the manifest file's metadata, stored in the 
manifest list.
-The `sequence_number` field represents the data sequence number and must never 
change after a file is added to the dataset. The data sequence number 
represents a relative age of the file content and should be used for planning 
which delete files apply to a data file.
-The `file_sequence_number` field represents the sequence number of the 
snapshot that added the file and must also remain unchanged upon assigning at 
commit. The file sequence number can't be used for pruning delete files as the 
data within the file may have an older data sequence number.
-The data and file sequence numbers are inherited only if the entry status is 1 
(added). If the entry status is 0 (existing) or 2 (deleted), the entry must 
include both sequence numbers explicitly.
+    Iceberg v2 adds data and file sequence numbers to the entry and makes the 
snapshot ID optional. Values for these fields are inherited from manifest 
metadata when `null`. That is, if the field is `null` for an entry, then the 
entry must inherit its value from the manifest file's metadata, stored in the 
manifest list.
+    The `sequence_number` field represents the data sequence number and must 
never change after a file is added to the dataset. The data sequence number 
represents a relative age of the file content and should be used for planning 
which delete files apply to a data file.
+    The `file_sequence_number` field represents the sequence number of the 
snapshot that added the file and must also remain unchanged upon assigning at 
commit. The file sequence number can't be used for pruning delete files as the 
data within the file may have an older data sequence number.
+    The data and file sequence numbers are inherited only if the entry status 
is 1 (added). If the entry status is 0 (existing) or 2 (deleted), the entry 
must include both sequence numbers explicitly.
 
-Notes:
+    Notes:
 
-1. Technically, data files can be deleted when the last snapshot that contains 
the file as “live” data is garbage collected. But this is harder to detect and 
requires finding the diff of multiple snapshots. It is easier to track what 
files are deleted in a snapshot and delete them when that snapshot expires.  It 
is not recommended to add a deleted file back to a table. Adding a deleted file 
can lead to edge cases where incremental deletes can break table snapshots.
-2. Manifest list files are required in v2, so that the `sequence_number` and 
`snapshot_id` to inherit are always available.
+    1. Technically, data files can be deleted when the last snapshot that 
contains the file as "live" data is garbage collected. But this is harder to 
detect and requires finding the diff of multiple snapshots. It is easier to 
track what files are deleted in a snapshot and delete them when that snapshot 
expires.  It is not recommended to add a deleted file back to a table. Adding a 
deleted file can lead to edge cases where incremental deletes can break table 
snapshots.
+    2. Manifest list files are required in v2, so that the `sequence_number` 
and `snapshot_id` to inherit are always available.
+
+=== "v4"
+    **Content Entries**
+
+    | Field id | Name | Type | Write | Read | Description |
+    |----------|------|------|-------|------|-------------|
+    | 134 | **`content_type`** | `int` (0: DATA, 2: EQUALITY DELETES, 3: 
DATA_MANIFEST, 4: DELETE_MANIFEST) | *required* | *required* | Type of content 
stored in the entry. Content types 3 and 4 are only valid in root manifests. |
+    | 157 | **`writer_format_version`** | `int` (0: PRE-V4, 4: V4) | 
*required* | *required* | Writer format version. V4 writers must produce 
`writer_format_version` 4. |
+    | 100 | **`location`** | `string` | *required* | *required* | Location of 
the file or manifest. |
+    | 101 | **`file_format`** | `string` | *required* | *required* | String 
file format name: `avro`, `orc`, `parquet`, or `puffin` |
+    | 147 | **`tracking`** | `tracking` struct | *required* | *required* | 
Groups status, snapshot, and sequence number. See tracking struct below. |
+    | 141 | **`spec_id`** | `int` | *optional* | *optional* | ID of the 
partition spec used to write this manifest or data file. |

Review Comment:
   Will update this, yes it certainly can be null for leafs manifests since 
they are not bound to a partition spec,.
   
   We should reason about if it makes sense to make it required for data files 
rather than assume we just carry forward the requirement for data files that we 
have in v3 and older format versions. Remember the data file partition tuple 
currently modeled in v4 is the union schema, so we don't strictly need the 
specific partition spec for this file even for matching.
   
   I think the main benefit of having it required for data files is that it'd 
be explicit on read rather than readers having to have a fallback to assume 
unpartitioned in the null case; but what are the cases where we even need to 
read the spec, it's for the tuple for eq. delete matching and pruning, which as 
currently established is the union schema anyways. So do we strictly need to 
impose the writer side requirement for v4 data files? I would argue no, unless 
there's another reason the spec used at the time of the write is useful 
information later on.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spec: V4 Adaptive Metadata Tree Spec Changes for Entry Structures [iceberg]

Reply via email to