laskoviymishka commented on code in PR #1126:
URL: https://github.com/apache/iceberg-go/pull/1126#discussion_r3302046767


##########
table/metadata_schema_compatibility.go:
##########
@@ -84,6 +84,10 @@ func checkSchemaCompatibility(sc *iceberg.Schema, 
formatVersion int) error {
        const defaultValuesMinFormatVersion = 3
        problems := make([]IncompatibleField, 0)
 
+       if err := validateNoReservedMetadataColumnIDs(sc); err != nil {

Review Comment:
   I think this call is effectively dead for the `NewMetadata` path — by the 
time `checkSchemaCompatibility` runs, `reassignIDs` has already remapped every 
ID to a fresh non-reserved value, so the validator can't find anything. The 
only meaningful check is the new one in `NewMetadataWithUUID` before 
reassignment.
   
   If the intent is to also catch callers that reach `checkSchemaCompatibility` 
without going through reassignment (e.g. `MetadataBuilder.AddSchema` later in a 
table's life), I'd move this guard into `AddSchema` directly and leave 
`checkSchemaCompatibility` focused on format-version compatibility. Otherwise 
we end up with two call sites for the same guard and an unclear invariant about 
which one is the actual gate. wdyt?



##########
table/metadata_schema_compatibility.go:
##########
@@ -150,6 +149,54 @@ func checkSchemaCompatibility(sc *iceberg.Schema, 
formatVersion int) error {
        return nil
 }
 
+func validateNoReservedMetadataColumnIDs(sc *iceberg.Schema) error {
+       for _, field := range sc.Fields() {
+               if err := validateNoReservedMetadataColumnID(field, 
field.Name); err != nil {
+                       return err
+               }
+       }
+
+       return nil
+}

Review Comment:
   Small consistency thing: the neighboring validators (`validateUnknownTypes`, 
`validateComplexTypeDefaults`) wrap their error with `fmt.Errorf("failed to 
validate ...: %w", err)` at the call site, while this one returns bare. Not a 
regression (the inline check it replaced also returned bare), but worth 
wrapping at the `checkSchemaCompatibility` call site for symmetry if we keep 
that call site at all.



##########
table/scanner_internal_test.go:
##########
@@ -492,45 +492,32 @@ func TestProjectionWithRowLineageRequiresV3(t *testing.T) 
{
        assert.ErrorContains(t, err, "row lineage")
 }
 
-// TestProjectionV3SchemaAlreadyHasRowID covers the case where the user schema
-// already declares _row_id (a reserved field id, but legal in v3). The
-// projection helper must be idempotent and not panic on the duplicate ID.
-func TestProjectionV3SchemaAlreadyHasRowID(t *testing.T) {
+// TestAppendMissingLineageFieldsAlreadyHasRowID covers the helper's idempotent
+// behavior when a schema already includes reserved lineage fields. New table
+// metadata rejects user schemas with these IDs, but projections can still
+// contain them after the scanner adds row-lineage columns.
+func TestAppendMissingLineageFieldsAlreadyHasRowID(t *testing.T) {

Review Comment:
   This rename narrows the test from end-to-end (`scan.Projection()` over real 
v3 metadata) to a unit test of `appendMissingLineageFields` in isolation. The 
motivation is clear — `NewMetadata` now rejects the old setup — but it leaves 
the full read-path projection over a schema-with-reserved-IDs uncovered, and 
existing tables loaded from catalogs can still have those IDs.
   
   I'd keep this unit test and add a sibling that constructs `*Scan` with a 
hand-built metadata (or via a path that bypasses the new validator) and 
exercises `Projection()` end-to-end. Otherwise we lose the original regression 
target.



##########
table/metadata.go:
##########
@@ -2194,6 +2194,10 @@ func NewMetadataWithUUID(sc *iceberg.Schema, partitions 
*iceberg.PartitionSpec,
                }
        }
 
+       if err := validateNoReservedMetadataColumnIDs(sc); err != nil {
+               return nil, err

Review Comment:
   Worth a one-line comment here explaining *why* the check runs before 
`reassignIDs` — "user-provided IDs must be validated before reassignment 
overwrites them with fresh non-reserved IDs". The placement is correct but 
non-obvious to a future reader.
   
   Also a heads-up: Java's `TableMetadata.newTableMetadata` doesn't reject 
reserved IDs here; it silently relies on `TypeUtil.assignFreshIds` to remap 
them. We're imposing a stricter precondition than Java. I think that's the 
right call (silent reassignment of a user-supplied `_row_id` is a footgun), but 
worth flagging in case it's intentional vs. accidental divergence.



##########
table/metadata_schema_compatibility.go:
##########
@@ -150,6 +149,54 @@ func checkSchemaCompatibility(sc *iceberg.Schema, 
formatVersion int) error {
        return nil
 }
 
+func validateNoReservedMetadataColumnIDs(sc *iceberg.Schema) error {
+       for _, field := range sc.Fields() {
+               if err := validateNoReservedMetadataColumnID(field, 
field.Name); err != nil {
+                       return err
+               }
+       }
+
+       return nil
+}
+
+func validateNoReservedMetadataColumnID(field iceberg.NestedField, name 
string) error {
+       if iceberg.IsMetadataColumn(field.ID) {
+               return fmt.Errorf("%w: field '%s' uses reserved metadata column 
ID %d",
+                       iceberg.ErrInvalidSchema, name, field.ID)
+       }
+
+       switch typ := field.Type.(type) {
+       case *iceberg.StructType:
+               if typ == nil {
+                       return nil
+               }
+               for _, child := range typ.FieldList {
+                       if err := validateNoReservedMetadataColumnID(child, 
name+"."+child.Name); err != nil {
+                               return err
+                       }
+               }

Review Comment:
   The three `if typ == nil { return nil }` arms (StructType / ListType / 
MapType) aren't reachable — a successful type assertion on `field.Type.(type)` 
already discriminates against the nil-interface case, and nothing in the 
codebase produces a typed-nil `*iceberg.StructType` here. 
`validateComplexDefault` in the same file does the same switch without these 
guards. I'd drop them.



##########
table/metadata_schema_compatibility_test.go:
##########
@@ -50,3 +50,31 @@ func TestNewMetadata_DuplicateValues(t *testing.T) {
        assert.ErrorIs(t, err, iceberg.ErrInvalidSchema)
        assert.Contains(t, err.Error(), "multiple fields for name foo")
 }
+

Review Comment:
   Two thoughts on the new test:
   
   The PR description says nested struct/list/map fields are validated, but the 
test only covers top-level reserved IDs. I'd add at least one case with a 
struct child or a list element carrying `RowIDFieldID` so the recursive walk is 
actually exercised.
   
   Also worth adding a `NestedField{ID: iceberg.RowIDFieldID, Name: "user_id"}` 
case — that confirms the error message uses the schema's name, not the 
canonical constant, which is the point of the `name` argument in 
`validateNoReservedMetadataColumnID`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to