huan233usc opened a new pull request, #731:
URL: https://github.com/apache/iceberg-cpp/pull/731
Closes #730 (item 2 of #637).
Implements Iceberg v3 column default values: `initial-default` /
`write-default` on the
schema, JSON serde, read-path application, schema-evolution support, and
format-version
validation.
## What changed
### Schema model
- `SchemaField` carries optional `initial_default` / `write_default` literals
(`std::shared_ptr<Literal>` to keep `schema_field.h` free of the
`literal.h → type.h → schema_field.h` include cycle), with
`WithInitialDefault` /
`WithWriteDefault` copy-modifiers in the style of
`AsRequired`/`AsOptional`.
- `SchemaField::Validate()` checks that defaults are primitive literals
matching the
field type; `Schema::Validate(format_version)` rejects schemas with
defaults below v3
(uses the previously-unused
`TableMetadata::kMinFormatVersionDefaultValues`, resolving
the TODO there).
### JSON serde
- `FieldFromJson` / `ToJson(SchemaField)` parse and write `initial-default` /
`write-default` using the existing single-value serialization
(`LiteralFromJson(json, type)`), resolving the `add default values` TODO
in struct
serialization. All primitive types supported (incl. decimal, fixed, uuid,
temporal).
### Read path (`initial-default`)
- `Project()` maps a column missing from a data file to
`FieldProjection::Kind::kDefault` carrying the literal when the field has
an
`initial-default` — for required *and* optional fields, per the spec
("optional with
default" reads the default, not null). Resolves the default-value TODO in
`schema_util.cc`; the Avro-side projection in `avro_schema_util.cc` gets
the same
branch.
- New `iceberg::arrow` helpers (`literal_util`) convert a `Literal` to an
Arrow scalar /
constant array; the Parquet reader materializes `kDefault` via
`MakeDefaultArray` and
the Avro reader via `AppendDefaultToBuilder`.
### Schema evolution (`write-default`)
- `AddColumn` / `AddRequiredColumn` accept an optional `default_value`, used
as both the
`initial-default` and `write-default` of the new column (Java parity). A
required
column with a default no longer needs `AllowIncompatibleChanges()`.
- `RequireColumn()` accepts a column added with a default in the same update
(resolves
the defaulted-add TODO in `UpdateColumnRequirementInternal`).
- New `UpdateColumnDefault()` updates the `write-default` of an existing
column
(`initial-default` stays fixed once the column exists).
- `UpdateColumnDoc` / `RenameColumn` / `UpdateColumn` preserve defaults when
reconstructing the field; type promotion casts the defaults to the new
type.
### Scope note: write-path application
Writers in this library consume complete Arrow arrays, so filling omitted
columns with
`write-default` at write time remains the engine's responsibility, as in
Java. The
library's role — storing, validating, serializing the defaults, and exposing
them
through schema evolution — is covered here.
## Testing
- Schema serde round-trips (top-level + nested struct fields, mismatch
rejection).
- `Schema::Validate`: v2 rejects defaults, v3 accepts; mismatched default
type rejected.
- Projection: missing required/optional fields with `initial-default` →
`kDefault`;
present fields ignore `initial-default`.
- Parquet `ProjectRecordBatch` and Avro `AppendDatumToBuilder`: missing
columns
materialize the default at top level and in nested structs.
- `UpdateSchema`: add with default (both defaults set),
required-with-default without
`AllowIncompatibleChanges()`, mismatched default rejected,
`UpdateColumnDefault`,
`RequireColumn` on defaulted add, doc-update preservation, type-promotion
casting,
and v2 rejection at `Apply()`; new `TableMetadataV3Valid.json` test
resource.
- Full suite passing locally (the pre-existing S3 `file_io_test` is
unrelated and fails
only in my local environment due to a Homebrew AWS-SDK ABI issue; not
touched by this
change).
This pull request and its description were written by Isaac.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]