wgtmac commented on code in PR #662:
URL: https://github.com/apache/iceberg-cpp/pull/662#discussion_r3286530627
##########
src/iceberg/type_fwd.h:
##########
@@ -31,7 +31,7 @@ namespace iceberg {
/// This is not a complete data type by itself because some types are nested
/// and/or parameterized.
///
-/// Iceberg V3 types are not currently supported.
+/// Iceberg V3's `unknown` type is supported as a null-only placeholder type.
Review Comment:
Please register `TypeId::kUnknown -> 3` in
`TableMetadata::kMinFormatVersions`. This advertises a v3-only type, but the
validation table is still empty, so v1/v2 metadata can contain `unknown`. Java
rejects `UNKNOWN` before v3.
##########
src/iceberg/util/type_util.cc:
##########
@@ -426,6 +426,10 @@ bool IsPromotionAllowed(const std::shared_ptr<Type>&
from_type,
TypeId from_id = from_type->type_id();
TypeId to_id = to_type->type_id();
+ if (from_id == TypeId::kUnknown) {
Review Comment:
Java does not allow `UNKNOWN` as a schema-update promotion source. With
this, `UpdateColumn` can turn an `unknown` column into any primitive type,
which diverges from Java's `TypeUtil.isPromotionAllowed` rules.
##########
src/iceberg/json_serde.cc:
##########
@@ -540,6 +558,7 @@ Result<std::unique_ptr<SchemaField>> FieldFromJson(const
nlohmann::json& json) {
ICEBERG_ASSIGN_OR_RAISE(auto required, GetJsonValue<bool>(json, kRequired));
ICEBERG_ASSIGN_OR_RAISE(auto doc, GetJsonValueOrDefault<std::string>(json,
kDoc));
+ ICEBERG_RETURN_UNEXPECTED(ValidateUnknownFieldOptional(*type, !required,
name));
Review Comment:
This only rejects required `unknown` during JSON parsing. The public
construction/update paths still allow it via `SchemaField::MakeRequired(...,
unknown())` and `AllowIncompatibleChanges().AddRequiredColumn(..., unknown())`.
The invariant should live in the model/update path too.
##########
src/iceberg/schema_internal.cc:
##########
@@ -141,6 +141,9 @@ ArrowErrorCode ToArrowSchema(const Type& type, bool
optional, std::string_view n
ArrowMetadataBuilderAppend(&metadata_buffer,
ArrowCharView(kArrowExtensionName),
ArrowCharView(kArrowUuidExtensionName)));
} break;
+ case TypeId::kUnknown:
Review Comment:
Java Parquet skips unknown fields on write. Mapping `unknown` to Arrow null
here makes the Parquet writer emit a physical null column, or depend on Arrow
rejecting it. The write path should skip unknown columns instead.
##########
src/iceberg/schema_util.cc:
##########
@@ -79,6 +82,55 @@ Status ValidateSchemaEvolution(const Type& expected_type,
const Type& source_typ
return NotSupported("Cannot read {} from {}", expected_type, source_type);
}
+Result<FieldProjection> ProjectNested(const Type& expected_type, const Type&
source_type,
+ bool prune_source);
+
+Result<FieldProjection> ProjectField(const SchemaField& expected_field,
+ const SchemaField& source_field, size_t
source_index,
+ bool prune_source) {
+ FieldProjection projection;
+
+ if (expected_field.type()->type_id() == TypeId::kUnknown) {
+ if (!expected_field.optional()) {
+ return InvalidSchema("Cannot project required field with id {} as null",
+ expected_field.field_id());
+ }
+ projection.kind = FieldProjection::Kind::kNull;
+ return projection;
+ }
+
+ if (source_field.type()->type_id() == TypeId::kUnknown &&
+ expected_field.type()->is_nested()) {
+ if (!expected_field.optional()) {
+ return InvalidSchema("Cannot project required field with id {} as null",
+ expected_field.field_id());
+ }
+ projection.kind = FieldProjection::Kind::kNull;
+ return projection;
+ }
+
+ if (source_field.type()->type_id() == TypeId::kUnknown &&
!expected_field.optional()) {
+ return InvalidSchema("Cannot project required field with id {} as null",
+ expected_field.field_id());
+ }
+
+ if (expected_field.type()->is_nested()) {
Review Comment:
`unknown` is still a primitive type, so it should not evolve to
struct/list/map. This path should fail instead of projecting the whole nested
value as null.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]