jqin61 commented on code in PR #453: URL: https://github.com/apache/iceberg-python/pull/453#discussion_r1506175834
########## pyiceberg/partitioning.py: ########## @@ -215,3 +246,59 @@ def assign_fresh_partition_spec_ids(spec: PartitionSpec, old_schema: Schema, fre ) ) return PartitionSpec(*partition_fields, spec_id=INITIAL_PARTITION_SPEC_ID) + + +@dataclass(frozen=True) +class PartitionFieldValue: + field: PartitionField + value: Any + + +@dataclass(frozen=True) +class PartitionKey: + raw_partition_field_values: List[PartitionFieldValue] + partition_spec: PartitionSpec + schema: Schema + + @cached_property + def partition(self) -> Record: # partition key transformed with iceberg internal representation as input + iceberg_typed_key_values = {} + for raw_partition_field_value in self.raw_partition_field_values: + partition_fields = self.partition_spec.source_id_to_fields_map[raw_partition_field_value.field.source_id] + if len(partition_fields) != 1: + raise ValueError("partition_fields must contain exactly one field.") + partition_field = partition_fields[0] + iceberg_type = self.schema.find_field(name_or_id=raw_partition_field_value.field.source_id).field_type + iceberg_typed_value = _to_iceberg_internal_representation(iceberg_type, raw_partition_field_value.value) + transformed_value = partition_field.transform.transform(iceberg_type)(iceberg_typed_value) + iceberg_typed_key_values[partition_field.name] = transformed_value + return Record(**iceberg_typed_key_values) Review Comment: Hi @Fokko, thanks for the guidance! My intention of adding the keys is because this PartitionKey.partition is not only used for generating the file path but also used to initiate the Datafile.partition in the io.pyarrow.write_file(). As the integration test shows, ``` snapshot.manifests(iceberg_table.io)[0].fetch_manifest_entry(iceberg_table.io)[0].data_file.partition ``` prints ``` Record(timestamp_field=1672574401000000) ``` So I assume this data_file.partition is Record with keys. Let me know what you think about it, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org