kevinjqliu commented on code in PR #1304: URL: https://github.com/apache/iceberg-python/pull/1304#discussion_r1855257218
########## pyiceberg/table/metadata.py: ########## @@ -517,12 +517,15 @@ def new_table_metadata( location: str, properties: Properties = EMPTY_DICT, table_uuid: Optional[uuid.UUID] = None, + assign_fresh_ids: bool = True, ) -> TableMetadata: from pyiceberg.table import TableProperties - fresh_schema = assign_fresh_schema_ids(schema) - fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) - fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) + if assign_fresh_ids: + fresh_schema = assign_fresh_schema_ids(schema) + partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) + sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) + schema = fresh_schema Review Comment: this is where `assign_fresh_ids` is ultimately used ########## pyiceberg/table/metadata.py: ########## @@ -517,12 +517,15 @@ def new_table_metadata( location: str, properties: Properties = EMPTY_DICT, table_uuid: Optional[uuid.UUID] = None, + assign_fresh_ids: bool = True, ) -> TableMetadata: from pyiceberg.table import TableProperties - fresh_schema = assign_fresh_schema_ids(schema) - fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) - fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) + if assign_fresh_ids: + fresh_schema = assign_fresh_schema_ids(schema) + partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) + sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) + schema = fresh_schema Review Comment: and this function is called by `_create_staged_table` and `create_table` ########## pyiceberg/table/metadata.py: ########## @@ -517,12 +517,15 @@ def new_table_metadata( location: str, properties: Properties = EMPTY_DICT, table_uuid: Optional[uuid.UUID] = None, + assign_fresh_ids: bool = True, ) -> TableMetadata: from pyiceberg.table import TableProperties - fresh_schema = assign_fresh_schema_ids(schema) - fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) - fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) + if assign_fresh_ids: + fresh_schema = assign_fresh_schema_ids(schema) + partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) + sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) + schema = fresh_schema Review Comment: My proposal is to not include `assign_fresh_ids` as a flag in functions other than `new_table_metadata`. So when `_create_staged_table` and `create_table` is given * a `pa.Schema`, convert to `Schema` and set `assign_fresh_ids` to True in `new_table_metadata` * a `Schema`. Assume the user created `Schema` with the correct IDs (possibly verify some correctness characteristics such as uniqueness). And use the schema as is If a user wants to reassign IDs for `Schema`, this can be done outside the `create_table` functions and we can even provide a helper function to do so. I feel like this way can help break apart the responsibilities of schema id assignment from the `create_table` methods. ########## pyiceberg/table/metadata.py: ########## @@ -517,12 +517,15 @@ def new_table_metadata( location: str, properties: Properties = EMPTY_DICT, table_uuid: Optional[uuid.UUID] = None, + assign_fresh_ids: bool = True, ) -> TableMetadata: from pyiceberg.table import TableProperties - fresh_schema = assign_fresh_schema_ids(schema) - fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) - fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) + if assign_fresh_ids: + fresh_schema = assign_fresh_schema_ids(schema) + partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) + sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) + schema = fresh_schema Review Comment: both functions take `schema: Union[Schema, "pa.Schema"],` as input. * If `pa.Schema` is given, we want to convert and assign id (this is currently done by [setting the `assign_fresh_ids` flag to True](https://github.com/apache/iceberg-python/pull/1304/files#diff-276259d15e1e921df3b4783ec29c6390c828afa76bcf707f0f4923ff7f2d4194R862-R864)) * If `Schema` is given, currently the default is to assign the schema ids, `assign_fresh_ids: bool = True`. ########## pyiceberg/table/metadata.py: ########## @@ -517,12 +517,15 @@ def new_table_metadata( location: str, properties: Properties = EMPTY_DICT, table_uuid: Optional[uuid.UUID] = None, + assign_fresh_ids: bool = True, ) -> TableMetadata: from pyiceberg.table import TableProperties - fresh_schema = assign_fresh_schema_ids(schema) - fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) - fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) + if assign_fresh_ids: + fresh_schema = assign_fresh_schema_ids(schema) + partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) + sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) + schema = fresh_schema Review Comment: LMK if this makes sense or if im missing something! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org