Fokko commented on code in PR #498: URL: https://github.com/apache/iceberg-python/pull/498#discussion_r1537251443
########## pyiceberg/catalog/rest.py: ########## @@ -524,7 +537,32 @@ def create_table( except HTTPError as exc: self._handle_non_200_response(exc, {409: TableAlreadyExistsError}) - table_response = TableResponse(**response.json()) + return TableResponse(**response.json()) + + @retry(**_RETRY_ARGS) + def _create_staged_table( Review Comment: The inheritance feels off here. I would just expect to `create_table_transaction`. The current `_create_staged_table` on the `Catalog` is now very specific for non-REST catalogs. Should we introduce another layer in the OOP hierarchy there? Then we can override `create_table_transaction` there. ########## pyiceberg/table/__init__.py: ########## @@ -1707,7 +1831,8 @@ def union_by_name(self, new_schema: Union[Schema, "pa.Schema"]) -> UpdateSchema: visit_with_partner( Catalog._convert_schema_if_needed(new_schema), -1, - UnionByNameVisitor(update_schema=self, existing_schema=self._schema, case_sensitive=self._case_sensitive), # type: ignore + UnionByNameVisitor(update_schema=self, existing_schema=self._schema, case_sensitive=self._case_sensitive), Review Comment: Can you revert this change? ########## pyiceberg/catalog/__init__.py: ########## @@ -717,6 +791,10 @@ def _get_updated_props_and_update_summary( return properties_update_summary, updated_properties + @staticmethod + def empty_table_metadata() -> TableMetadata: + return TableMetadataV1(location="", last_column_id=-1, schema=Schema()) Review Comment: I recommend creating a V2 table by default. This is also the case for Java. Partition evolution is very awkward for V1 tables (keeping the old partitions as null-transforms). ########## pyiceberg/catalog/__init__.py: ########## @@ -288,6 +291,78 @@ def __init__(self, name: str, **properties: str): def _load_file_io(self, properties: Properties = EMPTY_DICT, location: Optional[str] = None) -> FileIO: return load_file_io({**self.properties, **properties}, location) + def _create_staged_table( + self, + identifier: Union[str, Identifier], + schema: Union[Schema, "pa.Schema"], + location: Optional[str] = None, + partition_spec: PartitionSpec = UNPARTITIONED_PARTITION_SPEC, + sort_order: SortOrder = UNSORTED_SORT_ORDER, + properties: Properties = EMPTY_DICT, + ) -> StagedTable: + """Create a table and return the table instance without committing the changes. + + Args: + identifier (str | Identifier): Table identifier. + schema (Schema): Table's schema. + location (str | None): Location for the table. Optional Argument. + partition_spec (PartitionSpec): PartitionSpec for the table. + sort_order (SortOrder): SortOrder for the table. + properties (Properties): Table properties that can be a string based dictionary. + + Returns: + Table: the created table instance. + + Raises: + TableAlreadyExistsError: If a table with the name already exists. + """ + schema: Schema = self._convert_schema_if_needed(schema) # type: ignore + + database_name, table_name = self.identifier_to_database_and_table(identifier) + + location = self._resolve_table_location(location, database_name, table_name) + metadata_location = self._get_metadata_location(location=location) + metadata = new_table_metadata( + location=location, schema=schema, partition_spec=partition_spec, sort_order=sort_order, properties=properties + ) + io = load_file_io(properties=self.properties, location=metadata_location) + return StagedTable( + identifier=(self.name, database_name, table_name), + metadata=metadata, + metadata_location=metadata_location, + io=io, + catalog=self, + ) + + def create_table_transaction( Review Comment: I was leaning towards staged since we use the same name for the REST catalog. I'm not super strong on this one. Let's keep it transaction unless anyone else jumps in here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org