HonahX commented on code in PR #140:
URL: https://github.com/apache/iceberg-python/pull/140#discussion_r1438987608
##########
pyiceberg/catalog/__init__.py:
##########
@@ -587,8 +590,34 @@ def _write_metadata(metadata: TableMetadata, io: FileIO,
metadata_path: str) ->
ToOutputFile.table_metadata(metadata, io.new_output(metadata_path))
@staticmethod
- def _get_metadata_location(location: str) -> str:
- return f"{location}/metadata/00000-{uuid.uuid4()}.metadata.json"
+ def _get_metadata_location(location: str, new_version: int = 0) -> str:
+ if new_version < 0:
+ raise ValueError(f"Table metadata version: {new_version} cannot be
negative")
Review Comment:
Thanks for the suggestion! Since this is an internal method, we can rely on
the type check to ensure new_version is never None. I think one benefit of
doing type-checking is that it reduces the number of `None` checks in our code.
Also, I've incorporated your suggested change to the error message.
Please let me know if you think the type-check is not enough and we'd better
do the None-check here. Thanks!
##########
pyiceberg/catalog/glue.py:
##########
@@ -177,6 +198,28 @@ def _create_glue_table(self, database_name: str,
table_name: str, table_input: T
except self.glue.exceptions.EntityNotFoundException as e:
raise NoSuchNamespaceError(f"Database {database_name} does not
exist") from e
+ def _update_glue_table(self, database_name: str, table_name: str,
table_input: TableInputTypeDef, version_id: str) -> None:
+ try:
+ self.glue.update_table(
+ DatabaseName=database_name,
+ TableInput=table_input,
+ SkipArchive=self.properties.get(GLUE_SKIP_ARCHIVE,
GLUE_SKIP_ARCHIVE_DEFAULT),
+ VersionId=version_id,
+ )
+ except self.glue.exceptions.EntityNotFoundException as e:
+ raise NoSuchTableError(f"Table does not exist:
{database_name}.{table_name}") from e
Review Comment:
Thanks for the suggestion! Updated.
##########
pyiceberg/catalog/glue.py:
##########
@@ -247,8 +290,52 @@ def _commit_table(self, table_request: CommitTableRequest)
-> CommitTableRespons
Raises:
NoSuchTableError: If a table with the given identifier does not
exist.
+ CommitFailedException: If the commit failed.
"""
- raise NotImplementedError
+ identifier_tuple = self.identifier_to_tuple_without_catalog(
+ tuple(table_request.identifier.namespace.root +
[table_request.identifier.name])
+ )
+ database_name, table_name =
self.identifier_to_database_and_table(identifier_tuple)
+
+ current_glue_table = self._get_glue_table(database_name=database_name,
table_name=table_name)
+ glue_table_version_id = current_glue_table.get("VersionId")
+ if glue_table_version_id is None:
Review Comment:
Thanks for the suggestion! I agree it is more robust to use `not` here.
`glue_table_version_id` is a string so it could be an empty string when version
id is unavailable
##########
pyiceberg/catalog/__init__.py:
##########
@@ -587,8 +590,34 @@ def _write_metadata(metadata: TableMetadata, io: FileIO,
metadata_path: str) ->
ToOutputFile.table_metadata(metadata, io.new_output(metadata_path))
@staticmethod
- def _get_metadata_location(location: str) -> str:
- return f"{location}/metadata/00000-{uuid.uuid4()}.metadata.json"
+ def _get_metadata_location(location: str, new_version: int = 0) -> str:
+ if new_version < 0:
+ raise ValueError(f"Table metadata version: {new_version} cannot be
negative")
+ version_str = f"{new_version:05d}"
+ return
f"{location}/metadata/{version_str}-{uuid.uuid4()}.metadata.json"
+
+ @staticmethod
+ def _parse_metadata_version(metadata_location: str) -> int:
+ """Parse the version from the metadata location.
+
+ The version is the first part of the file name, before the first dash.
+ For example, the version of the metadata file
+
`s3://bucket/db/tb/metadata/00001-6c97e413-d51b-4538-ac70-12fe2a85cb83.metadata.json`
+ is 1.
+ If the path does not comply with the pattern, the version is defaulted
to be -1, ensuring
+ that the next metadata file is treated as having version 0.
+
+ Args:
+ metadata_location (str): The location of the metadata file.
+
+ Returns:
+ int: The version of the metadata file. -1 if the file name does
not have valid version string
+ """
+ file_name = metadata_location.split("/")[-1]
Review Comment:
Similar reason as above, since this is an internal method, I think we can
rely on the type-check to ensure `metadata_location` is not `None`. But please
let me know if you think type-check is not enough.
##########
pyiceberg/catalog/__init__.py:
##########
@@ -74,6 +75,8 @@
LOCATION = "location"
EXTERNAL_TABLE = "EXTERNAL_TABLE"
+TABLE_METADATA_FILE_NAME_REGEX = re.compile(r"""(\d+)-.*\.json""", re.X)
Review Comment:
Thanks for the suggestion. I've updated the regex to capture the UUID.
Although in java we do not check the UUID validity:
https://github.com/apache/iceberg/blob/81bf8d30766b1b129b87abde15239645cb127046/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L382-L404
I think it make sense to add the check here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]