kevinjqliu commented on code in PR #2013: URL: https://github.com/apache/iceberg-python/pull/2013#discussion_r2122646431
########## pyiceberg/catalog/hive.py: ########## @@ -211,11 +211,18 @@ def _construct_hive_storage_descriptor( DEFAULT_PROPERTIES = {TableProperties.PARQUET_COMPRESSION: TableProperties.PARQUET_COMPRESSION_DEFAULT} -def _construct_parameters(metadata_location: str, previous_metadata_location: Optional[str] = None) -> Dict[str, Any]: +def _construct_parameters( + metadata_location: str, previous_metadata_location: Optional[str] = None, metadata_properties: Optional[Properties] = None +) -> Dict[str, Any]: properties = {PROP_EXTERNAL: "TRUE", PROP_TABLE_TYPE: "ICEBERG", PROP_METADATA_LOCATION: metadata_location} if previous_metadata_location: properties[PROP_PREVIOUS_METADATA_LOCATION] = previous_metadata_location + if metadata_properties: + for key, value in metadata_properties.items(): + if key not in properties: Review Comment: 👍 this is fine, it helps with not re-setting `PROP_EXTERNAL`, `PROP_TABLE_TYPE`, `PROP_METADATA_LOCATION`, and `PROP_PREVIOUS_METADATA_LOCATION` ########## tests/integration/test_reads.py: ########## @@ -111,6 +112,23 @@ def test_table_properties(catalog: Catalog) -> None: table.transaction().set_properties(property_name=None).commit_transaction() assert "None type is not a supported value in properties: property_name" in str(exc_info.value) + if isinstance(catalog, HiveCatalog): Review Comment: this is a great test! could you move this into its own test function? with just hive catalog ``` @pytest.mark.integration @pytest.mark.parametrize("catalog", [pytest.lazy_fixture("session_catalog_hive")]) ``` ########## tests/integration/test_reads.py: ########## @@ -111,6 +112,23 @@ def test_table_properties(catalog: Catalog) -> None: table.transaction().set_properties(property_name=None).commit_transaction() Review Comment: I was wondering why the rest of these tests pass since we're not setting the properties in the HMS. Turns out the table properties are saved in the [table metadata](https://iceberg.apache.org/spec/#table-metadata-fields) using its `properties` field. This is not what the table metadata's properties field should be used for, ``` properties A string to string map of table properties. This is used to control settings that affect reading and writing and is not intended to be used for arbitrary metadata. For example, commit.retry.num-retries is used to control the number of commit retries. ``` This is a side affect of https://github.com/apache/iceberg-python/blob/a67c5592f3243d255519581fedfcc5d93274b9c8/pyiceberg/table/__init__.py#L1131-L1134 and https://github.com/apache/iceberg-python/blob/a67c5592f3243d255519581fedfcc5d93274b9c8/pyiceberg/catalog/hive.py#L338-L344 We should fix this behavior and read/write properties using the HMS's table parameters. We can fix this separately from the current issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org