kevinjqliu commented on code in PR #2013:
URL: https://github.com/apache/iceberg-python/pull/2013#discussion_r2122646431


##########
pyiceberg/catalog/hive.py:
##########
@@ -211,11 +211,18 @@ def _construct_hive_storage_descriptor(
 DEFAULT_PROPERTIES = {TableProperties.PARQUET_COMPRESSION: 
TableProperties.PARQUET_COMPRESSION_DEFAULT}
 
 
-def _construct_parameters(metadata_location: str, previous_metadata_location: 
Optional[str] = None) -> Dict[str, Any]:
+def _construct_parameters(
+    metadata_location: str, previous_metadata_location: Optional[str] = None, 
metadata_properties: Optional[Properties] = None
+) -> Dict[str, Any]:
     properties = {PROP_EXTERNAL: "TRUE", PROP_TABLE_TYPE: "ICEBERG", 
PROP_METADATA_LOCATION: metadata_location}
     if previous_metadata_location:
         properties[PROP_PREVIOUS_METADATA_LOCATION] = 
previous_metadata_location
 
+    if metadata_properties:
+        for key, value in metadata_properties.items():
+            if key not in properties:

Review Comment:
   👍 this is fine, it helps with not re-setting `PROP_EXTERNAL`, 
`PROP_TABLE_TYPE`, `PROP_METADATA_LOCATION`, and 
`PROP_PREVIOUS_METADATA_LOCATION`
   



##########
tests/integration/test_reads.py:
##########
@@ -111,6 +112,23 @@ def test_table_properties(catalog: Catalog) -> None:
         
table.transaction().set_properties(property_name=None).commit_transaction()
     assert "None type is not a supported value in properties: property_name" 
in str(exc_info.value)
 
+    if isinstance(catalog, HiveCatalog):

Review Comment:
   this is a great test! could you move this into its own test function?
   
   with just hive catalog
   ```
   @pytest.mark.integration
   @pytest.mark.parametrize("catalog", 
[pytest.lazy_fixture("session_catalog_hive")])
   ```



##########
tests/integration/test_reads.py:
##########
@@ -111,6 +112,23 @@ def test_table_properties(catalog: Catalog) -> None:
         
table.transaction().set_properties(property_name=None).commit_transaction()

Review Comment:
   I was wondering why the rest of these tests pass since we're not setting the 
properties in the HMS. Turns out the table properties are saved in the [table 
metadata](https://iceberg.apache.org/spec/#table-metadata-fields) using its 
`properties` field. 
   
   This is not what the table metadata's properties field should be used for, 
   ```
   properties   A string to string map of table properties. This is used to 
control settings that affect reading and writing and is not intended to be used 
for arbitrary metadata. For example, commit.retry.num-retries is used to 
control the number of commit retries.
   ``` 
   
   This is a side affect of 
https://github.com/apache/iceberg-python/blob/a67c5592f3243d255519581fedfcc5d93274b9c8/pyiceberg/table/__init__.py#L1131-L1134
 and 
https://github.com/apache/iceberg-python/blob/a67c5592f3243d255519581fedfcc5d93274b9c8/pyiceberg/catalog/hive.py#L338-L344
   
   We should fix this behavior and read/write properties using the HMS's table 
parameters. We can fix this separately from the current issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to