tomtongue commented on PR #12017:
URL: https://github.com/apache/iceberg/pull/12017#issuecomment-3242092127
@flyrain Thanks for the comment. As you're mentioning, they seem confusing,
and I should've considered that the view metadata path can also be configured
with `location` view property like `TBLPROPERTIES
('location'='/custom-path/to/view-metadata-location')`. But I also confirmed
there's subtle difference between `location` and `write.metadata.path`, please
let me explain the difference first:
* When using `location`: the `metadata.gz.json` is put in `<the specified
path by 'location'>/metadata` (`/metadata` is alwasy added to the view metadata
location
* When using `write.metadata.path`: the `metadata.gz.json` is put in `<the
specified path by 'write.metadata.path'>/`
Here's the detail:
```
// When using location -> the metadata is put in '<the specified path by
location>/metadata/gz.metadata.'
spark.sql("""
CREATE VIEW hive_catalog.db.view_loc
TBLPROPERTIES('location'='s3://bucket/iceberg/custom-view-location')
AS SELECT id, count(*) as cnt FROM hive_catalog.db.iceberg_w_loc GROUP BY id
""")
/* DESCRIBE EXTENDED db.view_loc
+---------------------------+---------------------------------------------------------------------------------------------------------+-------+
|col_name |data_type
|comment|
+---------------------------+---------------------------------------------------------------------------------------------------------+-------+
|id |int
| |
|cnt |bigint
| |
| |
| |
|# Detailed View Information|
| |
|Comment |
| |
|View Catalog and Namespace |hive_catalog.db
| |
|View Query Output Columns |[id, cnt]
| |
|View Properties |['format-version' = '1', 'location' =
's3://bucket/iceberg/custom-view-location', 'provider' = 'iceberg']| |
|Created By |Spark 3.5.5-amzn-0
| |
+---------------------------+---------------------------------------------------------------------------------------------------------+-------+
Storage: -> `metadata/` path is added
s3://bucket/iceberg/custom-view-location/
- metadata/
- 00000-9fb69491-3dfe-4863-a1bc-04a9234f22c3.gz.metadata.json
*/
```
```
// When using write.metadata.path
spark.sql("""
CREATE VIEW hive_catalog.db.view_wmp
TBLPROPERTIES('write.metadata.path'='s3://bucket/iceberg/custom-view-metadata-path')
AS SELECT id, count(*) as cnt FROM hive_catalog.db.iceberg_w_loc GROUP BY id
""")
/* DESCRIBE EXTENDED db.view_loc
spark.sql("DESCRIBE EXTENDED db.view_wmp").show(false)
+---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|col_name |data_type
|comment|
+---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|id |int
| |
|cnt |bigint
| |
| |
| |
|# Detailed View Information|
| |
|Comment |
| |
|View Catalog and Namespace |hive_catalog.db
| |
|View Query Output Columns |[id, cnt]
| |
|View Properties |['format-version' = '1', 'location' =
's3://bucket/iceberg/default-hive-db/view_wmp', 'provider' = 'iceberg',
'write.metadata.path' = 's3://bucket/iceberg/custom-view-metadata-path']|
|
|Created By |Spark 3.5.5-amzn-0
| |
+---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
Storage: -> No `metadata/` path
s3://bucket/iceberg/custom-view-metadata-path/
- 00000-5e854155-25ac-4646-b79c-ba2ed1618b4f.gz.metadata.json
*/
```
I believe in terms of view metadata path configuration, there's a bit
difference but the configuration with `write.metadata.path` seems better than
`location` (because `metadata/` path is not added), but `location` parameter
should be used basically.
So `location` can be updated along with the `write.metadata.path`
configuration, or remove `write.metadata.path` and merge this configurable
parameter into `location` by adding `location` to the view properties.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]