smaheshwar-pltr commented on code in PR #1537:
URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921100667


##########
mkdocs/docs/configuration.md:
##########
@@ -195,6 +198,85 @@ PyIceberg uses 
[S3FileSystem](https://arrow.apache.org/docs/python/generated/pya
 
 <!-- markdown-link-check-enable-->
 
+## Location Providers
+
+Iceberg works with the concept of a LocationProvider that determines the file 
paths for a table's data. PyIceberg
+introduces a pluggable LocationProvider module; the LocationProvider used may 
be specified on a per-table basis via
+table properties. PyIceberg defaults to the 
[ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider),
+which generates file paths that are optimised for object storage.
+
+### SimpleLocationProvider
+
+The SimpleLocationProvider places file names underneath a `data` directory in 
the table's storage location. For example,
+a non-partitioned table might have a data file with location:
+
+```txt
+s3://bucket/ns/table/data/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
+```
+
+When data is partitioned, the files under a given partition are grouped into a 
subdirectory, with that partition key
+and value as the directory name. For example, a table partitioned over a 
string column `category` might have a data file
+with location:
+
+```txt
+s3://bucket/ns/table/data/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
+```
+
+The SimpleLocationProvider is enabled for a table by explicitly setting its 
`write.object-storage.enabled` table property to `false`.
+
+### ObjectStoreLocationProvider

Review Comment:
   There's a lot of natural duplication between this section and 
https://iceberg.apache.org/docs/latest/aws/#object-store-file-layout. I've gone 
less in-depth here though.
   
   I was unsure whether to link to this webpage (and if so, how to word it) 
because there's a lot that's not relevant to us, e.g.
   
   > Note, the path resolution logic for ObjectStoreLocationProvider is 
write.data.path then <tableLocation>/data. However, for the older versions up 
to 0.12.0, the logic is as follows: - before 0.12.0, write.object-storage.path 
must be set. - at 0.12.0, write.object-storage.path then 
write.folder-storage.path then <tableLocation>/data.
   
   and 
   
   > Previously provided base64 hash was updated to base2 in order to provide 
an improved auto-scaling behavior on S3 General Purpose Buckets.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to