gaborkaszab commented on code in PR #13432:
URL: https://github.com/apache/iceberg/pull/13432#discussion_r2259567918


##########
docs/docs/maintenance.md:
##########
@@ -63,20 +63,25 @@ Expiring old snapshots removes them from metadata, so they 
are no longer availab
 
 ### Remove old metadata files
 
-Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity.
+Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity. 

Review Comment:
   nit: is there an extra 'space' at the end of the line? 



##########
docs/docs/maintenance.md:
##########
@@ -63,20 +63,25 @@ Expiring old snapshots removes them from metadata, so they 
are no longer availab
 
 ### Remove old metadata files
 
-Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity.
+Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity. 
 
-Old metadata files are kept for history by default. Tables with frequent 
commits, like those written by streaming jobs, may need to regularly clean 
metadata files.
+Old metadata files are kept for history by default. Tables with frequent 
commits, like those written by streaming jobs, may need to regularly clean 
metadata files to reduce metadata overhead.
 
-To automatically clean metadata files, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files (up to `write.metadata.previous-versions-max`) 
and will delete the oldest metadata file after each new one is created.
+Each metadata file tracks the older metadata files in the `metadata-log` 
field.  The number of metadata files being tracked is defined by 
`write.metadata.previous-versions-max` (default is 100).

Review Comment:
   nit: `(default is 100)` this information is present in the table, redundant 
here. If we decide to keep the table, I don't think this is needed.



##########
docs/docs/maintenance.md:
##########
@@ -63,20 +63,25 @@ Expiring old snapshots removes them from metadata, so they 
are no longer availab
 
 ### Remove old metadata files
 
-Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity.
+Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity. 
 
-Old metadata files are kept for history by default. Tables with frequent 
commits, like those written by streaming jobs, may need to regularly clean 
metadata files.
+Old metadata files are kept for history by default. Tables with frequent 
commits, like those written by streaming jobs, may need to regularly clean 
metadata files to reduce metadata overhead.

Review Comment:
   `to reduce metadata overhead` not sure what you mean by this. In a sense 
there is no `overhead` with these metadata files in terms of query performance. 
They occupy extra space on disk, though. But I think this addition to the 
comment is misleading.



##########
docs/docs/maintenance.md:
##########
@@ -63,20 +63,16 @@ Expiring old snapshots removes them from metadata, so they 
are no longer availab
 
 ### Remove old metadata files
 
-Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity.
+Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity. Old metadata files are kept 
for history by default, and are tracked in the `metadata-log` field of the 
metadata file. Tables with frequent commits, like those written by streaming 
jobs, may need to regularly clean metadata files to reduce metadata overhead.
 
-Old metadata files are kept for history by default. Tables with frequent 
commits, like those written by streaming jobs, may need to regularly clean 
metadata files.
+The number of metadata files being **tracked** is defined by 
`write.metadata.previous-versions-max` (default is 100).
 
-To automatically clean metadata files, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files (up to `write.metadata.previous-versions-max`) 
and will delete the oldest metadata file after each new one is created.
+To automatically delete older metadata files when they become untracked, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files as tracked (up to 
`write.metadata.previous-versions-max`), and will delete the oldest metadata 
file every time a new one is created.
+Alternatively, untracked metadata files can be deleted as part of [orphan file 
deletion](#delete-orphan-files).
 
-| Property                                     | Description                   
                                           |
-| -------------------------------------------- 
|--------------------------------------------------------------------------|
-| `write.metadata.delete-after-commit.enabled` | Whether to delete old 
**tracked** metadata files after each table commit |
-| `write.metadata.previous-versions-max`       | The number of old metadata 
files to keep                                 |
-
-Note that this will only delete metadata files that are **tracked** in the 
metadata log and will not delete orphaned metadata files.
-Example: With `write.metadata.delete-after-commit.enabled=false` and 
`write.metadata.previous-versions-max=10`, one will have 10 tracked metadata 
files and 90 orphaned metadata files after 100 commits.
-Configuring `write.metadata.delete-after-commit.enabled=true` and 
`write.metadata.previous-versions-max=20` will not automatically delete 
metadata files. Tracked metadata files would be deleted again when reaching 
`write.metadata.previous-versions-max=20`.

Review Comment:
   `Note that this will only delete metadata files that are **tracked** in the 
metadata log and will not delete orphaned metadata files.` This sentence was 
dropped but I think it;s essential to call out explicitly that only tracked 
files are deleted.
   I know you can extract the same info from the examples, but an explicit 
mention in a separate bullet point makes sense, because this is an area that 
causes confusions for many users.



##########
docs/docs/maintenance.md:
##########
@@ -63,20 +63,16 @@ Expiring old snapshots removes them from metadata, so they 
are no longer availab
 
 ### Remove old metadata files
 
-Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity.
+Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity. Old metadata files are kept 
for history by default, and are tracked in the `metadata-log` field of the 
metadata file. Tables with frequent commits, like those written by streaming 
jobs, may need to regularly clean metadata files to reduce metadata overhead.
 
-Old metadata files are kept for history by default. Tables with frequent 
commits, like those written by streaming jobs, may need to regularly clean 
metadata files.
+The number of metadata files being **tracked** is defined by 
`write.metadata.previous-versions-max` (default is 100).
 
-To automatically clean metadata files, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files (up to `write.metadata.previous-versions-max`) 
and will delete the oldest metadata file after each new one is created.
+To automatically delete older metadata files when they become untracked, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files as tracked (up to 
`write.metadata.previous-versions-max`), and will delete the oldest metadata 
file every time a new one is created.
+Alternatively, untracked metadata files can be deleted as part of [orphan file 
deletion](#delete-orphan-files).
 
-| Property                                     | Description                   
                                           |

Review Comment:
   oh, sorry for the extra work then. Let's see @manuzhang 's opinion here. I 
think keeping both tables makes sense for more visibility, but let's reach 
consensus here.
   
   Was it intentional that you didn't restore the table at the original place 
and moved down a bit?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to