snazy commented on code in PR #8382:
URL: https://github.com/apache/iceberg/pull/8382#discussion_r1324153854
##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -111,23 +111,31 @@ public static TableMetadata
updateTableMetadataWithNessieSpecificProperties(
// Update the TableMetadata with the Content of NessieTableState.
Map<String, String> newProperties =
Maps.newHashMap(tableMetadata.properties());
newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY,
reference.getHash());
+
// To prevent accidental deletion of files that are still referenced by
other branches/tags,
Review Comment:
Nit: wonder whether all the GC related warning code should better go into a
separate method (it's quite some lines of code/comments)
##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##########
@@ -55,6 +55,8 @@ public class NessieTableOperations extends
BaseMetastoreTableOperations {
*/
public static final String NESSIE_COMMIT_ID_PROPERTY = "nessie.commit.id";
+ public static final String NESSIE_GC_WARNING_PROPERTY =
"nessie.gc.user.warned";
Review Comment:
Should this maybe be something like `nessie.gc.no-warning`?
##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -111,23 +111,31 @@ public static TableMetadata
updateTableMetadataWithNessieSpecificProperties(
// Update the TableMetadata with the Content of NessieTableState.
Map<String, String> newProperties =
Maps.newHashMap(tableMetadata.properties());
newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY,
reference.getHash());
+
// To prevent accidental deletion of files that are still referenced by
other branches/tags,
- // setting GC_ENABLED to false. So that all Iceberg's gc operations like
expire_snapshots,
- // remove_orphan_files, drop_table with purge will fail with an error.
- // Nessie CLI will provide a reference aware GC functionality for the
expired/unreferenced
+ // setting GC_ENABLED to 'false' is recommended, so that all Iceberg's gc
operations like
+ // expire_snapshots, remove_orphan_files, drop_table with purge will fail
with an error.
+ // `nessie-gc` CLI provides a reference aware GC functionality for the
expired/unreferenced
// files.
- newProperties.put(TableProperties.GC_ENABLED, "false");
-
- boolean metadataCleanupEnabled =
- newProperties
-
.getOrDefault(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, "false")
- .equalsIgnoreCase("true");
- if (metadataCleanupEnabled) {
- newProperties.put(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
"false");
+ // Advanced users may still want to use the simpler Iceberg GC tool iff
their Nessie Server
+ // contains only one branch (in which case the full Nessie history will be
reflected in the
+ // Iceberg sequence of snapshots).
+ boolean warn =
+ tableMetadata.propertyAsBoolean(
+ TableProperties.GC_ENABLED, TableProperties.GC_ENABLED_DEFAULT)
+ || tableMetadata.propertyAsBoolean(
+ TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
+ TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED_DEFAULT);
+
+ if (warn &&
!newProperties.containsKey(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY)) {
+ newProperties.put(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY, "1");
LOG.warn(
- "Automatic table metadata files cleanup was requested, but disabled
because "
- + "the Nessie catalog can use historical metadata files from
other references. "
- + "Use the 'nessie-gc' tool for history-aware GC");
+ "Standard Iceberg property '{}' and/or '{}' are enabled on table
'{}' in NessieCatalog."
+ + " This may make data in historical Nessie commits
inaccessible."
Review Comment:
```suggestion
+ " This likely makes data in other Nessie branches and tags
and in earlier, historical Nessie commits inaccessible."
```
##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -111,23 +111,31 @@ public static TableMetadata
updateTableMetadataWithNessieSpecificProperties(
// Update the TableMetadata with the Content of NessieTableState.
Map<String, String> newProperties =
Maps.newHashMap(tableMetadata.properties());
newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY,
reference.getHash());
+
// To prevent accidental deletion of files that are still referenced by
other branches/tags,
- // setting GC_ENABLED to false. So that all Iceberg's gc operations like
expire_snapshots,
- // remove_orphan_files, drop_table with purge will fail with an error.
- // Nessie CLI will provide a reference aware GC functionality for the
expired/unreferenced
+ // setting GC_ENABLED to 'false' is recommended, so that all Iceberg's gc
operations like
+ // expire_snapshots, remove_orphan_files, drop_table with purge will fail
with an error.
+ // `nessie-gc` CLI provides a reference aware GC functionality for the
expired/unreferenced
// files.
- newProperties.put(TableProperties.GC_ENABLED, "false");
-
- boolean metadataCleanupEnabled =
- newProperties
-
.getOrDefault(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, "false")
- .equalsIgnoreCase("true");
- if (metadataCleanupEnabled) {
- newProperties.put(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
"false");
+ // Advanced users may still want to use the simpler Iceberg GC tool iff
their Nessie Server
+ // contains only one branch (in which case the full Nessie history will be
reflected in the
+ // Iceberg sequence of snapshots).
+ boolean warn =
+ tableMetadata.propertyAsBoolean(
+ TableProperties.GC_ENABLED, TableProperties.GC_ENABLED_DEFAULT)
+ || tableMetadata.propertyAsBoolean(
+ TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
+ TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED_DEFAULT);
+
+ if (warn &&
!newProperties.containsKey(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY)) {
+ newProperties.put(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY, "1");
LOG.warn(
- "Automatic table metadata files cleanup was requested, but disabled
because "
- + "the Nessie catalog can use historical metadata files from
other references. "
- + "Use the 'nessie-gc' tool for history-aware GC");
+ "Standard Iceberg property '{}' and/or '{}' are enabled on table
'{}' in NessieCatalog."
+ + " This may make data in historical Nessie commits
inaccessible."
+ + " Consider setting those properties to 'false' use the
'nessie-gc' tool for history-aware GC.",
Review Comment:
```suggestion
+ " The recommended setting for those properties is 'false',
use the 'nessie-gc' tool for Nessie reference aware garbage collection.",
```
##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -111,23 +111,31 @@ public static TableMetadata
updateTableMetadataWithNessieSpecificProperties(
// Update the TableMetadata with the Content of NessieTableState.
Map<String, String> newProperties =
Maps.newHashMap(tableMetadata.properties());
newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY,
reference.getHash());
+
// To prevent accidental deletion of files that are still referenced by
other branches/tags,
- // setting GC_ENABLED to false. So that all Iceberg's gc operations like
expire_snapshots,
- // remove_orphan_files, drop_table with purge will fail with an error.
- // Nessie CLI will provide a reference aware GC functionality for the
expired/unreferenced
+ // setting GC_ENABLED to 'false' is recommended, so that all Iceberg's gc
operations like
+ // expire_snapshots, remove_orphan_files, drop_table with purge will fail
with an error.
+ // `nessie-gc` CLI provides a reference aware GC functionality for the
expired/unreferenced
// files.
- newProperties.put(TableProperties.GC_ENABLED, "false");
-
- boolean metadataCleanupEnabled =
- newProperties
-
.getOrDefault(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, "false")
- .equalsIgnoreCase("true");
- if (metadataCleanupEnabled) {
- newProperties.put(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
"false");
+ // Advanced users may still want to use the simpler Iceberg GC tool iff
their Nessie Server
+ // contains only one branch (in which case the full Nessie history will be
reflected in the
+ // Iceberg sequence of snapshots).
+ boolean warn =
+ tableMetadata.propertyAsBoolean(
+ TableProperties.GC_ENABLED, TableProperties.GC_ENABLED_DEFAULT)
+ || tableMetadata.propertyAsBoolean(
+ TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
+ TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED_DEFAULT);
+
+ if (warn &&
!newProperties.containsKey(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY)) {
+ newProperties.put(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY, "1");
LOG.warn(
- "Automatic table metadata files cleanup was requested, but disabled
because "
- + "the Nessie catalog can use historical metadata files from
other references. "
- + "Use the 'nessie-gc' tool for history-aware GC");
+ "Standard Iceberg property '{}' and/or '{}' are enabled on table
'{}' in NessieCatalog."
Review Comment:
```suggestion
"The Iceberg property '{}' and/or '{}' is enabled on table '{}' in
NessieCatalog."
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]