laskoviymishka commented on code in PR #1185:
URL: https://github.com/apache/iceberg-go/pull/1185#discussion_r3417888615
##########
catalog/hadoop/hadoop.go:
##########
@@ -723,3 +772,26 @@ func (c *Catalog) LoadNamespaceProperties(_
context.Context, ns table.Identifier
func (c *Catalog) UpdateNamespaceProperties(_ context.Context, _
table.Identifier, _ []string, _ iceberg.Properties)
(catalog.PropertiesUpdateSummary, error) {
return catalog.PropertiesUpdateSummary{}, errors.New("hadoop catalog:
UpdateNamespaceProperties not yet implemented")
}
+
+// CheckedMkdirAll is a helper function that checks
+// all subdirectories of a given identifier path exist before creating the
full path.
+// This function is not atomic and may not return ErrNamespaceAlreadyExists if
+// called concurrently with other calls that change the same identifier
+func (c *Catalog) CheckedMkdirAll(id table.Identifier) error {
Review Comment:
I'd unexport this to `checkedMkdirAll`. The only caller is `CreateNamespace`
in this same package, it bypasses `validateIdentifier`, and exporting it bakes
an internal implementation detail into the stable public API right as we're
trying to slim the surface.
##########
catalog/hadoop/hadoop.go:
##########
@@ -723,3 +772,26 @@ func (c *Catalog) LoadNamespaceProperties(_
context.Context, ns table.Identifier
func (c *Catalog) UpdateNamespaceProperties(_ context.Context, _
table.Identifier, _ []string, _ iceberg.Properties)
(catalog.PropertiesUpdateSummary, error) {
return catalog.PropertiesUpdateSummary{}, errors.New("hadoop catalog:
UpdateNamespaceProperties not yet implemented")
}
+
+// CheckedMkdirAll is a helper function that checks
+// all subdirectories of a given identifier path exist before creating the
full path.
+// This function is not atomic and may not return ErrNamespaceAlreadyExists if
+// called concurrently with other calls that change the same identifier
+func (c *Catalog) CheckedMkdirAll(id table.Identifier) error {
+ path := c.namespaceToPath(id)
+ for pathIndex := range id {
Review Comment:
I think there's a real bug in this loop. `pathIndex` starts at 0, so the
first iteration uses `id[:0]` — the empty identifier — and stats the warehouse
root rather than the first namespace component. For a single-component
namespace like `["ns"]` we never verify any actual parent, and we redundantly
stat the warehouse on every call.
I'd start at 1: `for pathIndex := 1; pathIndex < len(id); pathIndex++`.
`TestCreateNamespaceNestedParentMissing` only passes today because the
warehouse always exists in the fixture, so the intended invariant ("all parent
namespaces must exist") isn't really being checked. wdyt?
##########
catalog/hadoop/hadoop.go:
##########
@@ -723,3 +772,26 @@ func (c *Catalog) LoadNamespaceProperties(_
context.Context, ns table.Identifier
func (c *Catalog) UpdateNamespaceProperties(_ context.Context, _
table.Identifier, _ []string, _ iceberg.Properties)
(catalog.PropertiesUpdateSummary, error) {
return catalog.PropertiesUpdateSummary{}, errors.New("hadoop catalog:
UpdateNamespaceProperties not yet implemented")
}
+
+// CheckedMkdirAll is a helper function that checks
+// all subdirectories of a given identifier path exist before creating the
full path.
+// This function is not atomic and may not return ErrNamespaceAlreadyExists if
+// called concurrently with other calls that change the same identifier
+func (c *Catalog) CheckedMkdirAll(id table.Identifier) error {
+ path := c.namespaceToPath(id)
+ for pathIndex := range id {
+ subPath := id[:pathIndex]
+ parentPath := c.namespaceToPath(subPath)
+ if _, err := c.filesystem.Stat(parentPath); err != nil {
Review Comment:
Returning the raw `Stat` error here breaks the `ErrNoSuchNamespace` contract
that `CreateNamespace` used to hold. The caller only catches `fs.ErrExist`;
everything else — including the `fs.ErrNotExist` from a missing parent — falls
through to the generic `fmt.Errorf`, so the error chain no longer contains
`catalog.ErrNoSuchNamespace`.
The old `CreateNamespace` had an explicit `if errors.Is(err, fs.ErrNotExist)
{ ... catalog.ErrNoSuchNamespace ... }` branch that this PR drops. I'd re-add
that branch in `CreateNamespace` (alongside the existing `ErrExist` check),
which covers both the intermediate-parent and missing-warehouse-root cases in
one spot. It'd also help to record which component failed so the message isn't
misleading.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]