laskoviymishka commented on code in PR #963:
URL: https://github.com/apache/iceberg-go/pull/963#discussion_r3188341606
##########
catalog/hadoop/hadoop.go:
##########
@@ -52,6 +52,31 @@ var _ catalog.Catalog = (*Catalog)(nil)
// v1.metadata.json, v42.metadata.json, v1.gz.metadata.json, etc.
var versionPattern = regexp.MustCompile(`^v([0-9]+)(?:\.gz)?\.metadata\.json$`)
+// uuidMetadataPattern matches UUID-style metadata filenames produced by
+// Java/PyIceberg catalogs: 00000-<uuid>.metadata.json, etc.
+var uuidMetadataPattern =
regexp.MustCompile(`^[0-9]+-[0-9a-fA-F-]+\.metadata\.json$`)
Review Comment:
`uuidMetadataPattern` should probably accept the gzip variant too. Java can
write `00000-<uuid>.gz.metadata.json`, and today those table dirs may be
mistaken for namespaces. We should mirror the existing `versionPattern` gzip
handling and tighten the UUID shape (5-digit sequence + 8-4-4-4-12 hex) while
we're here.
##########
catalog/hadoop/hadoop.go:
##########
@@ -52,6 +52,31 @@ var _ catalog.Catalog = (*Catalog)(nil)
// v1.metadata.json, v42.metadata.json, v1.gz.metadata.json, etc.
var versionPattern = regexp.MustCompile(`^v([0-9]+)(?:\.gz)?\.metadata\.json$`)
+// uuidMetadataPattern matches UUID-style metadata filenames produced by
+// Java/PyIceberg catalogs: 00000-<uuid>.metadata.json, etc.
+var uuidMetadataPattern =
regexp.MustCompile(`^[0-9]+-[0-9a-fA-F-]+\.metadata\.json$`)
+
+// validateIdentifier checks that each component of an identifier is safe for
+// use as a path segment. It rejects empty parts, path traversal sequences,
+// and components containing path separators.
+func validateIdentifier(ident table.Identifier) error {
+ for _, part := range ident {
+ if part == "" {
+ return errors.New("hadoop catalog: identifier component
must not be empty")
+ }
+
+ if part == "." || part == ".." {
+ return fmt.Errorf("hadoop catalog: invalid identifier
component %q", part)
+ }
+
+ if strings.ContainsAny(part, "/\\") {
+ return fmt.Errorf("hadoop catalog: identifier component
must not contain path separators: %q", part)
+ }
+ }
+
+ return nil
+}
Review Comment:
Empty namespace handling is still inconsistent. `CreateNamespace` /
`DropNamespace` reject empty identifiers, but `CheckNamespaceExists(ctx, nil)`
returns `true` and `LoadNamespaceProperties(ctx, nil)` returns warehouse
properties — because `validateIdentifier` is a no-op on an empty slice. I'd
move the empty check in here (wrapped with `catalog.ErrNoSuchNamespace`,
matching the SQL/REST catalogs) and leave `ListNamespaces` as the only method
where empty means "list root".
Also worth a short doc note that this is POSIX-best-effort — it doesn't
catch NUL bytes or Windows reserved names like `NUL`/`CON`/`COM1`.
##########
catalog/hadoop/hadoop_test.go:
##########
@@ -386,3 +386,318 @@ func (s *HadoopCatalogTestSuite)
TestFindVersionMixedGzipAndPlain() {
s.Require().NoError(err)
s.Equal(3, ver)
}
+
+// CreateNamespace tests
+
+func (s *HadoopCatalogTestSuite) TestCreateNamespace() {
+ err := s.cat.CreateNamespace(context.Background(), []string{"ns"}, nil)
+ s.Require().NoError(err)
+
+ info, err := os.Stat(filepath.Join(s.warehouse, "ns"))
+ s.Require().NoError(err)
+ s.True(info.IsDir())
+}
+
+func (s *HadoopCatalogTestSuite) TestCreateNamespaceAlreadyExists() {
+ s.Require().NoError(os.Mkdir(filepath.Join(s.warehouse, "ns"), 0o755))
+
+ err := s.cat.CreateNamespace(context.Background(), []string{"ns"}, nil)
+ s.ErrorIs(err, catalog.ErrNamespaceAlreadyExists)
+}
+
+func (s *HadoopCatalogTestSuite) TestCreateNamespaceNested() {
+ // Parent namespaces must exist first with atomic os.Mkdir.
+ s.Require().NoError(os.Mkdir(filepath.Join(s.warehouse, "a"), 0o755))
+ s.Require().NoError(os.Mkdir(filepath.Join(s.warehouse, "a", "b"),
0o755))
+
+ err := s.cat.CreateNamespace(context.Background(), []string{"a", "b",
"c"}, nil)
+ s.Require().NoError(err)
+
+ info, err := os.Stat(filepath.Join(s.warehouse, "a", "b", "c"))
+ s.Require().NoError(err)
+ s.True(info.IsDir())
+}
+
+func (s *HadoopCatalogTestSuite) TestCreateNamespaceNestedParentMissing() {
+ // Creating a nested namespace without parent should fail.
+ err := s.cat.CreateNamespace(context.Background(), []string{"a", "b",
"c"}, nil)
+ s.Require().Error(err)
+ s.Contains(err.Error(), "failed to create namespace")
Review Comment:
Several tests assert error-message substrings (this one, plus
431/569/596/602/640/646/652/658/664/670/676/682). It would be better to wrap
with sentinels where possible and use `ErrorIs`, especially for the
empty/invalid namespace and unsupported-properties cases. This particular
`"failed to create namespace"` substring wouldn't catch a real regression if
the implementation switches behavior — `errors.Is(err, fs.ErrNotExist)` would
be more meaningful.
##########
catalog/hadoop/hadoop.go:
##########
@@ -265,26 +300,139 @@ func (c *Catalog) CheckTableExists(_ context.Context, _
table.Identifier) (bool,
return false, errors.New("hadoop catalog: CheckTableExists not yet
implemented")
}
-func (c *Catalog) ListNamespaces(_ context.Context, _ table.Identifier)
([]table.Identifier, error) {
- return nil, errors.New("hadoop catalog: ListNamespaces not yet
implemented")
+func (c *Catalog) CreateNamespace(_ context.Context, ns table.Identifier,
props iceberg.Properties) error {
+ if len(ns) == 0 {
+ return errors.New("hadoop catalog: namespace identifier must
not be empty")
+ }
+
+ if err := validateIdentifier(ns); err != nil {
+ return err
+ }
+
+ if len(props) > 0 {
+ return errors.New("hadoop catalog: namespace properties are not
supported")
+ }
+
+ path := c.namespaceToPath(ns)
+
+ if err := os.Mkdir(path, 0o755); err != nil {
+ if os.IsExist(err) {
+ return fmt.Errorf("%w: %s",
catalog.ErrNamespaceAlreadyExists, strings.Join(ns, "."))
+ }
+
+ return fmt.Errorf("hadoop catalog: failed to create namespace:
%w", err)
Review Comment:
Nested namespace creation with a missing parent currently leaks raw `ENOENT`
through this wrapper. I'd either intentionally match Java with `MkdirAll`, or
detect the missing parent and return `ErrNoSuchNamespace`. Either is fine, but
it should be explicit.
##########
catalog/hadoop/hadoop.go:
##########
@@ -265,26 +300,139 @@ func (c *Catalog) CheckTableExists(_ context.Context, _
table.Identifier) (bool,
return false, errors.New("hadoop catalog: CheckTableExists not yet
implemented")
}
-func (c *Catalog) ListNamespaces(_ context.Context, _ table.Identifier)
([]table.Identifier, error) {
- return nil, errors.New("hadoop catalog: ListNamespaces not yet
implemented")
+func (c *Catalog) CreateNamespace(_ context.Context, ns table.Identifier,
props iceberg.Properties) error {
+ if len(ns) == 0 {
+ return errors.New("hadoop catalog: namespace identifier must
not be empty")
+ }
+
+ if err := validateIdentifier(ns); err != nil {
+ return err
+ }
+
+ if len(props) > 0 {
+ return errors.New("hadoop catalog: namespace properties are not
supported")
+ }
+
+ path := c.namespaceToPath(ns)
+
+ if err := os.Mkdir(path, 0o755); err != nil {
+ if os.IsExist(err) {
+ return fmt.Errorf("%w: %s",
catalog.ErrNamespaceAlreadyExists, strings.Join(ns, "."))
+ }
+
+ return fmt.Errorf("hadoop catalog: failed to create namespace:
%w", err)
+ }
+
+ return nil
}
-func (c *Catalog) CreateNamespace(_ context.Context, _ table.Identifier, _
iceberg.Properties) error {
- return errors.New("hadoop catalog: CreateNamespace not yet implemented")
+func (c *Catalog) DropNamespace(_ context.Context, ns table.Identifier) error {
+ if len(ns) == 0 {
+ return errors.New("hadoop catalog: namespace identifier must
not be empty")
+ }
+
+ if err := validateIdentifier(ns); err != nil {
+ return err
+ }
+
+ path := c.namespaceToPath(ns)
+
+ if _, err := os.Stat(path); os.IsNotExist(err) {
Review Comment:
Small cleanup while here: drop the standalone `Stat` and rely on `ReadDir`
returning `fs.ErrNotExist` — closes the small TOCTTOU and saves a syscall. Same
pass: switch `os.IsExist` / `os.IsNotExist` (here, line 319, 364, 390, 425) to
`errors.Is(err, fs.ErrExist)` / `errors.Is(err, fs.ErrNotExist)` for
consistency with the rest of the catalog package.
##########
catalog/hadoop/hadoop.go:
##########
@@ -265,26 +300,139 @@ func (c *Catalog) CheckTableExists(_ context.Context, _
table.Identifier) (bool,
return false, errors.New("hadoop catalog: CheckTableExists not yet
implemented")
}
-func (c *Catalog) ListNamespaces(_ context.Context, _ table.Identifier)
([]table.Identifier, error) {
- return nil, errors.New("hadoop catalog: ListNamespaces not yet
implemented")
+func (c *Catalog) CreateNamespace(_ context.Context, ns table.Identifier,
props iceberg.Properties) error {
+ if len(ns) == 0 {
+ return errors.New("hadoop catalog: namespace identifier must
not be empty")
+ }
+
+ if err := validateIdentifier(ns); err != nil {
+ return err
+ }
+
+ if len(props) > 0 {
+ return errors.New("hadoop catalog: namespace properties are not
supported")
+ }
+
+ path := c.namespaceToPath(ns)
+
+ if err := os.Mkdir(path, 0o755); err != nil {
+ if os.IsExist(err) {
+ return fmt.Errorf("%w: %s",
catalog.ErrNamespaceAlreadyExists, strings.Join(ns, "."))
+ }
+
+ return fmt.Errorf("hadoop catalog: failed to create namespace:
%w", err)
+ }
+
+ return nil
}
-func (c *Catalog) CreateNamespace(_ context.Context, _ table.Identifier, _
iceberg.Properties) error {
- return errors.New("hadoop catalog: CreateNamespace not yet implemented")
+func (c *Catalog) DropNamespace(_ context.Context, ns table.Identifier) error {
+ if len(ns) == 0 {
+ return errors.New("hadoop catalog: namespace identifier must
not be empty")
+ }
+
+ if err := validateIdentifier(ns); err != nil {
+ return err
+ }
+
+ path := c.namespaceToPath(ns)
+
+ if _, err := os.Stat(path); os.IsNotExist(err) {
+ return fmt.Errorf("%w: %s", catalog.ErrNoSuchNamespace,
strings.Join(ns, "."))
+ }
+
+ entries, err := os.ReadDir(path)
+ if err != nil {
+ return fmt.Errorf("hadoop catalog: failed to read namespace
directory: %w", err)
+ }
+
+ if len(entries) > 0 {
+ return fmt.Errorf("%w: %s", catalog.ErrNamespaceNotEmpty,
strings.Join(ns, "."))
+ }
+
+ return os.Remove(path)
}
-func (c *Catalog) DropNamespace(_ context.Context, _ table.Identifier) error {
- return errors.New("hadoop catalog: DropNamespace not yet implemented")
+func (c *Catalog) CheckNamespaceExists(_ context.Context, ns table.Identifier)
(bool, error) {
+ if err := validateIdentifier(ns); err != nil {
+ return false, err
+ }
+
+ path := c.namespaceToPath(ns)
+
+ info, err := os.Stat(path)
+ if os.IsNotExist(err) {
+ return false, nil
+ }
+
+ if err != nil {
+ return false, err
+ }
+
+ return info.IsDir(), nil
}
-func (c *Catalog) CheckNamespaceExists(_ context.Context, _ table.Identifier)
(bool, error) {
- return false, errors.New("hadoop catalog: CheckNamespaceExists not yet
implemented")
+func (c *Catalog) ListNamespaces(_ context.Context, parent table.Identifier)
([]table.Identifier, error) {
+ if len(parent) > 0 {
+ if err := validateIdentifier(parent); err != nil {
+ return nil, err
+ }
+ }
+
+ var path string
+
+ if len(parent) == 0 {
+ path = c.warehouse
+ } else {
+ path = c.namespaceToPath(parent)
+
+ info, err := os.Stat(path)
+ if os.IsNotExist(err) || (err == nil && !info.IsDir()) {
+ return nil, fmt.Errorf("%w: %s",
catalog.ErrNoSuchNamespace, strings.Join(parent, "."))
+ }
+ }
+
+ entries, err := os.ReadDir(path)
+ if err != nil {
+ return nil, fmt.Errorf("hadoop catalog: failed to read
directory: %w", err)
+ }
+
+ result := []table.Identifier{}
+ for _, e := range entries {
+ if !e.IsDir() {
+ continue
+ }
+
+ child := filepath.Join(path, e.Name())
+ if isTableDir(child) {
+ continue
+ }
+
+ result = append(result, table.Identifier{e.Name()})
+ }
+
+ return result, nil
}
-func (c *Catalog) LoadNamespaceProperties(_ context.Context, _
table.Identifier) (iceberg.Properties, error) {
- return nil, errors.New("hadoop catalog: LoadNamespaceProperties not yet
implemented")
+func (c *Catalog) LoadNamespaceProperties(_ context.Context, ns
table.Identifier) (iceberg.Properties, error) {
+ if err := validateIdentifier(ns); err != nil {
+ return nil, err
+ }
+
+ path := c.namespaceToPath(ns)
+
+ info, err := os.Stat(path)
+ if os.IsNotExist(err) || (err == nil && !info.IsDir()) {
+ return nil, fmt.Errorf("%w: %s", catalog.ErrNoSuchNamespace,
strings.Join(ns, "."))
+ }
+
+ if err != nil {
+ return nil, fmt.Errorf("hadoop catalog: failed to stat
namespace: %w", err)
+ }
+
+ return iceberg.Properties{"location": "file://" + path}, nil
Review Comment:
`"file://" + path` is pretty fragile: spaces / `#` / `?`, relative paths,
and Windows paths can all produce bad URIs. Either build it through
`url.URL{Scheme: "file", Path: ...}` after normalizing the warehouse to an
absolute path in `NewCatalog`, or drop the synthetic `location` entirely and
return empty properties. If we keep the synthetic location, I'd add a doc
comment saying this is a Go-only convenience and that user namespace properties
are not persisted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]