laskoviymishka commented on code in PR #963:
URL: https://github.com/apache/iceberg-go/pull/963#discussion_r3188341606


##########
catalog/hadoop/hadoop.go:
##########
@@ -52,6 +52,31 @@ var _ catalog.Catalog = (*Catalog)(nil)
 // v1.metadata.json, v42.metadata.json, v1.gz.metadata.json, etc.
 var versionPattern = regexp.MustCompile(`^v([0-9]+)(?:\.gz)?\.metadata\.json$`)
 
+// uuidMetadataPattern matches UUID-style metadata filenames produced by
+// Java/PyIceberg catalogs: 00000-<uuid>.metadata.json, etc.
+var uuidMetadataPattern = 
regexp.MustCompile(`^[0-9]+-[0-9a-fA-F-]+\.metadata\.json$`)

Review Comment:
   `uuidMetadataPattern` should probably accept the gzip variant too. Java can 
write `00000-<uuid>.gz.metadata.json`, and today those table dirs may be 
mistaken for namespaces. We should mirror the existing `versionPattern` gzip 
handling and tighten the UUID shape (5-digit sequence + 8-4-4-4-12 hex) while 
we're here.



##########
catalog/hadoop/hadoop.go:
##########
@@ -52,6 +52,31 @@ var _ catalog.Catalog = (*Catalog)(nil)
 // v1.metadata.json, v42.metadata.json, v1.gz.metadata.json, etc.
 var versionPattern = regexp.MustCompile(`^v([0-9]+)(?:\.gz)?\.metadata\.json$`)
 
+// uuidMetadataPattern matches UUID-style metadata filenames produced by
+// Java/PyIceberg catalogs: 00000-<uuid>.metadata.json, etc.
+var uuidMetadataPattern = 
regexp.MustCompile(`^[0-9]+-[0-9a-fA-F-]+\.metadata\.json$`)
+
+// validateIdentifier checks that each component of an identifier is safe for
+// use as a path segment. It rejects empty parts, path traversal sequences,
+// and components containing path separators.
+func validateIdentifier(ident table.Identifier) error {
+       for _, part := range ident {
+               if part == "" {
+                       return errors.New("hadoop catalog: identifier component 
must not be empty")
+               }
+
+               if part == "." || part == ".." {
+                       return fmt.Errorf("hadoop catalog: invalid identifier 
component %q", part)
+               }
+
+               if strings.ContainsAny(part, "/\\") {
+                       return fmt.Errorf("hadoop catalog: identifier component 
must not contain path separators: %q", part)
+               }
+       }
+
+       return nil
+}

Review Comment:
   Empty namespace handling is still inconsistent. `CreateNamespace` / 
`DropNamespace` reject empty identifiers, but `CheckNamespaceExists(ctx, nil)` 
returns `true` and `LoadNamespaceProperties(ctx, nil)` returns warehouse 
properties — because `validateIdentifier` is a no-op on an empty slice. I'd 
move the empty check in here (wrapped with `catalog.ErrNoSuchNamespace`, 
matching the SQL/REST catalogs) and leave `ListNamespaces` as the only method 
where empty means "list root".
   
   Also worth a short doc note that this is POSIX-best-effort — it doesn't 
catch NUL bytes or Windows reserved names like `NUL`/`CON`/`COM1`.



##########
catalog/hadoop/hadoop_test.go:
##########
@@ -386,3 +386,318 @@ func (s *HadoopCatalogTestSuite) 
TestFindVersionMixedGzipAndPlain() {
        s.Require().NoError(err)
        s.Equal(3, ver)
 }
+
+// CreateNamespace tests
+
+func (s *HadoopCatalogTestSuite) TestCreateNamespace() {
+       err := s.cat.CreateNamespace(context.Background(), []string{"ns"}, nil)
+       s.Require().NoError(err)
+
+       info, err := os.Stat(filepath.Join(s.warehouse, "ns"))
+       s.Require().NoError(err)
+       s.True(info.IsDir())
+}
+
+func (s *HadoopCatalogTestSuite) TestCreateNamespaceAlreadyExists() {
+       s.Require().NoError(os.Mkdir(filepath.Join(s.warehouse, "ns"), 0o755))
+
+       err := s.cat.CreateNamespace(context.Background(), []string{"ns"}, nil)
+       s.ErrorIs(err, catalog.ErrNamespaceAlreadyExists)
+}
+
+func (s *HadoopCatalogTestSuite) TestCreateNamespaceNested() {
+       // Parent namespaces must exist first with atomic os.Mkdir.
+       s.Require().NoError(os.Mkdir(filepath.Join(s.warehouse, "a"), 0o755))
+       s.Require().NoError(os.Mkdir(filepath.Join(s.warehouse, "a", "b"), 
0o755))
+
+       err := s.cat.CreateNamespace(context.Background(), []string{"a", "b", 
"c"}, nil)
+       s.Require().NoError(err)
+
+       info, err := os.Stat(filepath.Join(s.warehouse, "a", "b", "c"))
+       s.Require().NoError(err)
+       s.True(info.IsDir())
+}
+
+func (s *HadoopCatalogTestSuite) TestCreateNamespaceNestedParentMissing() {
+       // Creating a nested namespace without parent should fail.
+       err := s.cat.CreateNamespace(context.Background(), []string{"a", "b", 
"c"}, nil)
+       s.Require().Error(err)
+       s.Contains(err.Error(), "failed to create namespace")

Review Comment:
   Several tests assert error-message substrings (this one, plus 
431/569/596/602/640/646/652/658/664/670/676/682). It would be better to wrap 
with sentinels where possible and use `ErrorIs`, especially for the 
empty/invalid namespace and unsupported-properties cases. This particular 
`"failed to create namespace"` substring wouldn't catch a real regression if 
the implementation switches behavior — `errors.Is(err, fs.ErrNotExist)` would 
be more meaningful.



##########
catalog/hadoop/hadoop.go:
##########
@@ -265,26 +300,139 @@ func (c *Catalog) CheckTableExists(_ context.Context, _ 
table.Identifier) (bool,
        return false, errors.New("hadoop catalog: CheckTableExists not yet 
implemented")
 }
 
-func (c *Catalog) ListNamespaces(_ context.Context, _ table.Identifier) 
([]table.Identifier, error) {
-       return nil, errors.New("hadoop catalog: ListNamespaces not yet 
implemented")
+func (c *Catalog) CreateNamespace(_ context.Context, ns table.Identifier, 
props iceberg.Properties) error {
+       if len(ns) == 0 {
+               return errors.New("hadoop catalog: namespace identifier must 
not be empty")
+       }
+
+       if err := validateIdentifier(ns); err != nil {
+               return err
+       }
+
+       if len(props) > 0 {
+               return errors.New("hadoop catalog: namespace properties are not 
supported")
+       }
+
+       path := c.namespaceToPath(ns)
+
+       if err := os.Mkdir(path, 0o755); err != nil {
+               if os.IsExist(err) {
+                       return fmt.Errorf("%w: %s", 
catalog.ErrNamespaceAlreadyExists, strings.Join(ns, "."))
+               }
+
+               return fmt.Errorf("hadoop catalog: failed to create namespace: 
%w", err)

Review Comment:
   Nested namespace creation with a missing parent currently leaks raw `ENOENT` 
through this wrapper. I'd either intentionally match Java with `MkdirAll`, or 
detect the missing parent and return `ErrNoSuchNamespace`. Either is fine, but 
it should be explicit.



##########
catalog/hadoop/hadoop.go:
##########
@@ -265,26 +300,139 @@ func (c *Catalog) CheckTableExists(_ context.Context, _ 
table.Identifier) (bool,
        return false, errors.New("hadoop catalog: CheckTableExists not yet 
implemented")
 }
 
-func (c *Catalog) ListNamespaces(_ context.Context, _ table.Identifier) 
([]table.Identifier, error) {
-       return nil, errors.New("hadoop catalog: ListNamespaces not yet 
implemented")
+func (c *Catalog) CreateNamespace(_ context.Context, ns table.Identifier, 
props iceberg.Properties) error {
+       if len(ns) == 0 {
+               return errors.New("hadoop catalog: namespace identifier must 
not be empty")
+       }
+
+       if err := validateIdentifier(ns); err != nil {
+               return err
+       }
+
+       if len(props) > 0 {
+               return errors.New("hadoop catalog: namespace properties are not 
supported")
+       }
+
+       path := c.namespaceToPath(ns)
+
+       if err := os.Mkdir(path, 0o755); err != nil {
+               if os.IsExist(err) {
+                       return fmt.Errorf("%w: %s", 
catalog.ErrNamespaceAlreadyExists, strings.Join(ns, "."))
+               }
+
+               return fmt.Errorf("hadoop catalog: failed to create namespace: 
%w", err)
+       }
+
+       return nil
 }
 
-func (c *Catalog) CreateNamespace(_ context.Context, _ table.Identifier, _ 
iceberg.Properties) error {
-       return errors.New("hadoop catalog: CreateNamespace not yet implemented")
+func (c *Catalog) DropNamespace(_ context.Context, ns table.Identifier) error {
+       if len(ns) == 0 {
+               return errors.New("hadoop catalog: namespace identifier must 
not be empty")
+       }
+
+       if err := validateIdentifier(ns); err != nil {
+               return err
+       }
+
+       path := c.namespaceToPath(ns)
+
+       if _, err := os.Stat(path); os.IsNotExist(err) {

Review Comment:
   Small cleanup while here: drop the standalone `Stat` and rely on `ReadDir` 
returning `fs.ErrNotExist` — closes the small TOCTTOU and saves a syscall. Same 
pass: switch `os.IsExist` / `os.IsNotExist` (here, line 319, 364, 390, 425) to 
`errors.Is(err, fs.ErrExist)` / `errors.Is(err, fs.ErrNotExist)` for 
consistency with the rest of the catalog package.



##########
catalog/hadoop/hadoop.go:
##########
@@ -265,26 +300,139 @@ func (c *Catalog) CheckTableExists(_ context.Context, _ 
table.Identifier) (bool,
        return false, errors.New("hadoop catalog: CheckTableExists not yet 
implemented")
 }
 
-func (c *Catalog) ListNamespaces(_ context.Context, _ table.Identifier) 
([]table.Identifier, error) {
-       return nil, errors.New("hadoop catalog: ListNamespaces not yet 
implemented")
+func (c *Catalog) CreateNamespace(_ context.Context, ns table.Identifier, 
props iceberg.Properties) error {
+       if len(ns) == 0 {
+               return errors.New("hadoop catalog: namespace identifier must 
not be empty")
+       }
+
+       if err := validateIdentifier(ns); err != nil {
+               return err
+       }
+
+       if len(props) > 0 {
+               return errors.New("hadoop catalog: namespace properties are not 
supported")
+       }
+
+       path := c.namespaceToPath(ns)
+
+       if err := os.Mkdir(path, 0o755); err != nil {
+               if os.IsExist(err) {
+                       return fmt.Errorf("%w: %s", 
catalog.ErrNamespaceAlreadyExists, strings.Join(ns, "."))
+               }
+
+               return fmt.Errorf("hadoop catalog: failed to create namespace: 
%w", err)
+       }
+
+       return nil
 }
 
-func (c *Catalog) CreateNamespace(_ context.Context, _ table.Identifier, _ 
iceberg.Properties) error {
-       return errors.New("hadoop catalog: CreateNamespace not yet implemented")
+func (c *Catalog) DropNamespace(_ context.Context, ns table.Identifier) error {
+       if len(ns) == 0 {
+               return errors.New("hadoop catalog: namespace identifier must 
not be empty")
+       }
+
+       if err := validateIdentifier(ns); err != nil {
+               return err
+       }
+
+       path := c.namespaceToPath(ns)
+
+       if _, err := os.Stat(path); os.IsNotExist(err) {
+               return fmt.Errorf("%w: %s", catalog.ErrNoSuchNamespace, 
strings.Join(ns, "."))
+       }
+
+       entries, err := os.ReadDir(path)
+       if err != nil {
+               return fmt.Errorf("hadoop catalog: failed to read namespace 
directory: %w", err)
+       }
+
+       if len(entries) > 0 {
+               return fmt.Errorf("%w: %s", catalog.ErrNamespaceNotEmpty, 
strings.Join(ns, "."))
+       }
+
+       return os.Remove(path)
 }
 
-func (c *Catalog) DropNamespace(_ context.Context, _ table.Identifier) error {
-       return errors.New("hadoop catalog: DropNamespace not yet implemented")
+func (c *Catalog) CheckNamespaceExists(_ context.Context, ns table.Identifier) 
(bool, error) {
+       if err := validateIdentifier(ns); err != nil {
+               return false, err
+       }
+
+       path := c.namespaceToPath(ns)
+
+       info, err := os.Stat(path)
+       if os.IsNotExist(err) {
+               return false, nil
+       }
+
+       if err != nil {
+               return false, err
+       }
+
+       return info.IsDir(), nil
 }
 
-func (c *Catalog) CheckNamespaceExists(_ context.Context, _ table.Identifier) 
(bool, error) {
-       return false, errors.New("hadoop catalog: CheckNamespaceExists not yet 
implemented")
+func (c *Catalog) ListNamespaces(_ context.Context, parent table.Identifier) 
([]table.Identifier, error) {
+       if len(parent) > 0 {
+               if err := validateIdentifier(parent); err != nil {
+                       return nil, err
+               }
+       }
+
+       var path string
+
+       if len(parent) == 0 {
+               path = c.warehouse
+       } else {
+               path = c.namespaceToPath(parent)
+
+               info, err := os.Stat(path)
+               if os.IsNotExist(err) || (err == nil && !info.IsDir()) {
+                       return nil, fmt.Errorf("%w: %s", 
catalog.ErrNoSuchNamespace, strings.Join(parent, "."))
+               }
+       }
+
+       entries, err := os.ReadDir(path)
+       if err != nil {
+               return nil, fmt.Errorf("hadoop catalog: failed to read 
directory: %w", err)
+       }
+
+       result := []table.Identifier{}
+       for _, e := range entries {
+               if !e.IsDir() {
+                       continue
+               }
+
+               child := filepath.Join(path, e.Name())
+               if isTableDir(child) {
+                       continue
+               }
+
+               result = append(result, table.Identifier{e.Name()})
+       }
+
+       return result, nil
 }
 
-func (c *Catalog) LoadNamespaceProperties(_ context.Context, _ 
table.Identifier) (iceberg.Properties, error) {
-       return nil, errors.New("hadoop catalog: LoadNamespaceProperties not yet 
implemented")
+func (c *Catalog) LoadNamespaceProperties(_ context.Context, ns 
table.Identifier) (iceberg.Properties, error) {
+       if err := validateIdentifier(ns); err != nil {
+               return nil, err
+       }
+
+       path := c.namespaceToPath(ns)
+
+       info, err := os.Stat(path)
+       if os.IsNotExist(err) || (err == nil && !info.IsDir()) {
+               return nil, fmt.Errorf("%w: %s", catalog.ErrNoSuchNamespace, 
strings.Join(ns, "."))
+       }
+
+       if err != nil {
+               return nil, fmt.Errorf("hadoop catalog: failed to stat 
namespace: %w", err)
+       }
+
+       return iceberg.Properties{"location": "file://" + path}, nil

Review Comment:
   `"file://" + path` is pretty fragile: spaces / `#` / `?`, relative paths, 
and Windows paths can all produce bad URIs. Either build it through 
`url.URL{Scheme: "file", Path: ...}` after normalizing the warehouse to an 
absolute path in `NewCatalog`, or drop the synthetic `location` entirely and 
return empty properties. If we keep the synthetic location, I'd add a doc 
comment saying this is a Go-only convenience and that user namespace properties 
are not persisted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to