Re: [I] About /v1/config REST API endpoint [iceberg]
nastra commented on issue #9880: URL: https://github.com/apache/iceberg/issues/9880#issuecomment-1985238077 The `warehouse` query parameter for that endpoint refers to https://github.com/apache/iceberg/blob/5b84f34a5386fc61b17bfe7dc7c1cbe565550958/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L896-L906, indicating that the client can send the used warehouse location to the server to keep the location in-sync. The response in the config endpoint then has defaults and overrides, which can control how the catalog is configured. I don't think any other catalog properties make sense to pass as a query param, because `warehouse` might be the only one that can be configured on the client-side and might be used on the server-side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feature make oauth `audience` configurable [iceberg]
nastra commented on code in PR #9839: URL: https://github.com/apache/iceberg/pull/9839#discussion_r1517380056 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -394,20 +430,24 @@ public static class AuthSession { private volatile boolean keepRefreshed = true; private final String oauth2ServerUri; +private volatile Map optionalOAuthParams = ImmutableMap.of(); Review Comment: why does this have to be `volatile`? The others are `volatile` because they are being updated when the session token is refreshed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feature make oauth `audience` configurable [iceberg]
nastra commented on code in PR #9839: URL: https://github.com/apache/iceberg/pull/9839#discussion_r1517381648 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -1444,7 +1484,7 @@ public void testCatalogTokenRefresh(String oauth2ServerUri) { eq(HTTPMethod.POST), eq(oauth2ServerUri), any(), - Mockito.argThat(firstRefreshRequest::equals), Review Comment: unrelated change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feature make oauth `audience` configurable [iceberg]
nastra commented on code in PR #9839: URL: https://github.com/apache/iceberg/pull/9839#discussion_r1517389925 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -471,7 +515,13 @@ static void setTokenRefreshNumRetries(int retries) { */ public static AuthSession empty() { return new AuthSession( - ImmutableMap.of(), null, null, null, OAuth2Properties.CATALOG_SCOPE, null); + ImmutableMap.of(), + null, + null, + null, + OAuth2Properties.CATALOG_SCOPE, + null, + ImmutableMap.of()); Review Comment: it has become quite painful to add new functionality/parameters to `AuthSession`, since it requires adding a single param at a bunch of places. The most recent change was done by #8976. A while ago I experimented with an [`AuthConfig`](https://github.com/apache/iceberg/commit/3bed59907f4230d10a97742c58ab4b85ce7b1ebf#diff-d783470ff004c360395125d467f1aadd776c8c9e925ba52639a405a03a72b870R647), which would carry all parameters required for an `AuthSession`, so that it would be easier to add new configuration parameters to it. So I wonder if we should refactor `AuthSession` and its configuration so that it's less painful to add new stuff to it? /cc @danielcweeks @amogh-jahagirdar @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add test for renaming table to a non-existing namespace [iceberg]
nastra commented on PR #9895: URL: https://github.com/apache/iceberg/pull/9895#issuecomment-1985258825 thanks everyone for reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add test for renaming table to a non-existing namespace [iceberg]
nastra merged PR #9895: URL: https://github.com/apache/iceberg/pull/9895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feature make oauth `audience` configurable [iceberg]
nastra commented on code in PR #9839: URL: https://github.com/apache/iceberg/pull/9839#discussion_r1517380056 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -394,20 +430,24 @@ public static class AuthSession { private volatile boolean keepRefreshed = true; private final String oauth2ServerUri; +private volatile Map optionalOAuthParams = ImmutableMap.of(); Review Comment: why does this have to be `volatile`? The others are `volatile` because they are being updated when the session token is refreshed but I don't think this appleis for that map -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feature make oauth `audience` configurable [iceberg]
himadripal commented on code in PR #9839: URL: https://github.com/apache/iceberg/pull/9839#discussion_r1517402474 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -471,7 +515,13 @@ static void setTokenRefreshNumRetries(int retries) { */ public static AuthSession empty() { return new AuthSession( - ImmutableMap.of(), null, null, null, OAuth2Properties.CATALOG_SCOPE, null); + ImmutableMap.of(), + null, + null, + null, + OAuth2Properties.CATALOG_SCOPE, + null, + ImmutableMap.of()); Review Comment: +1 to the painful to add parameters. With this PR, I have tried to add all optional oauth parameters (including scope) in a map but having a class like `[AuthConfig](https://github.com/apache/iceberg/commit/3bed59907f4230d10a97742c58ab4b85ce7b1ebf#diff-d783470ff004c360395125d467f1aadd776c8c9e925ba52639a405a03a72b870R647),` which holds all params with validation logic and passing it around would be best. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feature make oauth `audience` configurable [iceberg]
himadripal commented on code in PR #9839: URL: https://github.com/apache/iceberg/pull/9839#discussion_r1517402474 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -471,7 +515,13 @@ static void setTokenRefreshNumRetries(int retries) { */ public static AuthSession empty() { return new AuthSession( - ImmutableMap.of(), null, null, null, OAuth2Properties.CATALOG_SCOPE, null); + ImmutableMap.of(), + null, + null, + null, + OAuth2Properties.CATALOG_SCOPE, + null, + ImmutableMap.of()); Review Comment: +1 to the painful to add parameters. With this PR, I have tried to add all optional oauth parameters (including scope) in a map but having a class like `[AuthConfig](https://github.com/apache/iceberg/commit/3bed59907f4230d10a97742c58ab4b85ce7b1ebf#diff-d783470ff004c360395125d467f1aadd776c8c9e925ba52639a405a03a72b870R647)` which holds all params with validation logic and passing it around would be best. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feature make oauth `audience` configurable [iceberg]
himadripal commented on code in PR #9839: URL: https://github.com/apache/iceberg/pull/9839#discussion_r1517402474 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -471,7 +515,13 @@ static void setTokenRefreshNumRetries(int retries) { */ public static AuthSession empty() { return new AuthSession( - ImmutableMap.of(), null, null, null, OAuth2Properties.CATALOG_SCOPE, null); + ImmutableMap.of(), + null, + null, + null, + OAuth2Properties.CATALOG_SCOPE, + null, + ImmutableMap.of()); Review Comment: +1 to the painful to add parameters. With this PR, I have tried to add all optional oauth parameters (including scope) in a map but having a class like `AuthConfig` which holds all params with validation logic and passing it around would be best. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feature make oauth `audience` configurable [iceberg]
nastra commented on PR #9839: URL: https://github.com/apache/iceberg/pull/9839#issuecomment-1985279889 overall I think adding these optional OAuth2 params make sense to me. I'll also link to the [OAuth2 spec](https://datatracker.ietf.org/doc/html/rfc8693#name-token-exchange-request-and-) for other reviewers to better understand what those optional parameters do. I think we should decide how we want to deal with `AuthSession` configuration paramters as it causes lots of changes to add a single parameter to `AuthSession`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feature make oauth `audience` configurable [iceberg]
himadripal commented on code in PR #9839: URL: https://github.com/apache/iceberg/pull/9839#discussion_r1517414956 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -257,8 +292,9 @@ private static Map tokenExchangeRequest( formData.put(ACTOR_TOKEN, actorToken); formData.put(ACTOR_TOKEN_TYPE, actorTokenType); } +formData.putAll(optionalParams); -return formData.build(); +return formData.buildKeepingLast(); Review Comment: please note, changed it to `buildKeepingLast()` , I noticed everywhere else `build` is used which throws exception if it encounters duplicate keys. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [feature request] easier API to set table properties [iceberg-python]
HonahX commented on issue #502: URL: https://github.com/apache/iceberg-python/issues/502#issuecomment-1985304133 Thanks @syun64 @kevinjqliu for the thoughts and explanation. Since users seem to benefit from dictionaries and kwargs, I think it is good idea to support both (as in #503 ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Set table properties with dictionary [iceberg-python]
HonahX merged PR #503: URL: https://github.com/apache/iceberg-python/pull/503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [feature request] easier API to set table properties [iceberg-python]
HonahX closed issue #502: [feature request] easier API to set table properties URL: https://github.com/apache/iceberg-python/issues/502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Imports decouple [iceberg-python]
HonahX merged PR #505: URL: https://github.com/apache/iceberg-python/pull/505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] About /v1/config REST API endpoint [iceberg]
ajantha-bhat commented on issue #9880: URL: https://github.com/apache/iceberg/issues/9880#issuecomment-1985360044 Thanks. But may be we should use a separate PATCH/PUT instead of sending in a GET request as we want to update the resource here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Free disk space before running action in Spark CI [iceberg]
nastra commented on PR #9786: URL: https://github.com/apache/iceberg/pull/9786#issuecomment-1985450508 Seems to be still an issue in the latest runs: https://github.com/apache/iceberg/actions/runs/8200693411/job/22427953001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
tomtongue commented on PR #9892: URL: https://github.com/apache/iceberg/pull/9892#issuecomment-1985474689 @nastra Update the snapshot files. Could you review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Handle listing tables/views in empty namespaces [iceberg]
nastra commented on code in PR #9890: URL: https://github.com/apache/iceberg/pull/9890#discussion_r1517567287 ## core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java: ## @@ -168,6 +168,14 @@ protected boolean supportsNamesWithDot() { return true; } + protected boolean supportsCreatingEmptyNamespace() { Review Comment: currently, the only reason that two flags exist is because Nessie's behavior is different from JDBC + InMemory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Handle listing tables/views in empty namespaces [iceberg]
nastra commented on code in PR #9890: URL: https://github.com/apache/iceberg/pull/9890#discussion_r1517571160 ## core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java: ## @@ -987,6 +995,61 @@ public void testListTables() { assertEmpty("Should not contain ns_2.table_1 after drop", catalog, ns2); } + @Test + public void listNamespacesWithEmptyNamespace() { Review Comment: this behavior applies for all catalogs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Handle listing tables/views in empty namespaces [iceberg]
nastra commented on code in PR #9890: URL: https://github.com/apache/iceberg/pull/9890#discussion_r1517613054 ## core/src/main/java/org/apache/iceberg/inmemory/InMemoryCatalog.java: ## @@ -395,7 +395,9 @@ public void doCommit(TableMetadata base, TableMetadata metadata) { String oldLocation = base == null ? null : base.metadataFileLocation(); synchronized (InMemoryCatalog.this) { -if (null == base && !namespaceExists(tableIdentifier.namespace())) { +if (null == base +&& !namespaceExists(tableIdentifier.namespace()) +&& !tableIdentifier.namespace().isEmpty()) { Review Comment: I've reverted this change as it should actually require that the empty namespace has been created first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517614107 ## core/src/test/java/org/apache/iceberg/LocalTableOperations.java: ## @@ -63,7 +65,9 @@ public String metadataFileLocation(String fileName) { fileName, name -> { try { -return temp.newFile(name).getAbsolutePath(); +return java.nio.file.Files.createTempDirectory(temp, "junit") Review Comment: shouldn't this create a temp file with the given name rather than a temp directory? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517614411 ## core/src/test/java/org/apache/iceberg/TestSnapshotJson.java: ## @@ -19,23 +19,24 @@ package org.apache.iceberg; import static org.apache.iceberg.Files.localInput; +import static org.assertj.core.api.Assertions.assertThat; import java.io.File; import java.io.IOException; +import java.nio.file.Path; import java.util.List; -import java.util.UUID; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.assertj.core.api.Assertions; -import org.junit.Assert; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; public class TestSnapshotJson { - @Rule public TemporaryFolder temp = new TemporaryFolder(); + @TempDir private static Path temp; Review Comment: ```suggestion @TempDir private Path temp; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517614955 ## core/src/test/java/org/apache/iceberg/TestSnapshotJson.java: ## @@ -19,23 +19,24 @@ package org.apache.iceberg; import static org.apache.iceberg.Files.localInput; +import static org.assertj.core.api.Assertions.assertThat; import java.io.File; import java.io.IOException; +import java.nio.file.Path; import java.util.List; -import java.util.UUID; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.assertj.core.api.Assertions; -import org.junit.Assert; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; public class TestSnapshotJson { - @Rule public TemporaryFolder temp = new TemporaryFolder(); + @TempDir private static Path temp; - public TableOperations ops = new LocalTableOperations(temp); + public TableOperations ops = + new LocalTableOperations(java.nio.file.Files.createTempDirectory(temp, "junit")); Review Comment: why does this create a new temp directory? I think it should just pass `temp` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517615186 ## core/src/test/java/org/apache/iceberg/TestSnapshotJson.java: ## @@ -19,23 +19,24 @@ package org.apache.iceberg; import static org.apache.iceberg.Files.localInput; +import static org.assertj.core.api.Assertions.assertThat; import java.io.File; import java.io.IOException; +import java.nio.file.Path; import java.util.List; -import java.util.UUID; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.assertj.core.api.Assertions; -import org.junit.Assert; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; public class TestSnapshotJson { - @Rule public TemporaryFolder temp = new TemporaryFolder(); + @TempDir private static Path temp; - public TableOperations ops = new LocalTableOperations(temp); + public TableOperations ops = + new LocalTableOperations(java.nio.file.Files.createTempDirectory(temp, "junit")); + + public TestSnapshotJson() throws IOException {} Review Comment: why is the constructor here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517616912 ## core/src/test/java/org/apache/iceberg/TestSnapshotJson.java: ## @@ -149,29 +182,34 @@ public void testJsonConversionWithV1Manifests() { timestampMillis); String json = SnapshotParser.toJson(expected, true); -Assertions.assertThat(json).isEqualTo(expectedJson); +assertThat(json).isEqualTo(expectedJson); Snapshot snapshot = SnapshotParser.fromJson(json); -Assertions.assertThat(snapshot).isEqualTo(expected); - -Assert.assertEquals("Sequence number should default to 0 for v1", 0, snapshot.sequenceNumber()); -Assert.assertEquals("Snapshot ID should match", expected.snapshotId(), snapshot.snapshotId()); -Assert.assertEquals( -"Timestamp should match", expected.timestampMillis(), snapshot.timestampMillis()); -Assert.assertEquals("Parent ID should match", expected.parentId(), snapshot.parentId()); -Assert.assertEquals( -"Manifest list should match", -expected.manifestListLocation(), -snapshot.manifestListLocation()); -Assert.assertEquals( -"Files should match", expected.allManifests(ops.io()), snapshot.allManifests(ops.io())); -Assert.assertEquals("Operation should match", expected.operation(), snapshot.operation()); -Assert.assertEquals("Summary should match", expected.summary(), snapshot.summary()); -Assert.assertEquals("Schema ID should match", expected.schemaId(), snapshot.schemaId()); +assertThat(snapshot).isEqualTo(expected); + +assertThat(snapshot.sequenceNumber()).isEqualTo(0); +assertThat(snapshot.snapshotId()).isEqualTo(expected.snapshotId()); + assertThat(snapshot.timestampMillis()).isEqualTo(expected.timestampMillis()); +assertThat(snapshot.parentId()).isEqualTo(expected.parentId()); + assertThat(snapshot.manifestListLocation()).isEqualTo(expected.manifestListLocation()); + +assertThat(snapshot.allManifests(ops.io())).hasSize(2); + +for (int i = 0; i < snapshot.allManifests(ops.io()).size(); i++) { + ManifestFile actualManifestFile = snapshot.allManifests(ops.io()).get(i); + ManifestFile expectedManifestFile = expected.allManifests(ops.io()).get(i); + + assertThat(actualManifestFile.content()).isEqualTo(expectedManifestFile.content()); + assertThat(actualManifestFile.path()).isEqualTo(expectedManifestFile.path()); +} + +assertThat(snapshot.operation()).isEqualTo(expected.operation()); +assertThat(snapshot.summary()).isEqualTo(expected.summary()); +assertThat(snapshot.schemaId()).isEqualTo(expected.schemaId()); } private String createManifestListWithManifestFiles(long snapshotId, Long parentSnapshotId) throws IOException { -File manifestList = temp.newFile("manifests" + UUID.randomUUID()); +File manifestList = java.nio.file.Files.createTempDirectory(temp, "junit").toFile(); Review Comment: this should create a new temp file instead of a temp directory -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517617834 ## core/src/test/java/org/apache/iceberg/TestSnapshotJson.java: ## @@ -49,12 +50,23 @@ public void testJsonConversion() throws IOException { String json = SnapshotParser.toJson(expected); Snapshot snapshot = SnapshotParser.fromJson(json); -Assert.assertEquals("Snapshot ID should match", expected.snapshotId(), snapshot.snapshotId()); -Assert.assertEquals( -"Files should match", expected.allManifests(ops.io()), snapshot.allManifests(ops.io())); -Assert.assertNull("Operation should be null", snapshot.operation()); -Assert.assertNull("Summary should be null", snapshot.summary()); -Assert.assertEquals("Schema ID should match", Integer.valueOf(1), snapshot.schemaId()); +assertThat(snapshot.snapshotId()).isEqualTo(expected.snapshotId()); +assertThat(snapshot.allManifests(ops.io())).hasSize(2); + +for (int i = 0; i < snapshot.allManifests(ops.io()).size(); i++) { Review Comment: are these new checks? can you elaborate why those are being added? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517620107 ## core/src/test/java/org/apache/iceberg/TestTableMetadata.java: ## @@ -87,7 +86,7 @@ public class TestTableMetadata { .desc(Expressions.bucket("z", 4), NullOrder.NULLS_LAST) .build(); - @Rule public TemporaryFolder temp = new TemporaryFolder(); + @TempDir protected static Path temp; Review Comment: ```suggestion @TempDir private Path temp; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517620401 ## core/src/test/java/org/apache/iceberg/TestTableMetadata.java: ## @@ -1731,7 +1730,7 @@ public void testNoTrailingLocationSlash() { private String createManifestListWithManifestFile( long snapshotId, Long parentSnapshotId, String manifestFile) throws IOException { -File manifestList = temp.newFile("manifests" + UUID.randomUUID()); +File manifestList = java.nio.file.Files.createTempDirectory(temp, "junit").toFile(); Review Comment: same as above, shouldn't this create a temp file with the given name rather than a temp folder? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517621786 ## core/src/test/java/org/apache/iceberg/TestSnapshotManager.java: ## @@ -177,120 +177,115 @@ public void testCherryPickFromBranch() { long lastSnapshotId = table.currentSnapshot().snapshotId(); // pick the snapshot into the current state -Assertions.assertThatThrownBy( -() -> table.manageSnapshots().cherrypick(replaceSnapshotId).commit()) +assertThatThrownBy(() -> table.manageSnapshots().cherrypick(replaceSnapshotId).commit()) .isInstanceOf(ValidationException.class) .hasMessageStartingWith( "Cannot cherry-pick overwrite not based on an ancestor of the current state"); -Assert.assertEquals( -"Failed cherry-pick should not change the table state", -lastSnapshotId, -table.currentSnapshot().snapshotId()); +assertThat(table.currentSnapshot().snapshotId()) +.as("Failed cherry-pick should not change the table state") +.isEqualTo(lastSnapshotId); validateTableFiles(table, FILE_A); } - @Test + @TestTemplate public void testCherryPickOverwrite() { table.newAppend().appendFile(FILE_A).commit(); // stage an overwrite to replace FILE_A table.newOverwrite().deleteFile(FILE_A).addFile(REPLACEMENT_FILE_A).stageOnly().commit(); Snapshot staged = Iterables.getLast(table.snapshots()); -Assert.assertEquals( -"Should find the staged overwrite snapshot", DataOperations.OVERWRITE, staged.operation()); +assertThat(staged.operation()) +.as("Should find the staged overwrite snapshot") +.isEqualTo(DataOperations.OVERWRITE); // add another append so that the original commit can't be fast-forwarded table.newAppend().appendFile(FILE_B).commit(); long lastSnapshotId = table.currentSnapshot().snapshotId(); // pick the snapshot into the current state -Assertions.assertThatThrownBy( -() -> table.manageSnapshots().cherrypick(staged.snapshotId()).commit()) +assertThatThrownBy(() -> table.manageSnapshots().cherrypick(staged.snapshotId()).commit()) .isInstanceOf(ValidationException.class) .hasMessageEndingWith("not append, dynamic overwrite, or fast-forward"); -Assert.assertEquals( -"Failed cherry-pick should not change the table state", -lastSnapshotId, -table.currentSnapshot().snapshotId()); +assertThat(table.currentSnapshot().snapshotId()) +.as("Failed cherry-pick should not change the table state") +.isEqualTo(lastSnapshotId); validateTableFiles(table, FILE_A, FILE_B); } - @Test + @TestTemplate public void testCreateBranch() { table.newAppend().appendFile(FILE_A).commit(); long snapshotId = table.currentSnapshot().snapshotId(); // Test a basic case of creating a branch table.manageSnapshots().createBranch("branch1", snapshotId).commit(); SnapshotRef expectedBranch = table.ops().refresh().ref("branch1"); -Assert.assertTrue( -expectedBranch != null -&& expectedBranch.equals(SnapshotRef.branchBuilder(snapshotId).build())); +assertThat(expectedBranch).isNotNull(); + assertThat(expectedBranch).isEqualTo(SnapshotRef.branchBuilder(snapshotId).build()); Review Comment: ```suggestion assertThat(expectedBranch).isNotNull().isEqualTo(SnapshotRef.branchBuilder(snapshotId).build()); ``` ## core/src/test/java/org/apache/iceberg/TestSnapshotManager.java: ## @@ -177,120 +177,115 @@ public void testCherryPickFromBranch() { long lastSnapshotId = table.currentSnapshot().snapshotId(); // pick the snapshot into the current state -Assertions.assertThatThrownBy( -() -> table.manageSnapshots().cherrypick(replaceSnapshotId).commit()) +assertThatThrownBy(() -> table.manageSnapshots().cherrypick(replaceSnapshotId).commit()) .isInstanceOf(ValidationException.class) .hasMessageStartingWith( "Cannot cherry-pick overwrite not based on an ancestor of the current state"); -Assert.assertEquals( -"Failed cherry-pick should not change the table state", -lastSnapshotId, -table.currentSnapshot().snapshotId()); +assertThat(table.currentSnapshot().snapshotId()) +.as("Failed cherry-pick should not change the table state") +.isEqualTo(lastSnapshotId); validateTableFiles(table, FILE_A); } - @Test + @TestTemplate public void testCherryPickOverwrite() { table.newAppend().appendFile(FILE_A).commit(); // stage an overwrite to replace FILE_A table.newOverwrite().deleteFile(FILE_A).addFile(REPLACEMENT_FILE_A).stageOnly().commit(); Snapshot staged = Iterables.getLast(table.snapshots()); -Assert.assertEquals( -"Should find the staged overwrite snapshot", DataOperations.OVERWRITE, staged.oper
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517622216 ## core/src/test/java/org/apache/iceberg/TestSnapshotManager.java: ## @@ -177,120 +177,115 @@ public void testCherryPickFromBranch() { long lastSnapshotId = table.currentSnapshot().snapshotId(); // pick the snapshot into the current state -Assertions.assertThatThrownBy( -() -> table.manageSnapshots().cherrypick(replaceSnapshotId).commit()) +assertThatThrownBy(() -> table.manageSnapshots().cherrypick(replaceSnapshotId).commit()) .isInstanceOf(ValidationException.class) .hasMessageStartingWith( "Cannot cherry-pick overwrite not based on an ancestor of the current state"); -Assert.assertEquals( -"Failed cherry-pick should not change the table state", -lastSnapshotId, -table.currentSnapshot().snapshotId()); +assertThat(table.currentSnapshot().snapshotId()) +.as("Failed cherry-pick should not change the table state") +.isEqualTo(lastSnapshotId); validateTableFiles(table, FILE_A); } - @Test + @TestTemplate public void testCherryPickOverwrite() { table.newAppend().appendFile(FILE_A).commit(); // stage an overwrite to replace FILE_A table.newOverwrite().deleteFile(FILE_A).addFile(REPLACEMENT_FILE_A).stageOnly().commit(); Snapshot staged = Iterables.getLast(table.snapshots()); -Assert.assertEquals( -"Should find the staged overwrite snapshot", DataOperations.OVERWRITE, staged.operation()); +assertThat(staged.operation()) +.as("Should find the staged overwrite snapshot") +.isEqualTo(DataOperations.OVERWRITE); // add another append so that the original commit can't be fast-forwarded table.newAppend().appendFile(FILE_B).commit(); long lastSnapshotId = table.currentSnapshot().snapshotId(); // pick the snapshot into the current state -Assertions.assertThatThrownBy( -() -> table.manageSnapshots().cherrypick(staged.snapshotId()).commit()) +assertThatThrownBy(() -> table.manageSnapshots().cherrypick(staged.snapshotId()).commit()) .isInstanceOf(ValidationException.class) .hasMessageEndingWith("not append, dynamic overwrite, or fast-forward"); -Assert.assertEquals( -"Failed cherry-pick should not change the table state", -lastSnapshotId, -table.currentSnapshot().snapshotId()); +assertThat(table.currentSnapshot().snapshotId()) +.as("Failed cherry-pick should not change the table state") +.isEqualTo(lastSnapshotId); validateTableFiles(table, FILE_A, FILE_B); } - @Test + @TestTemplate public void testCreateBranch() { table.newAppend().appendFile(FILE_A).commit(); long snapshotId = table.currentSnapshot().snapshotId(); // Test a basic case of creating a branch table.manageSnapshots().createBranch("branch1", snapshotId).commit(); SnapshotRef expectedBranch = table.ops().refresh().ref("branch1"); -Assert.assertTrue( -expectedBranch != null -&& expectedBranch.equals(SnapshotRef.branchBuilder(snapshotId).build())); +assertThat(expectedBranch).isNotNull(); + assertThat(expectedBranch).isEqualTo(SnapshotRef.branchBuilder(snapshotId).build()); } - @Test + @TestTemplate public void testCreateBranchWithoutSnapshotId() { table.newAppend().appendFile(FILE_A).commit(); long snapshotId = table.currentSnapshot().snapshotId(); // Test a basic case of creating a branch table.manageSnapshots().createBranch("branch1").commit(); SnapshotRef actualBranch = table.ops().refresh().ref("branch1"); -Assertions.assertThat(actualBranch).isNotNull(); - Assertions.assertThat(actualBranch).isEqualTo(SnapshotRef.branchBuilder(snapshotId).build()); +assertThat(actualBranch).isNotNull(); + assertThat(actualBranch).isEqualTo(SnapshotRef.branchBuilder(snapshotId).build()); Review Comment: both checks can be combined -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517622548 ## core/src/test/java/org/apache/iceberg/TestSnapshotManager.java: ## @@ -301,32 +296,31 @@ public void testCreateBranchFailsWhenRefAlreadyExists() { .hasMessage("Ref branch2 already exists"); } - @Test + @TestTemplate public void testCreateTag() { table.newAppend().appendFile(FILE_A).commit(); long snapshotId = table.currentSnapshot().snapshotId(); // Test a basic case of creating a tag table.manageSnapshots().createTag("tag1", snapshotId).commit(); SnapshotRef expectedTag = table.ops().refresh().ref("tag1"); -Assert.assertTrue( -expectedTag != null && expectedTag.equals(SnapshotRef.tagBuilder(snapshotId).build())); +assertThat(expectedTag).isNotNull(); + assertThat(expectedTag).isEqualTo(SnapshotRef.tagBuilder(snapshotId).build()); Review Comment: same as above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
nastra commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517624708 ## core/src/test/java/org/apache/iceberg/TestSnapshotManager.java: ## @@ -700,81 +689,82 @@ public void testAttemptToRollbackToCurrentSnapshot() { table.manageSnapshots().rollbackTo(currentSnapshotId).commit(); } - @Test + @TestTemplate public void testSnapshotManagerThroughTransaction() { table.newAppend().appendFile(FILE_A).commit(); Snapshot snapshotAfterFirstAppend = readMetadata().currentSnapshot(); validateSnapshot(null, snapshotAfterFirstAppend, FILE_A); table.newAppend().appendFile(FILE_B).commit(); validateSnapshot(snapshotAfterFirstAppend, readMetadata().currentSnapshot(), FILE_B); -Assert.assertEquals("Table should be on version 2 after appending twice", 2, (int) version()); +assertThat(version()).as("Table should be on version 2 after appending twice").isEqualTo(2); TableMetadata base = readMetadata(); Transaction txn = table.newTransaction(); -Assert.assertSame( -"Base metadata should not change when transaction is created", base, readMetadata()); -Assert.assertEquals( -"Table should be on version 2 after creating transaction", 2, (int) version()); +assertThat(readMetadata()) +.as("Base metadata should not change when transaction is created") +.isEqualTo(base); +assertThat(version()) +.as("Table should be on version 2 after creating transaction") +.isEqualTo(2); ManageSnapshots manageSnapshots = txn.manageSnapshots(); -Assert.assertNotNull(manageSnapshots); +assertThat(manageSnapshots).isNotNull(); -Assert.assertSame( -"Base metadata should not change when manageSnapshots is created", base, readMetadata()); -Assert.assertEquals( -"Table should be on version 2 after creating manageSnapshots", 2, (int) version()); +assertThat(readMetadata()) +.as("Base metadata should not change when manageSnapshots is created") +.isEqualTo(base); Review Comment: this should be `isSameAs()` as this check actually verifies that both objects are referentially equal. Same for the other places that previously used `assertSame()` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Handle listing tables/views in empty namespaces [iceberg]
nastra commented on code in PR #9890: URL: https://github.com/apache/iceberg/pull/9890#discussion_r1517627456 ## core/src/main/java/org/apache/iceberg/inmemory/InMemoryCatalog.java: ## @@ -288,6 +288,7 @@ public List listNamespaces(Namespace namespace) throws NoSuchNamespac List filteredNamespaces = namespaces.keySet().stream() +.filter(n -> !n.isEmpty()) Review Comment: this would previously fail in L303 when calling `listNamespaces(Namespace.empty())` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Library public api isolation and import decoupling [iceberg-python]
ndrluis commented on issue #499: URL: https://github.com/apache/iceberg-python/issues/499#issuecomment-1985722458 @kevinjqliu Agreed, I haven't found anything that addresses this issue or provides a warning when it occurs. It seems like something the community has accepted, but I believe it's a problem. I will open an issue on Ruff to ask about this and understand how we can write a rule. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.43.0 [iceberg]
nastra closed pull request #9858: Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.43.0 URL: https://github.com/apache/iceberg/pull/9858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.43.0 [iceberg]
dependabot[bot] commented on PR #9858: URL: https://github.com/apache/iceberg/pull/9858#issuecomment-1985726060 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump spring-boot from 2.5.4 to 3.2.3 [iceberg]
nastra closed pull request #9796: Build: Bump spring-boot from 2.5.4 to 3.2.3 URL: https://github.com/apache/iceberg/pull/9796 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump spring-boot from 2.5.4 to 3.2.3 [iceberg]
dependabot[bot] commented on PR #9796: URL: https://github.com/apache/iceberg/pull/9796#issuecomment-1985729696 OK, I won't notify you again about this release, but will get in touch when a new version is available. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.33.0 [iceberg]
nastra closed pull request #9794: Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.33.0 URL: https://github.com/apache/iceberg/pull/9794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.33.0 [iceberg]
dependabot[bot] commented on PR #9794: URL: https://github.com/apache/iceberg/pull/9794#issuecomment-1985744377 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump net.snowflake:snowflake-jdbc from 3.14.5 to 3.15.0 [iceberg]
nastra closed pull request #9797: Build: Bump net.snowflake:snowflake-jdbc from 3.14.5 to 3.15.0 URL: https://github.com/apache/iceberg/pull/9797 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump net.snowflake:snowflake-jdbc from 3.14.5 to 3.15.0 [iceberg]
dependabot[bot] commented on PR #9797: URL: https://github.com/apache/iceberg/pull/9797#issuecomment-1985745044 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.springframework:spring-web from 5.3.30 to 6.1.4 [iceberg]
nastra closed pull request #9748: Build: Bump org.springframework:spring-web from 5.3.30 to 6.1.4 URL: https://github.com/apache/iceberg/pull/9748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.springframework:spring-web from 5.3.30 to 6.1.4 [iceberg]
dependabot[bot] commented on PR #9748: URL: https://github.com/apache/iceberg/pull/9748#issuecomment-1985745769 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.google.errorprone:error_prone_annotations from 2.24.1 to 2.25.0 [iceberg]
nastra closed pull request #9746: Build: Bump com.google.errorprone:error_prone_annotations from 2.24.1 to 2.25.0 URL: https://github.com/apache/iceberg/pull/9746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.google.errorprone:error_prone_annotations from 2.24.1 to 2.25.0 [iceberg]
dependabot[bot] commented on PR #9746: URL: https://github.com/apache/iceberg/pull/9746#issuecomment-1985746185 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.openapitools:openapi-generator-gradle-plugin from 6.6.0 to 7.3.0 [iceberg]
nastra closed pull request #9703: Build: Bump org.openapitools:openapi-generator-gradle-plugin from 6.6.0 to 7.3.0 URL: https://github.com/apache/iceberg/pull/9703 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.openapitools:openapi-generator-gradle-plugin from 6.6.0 to 7.3.0 [iceberg]
dependabot[bot] commented on PR #9703: URL: https://github.com/apache/iceberg/pull/9703#issuecomment-1985747945 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] HIVE-28021: escape percent symbol [iceberg]
nastra commented on code in PR #9667: URL: https://github.com/apache/iceberg/pull/9667#discussion_r1517758155 ## mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java: ## @@ -966,4 +966,41 @@ private String getCurrentSnapshotForHiveCatalogTable(org.apache.iceberg.Table ic return ((BaseMetastoreTableOperations) ((BaseTable) icebergTable).operations()) .currentMetadataLocation(); } + + @Test + public void testCreateTableWithPercentInName() throws IOException { +Assume.assumeTrue("This test requires Hive Version 4.", HiveVersion.min(HiveVersion.HIVE_4)); Review Comment: is this test even applicable then for the Iceberg codebase? Iceberg + Hive4 are maintained by the Hive community in https://github.com/apache/hive/tree/master/iceberg -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Add hive metastore catalog support [iceberg-rust]
marvinlanhenke commented on issue #113: URL: https://github.com/apache/iceberg-rust/issues/113#issuecomment-1985760454 @Xuanwo @liurenjie1024 ...just a quick update and a request for feedback. I 'finished' the work on all the namespace operations. Since this would be my first contribution to this repo, I would appreciate it if someone could take a quick look at the current status. I just want to make sure it aligns with the rest of the codebase, before I put in the effort to implement the rest of the catalog. https://github.com/marvinlanhenke/iceberg-rust/blob/hms/crates/catalog/hms/src/catalog.rs#L260-L428 sidenote: For the hive-metastore docker-compose, I used a file I had from a previous project; however this needs some optimization since the connection to the mysql backend slows down the integration tests. My approach would be to use `derby` instead of mysql to speed things up - but I havent gotten around to take a look at it. Anyway best regards and thanks in advance for the feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.diffplug.spotless:spotless-plugin-gradle from 6.13.0 to 6.25.0 [iceberg]
dependabot[bot] commented on PR #9569: URL: https://github.com/apache/iceberg/pull/9569#issuecomment-1985765560 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.diffplug.spotless:spotless-plugin-gradle from 6.13.0 to 6.25.0 [iceberg]
nastra closed pull request #9569: Build: Bump com.diffplug.spotless:spotless-plugin-gradle from 6.13.0 to 6.25.0 URL: https://github.com/apache/iceberg/pull/9569 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
tomtongue commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517821079 ## core/src/test/java/org/apache/iceberg/LocalTableOperations.java: ## @@ -63,7 +65,9 @@ public String metadataFileLocation(String fileName) { fileName, name -> { try { -return temp.newFile(name).getAbsolutePath(); +return java.nio.file.Files.createTempDirectory(temp, "junit") Review Comment: thanks. yes, that should create a temp file. Will fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
tomtongue commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517827090 ## core/src/test/java/org/apache/iceberg/TestSnapshotJson.java: ## @@ -19,23 +19,24 @@ package org.apache.iceberg; import static org.apache.iceberg.Files.localInput; +import static org.assertj.core.api.Assertions.assertThat; import java.io.File; import java.io.IOException; +import java.nio.file.Path; import java.util.List; -import java.util.UUID; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.assertj.core.api.Assertions; -import org.junit.Assert; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; public class TestSnapshotJson { - @Rule public TemporaryFolder temp = new TemporaryFolder(); + @TempDir private static Path temp; - public TableOperations ops = new LocalTableOperations(temp); + public TableOperations ops = + new LocalTableOperations(java.nio.file.Files.createTempDirectory(temp, "junit")); + + public TestSnapshotJson() throws IOException {} Review Comment: It doesn't need. As the above comment, I previously set `createTempDirectory` for `LocalTableOperations`, then the `createTempDirectory` needs the `IOException` handling. But as your feedback, `temp` should be set for `LocalTableOperations` and this `IOException` is not needed. Remove in the next commit. ## core/src/test/java/org/apache/iceberg/TestSnapshotJson.java: ## @@ -19,23 +19,24 @@ package org.apache.iceberg; import static org.apache.iceberg.Files.localInput; +import static org.assertj.core.api.Assertions.assertThat; import java.io.File; import java.io.IOException; +import java.nio.file.Path; import java.util.List; -import java.util.UUID; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; -import org.assertj.core.api.Assertions; -import org.junit.Assert; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; public class TestSnapshotJson { - @Rule public TemporaryFolder temp = new TemporaryFolder(); + @TempDir private static Path temp; - public TableOperations ops = new LocalTableOperations(temp); + public TableOperations ops = + new LocalTableOperations(java.nio.file.Files.createTempDirectory(temp, "junit")); + + public TestSnapshotJson() throws IOException {} Review Comment: It doesn't need. As the above comment, I previously set `createTempDirectory` for `LocalTableOperations`, then the `createTempDirectory` needs the `IOException` handling. But as your feedback, `temp` should be set for `LocalTableOperations` and this `IOException` is not needed. Remove it in the next commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Parquet bloom filter doesn't work with nested fields [iceberg]
hussein-awala opened a new issue, #9898: URL: https://github.com/apache/iceberg/issues/9898 ### Apache Iceberg version 1.4.3 (latest release) ### Query engine Spark ### Please describe the bug 🐞 I have an Iceberg table, and I want to create two bloom filters on a root string column and nested string column in a struct, I've set the properties `write.parquet.bloom-filter-enabled.column.a` and `write.parquet.bloom-filter-enabled.column.b.c` to `true`, and I checked with `parquet-cli`: ```bash $ parquet bloom-filter /path/to/file.parquet -c a -v Row group 0: value NOT exists. $ parquet bloom-filter /path/to/file.parquet -c a -v Row group 0: value maybe exists. $ parquet bloom-filter /path/to/file.parquet -c b.c -v Row group 0: column b.c has no bloom filter # check if it's an issue with column name parsing: $ parquet bloom-filter /path/to/file.parquet -c b.d -v Argument error: Schema doesn't have column: b.d ``` However, I tried with Spark and parquet, and it worker without any issue: ```scala import org.apache.spark.sql.types._ import org.apache.spark.sql.Row import spark.implicits._ val schema = StructType(Array( StructField("a", StringType, true), StructField("b", StringType, true), StructField("nested", StructType(Array( StructField("c", StringType, true), StructField("d", StringType, true) )), true) )) val data = Seq( Row("1", "25", Row("100", "a")), Row("2", "30", Row("200", "b")), Row("3", "35", Row("300", "c")), Row("4", "40", Row("400", "d")), Row("5", "45", Row("500", "e")) ) val df = spark.createDataFrame( spark.sparkContext.parallelize(data), schema ) df.write.format("parquet") .option("parquet.bloom.filter.enabled#a", "true") .option("parquet.bloom.filter.enabled#nested.c", "true") .save("bloom_parquet") ``` Check with `parquet-cli` ```bash $ github parquet bloom-filter bloom_parquet/part-2-9fac4c38-7113-45df-8db9-d96c3f6b6a8e-c000.snappy.parquet -c a -v "1" Row group 0: value 1 maybe exists. $ github parquet bloom-filter bloom_parquet/part-2-9fac4c38-7113-45df-8db9-d96c3f6b6a8e-c000.snappy.parquet -c nested.c -v "1" Row group 0: value 1 NOT exists. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
tomtongue commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517838222 ## core/src/test/java/org/apache/iceberg/TestTableMetadata.java: ## @@ -1731,7 +1730,7 @@ public void testNoTrailingLocationSlash() { private String createManifestListWithManifestFile( long snapshotId, Long parentSnapshotId, String manifestFile) throws IOException { -File manifestList = temp.newFile("manifests" + UUID.randomUUID()); +File manifestList = java.nio.file.Files.createTempDirectory(temp, "junit").toFile(); Review Comment: yes it should be a temp file. Will fix this part. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
tomtongue commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517839588 ## core/src/test/java/org/apache/iceberg/TestSnapshotManager.java: ## @@ -177,120 +177,115 @@ public void testCherryPickFromBranch() { long lastSnapshotId = table.currentSnapshot().snapshotId(); // pick the snapshot into the current state -Assertions.assertThatThrownBy( -() -> table.manageSnapshots().cherrypick(replaceSnapshotId).commit()) +assertThatThrownBy(() -> table.manageSnapshots().cherrypick(replaceSnapshotId).commit()) .isInstanceOf(ValidationException.class) .hasMessageStartingWith( "Cannot cherry-pick overwrite not based on an ancestor of the current state"); -Assert.assertEquals( -"Failed cherry-pick should not change the table state", -lastSnapshotId, -table.currentSnapshot().snapshotId()); +assertThat(table.currentSnapshot().snapshotId()) +.as("Failed cherry-pick should not change the table state") +.isEqualTo(lastSnapshotId); validateTableFiles(table, FILE_A); } - @Test + @TestTemplate public void testCherryPickOverwrite() { table.newAppend().appendFile(FILE_A).commit(); // stage an overwrite to replace FILE_A table.newOverwrite().deleteFile(FILE_A).addFile(REPLACEMENT_FILE_A).stageOnly().commit(); Snapshot staged = Iterables.getLast(table.snapshots()); -Assert.assertEquals( -"Should find the staged overwrite snapshot", DataOperations.OVERWRITE, staged.operation()); +assertThat(staged.operation()) +.as("Should find the staged overwrite snapshot") +.isEqualTo(DataOperations.OVERWRITE); // add another append so that the original commit can't be fast-forwarded table.newAppend().appendFile(FILE_B).commit(); long lastSnapshotId = table.currentSnapshot().snapshotId(); // pick the snapshot into the current state -Assertions.assertThatThrownBy( -() -> table.manageSnapshots().cherrypick(staged.snapshotId()).commit()) +assertThatThrownBy(() -> table.manageSnapshots().cherrypick(staged.snapshotId()).commit()) .isInstanceOf(ValidationException.class) .hasMessageEndingWith("not append, dynamic overwrite, or fast-forward"); -Assert.assertEquals( -"Failed cherry-pick should not change the table state", -lastSnapshotId, -table.currentSnapshot().snapshotId()); +assertThat(table.currentSnapshot().snapshotId()) +.as("Failed cherry-pick should not change the table state") +.isEqualTo(lastSnapshotId); validateTableFiles(table, FILE_A, FILE_B); } - @Test + @TestTemplate public void testCreateBranch() { table.newAppend().appendFile(FILE_A).commit(); long snapshotId = table.currentSnapshot().snapshotId(); // Test a basic case of creating a branch table.manageSnapshots().createBranch("branch1", snapshotId).commit(); SnapshotRef expectedBranch = table.ops().refresh().ref("branch1"); -Assert.assertTrue( -expectedBranch != null -&& expectedBranch.equals(SnapshotRef.branchBuilder(snapshotId).build())); +assertThat(expectedBranch).isNotNull(); + assertThat(expectedBranch).isEqualTo(SnapshotRef.branchBuilder(snapshotId).build()); } - @Test + @TestTemplate public void testCreateBranchWithoutSnapshotId() { table.newAppend().appendFile(FILE_A).commit(); long snapshotId = table.currentSnapshot().snapshotId(); // Test a basic case of creating a branch table.manageSnapshots().createBranch("branch1").commit(); SnapshotRef actualBranch = table.ops().refresh().ref("branch1"); -Assertions.assertThat(actualBranch).isNotNull(); - Assertions.assertThat(actualBranch).isEqualTo(SnapshotRef.branchBuilder(snapshotId).build()); +assertThat(actualBranch).isNotNull(); + assertThat(actualBranch).isEqualTo(SnapshotRef.branchBuilder(snapshotId).build()); Review Comment: sure, thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]
stevenzwu commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1517846647 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapRangePartitioner.java: ## @@ -0,0 +1,378 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.flink.sink.shuffle; + +import java.util.Arrays; +import java.util.Comparator; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.NavigableMap; +import java.util.concurrent.ThreadLocalRandom; +import java.util.concurrent.TimeUnit; +import org.apache.flink.api.common.functions.Partitioner; +import org.apache.flink.table.data.RowData; +import org.apache.iceberg.Schema; +import org.apache.iceberg.SortKey; +import org.apache.iceberg.SortOrder; +import org.apache.iceberg.SortOrderComparators; +import org.apache.iceberg.StructLike; +import org.apache.iceberg.flink.FlinkSchemaUtil; +import org.apache.iceberg.flink.RowDataWrapper; +import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting; +import org.apache.iceberg.relocated.com.google.common.base.MoreObjects; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Internal partitioner implementation that supports MapDataStatistics, which is typically used for + * low-cardinality use cases. While MapDataStatistics can keep accurate counters, it can't be used + * for high-cardinality use cases. Otherwise, the memory footprint is too high. + * + * It is a greedy algorithm for bin packing. With close file cost, the calculation isn't always + * precise when calculating close cost for every file, target weight per subtask, padding residual + * weight, assigned weight without close cost. + * + * All actions should be executed in a single Flink mailbox thread. So there is no need to make + * it thread safe. + */ +class MapRangePartitioner implements Partitioner { + private static final Logger LOG = LoggerFactory.getLogger(MapRangePartitioner.class); + + private final RowDataWrapper rowDataWrapper; + private final SortKey sortKey; + private final Comparator comparator; + private final Map mapStatistics; + private final double closeFileCostInWeightPercentage; + + // Counter that tracks how many times a new key encountered + // where there is no traffic statistics learned about it. + private long newSortKeyCounter; + private long lastNewSortKeyLogTimeMilli; + + // lazily computed due to the need of numPartitions + private Map assignment; + private NavigableMap sortedStatsWithCloseFileCost; + + MapRangePartitioner( + Schema schema, + SortOrder sortOrder, + MapDataStatistics dataStatistics, + double closeFileCostInWeightPercentage) { +dataStatistics +.statistics() +.entrySet() +.forEach( +entry -> +Preconditions.checkArgument( +entry.getValue() > 0, +"Invalid statistics: weight is 0 for key %s", +entry.getKey())); + +this.rowDataWrapper = new RowDataWrapper(FlinkSchemaUtil.convert(schema), schema.asStruct()); +this.sortKey = new SortKey(schema, sortOrder); +this.comparator = SortOrderComparators.forSchema(schema, sortOrder); +this.mapStatistics = dataStatistics.statistics(); +this.closeFileCostInWeightPercentage = closeFileCostInWeightPercentage; +this.newSortKeyCounter = 0; +this.lastNewSortKeyLogTimeMilli = System.currentTimeMillis(); + } + + @Override + public int partition(RowData row, int numPartitions) { +// assignment table can only be built lazily when first referenced here, +// because number of partitions (downstream subtasks) is needed +Map assignmentMap = assignment(numPartitions); +// reuse the sortKey and rowDataWrapper +sortKey.wrap(rowDataWrapper.wrap(row)); +KeyAssignment keyAssignment = assignmentMap.get(sortKey); +if (keyAssignment
Re: [I] Add hive metastore catalog support [iceberg-rust]
Xuanwo commented on issue #113: URL: https://github.com/apache/iceberg-rust/issues/113#issuecomment-1985888081 > I 'finished' the work on all the namespace operations. Awesome! > I just want to make sure it aligns with the rest of the codebase, before I put in the effort to implement the rest of the catalog. Let's merge this PR first and then start another one. There's no need to combine them into a single large PR. In short, submit a PR and then we can polish it together. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Rename `stat_table` in `Catalog` trait [iceberg-rust]
Xuanwo commented on issue #236: URL: https://github.com/apache/iceberg-rust/issues/236#issuecomment-1985889243 I remember that `stat_table` will also return some metadata of this table. cc @Fokko @liurenjie1024, any comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
tomtongue commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517865244 ## core/src/test/java/org/apache/iceberg/TestSnapshotJson.java: ## @@ -49,12 +50,23 @@ public void testJsonConversion() throws IOException { String json = SnapshotParser.toJson(expected); Snapshot snapshot = SnapshotParser.fromJson(json); -Assert.assertEquals("Snapshot ID should match", expected.snapshotId(), snapshot.snapshotId()); -Assert.assertEquals( -"Files should match", expected.allManifests(ops.io()), snapshot.allManifests(ops.io())); -Assert.assertNull("Operation should be null", snapshot.operation()); -Assert.assertNull("Summary should be null", snapshot.summary()); -Assert.assertEquals("Schema ID should match", Integer.valueOf(1), snapshot.schemaId()); +assertThat(snapshot.snapshotId()).isEqualTo(expected.snapshotId()); +assertThat(snapshot.allManifests(ops.io())).hasSize(2); + +for (int i = 0; i < snapshot.allManifests(ops.io()).size(); i++) { Review Comment: @nastra I tried to check each element in the manifest list, but it's a bit redundant. I believe it's enough to check with `isEqualTo` between two lists. Let me fix this part and other part like this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
tomtongue commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517865244 ## core/src/test/java/org/apache/iceberg/TestSnapshotJson.java: ## @@ -49,12 +50,23 @@ public void testJsonConversion() throws IOException { String json = SnapshotParser.toJson(expected); Snapshot snapshot = SnapshotParser.fromJson(json); -Assert.assertEquals("Snapshot ID should match", expected.snapshotId(), snapshot.snapshotId()); -Assert.assertEquals( -"Files should match", expected.allManifests(ops.io()), snapshot.allManifests(ops.io())); -Assert.assertNull("Operation should be null", snapshot.operation()); -Assert.assertNull("Summary should be null", snapshot.summary()); -Assert.assertEquals("Schema ID should match", Integer.valueOf(1), snapshot.schemaId()); +assertThat(snapshot.snapshotId()).isEqualTo(expected.snapshotId()); +assertThat(snapshot.allManifests(ops.io())).hasSize(2); + +for (int i = 0; i < snapshot.allManifests(ops.io()).size(); i++) { Review Comment: @nastra I tried to check each element in the manifest list, but it's a bit redundant. I believe it's enough to check with `isEqualTo` between two lists. Let me revert this part and other part like this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Library public api isolation and import decoupling [iceberg-python]
ndrluis commented on issue #499: URL: https://github.com/apache/iceberg-python/issues/499#issuecomment-1985907062 Issue to follow up https://github.com/astral-sh/ruff/issues/10300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
tomtongue commented on code in PR #9892: URL: https://github.com/apache/iceberg/pull/9892#discussion_r1517877405 ## core/src/test/java/org/apache/iceberg/TestSnapshotManager.java: ## @@ -700,81 +689,82 @@ public void testAttemptToRollbackToCurrentSnapshot() { table.manageSnapshots().rollbackTo(currentSnapshotId).commit(); } - @Test + @TestTemplate public void testSnapshotManagerThroughTransaction() { table.newAppend().appendFile(FILE_A).commit(); Snapshot snapshotAfterFirstAppend = readMetadata().currentSnapshot(); validateSnapshot(null, snapshotAfterFirstAppend, FILE_A); table.newAppend().appendFile(FILE_B).commit(); validateSnapshot(snapshotAfterFirstAppend, readMetadata().currentSnapshot(), FILE_B); -Assert.assertEquals("Table should be on version 2 after appending twice", 2, (int) version()); +assertThat(version()).as("Table should be on version 2 after appending twice").isEqualTo(2); TableMetadata base = readMetadata(); Transaction txn = table.newTransaction(); -Assert.assertSame( -"Base metadata should not change when transaction is created", base, readMetadata()); -Assert.assertEquals( -"Table should be on version 2 after creating transaction", 2, (int) version()); +assertThat(readMetadata()) +.as("Base metadata should not change when transaction is created") +.isEqualTo(base); +assertThat(version()) +.as("Table should be on version 2 after creating transaction") +.isEqualTo(2); ManageSnapshots manageSnapshots = txn.manageSnapshots(); -Assert.assertNotNull(manageSnapshots); +assertThat(manageSnapshots).isNotNull(); -Assert.assertSame( -"Base metadata should not change when manageSnapshots is created", base, readMetadata()); -Assert.assertEquals( -"Table should be on version 2 after creating manageSnapshots", 2, (int) version()); +assertThat(readMetadata()) +.as("Base metadata should not change when manageSnapshots is created") +.isEqualTo(base); Review Comment: Okay sure. Thanks for the advice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Rename `stat_table` in `Catalog` trait [iceberg-rust]
marvinlanhenke commented on issue #236: URL: https://github.com/apache/iceberg-rust/issues/236#issuecomment-1985944355 > I remember that `stat_table` will also return some metadata of this table. I only took a look at the java & python hive implementation and I couldn't find a similiar table operation - but perhaps `stat_table` is used in a different catalag implementation that does indeed return the metadata. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Core: add missing `@Test` annotation in TestRESTCatalog [iceberg]
adutra opened a new pull request, #9899: URL: https://github.com/apache/iceberg/pull/9899 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Add hive metastore catalog support (part 1/2) [iceberg-rust]
marvinlanhenke opened a new pull request, #237: URL: https://github.com/apache/iceberg-rust/pull/237 ### Which issue does this PR close? Closes #113 partly. ### Rationale for this change Add hive metastore support, to reach feature parity with other iceberg implementations. ### What changes are included in this PR? Added test infra (hive-metastore, docker-compose) for integration test. Implemented all of the namespace operations from the `Catalog` trait. ### Are these changes tested? Yes. Unit and integration tests are included. ### Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Add hive metastore catalog support [iceberg-rust]
marvinlanhenke commented on issue #113: URL: https://github.com/apache/iceberg-rust/issues/113#issuecomment-1985986494 > In short, submit a PR and then we can polish it together. Sounds good to me - looking forward to it. Best regards. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow setting non-string typed values in `set_properties` [iceberg-python]
kevinjqliu commented on PR #504: URL: https://github.com/apache/iceberg-python/pull/504#issuecomment-1985996781 rebased after #503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518068800 ## open-api/rest-catalog-open-api.yaml: ## @@ -2068,6 +2162,145 @@ components: items: $ref: '#/components/schemas/PartitionStatisticsFile' +PlanTask: + description: +A JSON object that contains information provided by the server, +to be utilized by clients for distributed planning, should be supplied +as is for input in PlanTable operation. + type: object + +FileScanTask: + type: object + required: +- schema +- spec +- start +- length +- data-file + properties: +data-file: + $ref: '#/components/schemas/ContentFile' +partition: + type: object + additionalProperties: +type: string +size-bytes: + type: number +start: + type: number +length: + type: number +estimated-rows-count: + type: number +delete-files: + type: array + items: +$ref: '#/components/schemas/ContentFile' +schema: + $ref: '#/components/schemas/Schema' +spec: + $ref: '#/components/schemas/PartitionSpec' +residual-filter: + $ref: '#/components/schemas/Expression' + +TypeValue: + oneOf: +- $ref: '#/components/schemas/PrimitiveTypeValue' +- $ref: '#/components/schemas/MapTypeValue' +- $ref: '#/components/schemas/StructTypeValue' + +MapTypeValue: + type: object + properties: +keys: + type: array + items: +$ref: '#/components/schemas/TypeValue' +values: + type: array + items: +$ref: '#/components/schemas/TypeValue' + +StructTypeValue: + type: object + additionalProperties: +oneOf: + - $ref: '#/components/schemas/TypeValue' + +PrimitiveTypeValue: + oneOf: +- type: boolean +- type: integer +- type: integer + format: int64 +- type: number +- type: string +- type: string + format: byte +- type: array + items: +$ref: '#/components/schemas/TypeValue' + +ContentFile: + type: object + required: +- spec-id +- content +- file-path +- file-format +- file-size-in-bytes +- record-count + properties: +spec-id: + type: integer +content: + $ref: '#/components/schemas/FileContent' +file-path: + type: string +file-format: + type: string +partition: + $ref: '#/components/schemas/StructTypeValue' +file-size-in-bytes: + type: integer + format: int64 +record-count: + type: integer + format: int64 +column-sizes: + $ref: '#/components/schemas/MapTypeValue' +value-counts: + $ref: '#/components/schemas/MapTypeValue' +null-value-counts: + $ref: '#/components/schemas/MapTypeValue' +nan-value-counts: + $ref: '#/components/schemas/MapTypeValue' +lower-bounds: + $ref: '#/components/schemas/MapTypeValue' +upper-bounds: + $ref: '#/components/schemas/MapTypeValue' +key-metadata: + type: string + format: byte +split-offsets: + type: array + items: +type: integer +format: int64 +equality-ids: + type: array + items: +type: integer +sort-order-id: Review Comment: Looks like we kept this in the ContentFile pr: https://github.com/apache/iceberg/commit/bb53c3d4e0e27ac6706803c2371793ad2476ae04 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518068800 ## open-api/rest-catalog-open-api.yaml: ## @@ -2068,6 +2162,145 @@ components: items: $ref: '#/components/schemas/PartitionStatisticsFile' +PlanTask: + description: +A JSON object that contains information provided by the server, +to be utilized by clients for distributed planning, should be supplied +as is for input in PlanTable operation. + type: object + +FileScanTask: + type: object + required: +- schema +- spec +- start +- length +- data-file + properties: +data-file: + $ref: '#/components/schemas/ContentFile' +partition: + type: object + additionalProperties: +type: string +size-bytes: + type: number +start: + type: number +length: + type: number +estimated-rows-count: + type: number +delete-files: + type: array + items: +$ref: '#/components/schemas/ContentFile' +schema: + $ref: '#/components/schemas/Schema' +spec: + $ref: '#/components/schemas/PartitionSpec' +residual-filter: + $ref: '#/components/schemas/Expression' + +TypeValue: + oneOf: +- $ref: '#/components/schemas/PrimitiveTypeValue' +- $ref: '#/components/schemas/MapTypeValue' +- $ref: '#/components/schemas/StructTypeValue' + +MapTypeValue: + type: object + properties: +keys: + type: array + items: +$ref: '#/components/schemas/TypeValue' +values: + type: array + items: +$ref: '#/components/schemas/TypeValue' + +StructTypeValue: + type: object + additionalProperties: +oneOf: + - $ref: '#/components/schemas/TypeValue' + +PrimitiveTypeValue: + oneOf: +- type: boolean +- type: integer +- type: integer + format: int64 +- type: number +- type: string +- type: string + format: byte +- type: array + items: +$ref: '#/components/schemas/TypeValue' + +ContentFile: + type: object + required: +- spec-id +- content +- file-path +- file-format +- file-size-in-bytes +- record-count + properties: +spec-id: + type: integer +content: + $ref: '#/components/schemas/FileContent' +file-path: + type: string +file-format: + type: string +partition: + $ref: '#/components/schemas/StructTypeValue' +file-size-in-bytes: + type: integer + format: int64 +record-count: + type: integer + format: int64 +column-sizes: + $ref: '#/components/schemas/MapTypeValue' +value-counts: + $ref: '#/components/schemas/MapTypeValue' +null-value-counts: + $ref: '#/components/schemas/MapTypeValue' +nan-value-counts: + $ref: '#/components/schemas/MapTypeValue' +lower-bounds: + $ref: '#/components/schemas/MapTypeValue' +upper-bounds: + $ref: '#/components/schemas/MapTypeValue' +key-metadata: + type: string + format: byte +split-offsets: + type: array + items: +type: integer +format: int64 +equality-ids: + type: array + items: +type: integer +sort-order-id: Review Comment: Looks like we kept this in Drews ContentFile pr: https://github.com/apache/iceberg/commit/bb53c3d4e0e27ac6706803c2371793ad2476ae04 so I believe we can keep this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518075237 ## open-api/rest-catalog-open-api.yaml: ## @@ -2068,6 +2162,145 @@ components: items: $ref: '#/components/schemas/PartitionStatisticsFile' +PlanTask: + description: +A JSON object that contains information provided by the server, +to be utilized by clients for distributed planning, should be supplied +as is for input in PlanTable operation. + type: object + +FileScanTask: + type: object + required: +- schema +- spec +- start +- length +- data-file + properties: +data-file: + $ref: '#/components/schemas/ContentFile' +partition: + type: object + additionalProperties: +type: string +size-bytes: + type: number +start: + type: number +length: + type: number +estimated-rows-count: + type: number +delete-files: + type: array + items: +$ref: '#/components/schemas/ContentFile' +schema: + $ref: '#/components/schemas/Schema' +spec: + $ref: '#/components/schemas/PartitionSpec' +residual-filter: + $ref: '#/components/schemas/Expression' + +TypeValue: + oneOf: +- $ref: '#/components/schemas/PrimitiveTypeValue' +- $ref: '#/components/schemas/MapTypeValue' +- $ref: '#/components/schemas/StructTypeValue' + +MapTypeValue: + type: object + properties: +keys: + type: array + items: +$ref: '#/components/schemas/TypeValue' +values: + type: array + items: +$ref: '#/components/schemas/TypeValue' + +StructTypeValue: + type: object + additionalProperties: +oneOf: + - $ref: '#/components/schemas/TypeValue' + +PrimitiveTypeValue: + oneOf: +- type: boolean +- type: integer +- type: integer + format: int64 +- type: number +- type: string +- type: string + format: byte +- type: array + items: +$ref: '#/components/schemas/TypeValue' + +ContentFile: + type: object + required: +- spec-id Review Comment: in the recent content file pr that was merged: https://github.com/apache/iceberg/commit/bb53c3d4e0e27ac6706803c2371793ad2476ae04 we have `spec-id` as required but not `partition` so can we resolve this thread? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518075237 ## open-api/rest-catalog-open-api.yaml: ## @@ -2068,6 +2162,145 @@ components: items: $ref: '#/components/schemas/PartitionStatisticsFile' +PlanTask: + description: +A JSON object that contains information provided by the server, +to be utilized by clients for distributed planning, should be supplied +as is for input in PlanTable operation. + type: object + +FileScanTask: + type: object + required: +- schema +- spec +- start +- length +- data-file + properties: +data-file: + $ref: '#/components/schemas/ContentFile' +partition: + type: object + additionalProperties: +type: string +size-bytes: + type: number +start: + type: number +length: + type: number +estimated-rows-count: + type: number +delete-files: + type: array + items: +$ref: '#/components/schemas/ContentFile' +schema: + $ref: '#/components/schemas/Schema' +spec: + $ref: '#/components/schemas/PartitionSpec' +residual-filter: + $ref: '#/components/schemas/Expression' + +TypeValue: + oneOf: +- $ref: '#/components/schemas/PrimitiveTypeValue' +- $ref: '#/components/schemas/MapTypeValue' +- $ref: '#/components/schemas/StructTypeValue' + +MapTypeValue: + type: object + properties: +keys: + type: array + items: +$ref: '#/components/schemas/TypeValue' +values: + type: array + items: +$ref: '#/components/schemas/TypeValue' + +StructTypeValue: + type: object + additionalProperties: +oneOf: + - $ref: '#/components/schemas/TypeValue' + +PrimitiveTypeValue: + oneOf: +- type: boolean +- type: integer +- type: integer + format: int64 +- type: number +- type: string +- type: string + format: byte +- type: array + items: +$ref: '#/components/schemas/TypeValue' + +ContentFile: + type: object + required: +- spec-id Review Comment: in the recent content file pr that was merged: https://github.com/apache/iceberg/commit/bb53c3d4e0e27ac6706803c2371793ad2476ae04 we have `spec-id` only as required so can we resolve this thread? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518068800 ## open-api/rest-catalog-open-api.yaml: ## @@ -2068,6 +2162,145 @@ components: items: $ref: '#/components/schemas/PartitionStatisticsFile' +PlanTask: + description: +A JSON object that contains information provided by the server, +to be utilized by clients for distributed planning, should be supplied +as is for input in PlanTable operation. + type: object + +FileScanTask: + type: object + required: +- schema +- spec +- start +- length +- data-file + properties: +data-file: + $ref: '#/components/schemas/ContentFile' +partition: + type: object + additionalProperties: +type: string +size-bytes: + type: number +start: + type: number +length: + type: number +estimated-rows-count: + type: number +delete-files: + type: array + items: +$ref: '#/components/schemas/ContentFile' +schema: + $ref: '#/components/schemas/Schema' +spec: + $ref: '#/components/schemas/PartitionSpec' +residual-filter: + $ref: '#/components/schemas/Expression' + +TypeValue: + oneOf: +- $ref: '#/components/schemas/PrimitiveTypeValue' +- $ref: '#/components/schemas/MapTypeValue' +- $ref: '#/components/schemas/StructTypeValue' + +MapTypeValue: + type: object + properties: +keys: + type: array + items: +$ref: '#/components/schemas/TypeValue' +values: + type: array + items: +$ref: '#/components/schemas/TypeValue' + +StructTypeValue: + type: object + additionalProperties: +oneOf: + - $ref: '#/components/schemas/TypeValue' + +PrimitiveTypeValue: + oneOf: +- type: boolean +- type: integer +- type: integer + format: int64 +- type: number +- type: string +- type: string + format: byte +- type: array + items: +$ref: '#/components/schemas/TypeValue' + +ContentFile: + type: object + required: +- spec-id +- content +- file-path +- file-format +- file-size-in-bytes +- record-count + properties: +spec-id: + type: integer +content: + $ref: '#/components/schemas/FileContent' +file-path: + type: string +file-format: + type: string +partition: + $ref: '#/components/schemas/StructTypeValue' +file-size-in-bytes: + type: integer + format: int64 +record-count: + type: integer + format: int64 +column-sizes: + $ref: '#/components/schemas/MapTypeValue' +value-counts: + $ref: '#/components/schemas/MapTypeValue' +null-value-counts: + $ref: '#/components/schemas/MapTypeValue' +nan-value-counts: + $ref: '#/components/schemas/MapTypeValue' +lower-bounds: + $ref: '#/components/schemas/MapTypeValue' +upper-bounds: + $ref: '#/components/schemas/MapTypeValue' +key-metadata: + type: string + format: byte +split-offsets: + type: array + items: +type: integer +format: int64 +equality-ids: + type: array + items: +type: integer +sort-order-id: Review Comment: Looks like we kept this in the ContentFile pr: https://github.com/apache/iceberg/commit/bb53c3d4e0e27ac6706803c2371793ad2476ae04 , so wondering if we can resolve this thread -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518113629 ## open-api/rest-catalog-open-api.yaml: ## @@ -537,6 +537,108 @@ paths: 5XX: $ref: '#/components/responses/ServerErrorResponse' + /v1/{prefix}/namespaces/{namespace}/tables/{table}/preplan: +parameters: + - $ref: '#/components/parameters/prefix' + - $ref: '#/components/parameters/namespace' + - $ref: '#/components/parameters/table' +post: + tags: +- Catalog API + summary: Find plan-tasks based on a plan context. Review Comment: Will fix this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518118829 ## open-api/rest-catalog-open-api.yaml: ## @@ -3789,6 +4097,21 @@ components: EmptyResponse: $ref: '#/components/examples/ListNamespacesEmptyExample' +PayLoadTooLargeResponse: Review Comment: will fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Getting java.lang.NoSuchMethodError org.apache.iceberg.util.JsonUtil.mapper() With Hive catalog [iceberg]
ismailsimsek commented on issue #9848: URL: https://github.com/apache/iceberg/issues/9848#issuecomment-1986226824 Fixed https://github.com/memiiso/debezium-server-iceberg/issues/271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Getting java.lang.NoSuchMethodError org.apache.iceberg.util.JsonUtil.mapper() With Hive catalog [iceberg]
ismailsimsek closed issue #9848: Getting java.lang.NoSuchMethodError org.apache.iceberg.util.JsonUtil.mapper() With Hive catalog URL: https://github.com/apache/iceberg/issues/9848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feature make oauth `audience` configurable [iceberg]
flyrain commented on code in PR #9839: URL: https://github.com/apache/iceberg/pull/9839#discussion_r1518181685 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -471,7 +515,13 @@ static void setTokenRefreshNumRetries(int retries) { */ public static AuthSession empty() { return new AuthSession( - ImmutableMap.of(), null, null, null, OAuth2Properties.CATALOG_SCOPE, null); + ImmutableMap.of(), + null, + null, + null, + OAuth2Properties.CATALOG_SCOPE, + null, + ImmutableMap.of()); Review Comment: Can we generate the request class(e.g., [OAuthClientCredentialsRequest](https://github.com/apache/iceberg/blob/bcbcbb263ea7e13ab22d0feb918e207c7e42dbbd/open-api/rest-catalog-open-api.yaml#L2919) ) from the spec so that we can pass a request object with all required and optional params? By doing that, we will always make sure the impl. synced with spec. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518181730 ## open-api/rest-catalog-open-api.yaml: ## @@ -3789,6 +4097,21 @@ components: EmptyResponse: $ref: '#/components/examples/ListNamespacesEmptyExample' +PayLoadTooLargeResponse: + description: +Payload is too large, clients should leverage the preplan endpoint and provide a plan-task before doing a plan. Review Comment: Will make description generic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518182591 ## open-api/rest-catalog-open-api.yaml: ## @@ -537,6 +537,108 @@ paths: 5XX: $ref: '#/components/responses/ServerErrorResponse' + /v1/{prefix}/namespaces/{namespace}/tables/{table}/preplan: +parameters: + - $ref: '#/components/parameters/prefix' + - $ref: '#/components/parameters/namespace' + - $ref: '#/components/parameters/table' +post: + tags: +- Catalog API + summary: Find plan-tasks based on a plan context. + description: +Scan pre-planning creates a set of opaque planning tasks for a set of scan configuration options. +Each task can be passed to the plan endpoint to fetch a (disjoint) subset of the file scan tasks for the scan. + +Scan pre-planning enables breaking scan planning across multiple tasks. +This can be used to parallelize scan planning requests, use fewer resources in each planning request, +or to delay parts of scan planning that may not be needed. Review Comment: will fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518183297 ## open-api/rest-catalog-open-api.yaml: ## @@ -537,6 +537,108 @@ paths: 5XX: $ref: '#/components/responses/ServerErrorResponse' + /v1/{prefix}/namespaces/{namespace}/tables/{table}/preplan: +parameters: + - $ref: '#/components/parameters/prefix' + - $ref: '#/components/parameters/namespace' + - $ref: '#/components/parameters/table' +post: + tags: +- Catalog API + summary: Find plan-tasks based on a plan context. + description: +Scan pre-planning creates a set of opaque planning tasks for a set of scan configuration options. +Each task can be passed to the plan endpoint to fetch a (disjoint) subset of the file scan tasks for the scan. + +Scan pre-planning enables breaking scan planning across multiple tasks. +This can be used to parallelize scan planning requests, use fewer resources in each planning request, +or to delay parts of scan planning that may not be needed. + operationId: PreplanTable + requestBody: +content: + application/json: +schema: + $ref: '#/components/schemas/PreplanTableRequest' + responses: +200: + $ref: '#/components/responses/PreplanTableResponse' +400: + $ref: '#/components/responses/BadRequestErrorResponse' +401: + $ref: '#/components/responses/UnauthorizedResponse' +403: + $ref: '#/components/responses/ForbiddenResponse' +404: + description: +Not Found +- NoSuchTableException, the table does not exist +- NoSuchNamespaceException, the namespace does not exist + content: +application/json: + schema: +$ref: '#/components/schemas/ErrorModel' + examples: +TableDoesNotExist: + $ref: '#/components/examples/NoSuchTableError' +NamespaceDoesNotExist: + $ref: '#/components/examples/NoSuchNamespaceError' +419: + $ref: '#/components/responses/AuthenticationTimeoutResponse' +503: + $ref: '#/components/responses/ServiceUnavailableResponse' +5XX: + $ref: '#/components/responses/ServerErrorResponse' + + /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan: +parameters: + - $ref: '#/components/parameters/prefix' + - $ref: '#/components/parameters/namespace' + - $ref: '#/components/parameters/table' +post: + tags: +- Catalog API + summary: Initiate plan on table to find relevant file scan tasks for a specific query. Review Comment: will fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518184139 ## open-api/rest-catalog-open-api.yaml: ## @@ -537,6 +537,108 @@ paths: 5XX: $ref: '#/components/responses/ServerErrorResponse' + /v1/{prefix}/namespaces/{namespace}/tables/{table}/preplan: +parameters: + - $ref: '#/components/parameters/prefix' + - $ref: '#/components/parameters/namespace' + - $ref: '#/components/parameters/table' +post: + tags: +- Catalog API + summary: Find plan-tasks based on a plan context. + description: +Scan pre-planning creates a set of opaque planning tasks for a set of scan configuration options. +Each task can be passed to the plan endpoint to fetch a (disjoint) subset of the file scan tasks for the scan. + +Scan pre-planning enables breaking scan planning across multiple tasks. +This can be used to parallelize scan planning requests, use fewer resources in each planning request, +or to delay parts of scan planning that may not be needed. + operationId: PreplanTable + requestBody: +content: + application/json: +schema: + $ref: '#/components/schemas/PreplanTableRequest' + responses: +200: + $ref: '#/components/responses/PreplanTableResponse' +400: + $ref: '#/components/responses/BadRequestErrorResponse' +401: + $ref: '#/components/responses/UnauthorizedResponse' +403: + $ref: '#/components/responses/ForbiddenResponse' +404: + description: +Not Found +- NoSuchTableException, the table does not exist +- NoSuchNamespaceException, the namespace does not exist + content: +application/json: + schema: +$ref: '#/components/schemas/ErrorModel' + examples: +TableDoesNotExist: + $ref: '#/components/examples/NoSuchTableError' +NamespaceDoesNotExist: + $ref: '#/components/examples/NoSuchNamespaceError' +419: + $ref: '#/components/responses/AuthenticationTimeoutResponse' +503: + $ref: '#/components/responses/ServiceUnavailableResponse' +5XX: + $ref: '#/components/responses/ServerErrorResponse' + + /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan: +parameters: + - $ref: '#/components/parameters/prefix' + - $ref: '#/components/parameters/namespace' + - $ref: '#/components/parameters/table' +post: + tags: +- Catalog API + summary: Initiate plan on table to find relevant file scan tasks for a specific query. + operationId: PlanTable + description: +When a user submits a query, a plan will be initiated to find the relevant file scan tasks +based on the user's selected columns, and filters. For enhanced query performance, +users can provide a plan-task which will distribute the planning work. Review Comment: will fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518068800 ## open-api/rest-catalog-open-api.yaml: ## @@ -2068,6 +2162,145 @@ components: items: $ref: '#/components/schemas/PartitionStatisticsFile' +PlanTask: + description: +A JSON object that contains information provided by the server, +to be utilized by clients for distributed planning, should be supplied +as is for input in PlanTable operation. + type: object + +FileScanTask: + type: object + required: +- schema +- spec +- start +- length +- data-file + properties: +data-file: + $ref: '#/components/schemas/ContentFile' +partition: + type: object + additionalProperties: +type: string +size-bytes: + type: number +start: + type: number +length: + type: number +estimated-rows-count: + type: number +delete-files: + type: array + items: +$ref: '#/components/schemas/ContentFile' +schema: + $ref: '#/components/schemas/Schema' +spec: + $ref: '#/components/schemas/PartitionSpec' +residual-filter: + $ref: '#/components/schemas/Expression' + +TypeValue: + oneOf: +- $ref: '#/components/schemas/PrimitiveTypeValue' +- $ref: '#/components/schemas/MapTypeValue' +- $ref: '#/components/schemas/StructTypeValue' + +MapTypeValue: + type: object + properties: +keys: + type: array + items: +$ref: '#/components/schemas/TypeValue' +values: + type: array + items: +$ref: '#/components/schemas/TypeValue' + +StructTypeValue: + type: object + additionalProperties: +oneOf: + - $ref: '#/components/schemas/TypeValue' + +PrimitiveTypeValue: + oneOf: +- type: boolean +- type: integer +- type: integer + format: int64 +- type: number +- type: string +- type: string + format: byte +- type: array + items: +$ref: '#/components/schemas/TypeValue' + +ContentFile: + type: object + required: +- spec-id +- content +- file-path +- file-format +- file-size-in-bytes +- record-count + properties: +spec-id: + type: integer +content: + $ref: '#/components/schemas/FileContent' +file-path: + type: string +file-format: + type: string +partition: + $ref: '#/components/schemas/StructTypeValue' +file-size-in-bytes: + type: integer + format: int64 +record-count: + type: integer + format: int64 +column-sizes: + $ref: '#/components/schemas/MapTypeValue' +value-counts: + $ref: '#/components/schemas/MapTypeValue' +null-value-counts: + $ref: '#/components/schemas/MapTypeValue' +nan-value-counts: + $ref: '#/components/schemas/MapTypeValue' +lower-bounds: + $ref: '#/components/schemas/MapTypeValue' +upper-bounds: + $ref: '#/components/schemas/MapTypeValue' +key-metadata: + type: string + format: byte +split-offsets: + type: array + items: +type: integer +format: int64 +equality-ids: + type: array + items: +type: integer +sort-order-id: Review Comment: Looks like we kept this in the ContentFile pr: https://github.com/apache/iceberg/commit/bb53c3d4e0e27ac6706803c2371793ad2476ae04 , so wondering if we can resolve this thread? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1517180435 ## open-api/rest-catalog-open-api.yaml: ## @@ -2838,6 +3093,59 @@ components: additionalProperties: type: string +PreplanTableRequest: + type: object + required: +- select +- filter +- case-sensitive + properties: +select: + description: A list of the selected column names + type: array + items: +type: string +filter: + $ref: '#/components/schemas/Expression' +case-sensitive: + description: Indicates whether column selection and filtering should be case sensitive + type: boolean +snapshot-id: + description: an int64 snapshot ID (if snapshot-range is not present); optional and defaults to the table's current snapshot + type: integer + format: int64 +snapshot-range: + description: a JSON list containing exactly 2 int64 snapshot IDs (if snapshot-id is not present) representing the start (exclusive) and end (inclusive) snapshots Review Comment: > Isn't a range typically inclusive start and exclusive end? why it is reversed here? I am starting to feel maybe using start-snapshot-id and end-snapshot-id might be a better choice to be less ambiguous, and those values can also be null to indicate no start or no end. I think @stevenzwu had also brought this up for the `IncrementalScan` for having inclusive and exclusive for the from/start snapshot. https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/IncrementalScan.java#L34 On further looking at the incrementalScan it seems that for to/end snapshot it is only inclusive https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/IncrementalScan.java#L87 ``` ThisT toSnapshot(long toSnapshotId); /** * Instructs this scan to look for changes up to a particular snapshot ref (inclusive * ``` So should we have a boolean flag to tell if from/start is inclusive or exclusive? And for end we always make it inclusive? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
rahil-c commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1517180435 ## open-api/rest-catalog-open-api.yaml: ## @@ -2838,6 +3093,59 @@ components: additionalProperties: type: string +PreplanTableRequest: + type: object + required: +- select +- filter +- case-sensitive + properties: +select: + description: A list of the selected column names + type: array + items: +type: string +filter: + $ref: '#/components/schemas/Expression' +case-sensitive: + description: Indicates whether column selection and filtering should be case sensitive + type: boolean +snapshot-id: + description: an int64 snapshot ID (if snapshot-range is not present); optional and defaults to the table's current snapshot + type: integer + format: int64 +snapshot-range: + description: a JSON list containing exactly 2 int64 snapshot IDs (if snapshot-id is not present) representing the start (exclusive) and end (inclusive) snapshots Review Comment: > Isn't a range typically inclusive start and exclusive end? why it is reversed here? I am starting to feel maybe using start-snapshot-id and end-snapshot-id might be a better choice to be less ambiguous, and those values can also be null to indicate no start or no end. I think @stevenzwu had also brought this up for the `IncrementalScan` for having inclusive and exclusive for the from/start snapshot. https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/IncrementalScan.java#L34 On further looking at the incrementalScan it seems that for to/end snapshot it is only inclusive https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/IncrementalScan.java#L87 ``` ThisT toSnapshot(long toSnapshotId); /** * Instructs this scan to look for changes up to a particular snapshot ref (inclusive * ``` So should we have a boolean flag to tell if from/start is inclusive or exclusive? And for end we always make it inclusive? What do you think @rdblue ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
jackye1995 commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518328719 ## open-api/rest-catalog-open-api.yaml: ## @@ -2106,6 +2204,41 @@ components: items: $ref: '#/components/schemas/PartitionStatisticsFile' +PlanTask: + description: +A JSON object that contains information provided by the server, +to be utilized by clients for distributed planning, should be supplied +as is for input in PlanTable operation. Review Comment: ```suggestion An opaque JSON object that contains information provided by the REST server, to be utilized by clients for distributed table scan planning, should be supplied as is for input in `PlanTable` operation. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
jackye1995 commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518329140 ## open-api/rest-catalog-open-api.yaml: ## @@ -2068,6 +2162,145 @@ components: items: $ref: '#/components/schemas/PartitionStatisticsFile' +PlanTask: + description: +A JSON object that contains information provided by the server, +to be utilized by clients for distributed planning, should be supplied +as is for input in PlanTable operation. + type: object + +FileScanTask: + type: object + required: +- schema +- spec +- start +- length +- data-file + properties: +data-file: + $ref: '#/components/schemas/ContentFile' +partition: + type: object + additionalProperties: +type: string +size-bytes: + type: number +start: + type: number +length: Review Comment: This comment seems not addressed? I agree that for scanning an entire file, length is not needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
jackye1995 commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518330379 ## open-api/rest-catalog-open-api.yaml: ## @@ -2106,6 +2204,41 @@ components: items: $ref: '#/components/schemas/PartitionStatisticsFile' +PlanTask: + description: +A JSON object that contains information provided by the server, +to be utilized by clients for distributed planning, should be supplied +as is for input in PlanTable operation. + type: object + +FileScanTask: + type: object + required: +- schema +- spec +- start +- length +- data-file + properties: +data-file: + $ref: '#/components/schemas/ContentFile' +start: + type: number +length: + type: number +delete-files: + type: array + items: +$ref: '#/components/schemas/ContentFile' +schema: + description: Table Schema + $ref: '#/components/schemas/Schema' +spec: + $ref: '#/components/schemas/PartitionSpec' +residual-filter: + description: An optional residual filter provided by a service, if not present clients shall calculate this residual or use the original filter. Review Comment: ```suggestion description: filters that are not fully applied by the server and should be pushed down to the file reader when reading the file. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
jackye1995 commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518331988 ## open-api/rest-catalog-open-api.yaml: ## @@ -3789,6 +3995,21 @@ components: EmptyResponse: $ref: '#/components/examples/ListNamespacesEmptyExample' +PayloadTooLargeResponse: + description: +Payload is too large. + content: +application/json: + schema: +$ref: '#/components/schemas/IcebergErrorResponse' + example: { +"error": { + "message": " Payload is too large, client should obtain a plan-task by using the preplan endpoint.", Review Comment: ```suggestion "message": " Payload is too large for PlanTable, please call PreplanTable to distribute table scan planning", ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
jackye1995 commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518332560 ## open-api/rest-catalog-open-api.yaml: ## @@ -3804,6 +4025,21 @@ components: } } +UnprocessableEntityResponse: + description: +Clients are required to do preplanning and provide a plan-task, before initiating a plan. Review Comment: Similar comment to PayloadTooLarge, this description should be generic. We can make the example specific to PlanTable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]
jackye1995 commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1518375778 ## open-api/rest-catalog-open-api.yaml: ## @@ -2838,6 +3093,59 @@ components: additionalProperties: type: string +PreplanTableRequest: + type: object + required: +- select +- filter +- case-sensitive + properties: +select: + description: A list of the selected column names + type: array + items: +type: string +filter: + $ref: '#/components/schemas/Expression' +case-sensitive: + description: Indicates whether column selection and filtering should be case sensitive + type: boolean +snapshot-id: + description: an int64 snapshot ID (if snapshot-range is not present); optional and defaults to the table's current snapshot + type: integer + format: int64 +snapshot-range: + description: a JSON list containing exactly 2 int64 snapshot IDs (if snapshot-id is not present) representing the start (exclusive) and end (inclusive) snapshots Review Comment: btw just to clarify my comment, I definitely agree exclusive start and inclusive end is the most common use case in incremental pipelines. I just think it might confuse user because the common "range" in programming language has been inclusive start exclusive end, and not everyone would read the description carefully. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Minor: Parse rowGroupSize as a long [iceberg]
hussein-awala opened a new pull request, #9900: URL: https://github.com/apache/iceberg/pull/9900 `withRowGroupSize(int rowGroupSize)` is deprecated in `ParquetWriter` and currently, the max possible rowGroupSize in Iceberg is 2GB (max Java int) which will be fixed after this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.4, 3.3 : support output-spec-id in rewrite (back port #9803) [iceberg]
himadripal commented on PR #9901: URL: https://github.com/apache/iceberg/pull/9901#issuecomment-1986580223 @RussellSpitzer @aokolnychyi @flyrain @szehon-ho -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Spark 3.4, 3.3 : support output-spec-id in rewrite (back port #9803) [iceberg]
himadripal opened a new pull request, #9901: URL: https://github.com/apache/iceberg/pull/9901 Backport #9803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Migrate Snapshot files in Core to JUnit5 [iceberg]
tomtongue commented on PR #9892: URL: https://github.com/apache/iceberg/pull/9892#issuecomment-1986579621 @nastra Thanks for the review. Reflected your comments. Could you review the new one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Flink 1.11.1 Not Supoort timestamptz [iceberg]
github-actions[bot] commented on issue #1915: URL: https://github.com/apache/iceberg/issues/1915#issuecomment-1986587888 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Spark Table With Timestamp Cannot Query By Flink [iceberg]
github-actions[bot] commented on issue #1916: URL: https://github.com/apache/iceberg/issues/1916#issuecomment-1986587898 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org