Re: [PR] Open-API: Refactor updates with discriminator [iceberg]
Fokko merged PR #9240: URL: https://github.com/apache/iceberg/pull/9240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422076161 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -239,33 +240,24 @@ public boolean dropTable(TableIdentifier identifier, boolean purge) { TableReference tableReference = parseTableReference(identifier); return client .withReference(tableReference.getReference(), tableReference.getHash()) -.dropTable(identifierWithoutTableReference(identifier, tableReference), purge); +.dropTable(identifierWithoutTableReference(identifier, tableReference), false); Review Comment: @ajantha-bhat shouldn't this keep the `purge` flag? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] iceberg-spark: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]
nastra commented on issue #9086: URL: https://github.com/apache/iceberg/issues/9086#issuecomment-1849563358 For parameterized testing we would need to have https://github.com/apache/iceberg/issues/9210 implemented first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]
nastra commented on code in PR #9161: URL: https://github.com/apache/iceberg/pull/9161#discussion_r1422097196 ## LICENSE: ## @@ -323,3 +323,15 @@ This product includes code from Apache HttpComponents Client. Copyright: 1999-2022 The Apache Software Foundation. Home page: https://hc.apache.org/ License: https://www.apache.org/licenses/LICENSE-2.0 + + + +This product includes code from Apache Flink. + +* Parameterized test at class level logic in ParameterizedTestExtension.java +* Parameter provider annotation for parameterized tests in Parameters.java +* Parameter field annotation for parameterized tests in Parameter.java + +Copyright: 1999-2022 The Apache Software Foundation. +Home page: https://hc.apache.org/ Review Comment: ```suggestion Home page: https://flink.apache.org/ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]
nastra commented on code in PR #9161: URL: https://github.com/apache/iceberg/pull/9161#discussion_r1422099227 ## api/src/test/java/org/apache/iceberg/Parameter.java: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; +import org.junit.runners.Parameterized; + +/** + * The annotation is used to replace {@link Parameterized.Parameter} for Junit 5 parameterized + * tests. Review Comment: The javadoc for this and the other classes should mention that this is taken from the Flink codebase (with a link pointing to it) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]
nastra commented on code in PR #9161: URL: https://github.com/apache/iceberg/pull/9161#discussion_r1422100956 ## api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java: ## @@ -0,0 +1,245 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg; + +import java.lang.reflect.Field; +import java.lang.reflect.Method; +import java.text.MessageFormat; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.List; +import java.util.function.Function; +import java.util.stream.Stream; +import org.assertj.core.util.Preconditions; +import org.junit.jupiter.api.TestTemplate; +import org.junit.jupiter.api.extension.BeforeEachCallback; +import org.junit.jupiter.api.extension.Extension; +import org.junit.jupiter.api.extension.ExtensionContext; +import org.junit.jupiter.api.extension.ParameterContext; +import org.junit.jupiter.api.extension.ParameterResolutionException; +import org.junit.jupiter.api.extension.ParameterResolver; +import org.junit.jupiter.api.extension.TestTemplateInvocationContext; +import org.junit.jupiter.api.extension.TestTemplateInvocationContextProvider; +import org.junit.platform.commons.support.AnnotationSupport; +import org.junit.platform.commons.support.HierarchyTraversalMode; + +/** + * This extension is used to implement parameterized tests for Junit 5 to replace Parameterized in + * Junit4. + * + * When use this extension, all tests must be annotated by {@link TestTemplate}. + */ +public class ParameterizedTestExtension implements TestTemplateInvocationContextProvider { + + private static final ExtensionContext.Namespace NAMESPACE = + ExtensionContext.Namespace.create("parameterized"); + private static final String PARAMETERS_STORE_KEY = "parameters"; + private static final String PARAMETER_FIELD_STORE_KEY_PREFIX = "parameterField_"; + private static final String INDEX_TEMPLATE = "{index}"; + + @Override + public boolean supportsTestTemplate(ExtensionContext context) { +return true; + } + + @Override + public Stream provideTestTemplateInvocationContexts( + ExtensionContext context) { + +// Search method annotated with @Parameters +final List parameterProviders = +AnnotationSupport.findAnnotatedMethods( +context.getRequiredTestClass(), Parameters.class, HierarchyTraversalMode.TOP_DOWN); +if (parameterProviders.isEmpty()) { + throw new IllegalStateException("Cannot find any parameter provider"); +} +if (parameterProviders.size() > 1) { + throw new IllegalStateException("Multiple parameter providers are found"); +} + +Method parameterProvider = parameterProviders.get(0); +// Get potential test name +String testNameTemplate = parameterProvider.getAnnotation(Parameters.class).name(); + +// Get parameter values +final Object parameterValues; +try { + parameterProvider.setAccessible(true); + parameterValues = parameterProvider.invoke(null); + context.getStore(NAMESPACE).put(PARAMETERS_STORE_KEY, parameterValues); +} catch (Exception e) { + throw new IllegalStateException("Failed to invoke parameter provider", e); +} +Preconditions.checkState(parameterValues != null, "Parameter values cannot be null"); + +// Parameter values could be Object[][] +if (parameterValues instanceof Object[][]) { + Object[][] typedParameterValues = (Object[][]) parameterValues; + return createContextForParameters( + Arrays.stream(typedParameterValues), testNameTemplate, context); +} + +// or a Collection +if (parameterValues instanceof Collection) { + final Collection typedParameterValues = (Collection) parameterValues; + final Stream parameterValueStream = + typedParameterValues.stream() + .map( + (Function) + parameterValue -> { +if (parameterValue instanceof Object[]) { + return (Object[]) parameterValue; +} else { + return new Object[] {parameterValue}; +} + }); +
Re: [I] iceberg-mr: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]
nastra closed issue #9083: iceberg-mr: Switch tests to JUnit5 + AssertJ-style assertions URL: https://github.com/apache/iceberg/issues/9083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Switch to junit5 for mr [iceberg]
nastra merged PR #9241: URL: https://github.com/apache/iceberg/pull/9241 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Modernize the readme's Status paragraph [iceberg]
mt-ronkorving opened a new pull request, #9272: URL: https://github.com/apache/iceberg/pull/9272 I've updated the ReadMe's Status paragraph to be more inline with the reality today and hopefully won't have to be kept up-to-date going forward. Please let me know if there are any inaccuracies in the text, or stylistic changes anyone would like to see. Thank you. Closes #9127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] AWS access denied from metadata file on S3 [iceberg-python]
itaise opened a new issue, #201: URL: https://github.com/apache/iceberg-python/issues/201 ### Apache Iceberg version 0.1.0 ### Please describe the bug 🐞 Hi, I am trying to read a table schema and getting permission denied. This is the code that I use: ``` from pyiceberg.catalog import load_catalog catalog = load_catalog("prod") table = catalog.load_table("stores.checkout") table ``` The code fails when trying to open the metadata file which is in S3 [On this line](https://github.com/apache/iceberg-python/blob/main/pyiceberg/catalog/hive.py#L240) I am getting the following error (I removed the file name and bucket name but they are the correct ones): (, OSError("When reading information for key 'X' in bucket 'X': AWS Error ACCESS_DENIED during HeadObject operation: No response body."), ) I checked that the credentials have HeadObject permission to this file. The Env variables which I set are: PYICEBERG_CATALOG__DEFAULT__S3__ACCESS_KEY_ID PYICEBERG_CATALOG__DEFAULT__S3__SECRET_ACCESS_KEY PYICEBERG_CATALOG__PROD__URI Can someone please assist? Thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] The "Status" paragraph in the readme seems very outdated [iceberg]
ronkorving commented on issue #9127: URL: https://github.com/apache/iceberg/issues/9127#issuecomment-1849677224 PR is ready for review ^ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Data: Add GenericFileWriterFactory [iceberg]
aokolnychyi commented on code in PR #9267: URL: https://github.com/apache/iceberg/pull/9267#discussion_r1422245545 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestMetadataTableReadableMetrics.java: ## @@ -219,27 +219,27 @@ public void testPrimitiveColumns() throws Exception { Row binaryCol = Row.of( -59L, Review Comment: The column sizes have changed cause the new writer picks up all table properties, which was not true before. I validated the actual values are correct using `parquet-tools`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Data: Add GenericFileWriterFactory [iceberg]
aokolnychyi commented on PR #9267: URL: https://github.com/apache/iceberg/pull/9267#issuecomment-1849745072 @nastra @Fokko @flyrain @amogh-jahagirdar, could you check this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Can databricks volume be implemented on Iceberg? [iceberg]
coolderli commented on issue #9249: URL: https://github.com/apache/iceberg/issues/9249#issuecomment-1849747058 @Fokko Thanks for your reply. I think we can use a table for the files that have schema. If a file does not have a schema, tables cannot be used. Of course, files can be directly used on object storage, but directories are difficult to manage I am wondering if it is possible to provide external references to files stored on object storage in a unified directory structure, such as using a fixed prefix : /volume/catalog_ Name/database_ Name/volume_ Name We can record the actual object storage address of the file in the iceberg manifest file, so that we can also provide a snapshot without relying on atomic renaming of object storage This does indeed reduce the read and write performance of file semantics. If submitted too frequently, it may cause small file problems, but it is possible if we use Spark for batch commit. In addition, I found that Spark does not have a catalog of the files, so maybe it is possible to implement a FileCatalog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Data: Add GenericFileWriterFactory [iceberg]
aokolnychyi commented on code in PR #9267: URL: https://github.com/apache/iceberg/pull/9267#discussion_r1422256919 ## data/src/test/java/org/apache/iceberg/io/TestFileWriterFactory.java: ## @@ -76,18 +76,15 @@ public static Object[] parameters() { private final FileFormat fileFormat; private final boolean partitioned; - private final List dataRows; private StructLike partition = null; private OutputFileFactory fileFactory = null; + private List dataRows; public TestFileWriterFactory(FileFormat fileFormat, boolean partitioned) { super(TABLE_FORMAT_VERSION); this.fileFormat = fileFormat; this.partitioned = partitioned; -this.dataRows = Review Comment: Had to move the initialization after the table creation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]
ajantha-bhat commented on code in PR #6887: URL: https://github.com/apache/iceberg/pull/6887#discussion_r1422303733 ## core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java: ## @@ -78,14 +78,13 @@ public Metrics metrics() { @Override public long length() { -if (stream != null) { - try { -return stream.getPos(); - } catch (IOException e) { -throw new RuntimeIOException(e, "Failed to get stream length"); - } +Preconditions.checkNotNull(stream, "Failed to get stream length: no open stream"); Review Comment: Aah, I used `checkNotNull` before instead of `checkState` before as it was just replacing NULL check. But yeah. It is better to throw `IllegalStateException` instead of `NPE`. I have updated now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] iceberg-spark: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]
chinmay-bhat commented on issue #9086: URL: https://github.com/apache/iceberg/issues/9086#issuecomment-1849868289 Ok, so can I pick this up once the parameterized tests PR is merged? Also, iceberg-spark has folders for each version (v3.2, 3.3, 3.4, 3.5). Do you recommend creating separate PRs for each? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] iceberg-spark: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]
nastra commented on issue #9086: URL: https://github.com/apache/iceberg/issues/9086#issuecomment-1849884128 We should be mostly focusing on migrating Spark 3.5, which is already a big task -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Properly suppress historical snapshots when building TableMetadata with suppressHistoricalSnapshots() [iceberg]
nastra commented on code in PR #9234: URL: https://github.com/apache/iceberg/pull/9234#discussion_r1422341041 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -1384,7 +1386,9 @@ public Builder setPreviousFileLocation(String previousFileLocation) { private boolean hasChanges() { return changes.size() != startingChangeCount || (discardChanges && !changes.isEmpty()) - || metadataLocation != null; + || metadataLocation != null + || suppressHistoricalSnapshots Review Comment: this is for the server-side to indicate that there are changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Properly suppress historical snapshots when building TableMetadata with suppressHistoricalSnapshots() [iceberg]
nastra commented on code in PR #9234: URL: https://github.com/apache/iceberg/pull/9234#discussion_r1422342150 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -1384,7 +1386,9 @@ public Builder setPreviousFileLocation(String previousFileLocation) { private boolean hasChanges() { return changes.size() != startingChangeCount || (discardChanges && !changes.isEmpty()) - || metadataLocation != null; + || metadataLocation != null + || suppressHistoricalSnapshots + || null != snapshotsSupplier; Review Comment: this is for the client-side so that it doesn't exit early in https://github.com/apache/iceberg/blob/f19643a93f5dac99bbdbc9881ef19c89d7bcd3eb/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L359-L368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Properly suppress historical snapshots when building TableMetadata with suppressHistoricalSnapshots() [iceberg]
nastra commented on code in PR #9234: URL: https://github.com/apache/iceberg/pull/9234#discussion_r1422342150 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -1384,7 +1386,9 @@ public Builder setPreviousFileLocation(String previousFileLocation) { private boolean hasChanges() { return changes.size() != startingChangeCount || (discardChanges && !changes.isEmpty()) - || metadataLocation != null; + || metadataLocation != null + || suppressHistoricalSnapshots + || null != snapshotsSupplier; Review Comment: this is for the client-side in https://github.com/apache/iceberg/blob/f19643a93f5dac99bbdbc9881ef19c89d7bcd3eb/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L359-L368 so that it doesn't exit early in the `build()` method -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Properly suppress historical snapshots when building TableMetadata with suppressHistoricalSnapshots() [iceberg]
nastra commented on code in PR #9234: URL: https://github.com/apache/iceberg/pull/9234#discussion_r1422341041 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -1384,7 +1386,9 @@ public Builder setPreviousFileLocation(String previousFileLocation) { private boolean hasChanges() { return changes.size() != startingChangeCount || (discardChanges && !changes.isEmpty()) - || metadataLocation != null; + || metadataLocation != null + || suppressHistoricalSnapshots Review Comment: this is for the server-side that calls `suppressHistoricalSnapshots()`to indicate that there are changes and to not exit early in `build()` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]
ajantha-bhat commented on PR #6887: URL: https://github.com/apache/iceberg/pull/6887#issuecomment-1849920644 PR is ready @danielcweeks, @nastra, @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Spark 3.5: Add Spark application id to summary of RewriteDataFilesSparkAction [iceberg]
manuzhang opened a new pull request, #9273: URL: https://github.com/apache/iceberg/pull/9273 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
ajantha-bhat commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422497424 ## core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java: ## @@ -400,8 +400,15 @@ public void replaceTableViaTransactionThatAlreadyExistsAsView() { .buildTable(viewIdentifier, SCHEMA) .replaceTransaction() .commitTransaction()) -.isInstanceOf(NoSuchTableException.class) -.hasMessageStartingWith("Table does not exist: ns.view"); +.satisfiesAnyOf( Review Comment: Since #9102 is merged and I rebased. This change is no longer exist. ## core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java: ## @@ -465,8 +472,15 @@ public void replaceViewThatAlreadyExistsAsTable() { .withDefaultNamespace(tableIdentifier.namespace()) .withQuery("spark", "select * from ns.tbl") .replace()) -.isInstanceOf(NoSuchViewException.class) -.hasMessageStartingWith("View does not exist: ns.table"); +.satisfiesAnyOf( Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] iceberg-spark: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]
chinmay-bhat commented on issue #9086: URL: https://github.com/apache/iceberg/issues/9086#issuecomment-1850148369 As it's a big task, I'll get started migrating tests in `iceberg-spark` v3.5 that are not parameterized, and later open a new PR for the parameterized ones :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
dimas-b commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422508540 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference( protected Map properties() { return catalogOptions; } + + @Override + protected ViewOperations newViewOps(TableIdentifier identifier) { +TableReference tr = parseTableReference(identifier); +return new NessieViewOperations( +ContentKey.of( + org.projectnessie.model.Namespace.of(identifier.namespace().levels()), tr.getName()), +client.withReference(tr.getReference(), tr.getHash()), +fileIO); + } + + @Override + public List listViews(Namespace namespace) { +return client.listViews(namespace); + } + + @Override + public boolean dropView(TableIdentifier identifier) { +TableReference tableReference = parseTableReference(identifier); +return client +.withReference(tableReference.getReference(), tableReference.getHash()) +.dropView(identifierWithoutTableReference(identifier, tableReference), false); + } + + @Override + public void renameView(TableIdentifier from, TableIdentifier to) { +TableReference fromTableReference = parseTableReference(from); +TableReference toTableReference = parseTableReference(to); + +validateReferenceForRename(fromTableReference, toTableReference, Content.Type.ICEBERG_VIEW); + +TableIdentifier fromIdentifier = +NessieUtil.removeCatalogName( +identifierWithoutTableReference(from, fromTableReference), name()); +TableIdentifier toIdentifier = +NessieUtil.removeCatalogName(identifierWithoutTableReference(to, toTableReference), name()); +client +.withReference(fromTableReference.getReference(), fromTableReference.getHash()) +.renameView(fromIdentifier, toIdentifier); + } + + private void validateReferenceForRename( + TableReference fromTableReference, TableReference toTableReference, Content.Type type) { +String fromReference = +fromTableReference.hasReference() +? fromTableReference.getReference() +: client.getRef().getName(); +String toReference = +toTableReference.hasReference() +? toTableReference.getReference() +: client.getRef().getName(); +Preconditions.checkArgument( +fromReference.equalsIgnoreCase(toReference), +"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':" ++ " source and target references must be the same.", +NessieUtil.contentTypeString(type).toLowerCase(), Review Comment: Since the expected text is in English we should use `.toLowerCase(Locale.ENGLISH))`. Using the default locale can have surprising results may be not in this particular case, but we ought to ensure correctness in general, I think. Try this: `assertThat("VIEW".toLowerCase(new Locale("TR"))).isEqualTo("view");` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
ajantha-bhat commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422521681 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference( protected Map properties() { return catalogOptions; } + + @Override + protected ViewOperations newViewOps(TableIdentifier identifier) { +TableReference tr = parseTableReference(identifier); +return new NessieViewOperations( +ContentKey.of( + org.projectnessie.model.Namespace.of(identifier.namespace().levels()), tr.getName()), +client.withReference(tr.getReference(), tr.getHash()), +fileIO); + } + + @Override + public List listViews(Namespace namespace) { +return client.listViews(namespace); + } + + @Override + public boolean dropView(TableIdentifier identifier) { +TableReference tableReference = parseTableReference(identifier); +return client +.withReference(tableReference.getReference(), tableReference.getHash()) +.dropView(identifierWithoutTableReference(identifier, tableReference), false); + } + + @Override + public void renameView(TableIdentifier from, TableIdentifier to) { +TableReference fromTableReference = parseTableReference(from); +TableReference toTableReference = parseTableReference(to); + +validateReferenceForRename(fromTableReference, toTableReference, Content.Type.ICEBERG_VIEW); + +TableIdentifier fromIdentifier = +NessieUtil.removeCatalogName( +identifierWithoutTableReference(from, fromTableReference), name()); +TableIdentifier toIdentifier = +NessieUtil.removeCatalogName(identifierWithoutTableReference(to, toTableReference), name()); +client +.withReference(fromTableReference.getReference(), fromTableReference.getHash()) +.renameView(fromIdentifier, toIdentifier); + } + + private void validateReferenceForRename( + TableReference fromTableReference, TableReference toTableReference, Content.Type type) { +String fromReference = +fromTableReference.hasReference() +? fromTableReference.getReference() +: client.getRef().getName(); +String toReference = +toTableReference.hasReference() +? toTableReference.getReference() +: client.getRef().getName(); +Preconditions.checkArgument( +fromReference.equalsIgnoreCase(toReference), +"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':" ++ " source and target references must be the same.", +NessieUtil.contentTypeString(type).toLowerCase(), Review Comment: While adding it, I did check the Iceberg code. `.toLowerCase()` is used in other places also. So, I followed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
ajantha-bhat commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422535596 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference( protected Map properties() { return catalogOptions; } + + @Override + protected ViewOperations newViewOps(TableIdentifier identifier) { +TableReference tr = parseTableReference(identifier); +return new NessieViewOperations( +ContentKey.of( + org.projectnessie.model.Namespace.of(identifier.namespace().levels()), tr.getName()), +client.withReference(tr.getReference(), tr.getHash()), +fileIO); + } + + @Override + public List listViews(Namespace namespace) { +return client.listViews(namespace); + } + + @Override + public boolean dropView(TableIdentifier identifier) { +TableReference tableReference = parseTableReference(identifier); +return client +.withReference(tableReference.getReference(), tableReference.getHash()) +.dropView(identifierWithoutTableReference(identifier, tableReference), false); + } + + @Override + public void renameView(TableIdentifier from, TableIdentifier to) { +TableReference fromTableReference = parseTableReference(from); +TableReference toTableReference = parseTableReference(to); + +validateReferenceForRename(fromTableReference, toTableReference, Content.Type.ICEBERG_VIEW); + +TableIdentifier fromIdentifier = +NessieUtil.removeCatalogName( +identifierWithoutTableReference(from, fromTableReference), name()); +TableIdentifier toIdentifier = +NessieUtil.removeCatalogName(identifierWithoutTableReference(to, toTableReference), name()); +client +.withReference(fromTableReference.getReference(), fromTableReference.getHash()) +.renameView(fromIdentifier, toIdentifier); + } + + private void validateReferenceForRename( + TableReference fromTableReference, TableReference toTableReference, Content.Type type) { +String fromReference = +fromTableReference.hasReference() +? fromTableReference.getReference() +: client.getRef().getName(); +String toReference = +toTableReference.hasReference() +? toTableReference.getReference() +: client.getRef().getName(); +Preconditions.checkArgument( +fromReference.equalsIgnoreCase(toReference), +"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':" ++ " source and target references must be the same.", +NessieUtil.contentTypeString(type).toLowerCase(), Review Comment: Fixed for new code added by this PR. We can have GH issue for existing code to track it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Api: Track partition statistics via TableMetadata [iceberg]
ajantha-bhat commented on code in PR #8502: URL: https://github.com/apache/iceberg/pull/8502#discussion_r1422541957 ## core/src/main/java/org/apache/iceberg/ReachableFileUtil.java: ## @@ -137,7 +137,9 @@ public static List manifestListLocations(Table table, Set snapshot * * @param table table for which statistics files needs to be listed * @return the location of statistics files + * @deprecated use the {@code allStatisticsFilesLocations(table)} instead. */ + @Deprecated Review Comment: removed the changes in this file as the new APIs added was not called from anywhere (this is needed for spark action). I will raise a follow up PR for Spark actions to consider partition stats file for remove orphan files and expire snapshots action. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Api: Track partition statistics via TableMetadata [iceberg]
ajantha-bhat commented on code in PR #8502: URL: https://github.com/apache/iceberg/pull/8502#discussion_r1422543077 ## .palantir/revapi.yml: ## @@ -873,6 +873,10 @@ acceptedBreaks: new: "method void org.apache.iceberg.encryption.Ciphers::()" justification: "Static utility class - should not have public constructor" "1.4.0": +org.apache.iceberg:iceberg-api: +- code: "java.method.addedToInterface" Review Comment: added as default -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
dimas-b commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422564943 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java: ## @@ -133,69 +114,30 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) { boolean newTable = base == null; String newMetadataLocation = writeNewMetadataIfRequired(newTable, metadata); -String refName = client.refName(); boolean failure = false; try { String contentId = table == null ? null : table.getId(); client.commitTable(base, metadata, newMetadataLocation, contentId, key); -} catch (NessieConflictException ex) { - failure = true; - if (ex instanceof NessieReferenceConflictException) { -// Throws a specialized exception, if possible -maybeThrowSpecializedException((NessieReferenceConflictException) ex); +} catch (NessieConflictException | NessieNotFoundException | HttpClientException ex) { + if (ex instanceof NessieConflictException || ex instanceof NessieNotFoundException) { +failure = true; } - throw new CommitFailedException( - ex, - "Cannot commit: Reference hash is out of date. " - + "Update the reference '%s' and try again", - refName); -} catch (HttpClientException ex) { - // Intentionally catch all nessie-client-exceptions here and not just the "timeout" variant - // to catch all kinds of network errors (e.g. connection reset). Network code implementation - // details and all kinds of network devices can induce unexpected behavior. So better be - // safe than sorry. - throw new CommitStateUnknownException(ex); -} catch (NessieNotFoundException ex) { + NessieUtil.handleExceptionsForCommits(ex, client.refName(), Content.Type.ICEBERG_TABLE) + .ifPresent( + exception -> { +throw exception; Review Comment: If I follow the logic correctly, `handleExceptionsForCommits` will always return something for the exception types caught in this case, so `ifPresent()` is really confusing. Could the code be refactored to avoid unnecessary conditional execution? ## nessie/src/main/java/org/apache/iceberg/nessie/NessieViewOperations.java: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import org.apache.iceberg.exceptions.NoSuchViewException; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.view.BaseViewOperations; +import org.apache.iceberg.view.ViewMetadata; +import org.apache.iceberg.view.ViewMetadataParser; +import org.projectnessie.client.http.HttpClientException; +import org.projectnessie.error.NessieBadRequestException; +import org.projectnessie.error.NessieConflictException; +import org.projectnessie.error.NessieContentNotFoundException; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Content; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.Reference; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class NessieViewOperations extends BaseViewOperations { + + private static final Logger LOG = LoggerFactory.getLogger(NessieViewOperations.class); + + private final NessieIcebergClient client; + private final ContentKey key; + private final FileIO fileIO; + private IcebergView icebergView; + + NessieViewOperations(ContentKey key, NessieIcebergClient client, FileIO fileIO) { +this.key = key; +this.client = client; +this.fileIO = fileIO; + } + + @Override + public void doRefresh() { +try { + client.refresh(); +} catch (NessieNotFoundException e) { + throw new RuntimeException( + String.format( + "Failed to refresh as ref '%s' is no longer valid.", client.getRef().getName()), + e); +} +String metadataLocation = null; +Reference reference = client.getRef().getReference(); +try { + Content content = client.getApi().getContent().key(key).reference(reference).get().get(key); + LOG.
[PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]
manuzhang opened a new pull request, #9274: URL: https://github.com/apache/iceberg/pull/9274 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
ajantha-bhat commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422622180 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieViewOperations.java: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import org.apache.iceberg.exceptions.NoSuchViewException; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.view.BaseViewOperations; +import org.apache.iceberg.view.ViewMetadata; +import org.apache.iceberg.view.ViewMetadataParser; +import org.projectnessie.client.http.HttpClientException; +import org.projectnessie.error.NessieBadRequestException; +import org.projectnessie.error.NessieConflictException; +import org.projectnessie.error.NessieContentNotFoundException; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Content; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.Reference; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class NessieViewOperations extends BaseViewOperations { + + private static final Logger LOG = LoggerFactory.getLogger(NessieViewOperations.class); + + private final NessieIcebergClient client; + private final ContentKey key; + private final FileIO fileIO; + private IcebergView icebergView; + + NessieViewOperations(ContentKey key, NessieIcebergClient client, FileIO fileIO) { +this.key = key; +this.client = client; +this.fileIO = fileIO; + } + + @Override + public void doRefresh() { +try { + client.refresh(); +} catch (NessieNotFoundException e) { + throw new RuntimeException( + String.format( + "Failed to refresh as ref '%s' is no longer valid.", client.getRef().getName()), + e); +} +String metadataLocation = null; +Reference reference = client.getRef().getReference(); +try { + Content content = client.getApi().getContent().key(key).reference(reference).get().get(key); + LOG.debug("Content '{}' at '{}': {}", key, reference, content); + if (content == null) { +if (currentMetadataLocation() != null) { + throw new NoSuchViewException("View does not exist: %s in %s", key, reference); +} + } else { +this.icebergView = +content +.unwrap(IcebergView.class) +.orElseThrow(() -> new NessieContentNotFoundException(key, reference.getName())); +metadataLocation = icebergView.getMetadataLocation(); + } +} catch (NessieNotFoundException ex) { + if (currentMetadataLocation() != null) { +throw new NoSuchViewException("View does not exist: %s in %s", key, reference); + } +} +refreshFromMetadataLocation( +metadataLocation, +null, +2, +location -> +NessieUtil.loadViewMetadata( +ViewMetadataParser.read(io().newInputFile(location)), location, reference)); + } + + @Override + public void doCommit(ViewMetadata base, ViewMetadata metadata) { +String newMetadataLocation = writeNewMetadataIfRequired(metadata); + +boolean failure = false; +try { + String contentId = icebergView == null ? null : icebergView.getId(); + client.commitView(base, metadata, newMetadataLocation, contentId, key); +} catch (NessieConflictException | NessieNotFoundException | HttpClientException ex) { + if (ex instanceof NessieConflictException || ex instanceof NessieNotFoundException) { +failure = true; + } + NessieUtil.handleExceptionsForCommits(ex, client.refName(), Content.Type.ICEBERG_VIEW) + .ifPresent( + exception -> { +throw exception; + }); +} catch (NessieBadRequestException ex) { + failure = true; + throw NessieUtil.handleBadRequestForCommit(client, key, Content.Type.ICEBERG_VIEW).orElse(ex); +} finally { + if (failure) { +io().deleteFile(newMetadataLocation); Review Comment: I tried before. But one works on `client.commitView` and one is `client.commitTable`. So, it canno
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
ajantha-bhat commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422626565 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieViewCatalog.java: ## @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.assertj.core.api.Assertions.assertThat; + +import java.io.IOException; +import java.net.URI; +import java.nio.file.Path; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.CatalogProperties; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.view.BaseView; +import org.apache.iceberg.view.View; +import org.apache.iceberg.view.ViewCatalogTests; +import org.apache.iceberg.view.ViewMetadata; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.extension.ExtendWith; +import org.junit.jupiter.api.extension.RegisterExtension; +import org.junit.jupiter.api.io.TempDir; +import org.projectnessie.client.api.NessieApiV1; +import org.projectnessie.client.ext.NessieApiVersion; +import org.projectnessie.client.ext.NessieApiVersions; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieConflictException; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.jaxrs.ext.NessieJaxRsExtension; +import org.projectnessie.model.Branch; +import org.projectnessie.model.Reference; +import org.projectnessie.model.Tag; +import org.projectnessie.versioned.storage.common.persist.Persist; +import org.projectnessie.versioned.storage.inmemory.InmemoryBackendTestFactory; +import org.projectnessie.versioned.storage.testextension.NessieBackend; +import org.projectnessie.versioned.storage.testextension.NessiePersist; +import org.projectnessie.versioned.storage.testextension.PersistExtension; + +@ExtendWith(PersistExtension.class) +@NessieBackend(InmemoryBackendTestFactory.class) +@NessieApiVersions // test all versions +public class TestNessieViewCatalog extends ViewCatalogTests { + + @NessiePersist static Persist persist; + + @RegisterExtension + static NessieJaxRsExtension server = NessieJaxRsExtension.jaxRsExtension(() -> persist); + + @TempDir private Path temp; + + private NessieCatalog catalog; + private NessieApiV1 api; + private NessieApiVersion apiVersion; + private Configuration hadoopConfig; + private String initialHashOfDefaultBranch; + private String uri; + + @BeforeEach + public void setUp(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws NessieNotFoundException { +api = clientFactory.make(); +apiVersion = clientFactory.apiVersion(); +initialHashOfDefaultBranch = api.getDefaultBranch().getHash(); +uri = nessieUri.toASCIIString(); +hadoopConfig = new Configuration(); +catalog = initNessieCatalog("main"); + } + + @AfterEach + public void afterEach() throws IOException { +resetData(); +try { + if (catalog != null) { +catalog.close(); + } + api.close(); +} finally { + catalog = null; + api = null; + hadoopConfig = null; +} + } + + private void resetData() throws NessieConflictException, NessieNotFoundException { +Branch defaultBranch = api.getDefaultBranch(); +for (Reference r : api.getAllReferences().get().getReferences()) { Review Comment: I don't see the advantage of adding at Nessie side. But it can be added as Util for the test classes. But maybe as a follow up PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422634486 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java: ## @@ -133,69 +114,30 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) { boolean newTable = base == null; String newMetadataLocation = writeNewMetadataIfRequired(newTable, metadata); -String refName = client.refName(); boolean failure = false; try { String contentId = table == null ? null : table.getId(); client.commitTable(base, metadata, newMetadataLocation, contentId, key); -} catch (NessieConflictException ex) { - failure = true; - if (ex instanceof NessieReferenceConflictException) { -// Throws a specialized exception, if possible -maybeThrowSpecializedException((NessieReferenceConflictException) ex); +} catch (NessieConflictException | NessieNotFoundException | HttpClientException ex) { + if (ex instanceof NessieConflictException || ex instanceof NessieNotFoundException) { +failure = true; } - throw new CommitFailedException( - ex, - "Cannot commit: Reference hash is out of date. " - + "Update the reference '%s' and try again", - refName); -} catch (HttpClientException ex) { - // Intentionally catch all nessie-client-exceptions here and not just the "timeout" variant - // to catch all kinds of network errors (e.g. connection reset). Network code implementation - // details and all kinds of network devices can induce unexpected behavior. So better be - // safe than sorry. - throw new CommitStateUnknownException(ex); -} catch (NessieNotFoundException ex) { + NessieUtil.handleExceptionsForCommits(ex, client.refName(), Content.Type.ICEBERG_TABLE) + .ifPresent( + exception -> { +throw exception; Review Comment: +1 on improving exception handling in the nessie code as I've raised this already before, because it's difficult to read & understand when certain things are thrown. However, I would suggest to do this as an immediate follow-up after this PR is merged, as otherwise this makes it more difficult to review the changes being introduced for Views -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
dimas-b commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422637569 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference( protected Map properties() { return catalogOptions; } + + @Override + protected ViewOperations newViewOps(TableIdentifier identifier) { +TableReference tr = parseTableReference(identifier); +return new NessieViewOperations( +ContentKey.of( + org.projectnessie.model.Namespace.of(identifier.namespace().levels()), tr.getName()), +client.withReference(tr.getReference(), tr.getHash()), +fileIO); + } + + @Override + public List listViews(Namespace namespace) { +return client.listViews(namespace); + } + + @Override + public boolean dropView(TableIdentifier identifier) { +TableReference tableReference = parseTableReference(identifier); +return client +.withReference(tableReference.getReference(), tableReference.getHash()) +.dropView(identifierWithoutTableReference(identifier, tableReference), false); + } + + @Override + public void renameView(TableIdentifier from, TableIdentifier to) { +TableReference fromTableReference = parseTableReference(from); +TableReference toTableReference = parseTableReference(to); + +validateReferenceForRename(fromTableReference, toTableReference, Content.Type.ICEBERG_VIEW); + +TableIdentifier fromIdentifier = +NessieUtil.removeCatalogName( +identifierWithoutTableReference(from, fromTableReference), name()); +TableIdentifier toIdentifier = +NessieUtil.removeCatalogName(identifierWithoutTableReference(to, toTableReference), name()); +client +.withReference(fromTableReference.getReference(), fromTableReference.getHash()) +.renameView(fromIdentifier, toIdentifier); + } + + private void validateReferenceForRename( + TableReference fromTableReference, TableReference toTableReference, Content.Type type) { +String fromReference = +fromTableReference.hasReference() +? fromTableReference.getReference() +: client.getRef().getName(); +String toReference = +toTableReference.hasReference() +? toTableReference.getReference() +: client.getRef().getName(); +Preconditions.checkArgument( +fromReference.equalsIgnoreCase(toReference), +"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':" ++ " source and target references must be the same.", +NessieUtil.contentTypeString(type).toLowerCase(), Review Comment: I did a quick scan of the Iceberg codebase for this, and I do a few places where `.toLowerCase()` may be a concern: 1) `TableIdentifier.toLoweCase()` is used only in `CachingCatalog` internally, which is probably ok, since the lower case strings are not exposed to the outside. 2) [VectorizedSupport](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/mr/src/main/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedSupport.java#L33) seems like it may be a problem, but I do not really know whether the lower case data is exposed and how. 3) [IcebergRecordObjectInspector](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java#L55) converts field names to lower case, and that seems to be affected by the locale problem as various case names are accepted as input parameters to its methods. 4) `jQuery` code uses a log of `toLowerCase()`, but I do not really know how it is supposed to be used. @nastra : Do you think this is worth opening an issue? To the best of my knowledge this kind of case conversion can be problematic only in German and Turkish locales. The German locale affects only proper German language words (so it is less of a problem), but the Turkish locale can cause English words to be converted in unexpected ways. Does Iceberg support using its libraries in user-defined locales? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
dimas-b commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422637569 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference( protected Map properties() { return catalogOptions; } + + @Override + protected ViewOperations newViewOps(TableIdentifier identifier) { +TableReference tr = parseTableReference(identifier); +return new NessieViewOperations( +ContentKey.of( + org.projectnessie.model.Namespace.of(identifier.namespace().levels()), tr.getName()), +client.withReference(tr.getReference(), tr.getHash()), +fileIO); + } + + @Override + public List listViews(Namespace namespace) { +return client.listViews(namespace); + } + + @Override + public boolean dropView(TableIdentifier identifier) { +TableReference tableReference = parseTableReference(identifier); +return client +.withReference(tableReference.getReference(), tableReference.getHash()) +.dropView(identifierWithoutTableReference(identifier, tableReference), false); + } + + @Override + public void renameView(TableIdentifier from, TableIdentifier to) { +TableReference fromTableReference = parseTableReference(from); +TableReference toTableReference = parseTableReference(to); + +validateReferenceForRename(fromTableReference, toTableReference, Content.Type.ICEBERG_VIEW); + +TableIdentifier fromIdentifier = +NessieUtil.removeCatalogName( +identifierWithoutTableReference(from, fromTableReference), name()); +TableIdentifier toIdentifier = +NessieUtil.removeCatalogName(identifierWithoutTableReference(to, toTableReference), name()); +client +.withReference(fromTableReference.getReference(), fromTableReference.getHash()) +.renameView(fromIdentifier, toIdentifier); + } + + private void validateReferenceForRename( + TableReference fromTableReference, TableReference toTableReference, Content.Type type) { +String fromReference = +fromTableReference.hasReference() +? fromTableReference.getReference() +: client.getRef().getName(); +String toReference = +toTableReference.hasReference() +? toTableReference.getReference() +: client.getRef().getName(); +Preconditions.checkArgument( +fromReference.equalsIgnoreCase(toReference), +"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':" ++ " source and target references must be the same.", +NessieUtil.contentTypeString(type).toLowerCase(), Review Comment: I did a quick scan of the Iceberg codebase for this, and I do a few places where `.toLowerCase()` may be a concern: 1) `TableIdentifier.toLoweCase()` is used only in `CachingCatalog` internally, which is probably ok, since the lower case strings are not exposed to the outside. 2) [VectorizedSupport](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/mr/src/main/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedSupport.java#L33) seems like it may be a problem, but I do not really know whether the lower case data is exposed and how. 3) [IcebergRecordObjectInspector](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java#L55) converts field names to lower case, and that seems to be affected by the locale problem as various case names are accepted as input parameters to its methods. 4) `jQuery` code uses a lot of `toLowerCase()`, but I do not really know how it is supposed to be used. @nastra : Do you think this is worth opening an issue? To the best of my knowledge this kind of case conversion can be problematic only in German and Turkish locales. The German locale affects only proper German language words (so it is less of a problem), but the Turkish locale can cause English words to be converted in unexpected ways. Does Iceberg support using its libraries in user-defined locales? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
dimas-b commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422655391 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieViewOperations.java: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import org.apache.iceberg.exceptions.NoSuchViewException; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.view.BaseViewOperations; +import org.apache.iceberg.view.ViewMetadata; +import org.apache.iceberg.view.ViewMetadataParser; +import org.projectnessie.client.http.HttpClientException; +import org.projectnessie.error.NessieBadRequestException; +import org.projectnessie.error.NessieConflictException; +import org.projectnessie.error.NessieContentNotFoundException; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Content; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.Reference; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class NessieViewOperations extends BaseViewOperations { + + private static final Logger LOG = LoggerFactory.getLogger(NessieViewOperations.class); + + private final NessieIcebergClient client; + private final ContentKey key; + private final FileIO fileIO; + private IcebergView icebergView; + + NessieViewOperations(ContentKey key, NessieIcebergClient client, FileIO fileIO) { +this.key = key; +this.client = client; +this.fileIO = fileIO; + } + + @Override + public void doRefresh() { +try { + client.refresh(); +} catch (NessieNotFoundException e) { + throw new RuntimeException( + String.format( + "Failed to refresh as ref '%s' is no longer valid.", client.getRef().getName()), + e); +} +String metadataLocation = null; +Reference reference = client.getRef().getReference(); +try { + Content content = client.getApi().getContent().key(key).reference(reference).get().get(key); + LOG.debug("Content '{}' at '{}': {}", key, reference, content); + if (content == null) { +if (currentMetadataLocation() != null) { + throw new NoSuchViewException("View does not exist: %s in %s", key, reference); +} + } else { +this.icebergView = +content +.unwrap(IcebergView.class) +.orElseThrow(() -> new NessieContentNotFoundException(key, reference.getName())); +metadataLocation = icebergView.getMetadataLocation(); + } +} catch (NessieNotFoundException ex) { + if (currentMetadataLocation() != null) { +throw new NoSuchViewException("View does not exist: %s in %s", key, reference); + } +} +refreshFromMetadataLocation( +metadataLocation, +null, +2, +location -> +NessieUtil.loadViewMetadata( +ViewMetadataParser.read(io().newInputFile(location)), location, reference)); + } + + @Override + public void doCommit(ViewMetadata base, ViewMetadata metadata) { +String newMetadataLocation = writeNewMetadataIfRequired(metadata); + +boolean failure = false; +try { + String contentId = icebergView == null ? null : icebergView.getId(); + client.commitView(base, metadata, newMetadataLocation, contentId, key); +} catch (NessieConflictException | NessieNotFoundException | HttpClientException ex) { + if (ex instanceof NessieConflictException || ex instanceof NessieNotFoundException) { +failure = true; + } + NessieUtil.handleExceptionsForCommits(ex, client.refName(), Content.Type.ICEBERG_VIEW) + .ifPresent( + exception -> { +throw exception; + }); +} catch (NessieBadRequestException ex) { + failure = true; + throw NessieUtil.handleBadRequestForCommit(client, key, Content.Type.ICEBERG_VIEW).orElse(ex); +} finally { + if (failure) { +io().deleteFile(newMetadataLocation); Review Comment: We could define our own functional `interface` with declared exceptions. -- This is an automated message
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
dimas-b commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422656154 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieViewCatalog.java: ## @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.assertj.core.api.Assertions.assertThat; + +import java.io.IOException; +import java.net.URI; +import java.nio.file.Path; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.CatalogProperties; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.view.BaseView; +import org.apache.iceberg.view.View; +import org.apache.iceberg.view.ViewCatalogTests; +import org.apache.iceberg.view.ViewMetadata; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.extension.ExtendWith; +import org.junit.jupiter.api.extension.RegisterExtension; +import org.junit.jupiter.api.io.TempDir; +import org.projectnessie.client.api.NessieApiV1; +import org.projectnessie.client.ext.NessieApiVersion; +import org.projectnessie.client.ext.NessieApiVersions; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieConflictException; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.jaxrs.ext.NessieJaxRsExtension; +import org.projectnessie.model.Branch; +import org.projectnessie.model.Reference; +import org.projectnessie.model.Tag; +import org.projectnessie.versioned.storage.common.persist.Persist; +import org.projectnessie.versioned.storage.inmemory.InmemoryBackendTestFactory; +import org.projectnessie.versioned.storage.testextension.NessieBackend; +import org.projectnessie.versioned.storage.testextension.NessiePersist; +import org.projectnessie.versioned.storage.testextension.PersistExtension; + +@ExtendWith(PersistExtension.class) +@NessieBackend(InmemoryBackendTestFactory.class) +@NessieApiVersions // test all versions +public class TestNessieViewCatalog extends ViewCatalogTests { + + @NessiePersist static Persist persist; + + @RegisterExtension + static NessieJaxRsExtension server = NessieJaxRsExtension.jaxRsExtension(() -> persist); + + @TempDir private Path temp; + + private NessieCatalog catalog; + private NessieApiV1 api; + private NessieApiVersion apiVersion; + private Configuration hadoopConfig; + private String initialHashOfDefaultBranch; + private String uri; + + @BeforeEach + public void setUp(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws NessieNotFoundException { +api = clientFactory.make(); +apiVersion = clientFactory.apiVersion(); +initialHashOfDefaultBranch = api.getDefaultBranch().getHash(); +uri = nessieUri.toASCIIString(); +hadoopConfig = new Configuration(); +catalog = initNessieCatalog("main"); + } + + @AfterEach + public void afterEach() throws IOException { +resetData(); +try { + if (catalog != null) { +catalog.close(); + } + api.close(); +} finally { + catalog = null; + api = null; + hadoopConfig = null; +} + } + + private void resetData() throws NessieConflictException, NessieNotFoundException { +Branch defaultBranch = api.getDefaultBranch(); +for (Reference r : api.getAllReferences().get().getReferences()) { Review Comment: fair enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
dimas-b commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422658039 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java: ## @@ -133,69 +114,30 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) { boolean newTable = base == null; String newMetadataLocation = writeNewMetadataIfRequired(newTable, metadata); -String refName = client.refName(); boolean failure = false; try { String contentId = table == null ? null : table.getId(); client.commitTable(base, metadata, newMetadataLocation, contentId, key); -} catch (NessieConflictException ex) { - failure = true; - if (ex instanceof NessieReferenceConflictException) { -// Throws a specialized exception, if possible -maybeThrowSpecializedException((NessieReferenceConflictException) ex); +} catch (NessieConflictException | NessieNotFoundException | HttpClientException ex) { + if (ex instanceof NessieConflictException || ex instanceof NessieNotFoundException) { +failure = true; } - throw new CommitFailedException( - ex, - "Cannot commit: Reference hash is out of date. " - + "Update the reference '%s' and try again", - refName); -} catch (HttpClientException ex) { - // Intentionally catch all nessie-client-exceptions here and not just the "timeout" variant - // to catch all kinds of network errors (e.g. connection reset). Network code implementation - // details and all kinds of network devices can induce unexpected behavior. So better be - // safe than sorry. - throw new CommitStateUnknownException(ex); -} catch (NessieNotFoundException ex) { + NessieUtil.handleExceptionsForCommits(ex, client.refName(), Content.Type.ICEBERG_TABLE) + .ifPresent( + exception -> { +throw exception; Review Comment: I'm fine with a follow-up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] java.lang.ClassNotFoundException: Failed to find data source: iceberg. Issue when we are using Java Custom Catalog [iceberg]
mahendrachandrasekhar opened a new issue, #9275: URL: https://github.com/apache/iceberg/issues/9275 We are using a Java Custom Catalog with iceberg. The Table is created properly, however we get an issue when we insert the data. `public String createCustomTable(String tableName) { try { TableIdentifier tableIdentifier = TableIdentifier.of(name(), tableName); Schema schema = readSchema(tableIdentifier); Map properties = ImmutableMap.of( TableProperties.DEFAULT_FILE_FORMAT, FileFormat.PARQUET.name() ); PartitionSpec partitionSpec = PartitionSpec.builderFor(schema) .identity(getPartitionKeyfromSchema(tableIdentifier.name())) .build(); String tableLocation = defaultLocation + tableIdentifier.namespace().toString() + "/" + tableIdentifier.name(); catalog.createTable(tableIdentifier, schema, partitionSpec, tableLocation, properties); catalog.loadTable(TableIdentifier.of(name(), tableName)); return "Table created"; } catch (Exception e) { return e.getMessage(); } } public String insertData(String tableName, String csvPath) throws IOException { Table icebergTable = catalog.loadTable(TableIdentifier.of(name(), tableName)); SparkSession spark = SparkSession.builder() .config("spark.master", "local") .getOrCreate(); String headerJson = readHeaderJson(tableName); LOGGER.info("Header JSON for {}: {}", tableName, headerJson); String[] columns = headerJson.split(","); Dataset df = spark.read() .option("header", "false") .option("inferSchema", "false") .option("comment", "#") .option("sep", "|") .csv(csvPath) .toDF(columns); LOGGER.info("Actual columns: {}", Arrays.toString(df.columns())); for (String col : df.columns()) { df = df.withColumn(col, df.col(col).cast("string")); } df.write().format("iceberg").mode(SaveMode.Append).save(icebergTable.location()); LOGGER.info("Data inserted successfully into table: {}", tableName); } ` When we execute this via a main program in Java it works perfectly However when we create a jar out of this and call this from Spark it gives us this error ` ERROR:root:Error: An error occurred while calling o0.insertData. : java.lang.ClassNotFoundException: Failed to find data source: iceberg. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:443) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:670) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720) at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239) at com.xyz.catalog.CustomCatalog.insertData(CustomCatalog.java:178) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.ClassNotFoundException: iceberg.DefaultSource at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:656)
Re: [I] java.lang.ClassNotFoundException: Failed to find data source: iceberg. Issue when we are using Java Custom Catalog [iceberg]
mahendrachandrasekhar commented on issue #9275: URL: https://github.com/apache/iceberg/issues/9275#issuecomment-1850354505 We have this included in our docker image (which is based of bitnami/spark:3.2.4 RUN curl -L -o /home/airflow/spark/jars/hadoop-aws.jar https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-aws/2.10.2/hadoop-aws-2.10.2.jar && \ curl -L -o /home/airflow/spark/jars/caffeine.jar https://repo.maven.apache.org/maven2/com/github/ben-manes/caffeine/caffeine/3.1.3/caffeine-3.1.3.jar && \ curl -L -o /home/airflow/spark/jars/iceberg-core.jar https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-core/1.4.2/iceberg-core-1.4.2.jar && \ curl -L -o /home/airflow/spark/jars/iceberg-spark.jar https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-spark-3.5_2.13/1.4.2/iceberg-spark-3.5_2.13-1.4.2.jar && \ curl -L -o /home/airflow/spark/jars/iceberg-spark-runtime.jar https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.13/1.4.2/iceberg-spark-runtime-3.5_2.13-1.4.2.jar && \ curl -L -o /home/airflow/spark/jars/iceberg-parquet.jar https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-parquet/1.4.2/iceberg-parquet-1.4.2.jar ENV SPARK_DIST_CLASSPATH $SPARK_DIST_CLASSPATH:/home/airflow/spark/jars/hadoop-aws.jar:/home/airflow/spark/jars/iceberg-core.jar:/home/airflow/spark/jars/caffeine.jar:/home/airflow/spark/jars/iceberg-spark-runtime.jar:/home/airflow/spark/jars/iceberg-parquet.jar:/home/airflow/spark/jars/iceberg-spark.jar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422706464 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference( protected Map properties() { return catalogOptions; } + + @Override + protected ViewOperations newViewOps(TableIdentifier identifier) { +TableReference tr = parseTableReference(identifier); +return new NessieViewOperations( +ContentKey.of( + org.projectnessie.model.Namespace.of(identifier.namespace().levels()), tr.getName()), +client.withReference(tr.getReference(), tr.getHash()), +fileIO); + } + + @Override + public List listViews(Namespace namespace) { +return client.listViews(namespace); + } + + @Override + public boolean dropView(TableIdentifier identifier) { +TableReference tableReference = parseTableReference(identifier); +return client +.withReference(tableReference.getReference(), tableReference.getHash()) +.dropView(identifierWithoutTableReference(identifier, tableReference), false); + } + + @Override + public void renameView(TableIdentifier from, TableIdentifier to) { +TableReference fromTableReference = parseTableReference(from); +TableReference toTableReference = parseTableReference(to); + +validateReferenceForRename(fromTableReference, toTableReference, Content.Type.ICEBERG_VIEW); + +TableIdentifier fromIdentifier = +NessieUtil.removeCatalogName( +identifierWithoutTableReference(from, fromTableReference), name()); +TableIdentifier toIdentifier = +NessieUtil.removeCatalogName(identifierWithoutTableReference(to, toTableReference), name()); +client +.withReference(fromTableReference.getReference(), fromTableReference.getHash()) +.renameView(fromIdentifier, toIdentifier); + } + + private void validateReferenceForRename( + TableReference fromTableReference, TableReference toTableReference, Content.Type type) { +String fromReference = +fromTableReference.hasReference() +? fromTableReference.getReference() +: client.getRef().getName(); +String toReference = +toTableReference.hasReference() +? toTableReference.getReference() +: client.getRef().getName(); +Preconditions.checkArgument( +fromReference.equalsIgnoreCase(toReference), +"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':" ++ " source and target references must be the same.", +NessieUtil.contentTypeString(type).toLowerCase(), Review Comment: it's probably a good idea to open an issue and raise awareness about this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
ajantha-bhat commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422710861 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java: ## @@ -133,69 +114,30 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) { boolean newTable = base == null; String newMetadataLocation = writeNewMetadataIfRequired(newTable, metadata); -String refName = client.refName(); boolean failure = false; try { String contentId = table == null ? null : table.getId(); client.commitTable(base, metadata, newMetadataLocation, contentId, key); -} catch (NessieConflictException ex) { - failure = true; - if (ex instanceof NessieReferenceConflictException) { -// Throws a specialized exception, if possible -maybeThrowSpecializedException((NessieReferenceConflictException) ex); +} catch (NessieConflictException | NessieNotFoundException | HttpClientException ex) { + if (ex instanceof NessieConflictException || ex instanceof NessieNotFoundException) { +failure = true; } - throw new CommitFailedException( - ex, - "Cannot commit: Reference hash is out of date. " - + "Update the reference '%s' and try again", - refName); -} catch (HttpClientException ex) { - // Intentionally catch all nessie-client-exceptions here and not just the "timeout" variant - // to catch all kinds of network errors (e.g. connection reset). Network code implementation - // details and all kinds of network devices can induce unexpected behavior. So better be - // safe than sorry. - throw new CommitStateUnknownException(ex); -} catch (NessieNotFoundException ex) { + NessieUtil.handleExceptionsForCommits(ex, client.refName(), Content.Type.ICEBERG_TABLE) + .ifPresent( + exception -> { +throw exception; Review Comment: Slightly improved now ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
ajantha-bhat commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422711613 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java: ## @@ -133,69 +114,30 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) { boolean newTable = base == null; String newMetadataLocation = writeNewMetadataIfRequired(newTable, metadata); -String refName = client.refName(); boolean failure = false; try { String contentId = table == null ? null : table.getId(); client.commitTable(base, metadata, newMetadataLocation, contentId, key); -} catch (NessieConflictException ex) { - failure = true; - if (ex instanceof NessieReferenceConflictException) { -// Throws a specialized exception, if possible -maybeThrowSpecializedException((NessieReferenceConflictException) ex); +} catch (NessieConflictException | NessieNotFoundException | HttpClientException ex) { + if (ex instanceof NessieConflictException || ex instanceof NessieNotFoundException) { +failure = true; Review Comment: That will be more complicated I guess. I have simplified a bit now. Please take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
ajantha-bhat commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422711943 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -448,33 +480,85 @@ public void renameTable(TableIdentifier from, TableIdentifier to) { // behavior. So better be safe than sorry. } + private static void validateToContentForRename( + TableIdentifier from, TableIdentifier to, IcebergContent existingToContent) { +if (existingToContent != null) { + if (existingToContent.getType() == Content.Type.ICEBERG_VIEW) { +throw new AlreadyExistsException("Cannot rename %s to %s. View already exists", from, to); + } else if (existingToContent.getType() == Content.Type.ICEBERG_TABLE) { +throw new AlreadyExistsException("Cannot rename %s to %s. Table already exists", from, to); + } else { +throw new AlreadyExistsException( +"Cannot rename %s to %s. Another content of type %s with same name already exists", +from, to, existingToContent.getType()); + } +} + } + + private static void validateFromContentForRename( + TableIdentifier from, Content.Type type, IcebergContent existingFromContent) { +if (existingFromContent == null) { + if (type == Content.Type.ICEBERG_VIEW) { +throw new NoSuchViewException("View does not exist: %s", from); + } else if (type == Content.Type.ICEBERG_TABLE) { +throw new NoSuchTableException("Table does not exist: %s", from); + } else { +throw new RuntimeException("Cannot perform rename for content type: " + type); + } +} else if (existingFromContent.getType() != type) { + throw new RuntimeException( + String.format("content type of from identifier %s should be of %s", from, type)); +} + } + public boolean dropTable(TableIdentifier identifier, boolean purge) { +return dropContent(identifier, purge, Content.Type.ICEBERG_TABLE); + } + + public boolean dropView(TableIdentifier identifier, boolean purge) { +return dropContent(identifier, purge, Content.Type.ICEBERG_VIEW); + } + + private boolean dropContent(TableIdentifier identifier, boolean purge, Content.Type type) { getRef().checkMutable(); -IcebergTable existingTable = table(identifier); -if (existingTable == null) { +IcebergContent existingContent = fetchContent(identifier); +if (existingContent == null) { return false; } +if (existingContent.getType() != type) { + throw new RuntimeException( + String.format( + "Cannot drop %s: not matching with the type `%s`", + identifier, NessieUtil.contentTypeString(type))); +} + +String contentType = NessieUtil.contentTypeString(type).toLowerCase(Locale.ENGLISH); + if (purge) { - LOG.info("Purging data for table {} was set to true but is ignored", identifier.toString()); + LOG.info( + "Purging data for {} {} was set to true but is ignored", + contentType, + identifier.toString()); } -// We try to drop the table. Simple retry after ref update. +// We try to drop the content. Simple retry after ref update. try { commitRetry( String.format("Iceberg delete table %s", identifier), Review Comment: good catch. Induced during rebase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Table Requirements Validation [iceberg-python]
Fokko merged PR #200: URL: https://github.com/apache/iceberg-python/pull/200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]
nastra commented on code in PR #8918: URL: https://github.com/apache/iceberg/pull/8918#discussion_r1422744205 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -261,6 +261,12 @@ public void renameTable(TableIdentifier from, TableIdentifier originalTo) { } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new RuntimeException("Interrupted in call to rename", e); +} catch (RuntimeException e) { Review Comment: @pvary @nk1506 what's the outcome of this? Are we ok with the changes in `HiveCatalog` or not? This PR is required for https://github.com/apache/iceberg/pull/8907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive: Add View support for HIVE catalog [iceberg]
nastra commented on code in PR #8907: URL: https://github.com/apache/iceberg/pull/8907#discussion_r1422744838 ## hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveViewCatalog.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.hive; + +import java.util.Collections; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.view.ViewCatalogTests; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; + +public class TestHiveViewCatalog extends ViewCatalogTests { + + private HiveCatalog catalog; + + @BeforeEach + public void before() throws Exception { +HiveMetastoreTest.startMetastore(Collections.emptyMap()); Review Comment: this should use the JUnit5 extension from https://github.com/apache/iceberg/pull/8918 rather than using `HiveMetastoreTest` directly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]
nastra commented on code in PR #8918: URL: https://github.com/apache/iceberg/pull/8918#discussion_r1422755860 ## hive-metastore/src/test/java/org/apache/iceberg/hive/HiveMetastoreTest.java: ## @@ -31,6 +31,10 @@ import org.junit.jupiter.api.AfterAll; import org.junit.jupiter.api.BeforeAll; +/* + * This meta-setup has been deprecated; use {@link HiveMetastoreExtension} instead. + * */ +@Deprecated Review Comment: @nk1506 can you please open a proper PR against the Iceberg repo with those changes where subclasses of `HiveMetastoreTest` are moved to JUnit5? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]
nastra commented on code in PR #8918: URL: https://github.com/apache/iceberg/pull/8918#discussion_r1422755860 ## hive-metastore/src/test/java/org/apache/iceberg/hive/HiveMetastoreTest.java: ## @@ -31,6 +31,10 @@ import org.junit.jupiter.api.AfterAll; import org.junit.jupiter.api.BeforeAll; +/* + * This meta-setup has been deprecated; use {@link HiveMetastoreExtension} instead. + * */ +@Deprecated Review Comment: @nk1506 can you please open a proper PR against the Iceberg repo with those changes where subclasses of `HiveMetastoreTest` are moved to JUnit5? That being said, I don't think the deprecation here is necessary, since `HiveMetastoreTest` could also be seen as a Junit4-thing, whereas the `HiveMetastoreExtension` is what you would use for JUnit5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]
nastra commented on code in PR #9161: URL: https://github.com/apache/iceberg/pull/9161#discussion_r1422771246 ## api/src/test/java/org/apache/iceberg/TestHelpers.java: ## @@ -173,6 +178,60 @@ public static void assertSameSchemaMap(Map map1, Map
Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]
nastra commented on code in PR #9161: URL: https://github.com/apache/iceberg/pull/9161#discussion_r1422771246 ## api/src/test/java/org/apache/iceberg/TestHelpers.java: ## @@ -173,6 +178,60 @@ public static void assertSameSchemaMap(Map map1, Map
Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]
pvary commented on code in PR #8918: URL: https://github.com/apache/iceberg/pull/8918#discussion_r1422793318 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -261,6 +261,12 @@ public void renameTable(TableIdentifier from, TableIdentifier originalTo) { } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new RuntimeException("Interrupted in call to rename", e); +} catch (RuntimeException e) { Review Comment: From my side: - I am not sure #9143 is generally ok - maybe there were some reason behind the `RuntimeException`, which I am not aware of. - I prefer not to have to unwrap the `RuntimeException` in exception handling - We need to understand why it is needed, and remove the root cause -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add SQLite support [iceberg-python]
rdblue merged PR #178: URL: https://github.com/apache/iceberg-python/pull/178 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Support schema.name-mapping.default Column Projection property [iceberg-python]
syun64 opened a new issue, #202: URL: https://github.com/apache/iceberg-python/issues/202 ### Feature Request / Improvement schema.name-mapping.default property is well supported in Spark Iceberg in order to enable column consistent reads for Iceberg tables that rely on migration procedures, where the field metadata of the underlying data files may not align with the current schema ids of the iceberg table. We should enable support for this property on PyIceberg to make sure that read operations are not misaligning data onto the wrong columns. [Add NameMapping to Spec Document](https://github.com/apache/iceberg/issues/3542) [Column Projection](https://iceberg.apache.org/spec/#column-projection) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
dimas-b commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422818781 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference( protected Map properties() { return catalogOptions; } + + @Override + protected ViewOperations newViewOps(TableIdentifier identifier) { +TableReference tr = parseTableReference(identifier); +return new NessieViewOperations( +ContentKey.of( + org.projectnessie.model.Namespace.of(identifier.namespace().levels()), tr.getName()), +client.withReference(tr.getReference(), tr.getHash()), +fileIO); + } + + @Override + public List listViews(Namespace namespace) { +return client.listViews(namespace); + } + + @Override + public boolean dropView(TableIdentifier identifier) { +TableReference tableReference = parseTableReference(identifier); +return client +.withReference(tableReference.getReference(), tableReference.getHash()) +.dropView(identifierWithoutTableReference(identifier, tableReference), false); + } + + @Override + public void renameView(TableIdentifier from, TableIdentifier to) { +TableReference fromTableReference = parseTableReference(from); +TableReference toTableReference = parseTableReference(to); + +validateReferenceForRename(fromTableReference, toTableReference, Content.Type.ICEBERG_VIEW); + +TableIdentifier fromIdentifier = +NessieUtil.removeCatalogName( +identifierWithoutTableReference(from, fromTableReference), name()); +TableIdentifier toIdentifier = +NessieUtil.removeCatalogName(identifierWithoutTableReference(to, toTableReference), name()); +client +.withReference(fromTableReference.getReference(), fromTableReference.getHash()) +.renameView(fromIdentifier, toIdentifier); + } + + private void validateReferenceForRename( + TableReference fromTableReference, TableReference toTableReference, Content.Type type) { +String fromReference = +fromTableReference.hasReference() +? fromTableReference.getReference() +: client.getRef().getName(); +String toReference = +toTableReference.hasReference() +? toTableReference.getReference() +: client.getRef().getName(); +Preconditions.checkArgument( +fromReference.equalsIgnoreCase(toReference), +"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':" ++ " source and target references must be the same.", +NessieUtil.contentTypeString(type).toLowerCase(), Review Comment: #9276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add doc for rewriting manifest with spec id [iceberg]
puchengy commented on PR #9253: URL: https://github.com/apache/iceberg/pull/9253#issuecomment-1850509409 @aokolnychyi Addressed your comment, PTAL, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] DatasourceV2 does not prune columns after V2ScanRelationPushDown [iceberg]
rdblue commented on issue #9268: URL: https://github.com/apache/iceberg/issues/9268#issuecomment-1850720036 I don't think I'm following the logic here. Is there a case where you're not seeing columns being properly pruned? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] JDBC catalog fix namespaceExists check [iceberg]
dramaticlly commented on PR #8340: URL: https://github.com/apache/iceberg/pull/8340#issuecomment-1850721026 @rdblue @amogh-jahagirdar can you take another look? I think it's awesome if we can merge this fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] DatasourceV2 does not prune columns after V2ScanRelationPushDown [iceberg]
akshayakp97 commented on issue #9268: URL: https://github.com/apache/iceberg/issues/9268#issuecomment-1850779601 Thanks for your response. I am looking at TPCDS q16 physical plan for Iceberg on EMR. Link to q16 - https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql The physical plan looks like - https://gist.github.com/akshayakp97/102715c66eee44bc6f72493f427528f8 Line 46 projects only two columns from `Project [cs_warehouse_sk#54840, cs_order_number#54843L]`, however it looks like Iceberg is scanning all columns for the `catalog_sales` table in Line 47. Upon further digging, I found out that `ColumnPruning` [rule](https://github.com/apache/spark/blob/7db85642600b1e3b39ca11e41d4e3e0bf1c8962b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L859) adds the new `Project [cs_warehouse_sk#54840, cs_order_number#54843L]` operator, but we still see all columns read by the corresponding BatchScanExec. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger [iceberg]
singhpk234 commented on PR #8980: URL: https://github.com/apache/iceberg/pull/8980#issuecomment-1850795022 > Can you explain what is the purpose of using existingFilesCount here ? I am not fully aware of this logically i totally agree with you it makes no sense to keep it but what i am skeptical is if there is some handling happening due to backward compatibility (not sure about it either) and you also suggested here that you are not able to populate this field (refering to your comment here : https://github.com/apache/iceberg/pull/8980#discussion_r1386580961) so my rationale here was to add this change when we can add some coverage or better would to take this change out of the current pr and involve more folks for the discussion and let this pr go as a functionality to skip overwrite commit only, please let me know your thoughts ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]
singhpk234 commented on code in PR #9274: URL: https://github.com/apache/iceberg/pull/9274#discussion_r1423038582 ## docs/spark-procedures.md: ## @@ -639,6 +639,7 @@ Keep in mind the `add_files` procedure will fetch the Parquet metadata from each | `source_table` | ✔️| string | Table where files should come from, paths are also possible in the form of \`file_format\`.\`path\` | | `partition_filter` | ️ | map | A map of partitions in the source table to import from | | `check_duplicate_files` | ️ | boolean | Whether to prevent files existing in the table from being added (defaults to true) | +| `listing_parallelism` | | long| The parallelism to use when listing files (defaults to 1) | Review Comment: can this be an int ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] DatasourceV2 does not prune columns after V2ScanRelationPushDown [iceberg]
akshayakp97 commented on issue #9268: URL: https://github.com/apache/iceberg/issues/9268#issuecomment-1850797730 After `ColumnPruning` adds the new `Project [cs_warehouse_sk#54840, cs_order_number#54843L]`, when `V2ScanRelationPushDown` rule triggers, it doesn't match the [`ScanOperation`](https://github.com/apache/spark/blob/7db85642600b1e3b39ca11e41d4e3e0bf1c8962b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L352-L353) pattern, because, the scan operator for `catalog_sales` in the logical plan seems to have been updated in the previous iterations of `V2ScanRelationPushDown` rule - which resulted in the conversion of `ScanBuilderHolder` to `DataSourceV2ScanRelation`. To summarize, in this case, it looks like a new `Project` was added after the creation of `DataSourceV2ScanRelation` in the logical plan, causing it not prune columns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] DatasourceV2 does not prune columns after V2ScanRelationPushDown [iceberg]
akshayakp97 commented on issue #9268: URL: https://github.com/apache/iceberg/issues/9268#issuecomment-1850822846 In general, if a `Project` is added after the execution of `V2ScanRelationPushDown` rule - how do the columns get pruned? Or, do we not expect any new `Project`'s? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]
GianlucaPrincipini commented on code in PR #9161: URL: https://github.com/apache/iceberg/pull/9161#discussion_r1423154765 ## api/src/test/java/org/apache/iceberg/TestHelpers.java: ## @@ -173,6 +178,60 @@ public static void assertSameSchemaMap(Map map1, Map
[PR] Core: Add StandardEncryptionManager [iceberg]
rdblue opened a new pull request, #9277: URL: https://github.com/apache/iceberg/pull/9277 This is an update to https://github.com/apache/iceberg/pull/6884 with a few final fixes to the `StandardEncryptionManager` and related classes. This is almost entirely @ggershinsky's work, I'm just submitting this to unblock merging since he's OOO right now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Build: Bump mkdocs-material from 9.5.1 to 9.5.2 [iceberg-python]
dependabot[bot] opened a new pull request, #203: URL: https://github.com/apache/iceberg-python/pull/203 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.1 to 9.5.2. Release notes Sourced from https://github.com/squidfunk/mkdocs-material/releases";>mkdocs-material's releases. mkdocs-material-9.5.2 Fixed types for slugify settings in blog plugin config Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6469";>#6469: Horizontal scrollbars on MathJax containers Changelog Sourced from https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG";>mkdocs-material's changelog. mkdocs-material-9.5.2+insiders-4.47.1 (2023-12-11) Improved editing experience for projects plugin Improved resilience of optimize and social plugin Fixed race condition when writing manifest in optimize and social plugin Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6475";>#6475: Logo not taking precedence over icon in social card Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6399";>#6399: Projects plugin doesn't pick up added/removed projects Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6306";>#6306: Projects plugin cache not correctly updated mkdocs-material-9.5.2 (2023-12-11) Fixed types for slugify settings in blog plugin config Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6469";>#6469: Horizontal scrollbars on MathJax containers mkdocs-material-9.5.1+insiders-4.47.0 (2023-12-08) Added support for staying on page when switching languages Added configurable logging capabilities to projects plugin Removed temporary warning on blog plugin authors file format change Fixed projects plugin logging messages twice on Linux systems Fixed projects plugin trying to hoist theme assets of divergent themes Fixed compatibility of optimize plugin and projects plugin Fixed compatibility of social plugin and projects plugin Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6448";>#6448: Code line selection broken for code blocks with custom ids Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6437";>#6437: Projects plugin crashing for certain site URL configurations Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6414";>#6414: Projects plugin doesn't prefix messages coming from projects mkdocs-material-9.5.1 (2023-12-08) Updated Greek translations Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6464";>#6464: Privacy plugin cannot be enabled Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6461";>#6461: Sorting blog posts ignores time component in date mkdocs-material-9.5.0 (2023-12-07) Merged Insiders features of 'Goat's Horn' funding goal Added privacy plugin: automatic downloading of external assets Added support for card grids and grid layouts Added support for improved tooltips Added support for content tabs anchor links (deep linking) Added support for automatic dark/light mode Added support for document contributors mkdocs-material-9.4.14+insiders-4.46.0 (2023-11-26) Added support for author profiles in blog plugin Fixed custom index pages yielding two navigation items (4.45.0 regression) mkdocs-material-9.4.14 (2023-11-26) ... (truncated) Commits https://github.com/squidfunk/mkdocs-material/commit/2ca5cf0ea0627982f595c14cb8080447f9a9d24c";>2ca5cf0 Prepare 9.5.2 release https://github.com/squidfunk/mkdocs-material/commit/52feaba4034960aadda79761d8a0d4085021df72";>52feaba Merge branch 'master' of github.com:squidfunk/mkdocs-material https://github.com/squidfunk/mkdocs-material/commit/649db514291140d372735183b9456072d22c65ac";>649db51 Updated dependencies https://github.com/squidfunk/mkdocs-material/commit/738dd7dc3f626741d6b7c28a4383adbf7c7db41f";>738dd7d Merge pull request https://redirect.github.com/squidfunk/mkdocs-material/issues/6486";>#6486 from squidfunk/dependabot/github_actions/actions/set... https://github.com/squidfunk/mkdocs-material/commit/ef00a474f5029774b5f9528e91c0a7c5f96a72f3";>ef00a47 Merge pull request https://redirect.github.com/squidfunk/mkdocs-material/issues/6487";>#6487 from squidfunk/dependabot/github_actions/actions/dep... https://github.com/squidfunk/mkdocs-material/commit/cc03f10d63997b79f50f32174eeb522b58b77b29";>cc03f10 Bump actions/deploy-pages from 2 to 3 https://github.com/squidfunk/mkdocs-material/commit/85a3298713dda1288276d6bdb673b531e3d47c2c";>85a3298 Bump actions/setup-python from 4 to 5 https://github.com/squidfunk/mkdocs-material/commit/419898a337f047f8eee7d6cd37472355a915053e";>419898a Documentation (https://redirect.github.com/squidfunk/mkdocs-material/issues/6477";>#6477) https://github.com
[PR] Build: Bump typing-extensions from 4.8.0 to 4.9.0 [iceberg-python]
dependabot[bot] opened a new pull request, #204: URL: https://github.com/apache/iceberg-python/pull/204 Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.8.0 to 4.9.0. Release notes Sourced from https://github.com/python/typing_extensions/releases";>typing-extensions's releases. 4.9.0 This feature release adds typing_extensions.ReadOnly, as specified by PEP 705, and makes various other improvements, especially to @typing_extensions.deprecated(). There are no changes since 4.9.0rc1. 4.9.0rc1 Add support for PEP 705, adding typing_extensions.ReadOnly. Patch by Jelle Zijlstra. All parameters on NewType.__call__ are now positional-only. This means that the signature of typing_extensions.NewType.__call__ now exactly matches the signature of typing.NewType.__call__. Patch by Alex Waygood. Fix bug with using @deprecated on a mixin class. Inheriting from a deprecated class now raises a DeprecationWarning. Patch by Jelle Zijlstra. @deprecated now gives a better error message if you pass a non-str argument to the msg parameter. Patch by Alex Waygood. @deprecated is now implemented as a class for better introspectability. Patch by Jelle Zijlstra. Exclude __match_args__ from Protocol members. Backport of https://redirect.github.com/python/cpython/pull/110683";>python/cpython#110683 by Nikita Sobolev. When creating a typing_extensions.NamedTuple class, ensure __set_name__ is called on all objects that define __set_name__ and exist in the values of the NamedTuple class's class dictionary. Patch by Alex Waygood, backporting https://redirect.github.com/python/cpython/pull/111876";>python/cpython#111876. Improve the error message when trying to call issubclass() against a Protocol that has non-method members. Patch by Alex Waygood (backporting https://redirect.github.com/python/cpython/pull/112344";>python/cpython#112344, by Randolph Scholz). Changelog Sourced from https://github.com/python/typing_extensions/blob/main/CHANGELOG.md";>typing-extensions's changelog. Release 4.9.0 (December 9, 2023) This feature release adds typing_extensions.ReadOnly, as specified by PEP 705, and makes various other improvements, especially to @typing_extensions.deprecated(). There are no changes since 4.9.0rc1. Release 4.9.0rc1 (November 29, 2023) Add support for PEP 705, adding typing_extensions.ReadOnly. Patch by Jelle Zijlstra. All parameters on NewType.__call__ are now positional-only. This means that the signature of typing_extensions.NewType.__call__ now exactly matches the signature of typing.NewType.__call__. Patch by Alex Waygood. Fix bug with using @deprecated on a mixin class. Inheriting from a deprecated class now raises a DeprecationWarning. Patch by Jelle Zijlstra. @deprecated now gives a better error message if you pass a non-str argument to the msg parameter. Patch by Alex Waygood. @deprecated is now implemented as a class for better introspectability. Patch by Jelle Zijlstra. Exclude __match_args__ from Protocol members. Backport of https://redirect.github.com/python/cpython/pull/110683";>python/cpython#110683 by Nikita Sobolev. When creating a typing_extensions.NamedTuple class, ensure __set_name__ is called on all objects that define __set_name__ and exist in the values of the NamedTuple class's class dictionary. Patch by Alex Waygood, backporting https://redirect.github.com/python/cpython/pull/111876";>python/cpython#111876. Improve the error message when trying to call issubclass() against a Protocol that has non-method members. Patch by Alex Waygood (backporting https://redirect.github.com/python/cpython/pull/112344";>python/cpython#112344, by Randolph Scholz). Commits https://github.com/python/typing_extensions/commit/fc461d6faf4585849b561f2e4cbb06e9db095307";>fc461d6 Release 4.9.0 (https://redirect.github.com/python/typing_extensions/issues/313";>#313) https://github.com/python/typing_extensions/commit/f82d6367f3ff8f16b6291de06394ec6b9318bfc3";>f82d636 Prepare release 4.9.0rc1 (https://redirect.github.com/python/typing_extensions/issues/306";>#306) https://github.com/python/typing_extensions/commit/daa793141c3d504ce0a1d19ef032ea83466ba5c2";>daa7931 Run typed-argument-parser tests on 3.12 in the daily workflow (https://redirect.github.com/python/typing_extensions/issues/307";>#307) https://github.com/python/typing_extensions/commit/0b0166d649cebcb48e7e208ae5da36cfab5965fe";>0b0166d Add support for PEP 705 (https://redirect.github.com/python/typing_extensions/issues/284";>#284) https://github.com/python/typing_extensions/commit/db6f9b4a0e1c18c6269691691e72e6b80a247ebd";>db6f9b4 Update https://github.com/deprecated";>@deprecated implementation (https://redirect.github.com/python/typing_extensions/issues/302";
[PR] Core, API: Move SQLViewRepresentation to API [iceberg]
amogh-jahagirdar opened a new pull request, #9278: URL: https://github.com/apache/iceberg/pull/9278 This change is similar to the one done a while back for `BaseViewVersion`. SQLViewRepresentation is moved from the core module to the API module (since it is a concept that's apparent in the spec). This stems from this thread https://github.com/apache/iceberg/pull/9247/files#r1419759456 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]
amogh-jahagirdar commented on code in PR #9278: URL: https://github.com/apache/iceberg/pull/9278#discussion_r1423218929 ## core/src/main/java/org/apache/iceberg/view/BaseSQLViewRepresentation.java: ## @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.view; + +import org.immutables.value.Value; + +@Value.Immutable +@SuppressWarnings("ImmutablesStyle") +@Value.Style( +typeImmutable = "ImmutableSQLViewRepresentation", +visibilityString = "PUBLIC", +builderVisibilityString = "PUBLIC") +interface BaseSQLViewRepresentation extends SQLViewRepresentation {} Review Comment: Package private `BaseSQLViewRepresentation` which extends the public SQLViewRepresentation. The generated Immutables class is public for preserving compatibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Querying metadata tables for a branch or tag [iceberg]
wypoon opened a new issue, #9279: URL: https://github.com/apache/iceberg/issues/9279 ### Query engine _No response_ ### Question The documentation states that metadata tables can be queried with `TIMESTAMP AS OF` and `VERSION AS OF`. As tables can be queried for a branch or tag using `VERSION AS OF` as well, this suggests that metadata tables can be queried for a branch or tag this way too. However, I have found that `VERSION AS OF` seems to have an effect on only some metadata tables and not all. E.g., querying the `history` table always returns the history of the main branch. Which metadata tables support querying for a branch or tag (or snapshot for that matter)? And which ones do not? For the ones that do not, is it simply that the support hasn't been implemented yet? or is there some inherent technical reason it is not supported? @jackye1995 @amogh-jahagirdar In any case, it would be helpful to document the current state of support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]
amogh-jahagirdar commented on code in PR #9278: URL: https://github.com/apache/iceberg/pull/9278#discussion_r1423222132 ## .palantir/revapi.yml: ## @@ -877,6 +877,10 @@ acceptedBreaks: - code: "java.field.serialVersionUIDChanged" new: "field org.apache.iceberg.util.SerializableMap.serialVersionUID" justification: "Serialization is not be used" +- code: "java.method.visibilityReduced" + old: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.SQLViewRepresentation)" + new: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.BaseSQLViewRepresentation)" + justification: "Immutables generated copyOf visibility changed" Review Comment: I think there must have been some behavior change in Immutables between the original code generation for SQLViewRepresentation and now. Immutables now generates a package-private copyOf (confirmed with ViewVersion). However, prior to this change the generated copyOf for SQLViewRepresentation was actually public. This is technically breaking to downgrade the copyOf but I don't see any options in the `Immutables` doc to control this behavior. Before merging this though, I'll explore a few options -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]
amogh-jahagirdar commented on code in PR #9278: URL: https://github.com/apache/iceberg/pull/9278#discussion_r1423222132 ## .palantir/revapi.yml: ## @@ -877,6 +877,10 @@ acceptedBreaks: - code: "java.field.serialVersionUIDChanged" new: "field org.apache.iceberg.util.SerializableMap.serialVersionUID" justification: "Serialization is not be used" +- code: "java.method.visibilityReduced" + old: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.SQLViewRepresentation)" + new: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.BaseSQLViewRepresentation)" + justification: "Immutables generated copyOf visibility changed" Review Comment: I think there must have been some behavior change in Immutables between the original code generation for SQLViewRepresentation and now. Immutables now generates a package-private copyOf (confirmed with ViewVersion). However, prior to this change the generated copyOf for SQLViewRepresentation was actually public. This is technically breaking to downgrade the copyOf but I don't see any options in the `Immutables` docs to control this behavior. Before merging this though, I'll explore a few options -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]
amogh-jahagirdar closed pull request #9278: Core, API: Move SQLViewRepresentation to API URL: https://github.com/apache/iceberg/pull/9278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]
amogh-jahagirdar commented on PR #9278: URL: https://github.com/apache/iceberg/pull/9278#issuecomment-1851051133 Looks like some unrelated Flink tests failed, noting here: ``` TestIcebergSourceWithWatermarkExtractor > testThrottling FAILED java.lang.AssertionError: Expecting
Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]
amogh-jahagirdar commented on PR #9278: URL: https://github.com/apache/iceberg/pull/9278#issuecomment-1851053058 Ah I see the flink failure was fixed recently in https://github.com/apache/iceberg/pull/9216, I just need to rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]
amogh-jahagirdar commented on code in PR #9274: URL: https://github.com/apache/iceberg/pull/9274#discussion_r1423238036 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java: ## @@ -530,14 +515,16 @@ private static void importUnpartitionedSparkTable( * @param spec a partition spec * @param stagingDir a staging directory to store temporary manifest files * @param checkDuplicateFiles if true, throw exception if import results in a duplicate data file + * @param listingParallelism the parallelism to use when listing files */ public static void importSparkPartitions( SparkSession spark, List partitions, Table targetTable, PartitionSpec spec, String stagingDir, - boolean checkDuplicateFiles) { + boolean checkDuplicateFiles, + int listingParallelism) { Review Comment: Important: This is a public utility, we can't just add a parameter without breaking users who may be using this (and this API has existed for a while) . It's better to duplicate and add a new a method (and then leverage the implementation of the parallel one in the original one) ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java: ## @@ -417,33 +427,6 @@ public static void importSparkTable( } } - /** - * Import files from an existing Spark table to an Iceberg table. - * - * The import uses the Spark session to get table metadata. It assumes no operation is going on - * the original and target table and thus is not thread-safe. - * - * @param spark a Spark session - * @param sourceTableIdent an identifier of the source Spark table - * @param targetTable an Iceberg table where to import the data - * @param stagingDir a staging directory to store temporary manifest files - * @param checkDuplicateFiles if true, throw exception if import results in a duplicate data file - */ - public static void importSparkTable( - SparkSession spark, Review Comment: Important: this is a public utility, we want to maintain compatibility and not remove it. If there's a reason we want to deprecate it, it should go down a deprecation path. But I don't see why we would want to deprecate this ## data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java: ## @@ -215,11 +215,11 @@ private static DataFile buildDataFile( .build(); } - private static ExecutorService migrationService(int concurrentDeletes) { + private static ExecutorService migrationService(int numThreads) { Review Comment: Nit: This seems like an unnecessary name change to me, there are other places in the code where concurrentX refers to parallelism we want, and I don't see a need to change that convention -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]
amogh-jahagirdar commented on code in PR #9278: URL: https://github.com/apache/iceberg/pull/9278#discussion_r1423222132 ## .palantir/revapi.yml: ## @@ -877,6 +877,10 @@ acceptedBreaks: - code: "java.field.serialVersionUIDChanged" new: "field org.apache.iceberg.util.SerializableMap.serialVersionUID" justification: "Serialization is not be used" +- code: "java.method.visibilityReduced" + old: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.SQLViewRepresentation)" + new: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.BaseSQLViewRepresentation)" + justification: "Immutables generated copyOf visibility changed" Review Comment: I think there must have been some behavior change in Immutables between the original code generation for SQLViewRepresentation and now. Immutables now generates a package-private copyOf (confirmed with ViewVersion). However, prior to this change the generated copyOf for SQLViewRepresentation was actually public. This is technically breaking to downgrade the copyOf but I don't see any options in the `Immutables` docs to control this behavior. Before merging this though, I'll explore a few options. Since the view APIs are newer we have more flexibility here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Quickstart should give an example of a REST catalog enablement in the spark iceberg docker compose setup [iceberg]
github-actions[bot] commented on issue #7615: URL: https://github.com/apache/iceberg/issues/7615#issuecomment-1851097951 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Quickstart should give an example of a REST catalog enablement in the spark iceberg docker compose setup [iceberg]
github-actions[bot] closed issue #7615: Quickstart should give an example of a REST catalog enablement in the spark iceberg docker compose setup URL: https://github.com/apache/iceberg/issues/7615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Is is possible to control the number of partitions (groups) for compaction ? [iceberg]
github-actions[bot] commented on issue #7506: URL: https://github.com/apache/iceberg/issues/7506#issuecomment-1851097979 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Inconsistent API for remove_orphan_files and DeleteOrphanFiles [iceberg]
github-actions[bot] commented on issue #7480: URL: https://github.com/apache/iceberg/issues/7480#issuecomment-1851098001 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] DecimalType declaration check missing proper assertions [iceberg]
github-actions[bot] commented on issue #7420: URL: https://github.com/apache/iceberg/issues/7420#issuecomment-1851098029 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]
manuzhang commented on code in PR #9274: URL: https://github.com/apache/iceberg/pull/9274#discussion_r1423283454 ## data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java: ## @@ -215,11 +215,11 @@ private static DataFile buildDataFile( .build(); } - private static ExecutorService migrationService(int concurrentDeletes) { + private static ExecutorService migrationService(int numThreads) { Review Comment: However, this service is not for deletes. It might have been copied from somewhere. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]
manuzhang commented on code in PR #9274: URL: https://github.com/apache/iceberg/pull/9274#discussion_r1423305940 ## docs/spark-procedures.md: ## @@ -639,6 +639,7 @@ Keep in mind the `add_files` procedure will fetch the Parquet metadata from each | `source_table` | ✔️| string | Table where files should come from, paths are also possible in the form of \`file_format\`.\`path\` | | `partition_filter` | ️ | map | A map of partitions in the source table to import from | | `check_duplicate_files` | ️ | boolean | Whether to prevent files existing in the table from being added (defaults to true) | +| `listing_parallelism` | | long| The parallelism to use when listing files (defaults to 1) | Review Comment: There's no `asInt` method in `ProcedureInput`. Maybe shall I add one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Add Spark application id to summary of RewriteDataFilesSparkAction [iceberg]
manuzhang commented on PR #9273: URL: https://github.com/apache/iceberg/pull/9273#issuecomment-1851176081 @amogh-jahagirdar @nastra could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Documentation [iceberg-rust]
liurenjie1024 commented on issue #114: URL: https://github.com/apache/iceberg-rust/issues/114#issuecomment-1851182139 cc @Xuanwo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]
amogh-jahagirdar commented on code in PR #9274: URL: https://github.com/apache/iceberg/pull/9274#discussion_r1423322127 ## data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java: ## @@ -215,11 +215,11 @@ private static DataFile buildDataFile( .build(); } - private static ExecutorService migrationService(int concurrentDeletes) { + private static ExecutorService migrationService(int numThreads) { Review Comment: Ah you are right. I confused myself, in that case this name is a good thing. Thanks for fixing this! ## data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java: ## @@ -215,11 +215,11 @@ private static DataFile buildDataFile( .build(); } - private static ExecutorService migrationService(int concurrentDeletes) { + private static ExecutorService migrationService(int numThreads) { Review Comment: Ah you are right. I confused myself, in that case this name change is a good change. Thanks for fixing this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Documentation [iceberg-rust]
Xuanwo commented on issue #114: URL: https://github.com/apache/iceberg-rust/issues/114#issuecomment-1851232081 > Just curious, what would be the relationship with docs of crates on docs.rs? From the ASF perspective, anything outside of `apache.org` does not exist. We should host our guide and API documentation on `rust.iceberg.apache.org` as the single source of truth. The documents on `docs.rs` are merely for the convenience of Rust users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Create Branches / TAGS between 2 snapshots [iceberg]
fanaticjo opened a new issue, #9280: URL: https://github.com/apache/iceberg/issues/9280 ### Feature Request / Improvement Is there a way where we can create a branch / TAG based on 2 snapshot ids or only latest data We have a use case where we write monthly generated report to a iceberg table , but for every month we want to tag / branch the data for audit purposes . Currently branch / tag creates the data from that snapshot to first snapshots . ### Query engine Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Create Branches / TAGS between 2 snapshots [iceberg]
fanaticjo opened a new issue, #9281: URL: https://github.com/apache/iceberg/issues/9281 ### Feature Request / Improvement Is there a way where we can create a branch / TAG based on 2 snapshot ids or only latest data We have a use case where we write monthly generated report to a iceberg table , but for every month we want to tag / branch the data for audit purposes . Currently branch / tag creates the data from that snapshot to first snapshots . if this task is possible , please let us know or if we can contribute also ### Query engine Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Unable to merge CDC data into snapshot data. java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Long [iceberg]
harshith-bolar-rapido commented on issue #8333: URL: https://github.com/apache/iceberg/issues/8333#issuecomment-1851252377 Any update on this? Facing the same issue with 1.2.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Unable to merge CDC data into snapshot data. java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Long [iceberg]
chandu-1101 commented on issue #8333: URL: https://github.com/apache/iceberg/issues/8333#issuecomment-1851283161 Abandon Iceberg, Abandon Hudi. Both are useless frameworks. Go with Parquet and spark sql. Your life will be simple and happy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Data: Add GenericFileWriterFactory [iceberg]
szehon-ho commented on code in PR #9267: URL: https://github.com/apache/iceberg/pull/9267#discussion_r1423408972 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestMetadataTableReadableMetrics.java: ## @@ -219,27 +219,27 @@ public void testPrimitiveColumns() throws Exception { Row binaryCol = Row.of( -59L, Review Comment: I probably should have originally picked these up from the file itself, instead of hard-coding it, but its a suggestion for another PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Data: Add GenericFileWriterFactory [iceberg]
szehon-ho commented on code in PR #9267: URL: https://github.com/apache/iceberg/pull/9267#discussion_r1423408972 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestMetadataTableReadableMetrics.java: ## @@ -219,27 +219,27 @@ public void testPrimitiveColumns() throws Exception { Row binaryCol = Row.of( -59L, Review Comment: I probably should have originally picked these up from the file itself, but its a suggestion for another PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Hive: Use jUnit5 based HiveMetastoreExtension with Hive catalog tests [iceberg]
nk1506 opened a new pull request, #9282: URL: https://github.com/apache/iceberg/pull/9282 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]
nk1506 commented on code in PR #8918: URL: https://github.com/apache/iceberg/pull/8918#discussion_r1423500567 ## hive-metastore/src/test/java/org/apache/iceberg/hive/HiveMetastoreTest.java: ## @@ -31,6 +31,10 @@ import org.junit.jupiter.api.AfterAll; import org.junit.jupiter.api.BeforeAll; +/* + * This meta-setup has been deprecated; use {@link HiveMetastoreExtension} instead. + * */ +@Deprecated Review Comment: Created a [PR](https://github.com/apache/iceberg/pull/9282) with jUnit5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Set HadoopFileIO as default for HadoopCatalog [iceberg]
agrawalreetika commented on PR #9283: URL: https://github.com/apache/iceberg/pull/9283#issuecomment-1851415341 @nastra, Could you please help me with review on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org